A topic (experience, test) to share with you: for ARM processor and system "experts", esp. in terms of ARM instruction set and MPU, relocatable code...
Question first:
Who does know how to mix long_calls with short_calls in C/C++ code?
It means: how to declare, define and call functions which are too far away from calling code via relative, Place-Independent-Code (PIC) code?
My intention: I want to load code on external SDRAM (visible at address 0x60000000) and execute code on SDRAM (e.g. for "Apps" loaded from QSPI or SDCard).
OK, here what I think we had to bear in mind, when we want to implement:
- Obvious, SRAM has to be initialized, e.g. via:
SDRAMClass sdram;
sdram.begin();
-
But not sure if code execution is allowed and possible on SDRAM: usually, the external SDRAM, in address range 0x60000000 ... 0x7FFFFFFF is default as: "device" and executable. But mbed, RTOS etc. could configure MPU and disable code fetch and execution: this is actually recommended to do so: if external SDRAM is not available - avoid speculative code fetch, cache line fills (from unavailable memory): otherwise a HardFault_Handler can be called (what I had for several days). So, potentially consider to setup an MPU entry for SDRAM region.
-
Place data and code on SDRAM: OK, you can use attribute and section to do so.
-
Extend the linker_script.ld: you had to define sections there where the code is linked to SDRAM address, but the code and data is maintained as a "copy" in FLASH ROM (I use it here for simple test: my SDRAM functions sit as copy in FLASH ROM, just to make it easier).
-
Have a startup code: before the first use, even SDRAM initialized - the content (data and code = text) has to be written (copied) to SDRAM: have code (as assembly file) to "move" from other location (FLASH ROM) into SDRAM region.
-
Very important: Place Independent Code (PIC) is generated: see compiler options as -fPIC or -m no_long_calls: due to the ARM instruction set where code for branches or function calls are addressed as "relative to PC value" - the function called cannot be further away as 4 MB!. Calling a function from SDRAM (0x60000000) located in FLASH ROM (0x08000000) is TOO FAR AWAY for a relative ARM Thumb instruction! The distance is larger as 4MB. The compiler will complain (and it did so often). So, you need "helper functions", called "trampoline function" or "veener" to extend such a call over a larger distance (mainly via loading a 32bit address to register and jump via "next PC is register content").
I have it working (sharing and attached files here). The biggest problem is to deal with this PIC code (Place Independent Code) and the distance larger as 4MB. The only solution I have: write assembly code which will "extend" the call: from SDRAM region to FLASH ROM, in order to call a function available there.
Here are the code files and some remarks:
Have a function which copies code and data from FLASH ROM to SDRAM (after SDRAM init was done):
ITCM_Startup.h (401 Bytes)
ITCM_Startup.cpp (1.8 KB)
The ITCM_Startup.cpp is actually an assembly file, as ITCM_Startup.S!
These files have a function SDRAM_Startup() and functions to "extend" (forward) calls from SDRAM to FALSH ROM (where regular FW is).
These files are associated with the linker_script.ld:
linker_script.h (3.9 KB)
The file name is linker_script.ld! Place it under your project or potentially under the LIB reference file path (used from there, not sure), e.g.: C:\Users<username>\AppData\Local\Arduino15\packages\arduino\hardware\mbed_portenta\3.0.1\variants\PORTENTA_H7_M7
And you had to bear in mind: assembly code does not "find" C++ functions: to, you had to declare functions used from assembly code as "plain C function" (no "name mangling"). Also if you want to branch in assembly code to a function: the called function should be "plain C".
This linker script has the definition of sections, used via attribute, e.g. code in C/C++ looks like this:
char SDRAM_String[80] __attribute__((section(".sdramdata"))) = { "Hallo SDRAM\r\n" };
Remark: as a new code section, do NOT use .textsdram! I use .sdramtext because linker script has already a definition for (.text) which would place all .textsdram still in FLASH ROM (the second * for alias as any other appendix, .textsdram is the same as .text - in FLASH ROM!).
Configure the MPU: even it does not seem to be necessary to configure the MPU for an entry to manage the SDRAM region (0x60000000, 128MB), you can (and should) do:
/* configure the MPU - let allow code execution on SDRAM - it does not seem to be needed */
print_log(UART_OUT, "old MPU setting\r\n");
hex_dump((char *)0xE000ED90, 0x30, 4, UART_OUT);
{
#define MPU_SDRAM_EXEC_REGION_NUMBER MPU_REGION_SDRAM1
#define MPU_SDRAM_REGION_TEX (0x4 << MPU_RASR_TEX_Pos) /* Cached memory */
#define MPU_SDRAM_EXEC_REGION_SIZE (22 << MPU_RASR_SIZE_Pos) /* 2^(22+1) = 8MB */
#define MPU_SDRAM_ACCESS_PERMSSION (0x03UL << MPU_RASR_AP_Pos)
#define MPU_SDRAM_REGION_CACHABLE (0x01UL << MPU_RASR_C_Pos)
#define MPU_SDRAM_REGION_BUFFERABLE (0x01UL << MPU_RASR_B_Pos)
MPU->CTRL &= ~MPU_CTRL_ENABLE_Msk;
/* Configure SDARM region as first region */
MPU->RNR = MPU_SDRAM_EXEC_REGION_NUMBER;
/* Set MPU SDARM base address (0x60000000) */
MPU->RBAR = SDRAM_START_ADDRESS;
/*
- Execute region: RASR[size] = 22 -> 2^(22+1) -> size 8MB
- Access permission: Full access: RASR[AP] = 0b011
- Cached memory: RASR[TEX] = 0b0100
- Disable the Execute Never option: to allow the code execution on SDRAM: RASR[XN] = 0
- Enable the region MPU: RASR[EN] = 1
*/
MPU->RASR = (MPU_SDRAM_EXEC_REGION_SIZE | MPU_SDRAM_ACCESS_PERMSSION | MPU_SDRAM_REGION_TEX | \
MPU_RASR_ENABLE_Msk | MPU_SDRAM_REGION_BUFFERABLE) & ~MPU_RASR_XN_Msk; /* do not disable XN for code execution possible */
/* Enable MPU and leave the predefined regions to default configuration */
MPU->CTRL |= MPU_CTRL_PRIVDEFENA_Msk | MPU_CTRL_ENABLE_Msk;
__DSB(); /* we should use barriers to make sure all updated */
__ISB();
}
So, the user code looks like this:
extern "C" {
void SDRAM_Startup(void); /* initialize: copy code and data to SDRAM */
int SDRAM_FuncL(int i); /* a LONG_JUMP call, via FLASH ROM */
int SDRAM_Func(int i); /* the function as assembly code for SDRAM */
int SDRAM_Func2(int i); /* another function defined as C-code for SDRAM */
void SDRAM_UARTSend(const char* str, int chrs, EResultOut out); /* forward to UARTSend() */
}
char SDRAM_String[80] __attribute__((section(".sdramdata"))) = { "Hallo SDRAM\r\n" };
#ifdef THIS_FAILS
////#pragma push /* this generates very strange error: "instantion of mbed::Callback<(void()>" */
////#pragma long_calls
/* this this causes an error "relocation truncated" due to place independent (relocatable) code not possible
* how to define a C/C++ function placed on SDRAM??
*/
int __attribute__((section(".sdramtext"), long_call)) SDRAM_Func2(int i) {
return i * 10;
}
////#pragma short_calls
////#pragma pop
#endif
int SDRAM_Func2(int i) {
return i * 10;
}
typedef int (*pFunc_t)(int); //define a type for function pointers
And inside a regular function I do this (before I call a function on SDRAM):
SDRAM_Startup(); /* copy .data and .text to SDRAM! */
print_log(UART_OUT, "new MPU setting\r\n");
hex_dump((char*)0xE000ED90, 0x30, 4, UART_OUT);
print_log(UART_OUT, "SDRAM INIT result\r\n");
hex_dump((char*)0x60000000, 64, 4, UART_OUT);
print_log(UART_OUT, "SDRAM data : %lx\r\n", (unsigned long)SDRAM_String);
print_log(UART_OUT, "SDRAM func long : %lx\r\n", (unsigned long)&SDRAM_FuncL);
print_log(UART_OUT, "SDRAM func address : %lx\r\n", (unsigned long)&SDRAM_Func);
print_log(UART_OUT, "SDRAM func2 address: %lx\r\n", (unsigned long)&SDRAM_Func2);
strcpy(SDRAM_String, "NEW STRING!\r\n");
int i = 10;
i = SDRAM_FuncL(i);
print_log(UART_OUT, "##1 result is: %d\r\n", i);
i = SDRAM_Func2(i);
print_log(UART_OUT, "##2 result is: %d\r\n", i);
hex_dump((char*)0x60000000, 64, 4, UART_OUT);
{
/* this works! but be careful when moving relocatable code to SDRAM: can fail */
unsigned long* memP1 = (unsigned long*)0x60000200; //SDRAM address for code (the SDRAM_Func2)
unsigned long* memP2 = (unsigned long*)(((unsigned long)&SDRAM_Func2) & 0xFFFFFFFEul); //ATT: it is thumb code! odd address!
pFunc_t fPtr = (pFunc_t)0x60000201; //ATT: it must be odd, for thumb code!
print_log(UART_OUT, "##3 copy function to SDRAM\r\n");
memcpy(memP1, memP2, 80);
//try to call now the function on SDRAM
i = 20;
i = fPtr(i);
print_log(UART_OUT, "##3 result is: %d\r\n", i);
}
/* call function forwarded to FLASH ROM */
SDRAM_UARTSend("Hallo from SDRAM\r\n", 18, UART_OUT);
SDRAM_UARTSend(SDRAM_String, 13, UART_OUT);
So: the data, this SDRAM_String, is not an issue. Just do NOT use (NOLOAD) in linker script: so, this string is copied from FLASH ROM to SDRAM and it becomes available on SDRAM. With (NOLOAD) in linker script: the space is allocated but not the global assigned init value, e.g. the string: it does not sit in FLASH ROM as a copy for SDRAM content.
Just for all the function calls: I have not found a way to declare a C/C++ function as a long_call. Often, the compiler complains with "relocation truncate" or even a very strange error message as "callback mbed::...".
So, I forward all the long_calls from SDRAM via assembly code helper functions to FLASH ROM. Not so nice, but no idea yet how to cope with it.
PIC code instructions
When you study the ARM instruction set: you will see most of the instructions for branch and calling a function are "PC-relative". It means: the code will branch or call an address as function as "current-PC-value + offset". The beauty: it is place independent: the same code code can be moved, can be located somewhere else - it still works (not absolute references where it is located or where the calling code is located, as long as it is still relative to the caller).
This does not work anymore when code is executed on SDRAM (0x60000000) and it should call a function on FLASH ROM (0x08000000): it is "too far away" (for relative addressing, max. 4MB distance). So, you had to "extend" the call: you need a "veener" function where the total 32bit address is loaded into a register and than branch to it (see assembly code).
OK, fine: but: I need a free register (here I use R3) to do so. But this limits the the maximum number of parameters for a function call (max. 4: R0..R3). Now I need R3 free for this "long_jump" and I can have only 3 parameters on a function call.
I have not (yet) found a way to declare "long_calls" and to forward to a FAR address without to lose a temporary register just to do the "long_jump.
How to mix "long_calls" with "short_calls" (as default)? How to get rid of this "relocation truncate" error messages without to write so much assembly code handlers?
THUMB code
Also to bear in mind: our CM7 uses THUMB instructions, not ARM instructions. It means this: the bit 0 on every PC value, e.g. when you call a function, is set to 1. The address looks "strange": it is odd: if you print the address of a function - their address is odd (has a bit 0 set).
This is mandatory to keep it (as odd address): a bit 0 as 0 would mean: "change to ARM instruction set (away from THUMB) - and it will FAIL on CM7 (CM3, CM4)".
The compiler generates properly THUMB code (bit 0 set) for function addresses. But when you copy with assembly code - make sure code addresses are always on odd entry addresses (+1).
So: even the compiler generates already properly odd addresses for function calls: I make sure that bit 0 (of a function address, not for data! still "real" address) remains or is set.
Relocatable Code
I do something really ugly: I have a function SDRAM_Func2() which sits in FLASH ROM (0x08000000). I could not call this function directly (too far away). But I take the code and copy the code for this function from FLASH ROM into SDRAM. Than I call this function on SDRAM (after copied).
It works: due to fact that generated code is "Place Independent Code" (PIC); it works in the same way. I can call and OK.
BUT: if such a function I call (and copied their code) would call another sub-function inside this function - it will NOT work anymore. It will crash!
The sub-function call is still a "relative PC+offset call" but the function needed to be called as sub-function is NOT on SDRAM and even too far away. It cannot work.
So, even I had a SDRAM function working: if this one would call any other functions from it - it had to be a "long_call". It cannot work.
So, all OK, the approach is clear, the framework (test) works. Just if I do such one:
int __attribute__((section(".sdramtext"), long_call)) SDRAM_Func2(int i) {
return i * 10;
}
It gives me a very strange "Callback mbed::..." error. Why? This "long_call" does not seem to work.
And If I do just:
int __attribute__((section(".sdramtext"))) SDRAM_Func2(int i) {
it complains still with "relocation truncate" (the SDRAM_Func2 is still to far away, not for PIC code). How to mix long and short calls?
And: using the assembly code helper functions ("veener code") is a work-around: but how to have still 4 parameters to function calls (without to lose one register just for the long_call to do)?
OK: I could push register to stack, branch forward, pop back from stack... very ugly looking code.
Never mind: the good news is: I can load and execute code to and from SDRAM. Cool: Now I can write "Apps", e.g. load from SDCARD or QSPI flash (which does not let me execute directly from it as QSPI memory mapped device - a "wrong" chip soldered on Portenta H7?).
But it sounds to me: I had to create a "BIOS vector table" (extend calls from SDRAM to FLASH ROM), in assembly code. And create the "App" as an independent project, where it is self-contained (does not use directly FW and FLASH ROM functions), instead it uses the "BIOS vector table" to call external functions.
Not a big deal, just effort (and a bit complicated SW structure).
If you would have any idea how to deal with long_calls and to mix code (with short and long jumps) ... I would appreciate (even any idea how to deal with the "problem").
Thank you.
Debug Hooks
Due to the fact that Portenta H7 does not have really a debugger (just this bootloader) - it is a bit difficult to debug. I cannot but it would be nice to connect an ST-Link debugger and trace my code. Especially I want to know when HardFault_Handler is called, what is the processor PSPR register value etc. (where was the fault trap generated).
OK. Ideas (I use):
-
after compiling - check the generated *.MAP file (in project/Debug directory). It gives me a clear indication of the linker via linker_script.ld has placed properly the code, e.g. in SDRAM. Also, where to find the copy on FLASH ROM, when it will be copied from there to SDRAM, for SDRAM_setup().
-
I have created a hex_dump() function (see in my example code, implementation is not provided): I can print any MCU memory region, also registers, e.g. MPU registers. This is really helpful to debug: I can see what is inside the memory and registers of MCU. (I use a UART command line and this "hd" is a command to 'inspect' any memory).
-
I could also have an UART command to write any memory or register with a value, e.g. "wr . This could me to try something from command line, w/o to change/compile my code again.
The message is" if you have (and use) UART: have a simple "monitor program", to have have access to memory, registers from a command line. It makes it easier to try something, in inter-active mode, without to compile again code and see it crashing.
The compile time for Arduino, Portenta H7... is anyway so "bad" (long) - it is faster to debug via a UART command line. A hex_dump function from command line is my "biggest friend" meanwhile.