Portenta H7 dual core shared memory

Hello,

I have been using stm32h747 (not portenta h7, my own board) through STM32CubeIDE.
In my project, i am using shared memory to communicate between 2 cores.
My application process data in strict timing environment, so i need shared memory for communication.

Does arduino supports shared memory feature for portenta h7 ??

I'm not sure if this helps, but it is possible to pass data between the cores using RPC. See the example in the link below:

https://www.st.com/resource/en/application_note/an5617-stm32h745755-and-stm32h747757-lines-interprocessor-communications-stmicroelectronics.pdf. If I sum up what is written here you can use shared memory between the two core they have SRAMx in common. In my project I use DMA requests to write to shared memory and I read from CPU2 or 1. And I trigger the TC interrupt (Transfer Complete) on the NVIC corresponding to the CPU supposed to read from memory to tell him to read.

Best of luck

Mu comments on DualCore (Portenta H7) and shared memory:

First of all:
both cores can see the same system memories. You could share data between both cores just by using the same memory location (same address), e.g. in SRAM.
Just DTCM for CM7 is not possible for CM4, but every other memory is already shared between both (see the bus matrix).

The only "problem":

  • how to make it atomic (just one core accesses the shared memory)?

  • how to inform that one core has updated the memory and other core is informed about?

For this, you need this RPC (Remote Procedure Call) stuff. It is based mainly on using a HW Semaphore (a real register in MCU, not part of your RTOS). This helps to make sure that just one core writes/reads and the other has to wait (before using the same resource, same memory in a concurrent way - with a race condition).

Potentially, you can let both cores access the same memory location. But there is a race condition: you do not know which one wins to write, when a core reads from it - it might not get the update. Therefore, a RPC with HW semaphore is a "nice" approach to solve this issue.

For the 'race to read' can't you just define a structure in the shared memory where a variable tells you when you can read the buffer within the structure after one core is done with it?
Something like :

struct shared_data
{
  uint8_t sts_buff1_or2; // status : 1=M7toM4_1, 0=M7toM4_2 
  uint16_t bufferM4_M7[10]; // buffer
};
volatile struct shared_data * const xfr_ptr = (struct shared_data *)0x38001000;

It seemed to work for the one who came with the idea :
Shared memory
That would make no sense that the fastest way to inform cores that they can read data would be a method as 'slow' as RPC .
HSEM is also an option instead of RPC (adviced by many stm users)

Using a software "semaphore" works just in a statistical way. There is still a race condition, for sure. Potentially, you do not hit it most of the time but at a certain point it can fail. It is "not atomic'" and therefore not safe to do.

You have two masters in the system, the CM7 and the CM4. Also consider that CM7 can use caches. If your memory (software) based "semaphore" (sts_buff1_or2) is cached - the CM4 cannot see it (external memory not updated yet, CM4 does not have "cache coherency" with CM7).

In a multi-master system you need a real HW Semaphore. Therefore the HW Semaphore is part of the MCU system and it is used by the "official" RPC mechanism (the only correct way).

Imagine this situation:
Both cores try to read the Semaphore to figure out if it is free. They do it at the same time (not really due to arbitration on the HW bus). But both can see it as free. And both keep going to allocate it. This does not really avoid that both masters grab the "critical section" resource at the same time.

And due to the ARM Execution Pipeline it is also possible that the variable is read a bit ahead before it is evaluated by the following instruction.
There is for sure a tiny window where it will fail when both cores read the same semaphore and think it is free.

Only a HW semaphore can solve the issue: a single Read access, a single instruction, can allocate already the semaphore. This cannot be intercepted by another master.
But any SW based, memory based, shared memory Semaphore can be intercepted.

A "read-verify-write" is a sequence of several instructions. And the other MCU core (master, actually also a DMA is a master) can interrupt, intercept, interleave with the same sequence, during the other core doing the same.
It is "not atomic".

1 Like

Well, cut me if I am wrong but HSEM is Hardware Semaphore: https://www.st.com/resource/en/reference_manual/dm00176879-stm32h745755-and-stm32h747757-advanced-armbased-32bit-mcus-stmicroelectronics.pdf p.554. So based on what you are saying I create no source of conflict with my register-based semaphore. So I don't get the issue with this form of communication but I agree that the user in the link used those software semaphores it might not have been a good example but that's where my idea comes from and I don't plan to use SW only HW.

Yes, HSEM is Hardware Semaphore.
Think about an "atomic action" needed: this "read-check-write" to a register (SW based, memory based) semaphore is NOT safe. It is NOT an "atomic action". It should not be possible to be interrupted, interleaved, intercepted.

Two masters (cores) doing these instructions (more then one code line or processor code) can intercept each other. Both can see "it is free" and use the resource at the same time. They are not in sync.

core 1                                           core 2
read
                                                      read
check
                                                      check
free
                                                      free
write
                                                      write

Core 2 will read before Core 1 could write to it. Both will see "free".

A HW Semaphore (HSEM) is there to make it "atomic": the access to this HSEM (a register on the bus) is handled by the bus fabric: only one core is allowed to access. And if it is accessed, e.g. just via an atomic read command - as a single instruction (not C-code, a real processor instruction) can automatically allocate the HSEM if it is free (done in HW). This can never be interrupted by another master (core).

If you would keep going with a SW based semaphore, I would suggest to check it twice, to have a means to check if the other core has kicked in and has also grabbed the semaphore "at the same time".

Example:

//Core 1                                                            Core 2
if (semaphore == 0)                                        if (semaphore == 0)
     semaphore = 1;                                              semaphore = 2;
if (semaphore == 1)                                        if (semaphore == 2)
{                                                                         {
   //... I got the semaphore

When the other core "wins" - you might see the other value there.
But just doing "if (semaphore == 0)" is not safe!

And even this "double-check" is not safe!
What if Core2 sees also a 0 there (free), but the write of Core2 to allocate is done right after
Core1 has done the check for "if (semaphore == 1)" (a tiny bit later)?:
Core1 progresses. But Core will still do the check for "if (semaphore == 2)" and it will also see its own value is there. It has overwritten Core1 value which is a bit ahead.

There is a race condition which you cannot really solve in software. You need HW support for it.
Therefore the HSEM is there.
BTW: such HW Semaphores HSEM are always on a system with two masters, with two and more cores. The bus matrix allows "concurrent access" to memory by two parallel cores running. A C-code "instruction" (code line) is a sequence of several processor instructions. And those can be "scheduled" in any way, by intercepting each other (core1, core2 firing their instructions in a random way on the bus).
The processor does not have instructions to say: "let me do SOME instructions but do not interrupt me" (on the bus fabric, not related to an interrupt). Instructions on the bus fabric are arbitrated by the bus access logic, not by the software.

1 Like

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.