Unsure why this code does not work on Portenta H7 when swapping cores

I have two sketches:

Sketch 1:

#include "RPC.h"

int led;

void blinkLed() {

  digitalWrite(led, LOW);
  delay(1000);
  digitalWrite(led, HIGH);
  
}

void setup() {

  led = LEDR;

  pinMode(led,OUTPUT);
  digitalWrite(led,HIGH);

  RPC.begin();
  RPC.bind("blinkLed",blinkLed);

  #ifdef CORE_CM7  
      bootM4();
      Serial.begin(115200);
      Serial.println("Setup");
   #endif

}

void loop() {

}

Sketch 2:

#include "RPC.h"

void setup() {

  #ifdef CORE_CM7  
      bootM4();
      Serial.begin(115200);
      Serial.println("Setup");
  #endif

  RPC.begin();

}

void loop() {

  while (true) {

    Serial.println("Called blink method");
    RPC.call("blinkLed");
    delay(3000);

  }

}

When I run Sketch 1 on M7 and Sketch 2 on M4...everything works perfectly...but it does not work when I upload Sketch 1 on M4 and Sketch 2 on M7. Is there a reason why?

My five cents (all is more speculative, just based on mixing my knowledge).
First I thought: "if you swap the sketches - these CORE_CM7 has to be flipped as well". But after a deeper look: it looks reasonable.

The remaining questions is just: who does create the RPC (and the HW semaphore used by it), and who owns it?

As I understand:
The RPC mechanism uses the real HW semaphores in STM32H7xx system (MCU), the HSEM.
Such a HSEM has an "ownership": it knows who has created, allocated or released such a semaphore. The CPU ID is used deep inside, on the bus, to distinguish which CPU has created, released or allocated.

I think, I saw, that CM7 startup code initializes such a HSEM. It is used also during startup to figure out, if and when the other core, the CM4, has booted and is running.
I could imagine: the underlaying HSEM is always and just initialized by CM7 core (which boots always and first). So, any system boot with CM7 running first, might create already the semaphore.

I do not know what "RPC.begin()" will do: maybe just releasing the HSEM but assuming it was already created (by CM7).

If you flip now the sketches - possible, that the logic has a "dead lock": now the CM4 wants to create, release ... the semaphore. But is was and is still generated and initialized by CM7 core. The CM4 might not be able to take or "re-initialize" the HSEM. And all is stuck.

Semaphores, for my understanding (interpretation) have an "ownership" and a clear "master - slave" association. The "master", which has created the semaphore, has to release it before a "slave" can use or also allocate it.
If you flip now your sketches - potentially this HSEM creating and "ownership" has not changed. It can end up in: "the CM4 now as the master cannot create, release ... the HSEM because it is still owned by the CM7". The logic how the HSEM in the lower-level code is created, used - has not changed.

I would assume this: the dual-core STM is used and configured by the startup code (BSP) in a way, that CM4 is always a "slave", the CM7 the "master". When flipping the sketch code - it does not change this 'assumption' and logic on HSEM (and RPC calls). You cannot turn the CM4 into a "master" because it is not intended to do so, neither supported by HW and BSP (startup code).

It could be also a HW limitation, e.g. that only CM7 can initialize and setup a HW semaphore (not sure, but for sure the HSEM sees and uses the Core ID). Also for sure: the dual-core STM HW is not designed for the case that CM4 boots first and release the CM7. It has a fix boot sequence: both cores could boot in parallel (there is user programmable fuse bit) or: the CM7 only (and always) but it can release the CM4.

As what I have seen: the CM4 fuse bit to release CM4 on boot is not enabled (the default is overwritten to keep the CM4 in reset on Portenta H7).
So, flipping the logic can also result in: CM4 wants to release CM7 - but this is not supported, not the intended way (just CM7 as master can release CM4).
So, flipping the sketch code might interfere with the HW logic, the BSP (startup) code etc, esp. when it comes to the "ownership" of 'shared system resources', such as HSEM.

Sorry, speculative. Why do you want to make it working? To start CM4 as master is for sure not the intended, not a supported way. Keep going to consider CM7 is the "master of the system", the CM4 is 'just' a "slave of the CM7" (but not the way around).
When swapping sketch code - this "constraint" (assumption) should not change (and remains in place from a HW perspective).

@tjaekel since my CM7 is performing some resource intensive tasks (object detection), I want to reduce the workload by using the CM4 to take care of the hardware part.

This is why I want to call RPC functions from M7 and run them on M4

Sure, nothing to argue using both cores. It makes sense to share the workload between both cores.
Just few comments when doing it:

The cores should not block each other when running their code. If they would fetch the code from same flash memory - they would block each other. But the STM32H747 has the 2 MB flash split into two parts, each 1 MB. So, one core can access the lower 1 MB, the other the upper 1 MB (via a different patch on the bus fabric).
I think, the Arduino linker script does it: it has a CM4_BINARY_START, I think set to 0x08100000 for the CM4. So, they use the different parts of flash. OK.

When it comes to sharing data (besides the RPC to synchronize), e.g. CM4 should access the data prepared by CM7 (and vice versa) = "cache coherency".
The CM7 uses caches, the CM4 not (direct memory access). So, make sure to do cache maintenance (clear and invalidate) on CM7 side, so that both cores will see the real memory content (and not sitting just in CM7 cache).
You might need to use the MPU in order to configure write-through memory or even not cache-able memory regions used as shared memory. Maybe, also a need to set the shared attribute on MPU (two masters, CM7 and CM4 now accessing the same memory region).

You can benefit from the dual core system. But just by careful assignment of memory regions: if both cores would try to access the same memories (and flash) in parallel - it could slow down both MCUs.

Also a question which real-time sensitive parts you place on CM7 vs. CM4. The CM4 runs just with 1/2 the speed (240 MHz) and does not have caches, it does not have TCMs. It looks to me, when you need really fast code execution, e.g. INT handlers or for high speed interfaces (ETH, USB, SPI) - I would use the CM7. The CM4 maybe for the RTOS, the housekeeping, an UART interface to host or other slow speed stuff (e.g. BT, WiFi).

The CM7 has many options to tweak the performance (TCM, caches), the CM4 not. I consider the CM4 more like a slave co-processor, not as the main processor.

Also important to bear in mind: on Portenta H7 the user fuse bit for booting CM4 is cleared. So, just the CM7 boots and has to release the CM4.
You could program this fuse bit back to CM4 enabled (e.g. with STM32CubeProgrammer and external debugger) so that CM4 boots itself on reset.

When it comes to use peripheral, hardware in system: check carefully the bus fabric, the connectivity matrix: not all devices are access-able by CM4, e.g. SDMMC1 or LTDC.

And: the CM4 has its own vector table. If CM4 should handle the INT for a device - the CM4 vector has to be set. Here, I am not sure: how does the CM7 would not grab the same HW INT trigger? I guess just by not enabling this INT on CM7. If you enable "by mistake" the same INT vector in both cores - potentially both would be triggered and it ends up in a "race condition" for two INT handlers running at the same time.

Never mind: a good idea to use both cores, a nice project with some specific challenges. But possible for sure. Just a bit specific SW design and partition.
I wish you great success.

@tjaekel Thanks a lot for those wonderful tips!

Only the M7 core is able to print Serial data to the USB-C port with a virtual serial device.
On the M4, the Serial object refers to UART1 (Pin 13 and 14). That might explain why you do not see the logs in the terminal.

@rreignier the LED does not blink as well

Ohhh, that is true! CM4 vs. CM7 = they do not "see" the same UART. They end up in different UART pins.
Good point!

"The LED does not blink as well": what does it mean:
A thread, a core (e.g. CM4) can be blocked from running, e.g. CM7 has not released (RPC).

For my feeling: it does not make sense to think about the CM4 and CM7 are cores for a "Symmetrical Multi-core Platform" (SMP).
They are NOT equal: each core has limitations, e.g. what to access in system (CM4 cannot all access what CM7 could do). CM4 is more a slave and might need actions from master done, e.g. to initialize peripherals.

If CM4 runs first before CM7, but CM7 should do the GPIO initialization (for LEDs), and CM4 relies on assumption CM7 has done - nothing blinks (CM7 has not done yet).
Or if you "swap" the code: should CM4 now release CM7 (which is not the intended way)?

Swapping cores does not mean that the SW logic works still properly (it is not "symmetrical"):
Who is running first?
Who will initialize the resources needed (e.g. GPIOs)?
Is is guaranteed that the core in charge to do system initialization was running?
(e.g. also clock config, setup of HW semaphore to release other core?)
Is there a need that core (e.g. CM4) has to wait until other core (CM7) has finished?
Can the "slave core" (CM4) really access and do all the same as "master core" (CM7) can do?

What does your debugger (or some debug log messages) tells you?
Is the "system logic" correct (right sequence, e.g. CM7 initializes GPIO and just !! later CM4 uses those)?

Bear also in mind: the HW semaphore used to release a core (actually just the CM4 released by CM7) or to have this RPC - depends on which core will initialize, release this HW semaphore. It has an ownership, the semaphore knows who is the "master". Swapping the code for the cores might also create an issue on RPC, semaphore to release the core.

Or: CM4 expects that CM7 has configured system, e.g. clocks, FLASH_LATENCY, PPLs etc.
If CM4 is released to early, it might not have the system configured properly.
The CM7 remains and is ALWAYS the master, not vice versa. CM4 can only be released if CM7 has done the job. Swapping code might never work because CM4 cannot do really all in the same way (question: can CM4 configure clocks, PLLs ... in the same way? I do not believe or would trust).

Personally, I think: wasted time to write code where you can swap CM4 code vs. CM7 code. Think about a system where one core (CM7) is master, the other (CM4) a slave, or "co-processor". If you get such a system to work - it is challenging enough.

And swapping cores does not help at the end: CM4 cannot do exactly the same as CM7 (e.g. no access to SDMMC1). And they have also their separate INT vector tables. When you swap the cores - you swap also where an INT should go (and if it is initialized for this core).
It sounds to me as the "most complicated project" to get to working (for no reason or purpose).

This MCU is not a symmetrical DualCore system. Is is a main processor (CM7) with an auxiliary processor (CM4). Not a multi-core system where you can assign cores to any code thread ("symmetrical").
Sorry (what the effort? for what?)

Woahh thanks a lot for the information and tips. I found it really helpful

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.