Portenta H7 - Cache Coherency

Sorry, I'd like to "abuse" the forum to spread my thoughts (and knowledge, experience).

You might hear about a phrase like "Cache Coherency". What is it? Why do we need to bear in mind, and when?

Our Portenta H7 has TWO cores, the CM7 and the CM4. And it has DMA engines. Let;s call all these cores as well as the DMA engines as MASTERs.

So, different MASTERs can access memories as they want: they can intercept each other (DMA interrupts a core doing memory access) or a MASTER (DMA) does something to update the memory and MPU core will try to get it from the same memory.

The main question is: "can any memory updated be seen really by another MASTER?"

Bear in mind: we have caches (here mainly data caches): it could happen that a MCU core as a MASTER reads from memory. It has cached the memory. Instead of reading the real memory, it takes the "snapshot" taken from memory (sitting in cache) and read from this one.
If another MASTER would modify in between, e.g. a DMA has placed new data in real memory - the MCU will not realize. It is not informed about any update on main memory (to force an update on caches - snapshots).
It keeps going with the content in cache and does not see the change.

The similar issue: MCU updates the memory. It writes to memory address. But this goes via the cache. MCU reads back - all ok (it comes back from cache). But it is not in final real memory: the cache is just evicted when necessary to do so (e.g. all cache lines full), or there is a "Write Back" attribute configured: the memory is written any time later, when convenient to do, just to update the main memory, but MCU does not know when "write back" was done.

So, there is a "coherency issue": in a multi-master system it can happen, that all the MASTERs do not see the real updated physical memory. So data might still sit in caches (before written to memory) or a MASTER (esp. a MCU core) reads still from cache, w/o to realize the caches must be updated just because external memory was updated.

This is "cache coherency": not all MASTERs see the same content on physical, real memory.
And it can result in issues:
a CPU does not realize when a DMA has changed memory,
a DMA does not get what the CPU had changed in memory,
the other core (e.g. CM4) does not see what CM7 has written (still in CM7 cache only).

There are two solutions to deal with this issue:

  1. Use the MPU and declare regions as "sharable" (or "un-cached"): If you do an MPU config and it is "sharable", effectively you disable the cache: any write goes directly to memory. So, the other MASTER can see the updated memory, nothing sitting still in caches of a core/MASTER.

  2. Use "cache maintenance" (policy) functions: there are CMSIS functions to control, maintain the cache, e.g.: SCB_CleanDCache() or SCB_InvalidateDCache().
    They will help to "synchronize" with other MASTERs (but via SW, a need to use and code).

If MCU updates memory but it goes via caches - not sure when written to real memory: the SCB_CleanDCache() makes sure it is written back to real memory.
If MCU is not sure if DMA has updated memory, so that data cache has a "wrong" snapshot from memory: SCB_InvalidateDCache() forces to get rid of current DCache entries - read all again (so that the DMA changes become visible for this MCU).

So, sharing data between different cores, even between CM4 and CM7 or in addition with a DMA engine, can be tricky: your SW and program logic can fail just because they would not realize that a memory or cache update would be needed but was not done (yet).

BTW: volatile
You might think, why volatile does not handle this issue? It cannot: it is a compiler "hint" to inform the compiler as: "do not trust it is unchanged - read it again". A compiler could optimize code and assume as: "it is already written or I have read it already - why to do again?" It can optimize and assume as "it is already there, in a register - no need to do again". "volatile" would just chance to "read again but compiler does not care if it is read from cache again.

The compiler has no clue about caches: if they are there, how do they work, when is the real memory updated - it has no idea. A compiler optimizes code for single-master use case only. A compiler is not able to deal with a Multi-MASTER system or caches!

So, volatile does not solve your problem. You had to know when to evict or update caches, just to keep the other MASTERs in sync with you.

If you write a FW and you think: "why the hell I never see this updated variable?" Or:
"why the hell the device sends garbage (e.g. DMA to send out on peripherals)?" Or: "I know it came it from a peripheral - but my code (MCU) does not see what was received" ... it is a cache coherency issue.

So, the most obvious case where it can fail is to share a "global variable" via CM7 and CM4:
the CM7 has and uses caches. The CM4 does NOT. And CM4 cannot see what is in CM7 cache. So, CM4 uses the real physical memory.
But CM7 writes to memory - via its cache. So, an updated value is all OK for CM7 but it sits in CM7 cache. The CM4 cannot see it yet.
The same for the opposite way: the CM4 writes, even it is written in real memory (CM4 does not have DCache), but the CM7 reads from memory via DCache. The cache controller does not know that external memory was changed. It sees just that CM7 polls all the time the same memory address. So, cache provide the value from DCache to MCU. So, the CM7 will not realize that CM4 has updated the real memory.

In such "simple" systems, without a "Cache Coherency Interface" (CCI) available in HW - your FW has to make sure to configure the MPU and/'or to use cache maintenance (policy) functions when dealing with more as one master in the system (and one MCU plus a DMA is already a Multi-Master system! Consider a DMA as a MASTER!).

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.