You are on the right track: "Cache Coherency".
CM7 has caches, CM4 does not. CM4 is like any other core (e.g. a DMA engine) and writes/reads directly. But CM7 sees through caches to the memory content (and is not informed when real memory was changed by other core).
So, yes, you should use Cache Maintenance operations on CM7 side (invalidate cache before reading, but also clean cache after writing - so that CM4 can see the updated content).
Best practice is to align the data buffers (for cache line size, 32 bytes). But it works also not to align: just set the address for cache invalidate to the start of cache line and long enough to cover the entire buffer (I think the function should do to align to cache line size and cover entire length).
If SDRAM is cached is not - depends how the FW (Arduino code) would do. It can be as default cached (depends on the external memory address region if handled as "strongly ordered", "device" or "cached"). I think I saw: default is cached (maybe "write-through" for SDRAM region).
Most importantly would be to know: is the MPU configured? (potentially it is!) Is the external SDRAM address region configured as "cache-able"?
Also: if it is cache-able but with "write-through"?: if so, the CM4 could see what CM7 has written, even w/o to clean the cache. But CM7 cannot see what CM4 has written.
So, the MPU config in effect is the most important thing to know (and how the regions are defined).
If you do not know: assume to do always a Cache Maintenance (on CM7 side, invalidate and clean cache).
BTW: the same for other memories (internal SRAM). They can also be controlled by MPU in terms of cache behavior.
It is hard to say w/o info how the Arduino/mbed would initialize the MPU. I think, the memory region used for the RPC itself is un-cached (for sure, but all other memories...?).
You can also define another memory as un-cached. How to do MPU config in Arduino/mbed - no idea.
Just do all the time cache maintenance (on CM7) when not sure. It will contribute a bit of overhead ("unpredictable" latency) during runtime.
To think about "cache coherency" is always mandatory and very reasonable to bear in mind with "multi-core" programming (CM7 and CM4, CM7 with DMAs ...).