How to Transfer Strings Between Two Cores on pico?

Hello everyone,

I’m working on a project with a Raspberry Pi Pico and using both cores (core0 and core1). I’ve successfully run tasks on each core, but I’m facing an issue when it comes to transferring strings between the two cores.

I'm currently using the rp2040.fifo for communication between the cores. However, the problem is that rp2040.fifo seems to handle numeric data types like uint32_t efficiently, but I’m having trouble figuring out the best way to pass a full string or char array between the cores.

Here’s a simplified version of what I’m doing:


// On core0
if (Serial1.available()) {
  String incomingMessage = Serial1.readString(); // Read the string
  rp2040.fifo.push(sha1(incomingMessage));      // Attempt to send it (hashed for testing)
}

// On core1
if (rp2040.fifo.available()) {
  uint32_t receivedData = rp2040.fifo.pop();
  Serial.println("Received data: " + String(receivedData));  // Expected string
}

The API does not support pushing anything else than a 4 byte value.

the doc states

Communicating Between Cores

The RP2040 provides a hardware FIFO for communicating between cores, but it is used exclusively for the idle/resume calls described above. Instead, please use the following functions to access a software-managed, multicore safe FIFO. There are two FIFOs, one written to by core 0 and read by core 1, and the other written to by core 1 and read by core 0.

You can (and probably should) use shared memory (such as volatile globals) or other normal multiprocessor communication algorithms to transfer data or work between cores, but for simple tasks these FIFO routines can suffice.

void rp2040.fifo.push(uint32_t)

Pushes a value to the other core. Will block if the FIFO is full.

bool rp2040.fifo.push_nb(uint32_t)

Pushes a value to the other core. If the FIFO is full, returns false immediately and doesn't block. If the push is successful, returns true.

uint32_t rp2040.fifo.pop()

Reads a value from this core's FIFO. Blocks until one is available.

bool rp2040.fifo.pop_nb(uint32_t *dest)

Reads a value from this core's FIFO and places it in dest. Will return true if successful, or false if the pop would block.

int rp2040.fifo.available()

Returns the number of values available to read in this core's FIFO.

You could send a reference to the String rather than trying to force the whole string through the FIFO. A better solution would to use a queue and send the reference via the queue. You get points for not trying to use Serial on both cores.

Here is a horrible example to show the concept. IT SHOULD NOT BE USED. Casting a pointer to a uint32 only works in this case because they are the same size - this may not be the case on some systems. If you use a Q the correct data type size can be specified.

#include <FreeRTOS.h>


String test = ("test string");
String test2 = ("test string 2");

void setup() {
 // On core0

   rp2040.fifo.push((uint32_t)&test);      // Attempt to send string reference
   rp2040.fifo.push((uint32_t)&test2);      // Attempt to send string reference 2 

}

void loop() {
 

}
void setup1() {
 // On core1
  Serial.begin(115200);
  while (!Serial) {
       ; // wait for serial port to connect. Needed for native USB
  }
  Serial.println("running on core 1"); 

}

void loop1() {
  String * testString;  // pointer to String

  while (!rp2040.fifo.available());   // wait for data
  testString = (String*)rp2040.fifo.pop();
  Serial.println(*testString);  // Print String
}
1 Like

A String is a C++ container class object which is more than a char array.

A string is a char array of ASCII text chars ending with a terminating zero/NULL.

A String will contain a string but can't be handled as a string cause it's speshul.

To follow up on my earlier post. If you pass references you need a way to communicate back to the sending process that you are done with the data, so that the storage can be freed or reused. Inter-process communication is tedious (some say hard :slightly_smiling_face:)

So don’t pass references - just use volatile globals and usual semaphore/mutex for syncing as advocated in the doc.

1 Like

+1

Depending on what you're doing the shared data may be a table or list you want to keep in place at least for the most part.

But coordinating 2+ independent cores will require a minimum of extra code and data much like coordinating independent tasks on the same core. How close to minimum is to me a design challenge.

1 Like

Volatile has no place in inter-thread communication. Either use normal variables with a semaphore/mutex, or use an atomic variable (which may not be supported on the Cortex-M0+ of the RP2040).

Volatile is used for side effects outside of the C++ code, like hardware registers or interrupts. Adding volatile unnecessarily results in poor compiler optimizations, prevents caching in registers, etc.

1 Like

that's a fair point.

I just stupidly copied over from the documentation and added (without thinking) "usual semaphore/mutex for syncing"

You are right it does not make any sense to have them volatile if you use a mutex.

I was under the impression that volatile variables got updated in place where non-volatile variables might be changed while temporarily held in and worked on in cpu registers as part of optimization.

If I have a variable that I want another core to access... maybe I want that variable to be volatile as such access would certainly be outside of the C++ code the first core is running?

Or do I once again have it all wrong? It wouldn't be the first time!

I did this by creating a pair of circular buffers, one for each direction. Core 0 will write to its buffer then increment the head. Core 1 will monitor the head from core 0's buffer and when it sees the head doesn't match the tail will remove data from the buffer and increment the tail until it matches the head again. So long as reads and writes to the head and tail are atomic there's no problem with one core writing to the head while the other reads it. Make sure the buffer is big enough not to overflow in your application, or implement some kind of protection to prevent overflow.

What do you mean? There's C++ code running on the other core as well.

If both cores could possibly access the same variable (e.g. because it is global, or because a pointer to it was published somewhere in a global data structure or passed to any function that the compiler is not able to look into), the compiler assumes that that variable could be modified by other threads, even without volatile. However, the compiler is also allowed to assume that there are no race conditions, so this enables further optimizations.

The main difference between interrupts and threads is that an interrupt could fire and change any variable at any point, whereas another thread could only change a shared variable if there's an atomic or mutex operation involved. That's because without such operations, there would be a data race (which would be Undefined Behavior, so the compiler is allowed to assume that it cannot occur).

For example:

std::mutex mtx;
int shared = 0;

void foo() {
    std::lock_guard lck{mtx};
    shared += 8;
    // Other threads cannot change `shared` here,
    // because that would be a data race.
    shared += 8;
    // So the compiler is allowed to optimize this to
    //    shared += 16;
}

But with volatile:

volatile int shared = 0;

void foo() {
    shared += 8;
    // An interrupt could have changed `shared` here
    shared += 8;
    // So the compiler cannot merge the two additions,
    // it needs to reload the value from memory.
}

This is not true: Atomicity alone is not enough, you need to use the correct memory order, otherwise you will still end up with data races. With these kinds of concurrent data structures, there's a lot of room for very subtle bugs, and incorrect implementations may seem to work fine 99% of the time.

See e.g. the excellent atomic<> Weapons” talks by Herb Sutter: atomic Weapons: The C++ Memory Model and Modern Hardware – Sutter’s Mill

Also note that the Cortex-M0+ (used in e.g. a Raspberry Pi Pico) does not support the atomic/exclusive instructions that are required to atomically update the head and tail of the buffer. In such a case, you'll need to use the spinlocks or hardware FIFOs provided by the chip to achieve synchronization. In practice, you should probably use the thread-safe queues provided by the platform rather than rolling your own, though.

2 Likes

Thank you for the video Pieter, very interesting.

I just got my Pico a couple of days ago and have been using this thread as an excuse to study the documentation. Here is my attempt to use semaphores.

/****************************************
 * Using semaphores to sync tasks running on different cores
 * 
 * Note: Use of a non thread safe libraries like Serial in multiple
 *   tasks is a bad idea.
 *   Since we are only reading in one task and writing in another  
 *   task we may get away with it :-)
 *   
 **************************************/

#include <FreeRTOS.h>
#include <semphr.h>

String test;    // data buffer

SemaphoreHandle_t dataRead;  // indicate data read from input - ready to use
SemaphoreHandle_t dataWrite;  // indicate data written to output - new data can be read
SemaphoreHandle_t unleashCore1;   // have core 1 wait on startup

void setup() {
 // On core0
  // create mutexes for task sync
  dataRead = xSemaphoreCreateBinary();   // data read 
  dataWrite = xSemaphoreCreateBinary();  // data written
  unleashCore1 = xSemaphoreCreateBinary();  // core 1 start
  
  test.reserve(128);  // reserve some space to minimize heap fragmentation
   
  Serial.begin(115200);
  while (!Serial) {
       ; // wait for serial port to connect. Needed for native USB
  }

  xSemaphoreGive(dataWrite);   // indicate new data can be read

  // let core 1 start looking for data
  xSemaphoreGive(unleashCore1);   

}

void loop() {
  // continue getting data
  xSemaphoreTake(dataWrite,portMAX_DELAY);  // wait for buffer to be free 
  while (Serial.available() == 0) {}     //wait for data available
  test = Serial.readStringUntil('\n'); // Read the string 
  xSemaphoreGive(dataRead);    // indicate data ready


}
void setup1() {
 // On core1
  xSemaphoreTake(unleashCore1,portMAX_DELAY); // wait for core 0 to finish set up
 
}

void loop1() {
  
 xSemaphoreTake(dataRead,portMAX_DELAY);  // wait for new data
 Serial.println(test);     // print string
 xSemaphoreGive(dataWrite); // indicate print done
 
}

I started This Thread last year trying to get atomicity and memory order into my head. I came away with a basic understanding, but this is a lot easier:

FreeRTOS provides several well-tested methods for sharing data between tasks ... queues, direct notifications, mutexes, etc. Using the OS may incur more overhead, but (IMO) it's more bullet proof (and easier to understand) than rolling your own with Atomics and Memory Order.

1 Like

If one began with a single memory location that had one value, and that value was which core could write to that location and enough others to convey an address and count of a buffer then no other core would read or write to that buffer until it was the core named by that single location... you would never get a race condition as long as the owner didn't change that one value until it was done, which by the rules once it changed that value it could not.

The buffer access would be slower because of the lock but it would be locked it would be. The same scheme could work for more than two cores.. only slower overall.

When Commodore made the Amiga they had the CPU and Copper/Blitter share RAM by running RAM at twice the speed of the chips. The chips access the RAM every other cycle and behaved themselves well, atomicity guaranteed.
The Amigas were fantastic for their time! It took Jack Trammiel (Trample) to screw up a good thing, the bastard left Atari and got into Commodore to do it. I knew one of the many programmers he mass-fired back in the day.

1 Like

There's an unspoken assumption here:
Optimisation by the compiler is always a Good Thing. Maybe it's not. Maybe, given the application and processing power available it's not needed, or at least not needed to be as good as it can be. Maybe code that executes in the order the programmer wrote it will do what the programmer intended just fine. Maybe not having to worry about how the compiler's efforts to make the slickest possible code is worth more than having the slickest possible code. Weather forecasting code needs to be very well optimised and run on the most powerful hardware available, microcontroller code maybe not so much.

1 Like

Without applying the proper memory order model, there is no guarantee that two tasks / threads (on the same or different cores) will see multiple memory locations change in any assumed order. So, yes a race condition is possible.

1 Like

If only ONE core changes or even reads that RAM at any given time, please explain how there can be any race condition or admit you either didn't read and comprehend wtf I posted.

ONE core accesses that RAM and the LAST THING it changes is to give another core control. Until another core passes control to it, it does not read or write except to read that one single value. There can be no conflict but IT IS SLOWER THAT WAY. The sacrifice is speed. There's only one at a time access and that access must be relinquished for another. One at a time, never more, figure it out.