Pseudo-Multithreading with Arduino

I am looking for code examples of a Finite State Machine (roughly a single-core multithread).

I'm building a motor controller with an Arduino that needs to output serial data. The loop execution timing needs to be fixed. The loop has a routine that measures each loop's execution time and adds a few microseconds delay, so that each loop finishes in the same amount of time.

A serial transmit is required about once every 20 loops. The loop that contains the serial transmit takes much longer than the normal loop time (about twice as long). I wrote a software serial buffer to distribute the workload of sending a serial transmission across multiple loops, so no single loop would be encumbered by the serial TX. The problem still remained: one of the loops was still taking a very long time.

Unfortunately, I found that String concatenation is very slow on the Arduino, so the code in the loop that assembles the serial string is a huge asymmetry.

The solution is to start a new pseudo-thread that concatenates the output string in a low-priority "background task" while the foreground loop is executing. That way, the foreground loop can have fixed loop timing, and the background task can assemble and transmit the serial string when the foreground task is idle. I can add a few more delays into the foreground loop, which would give the background loop more time to execute.

The background task may need to halt and resume code mid-instruction. For example, each string concatenation instruction takes approximately 200 microseconds (I wrote a little benchmark to time individual instructions). Since some individual instructions take so long, they may need to be interrupted mid-instruction by the foreground task and resumed later, :

outputstring = String(somevariable); // this instruction takes 200us // may be interrupted mid-instruction by foreground task // and resumed later

As the code may have a lot of time-intensive content (i.e. floating-point calculations, etc.), forking a low-priority "background task" for lower-priority housekeeping or data reporting tasks would be very useful.

Unfortunately, I found that string concatenation is very slow on the Arduino,

No, it isn't. It's lightening fast. On the other hand, String concatenation IS slow.

Any ideas on how to achieve this with Arduino?

Quit using Strings.

At a minimum, your terminology is completely backwards. If you have some task that needs to be executed at specific times, then use a timer to call that task based on a timer interrupt. That task is then a "background" task. The non-time-critical stuff then gets executed in the foreground task, and will be completely oblivious to the fact that the background task is even running.

And, "outputstring = String(somevariable); // this instruction takes 200us" is not an "instruction", it is a "statement", which may well compile to tens, hundreds, or even thousands of actual machine instructions. If it actually takes 200 uSec to execute, then it is executing roughly 3,200 machine instructions @ 16 MHz.

Regards, Ray L.

smoore: The loop that contains the serial transmit takes much longer than the normal loop time (about twice as long). I wrote a software serial buffer to distribute the workload of sending a serial transmission across multiple loops, so no single loop would be encumbered by the serial TX. The problem still remained: one of the loops was still taking a very long time.

As far as I understand serial output is not slow at all (if there is less pushed in than the buffer can take). If it has to wait for the buffer to empty, it gets bound to the baudrate.

It seems to me that the process that generates the output is too slow, as you removed the accused serial slowdown and the program was still too slow.

Arduino uses the serial synchronous mode, but you can use it even in asynchronous mode. String concatenation can also be pretty fast, you first need to understand if you use C ++ strings or character arrays are used by us as such strings. Use multitasking on a single core is never a good choice, especially when it's slow and you try performance. However if you search you can find some examples here of the bases.

Thank you for the replies:

  1. I should use something other than Strings to create an array of output characters.
  2. The Serial.print is very fast. It's the String concatination that is takes a long time.
  3. I edited my OP to include there are other time-intensive things, such as floating-point calculations, that could also be moved into the lower-priority space. So my query is not entirely predicated on Strings. Using Strings as an example ended up being a distraction, which diluted my original question of tasking.
  4. Reversing my thinking is likely the best solution. Use interrupts to run high-priority, whereas the main loop runs the low-priority.

I think you're a little confused ideas, you can not say that the serial is fast and string concatenation is slow! It shows how to concatenate strings. I concatenate strings while sending data asynchronously, while the chip sends you concatenate instead Arduino remains them to cycle until the data has not been sent. If you think this is just multitasking and not "pseudo"

You may find some useful stuff in the demo Several Things at a Time

…R

Actually either approach on single core cpus is pseudo-multithreading - but more or less intelligable and powerful.

There are really good and easy and powerful multithreading libs available for AVRs and ARM:

AVRs (e.g., Arduino Uno and Mega): http://forum.arduino.cc/index.php?topic=347188.0 http://www.rtos48.com/

Arduino Due: https://www.arduino.cc/en/Reference/Scheduler

http://forum.arduino.cc/index.php?topic=318084.0 http://francois.pessaux.perso.sfr.fr/arduino.html#Babix

smoore:
Unfortunately, I found that String concatenation is very slow on the Arduino, so the code in the loop that assembles the serial string is a huge asymmetry.

I think you have to a case differentiation for sending bytes/chars to Serial:

  • string output is up to 63 bytes at a time
  • string output is 64 or more bytes at a time

The serial outgoing buffer can hold (at any time) up to 63 characters, that will be sent in the background with no significant delay.

But if you want to send, let’s say, 100 bytes at once to Serial, your application will be blocked for some time. Blocking time depends on baudrate. For example when using 9600 baud, you can send about 1 character per millisecond. If you try to send 100 bytes at once, this will happen:

  • 63 byte go into the output buffer, 37 bytes cannot be sent
  • one millisecond later: 1 byte is sent from buffer, another byte can go into the outgoing buffer
    and this happens 37 milliseconds in a row, until all 100 bytes are filled into the outgoing buffer.

So sending 63 bytes (or less) at once will need nearly no time (stuffing them into outgoing buffer).
But sending 100 bytes at once to Serial will last 37 milliseconds and blocks every program execution for 37 milliseconds.

If you block your programm by trying to send more bytes to Serial than fit into the Serial outgoing buffer on the one side, while on the other side there is idle time for Serial available, you better do a double buffering for Serial in your own task: Create an extra outgoing buffer as big as needed for the biggest output, then from this buffer fill into the Serial outgoing buffer by time.

jurs: But if you want to send, let's say, 100 bytes at once to Serial, your application will be blocked for some time.

Unless you edit the Serial library to increase the buffer size. On my Due projects, I use a 1K buffer...

Regards, Ray L.

On arduino the buffer work only at read and write is synchronous. Rtos It is poor in resources.

vbextreme:
On arduino the buffer work only at read and
write is synchronous.

I think hardwareserial tells a different story

// Define constants and variables for buffering incoming serial data.  We're
// using a ring buffer (I think), in which head is the index of the location
// to which to write the next incoming character and tail is the index of the
// location from which to read.
// NOTE: a "power of 2" buffer size is reccomended to dramatically
//       optimize all the modulo operations for ring buffers.
#if !defined(SERIAL_TX_BUFFER_SIZE)
#if (RAMEND < 1000)
#define SERIAL_TX_BUFFER_SIZE 16
#else
#define SERIAL_TX_BUFFER_SIZE 64
#endif
#endif
#if !defined(SERIAL_RX_BUFFER_SIZE)
#if (RAMEND < 1000)
#define SERIAL_RX_BUFFER_SIZE 16
#else
#define SERIAL_RX_BUFFER_SIZE 64
#endif
#endif

oops, I remember bad, sorry.

size_t HardwareSerial::write(uint8_t c)
{
  _written = true;
  // If the buffer and the data register is empty, just write the byte
  // to the data register and be done. This shortcut helps
  // significantly improve the effective datarate at high (>
  // 500kbit/s) bitrates, where interrupt overhead becomes a slowdown.
  if (_tx_buffer_head == _tx_buffer_tail && bit_is_set(*_ucsra, UDRE0)) {
    *_udr = c;
    sbi(*_ucsra, TXC0);
    return 1;
  }
  tx_buffer_index_t i = (_tx_buffer_head + 1) % SERIAL_TX_BUFFER_SIZE;

  // If the output buffer is full, there's nothing for it other than to 
  // wait for the interrupt handler to empty it a bit
  while (i == _tx_buffer_tail) {
    if (bit_is_clear(SREG, SREG_I)) {
      // Interrupts are disabled, so we'll have to poll the data
      // register empty flag ourselves. If it is set, pretend an
      // interrupt has happened and call the handler to free up
      // space for us.
      if(bit_is_set(*_ucsra, UDRE0))
    _tx_udr_empty_irq();
    } else {
      // nop, the interrupt handler will free up space for us
    }
  }

  _tx_buffer[_tx_buffer_head] = c;
  _tx_buffer_head = i;

  sbi(*_ucsrb, UDRIE0);
  
  return 1;
}

Nothing like success to clear the waters ...

I rewrote my routine to use char[n] instead of Strings. This required the use of strncat(), dtostrf(), and ltoa() to convert floating point or integer variables into characters and assemble an output string. This resulted in a considerable speed pickup over Strings.

Next, I used Serial.availableForWrite() to determine if the serial transmit buffer had space. If it had space, I'd assemble the array and Serial.print(char_array). The actual Serial.print(char_array) executes very fast if there is room in the TX buffer.

I still had a loop time asymmetry. The loop that triggered on Serial.availableForWrite() and assembled the output string was a third longer than other loops.

So I setup Timer1 as an interrupt. I benchmarked my critical stuff at about 2200us, so I made Timer1 interrupt every 3000us. I put the time-critical stuff in the interrupt routine (ISR). In the main() loop, I put the low-priory housekeeping, output data string fabrication, and the Serial.print (still using availableForWrite to trigger fetching and sending the data).

The most crucial item is a I2C communication. This requires Wire.h, which uses interrupts. So I had to nest Wire() inside of my ISR, having to use sei() before the Wire call. Yuck! Nesting interrupts! But being my most critical requirement, there it is. I'm guaranteed to be flamed for this. To make myself feel better, I put a cli() after the Wire call.

In short, my motor PID controller is running at fixed-frequency 3000us timing (333Hz), which is pretty respectable. Oscilloscope trace looks great. It grabs and spits out a new dataset over UART as soon as the TX buffer has space. It works at 9600 baud as well as 115200. I don't like the nested interrupts, but with a little testing and some error traps, it may prove to be a robust solution.

you can show how you create the string ?

Sure. Here is the completed code framework, including the interrupt and ISR. If you need to assemble using integers, use ltoa() instead of dtostrf().

Disclaimer: someone will invariably flame me for constructing an output string and immediately going to Serial.print(). They will suggest to use a bunch of Serial.print() instead of making an output string altogether. I have obviously removed a lot of code in order to post a stripped-down framework. It is up to you, depending on your application and requirements, if you want to assemble an output string to Serial.print at a later time. In my case, the best way to reach my functionality requirements was to build an output string that I could then Serial.print() at my leisure.

#include <math.h>
#include <Wire.h>

float Variable_1;
float Variable_2;
float Variable_3;
int dataready;

void setup() {
  Serial.begin(115200);
  delay(100);
  Serial.println(" Starting Init");
  Wire.begin();
  delay(100);
  // put your init routine here
  // this is where I setup my I2C device and init variables
  Serial.println("Finished Init");
}

void loop() { 
  char tmp1[10];
  char tmp2[10];
  char tmp3[10];
  char outputstring[60];
  int serialbuffer;
  
  float Variable_1_Buffered;
  float Variable_2_Buffered;
  float Variable_3_Buffered;

  Serial.println("Enable Interrupt");
  cli(); 
  TCCR1A = 0;
  TCCR1B = 0;
  OCR1A = 45;               // set interrupt at 3000us (using 1024 prescalar)
  TCCR1B |= (1 << WGM12);   // turn on CTC mode:
  TCCR1B |= (1 << CS10);    // prescalar
  TCCR1B |= (1 << CS12);    // prescalar 1024
  TIMSK1 |= (1 << OCIE1A);  // enable timer compare interrupt:
  sei();
  Serial.println("Interrupt Enabled");
  // forever loop
  while(1 > 0){
    serialbuffer = Serial.availableForWrite();  // check serial TX buffer
    if(serialbuffer > 60){						// if serial TX buffer is (nearly) empty
      if(dataready > 0) {						// if data is ready
        Variable_1_Buffered = Variable_1;		// Grab the global variables into local variables
        Variable_2_Buffered = Variable_2;
        Variable_3_Buffered = Variable_3;
        dtostrf(Variable_1_Buffered,6,4,tmp1);	// Convert variables into char[] strings
        dtostrf(Variable_2_Buffered,6,4,tmp2);
        dtostrf(Variable_3_Buffered,6,4,tmp3);

        outputstring[0] = (char)0;				// reset the output char[] string
        strncat(outputstring,tmp1,5);			// Assemble output char[] strings
        strncat(outputstring,",",1);
        strncat(outputstring,tmp2,5);
        strncat(outputstring,",",1);
        strncat(outputstring,tmp3,5);
        Serial.println(outputstring);			// Print the output char[] string
        dataready = 0;							// clear the flag and wait for new data
      }
    }
  }
}

ISR(TIMER1_COMPA_vect) {
  // Perform mission-critical calculations
  // and save parameters as global variables
  Variable_1 = ... ;
  Variable_2 = ... ;
  Variable_3 = ... ;
  // set global flag that data is ready to be printed
  dataready = 1;
}

I think you need something like this

#include <math.h>
#include <Wire.h>

volatile double Variable_1;
volatile double Variable_2;
volatile double Variable_3;
volatile int dataready = 0;

char doublebuffer[60];
byte ir;
byte onsend;

ISR(TIMER1_COMPA_vect) 
{
  if ( dataready ) return;
  // Perform mission-critical calculations
  // and save parameters as global variables
  Variable_1 = 0;
  Variable_2 = 0;
  Variable_3 = 0;
  // set global flag that data is ready to be printed
  dataready = 1;
}


void setup() {
  Serial.begin(115200);
  delay(100);
  Serial.println(" Starting Init");
  Wire.begin();
  delay(100);
  // put your init routine here
  // this is where I setup my I2C device and init variables

  Serial.println("Enable Interrupt");
  cli(); 
  TCCR1A = 0;
  TCCR1B = 0;
  OCR1A = 45;               // set interrupt at 3000us (using 1024 prescalar)
  TCCR1B |= (1 << WGM12);   // turn on CTC mode:
  TCCR1B |= (1 << CS10);    // prescalar
  TCCR1B |= (1 << CS12);    // prescalar 1024
  TIMSK1 |= (1 << OCIE1A);  // enable timer compare interrupt:
  sei();
  Serial.println("Interrupt Enabled");

  ir = 0;
  onsend = 0;
  
  Serial.println("Finished Init");
}

void loop() 
{ 
    if ( !onsend && dataready )
    {
        sprintf(doublebuffer,"%f,%f,%f\r\n", Variable_1, Variable_2, Variable_3);
        ir = 0;
        onsend = !onsend;
    }
  
    int mxs;
    if ( onsend && (mxs = Serial.availableForWrite()))
    {
        while ( mxs-- && doublebuffer[ir] )
            Serial.write(doublebuffer[ir++]);
            
        if ( !doublebuffer[ir] )
        {
            onsend = !onsend;
            dataready = 0;
        }
    }
}

despite the sprintf( ) function is a slow, compared to yours is definitely the most powerful.
the main reason is that the strcat should perform an internal strlen each concatenation.
Alternatively, you could redefine the strcpy( ) function with a more useful :

char* cstrcpy(char* d, const char* s)
{
    while ( (*d++ = *s++) );
    return d-1;
}

usage:

    char *a = "hello";
    char *b = " ";
    char *c = "world";
    char *e = "!";
    
    char dest[80];
    char* d;
    
    d = cstrcpy(dest,a);
    d = cstrcpy(d,b);
    d = cstrcpy(d,c);
    d = cstrcpy(d,e);
    
    Serial.println(dest);

sorry I had forgotten the avr limitations of the sprintf ( ).

#include <math.h>
#include <Wire.h>

volatile double Variable_1;
volatile double Variable_2;
volatile double Variable_3;
volatile int dataready;

char doublebuffer[60];
byte ir;
byte onsend;

ISR(TIMER1_COMPA_vect) 
{
  if ( dataready ) return;
  // Perform mission-critical calculations
  // and save parameters as global variables
  Variable_1 = 0;
  Variable_2 = 0;
  Variable_3 = 0;
  // set global flag that data is ready to be printed
  dataready = 1;
}


void setup() {
  Serial.begin(115200);
  delay(100);
  Serial.println(" Starting Init");
  Wire.begin();
  delay(100);
  // put your init routine here
  // this is where I setup my I2C device and init variables

  Serial.println("Enable Interrupt");
  cli(); 
  TCCR1A = 0;
  TCCR1B = 0;
  OCR1A = 45;               // set interrupt at 3000us (using 1024 prescalar)
  TCCR1B |= (1 << WGM12);   // turn on CTC mode:
  TCCR1B |= (1 << CS10);    // prescalar
  TCCR1B |= (1 << CS12);    // prescalar 1024
  TIMSK1 |= (1 << OCIE1A);  // enable timer compare interrupt:
  sei();
  Serial.println("Interrupt Enabled");

  ir = 0;
  onsend = 0;
  
  Serial.println("Finished Init");
}

char* cstrcpy(char* d, const char* s)
{
    while ( (*d++ = *s++) );
    return d-1;
}

void loop() 
{ 
    if ( !onsend && dataready )
    {
        char fbuf[20];
        char* b = doublebuffer;
        b = cstrcpy(b, dtostrf(Variable_1, 6, 4, fbuf));
        *b++ = ',';
        b = cstrcpy(b, dtostrf(Variable_2, 6, 4, fbuf));
        *b++ = ',';
        b = cstrcpy(b, dtostrf(Variable_3, 6, 4, fbuf));
        
        ir = 0;
        onsend = !onsend;
    }
  
    int mxs;
    if ( onsend && (mxs = Serial.availableForWrite()))
    {
        while ( mxs-- && doublebuffer[ir] )
            Serial.write(doublebuffer[ir++]);
            
        if ( !doublebuffer[ir] )
        {
            onsend = !onsend;
            dataready = 0;
        }
    }
}

Thanks for the contribution, vbextreme. I’m interested if the cstrcpy() is more efficient than strncat(). I’ll peek under the hood at the actual strncat() code. I suspect cstrcpy() will have fewer instructions (therefore fewer clock cycles). Overall, your implementation is more elegant.

Nice catch on the volatile declaration. This contribution belongs solidly in this thread about multitasking. Changing globals in ISRs is risky business indeed, and the volatile declaration is important. Volatile declaration would also apply to multitasking in OS48 or similar environments where variables may be changed in spawned tasks.