Real time-sharing based multi-threading on AVR/Arduino

I need "kind of" real multi-threading library on avr/arduino e.g. define a function and start a thread with executes it indefinitely in background.

Does any library or example exists for this ?

fyi: I'm not talking about having multiple functions called inside the main loop and this way to simulate threading. I need a real thing e.g. with _startThread(void()) etc.

example usage :

in my case I need a precise GPS calculations and as the avr supports only 32bit float I used a library called Flot64 and Math64 which has 64bit float support and on top I have built a GPS calculations, but although I managed to improve them about 100x they still take about 150-500 millis which is serious time for all other things to stop - hence I need some kind of background threading model based on time sharing . And before building it myself based on timer interrupt I want to make sure that I do not discover the "hot water" and if something is available of the shelf I'm happy to use it

btw: The GPS library can be found here :

vtomanov:
but although I managed to improve them about 100x they still take about 150-500 millis which is serious time for all other things to stop - hence I need some kind of background threading model based on time sharing

It sounds to me like you are hoping to conjure up extra CPU cycles out of thin air.

The Blink Without Delay technique is about the most efficient "multi-tasking" system you will get. Everything else just dresses that up in fancy clothes at the cost of using up extra CPU cycles.

The demo Several Things at a Time is an extended example of BWoD

It is entirely possible that you just need a faster microprocessor.

If you want more advice you will need to post your program.

...R

You can certainly write a time-slicing kernel that runs on the Arduino (I haven't checked to see if anyone else already has, but I wouldn't be surprised if it exists). The biggest problem is going to be memory constraints. You need to have enough memory to statically allocate a stack and space to store registers for each task. You will also need to decide just how much to "snapshot" when switching tasks. Normally, this is only the CPU registers, but gets more complicated when there are a large number of on-board devices. You'll probably also have to implement some sort of semaphores and locking mechanisms to allow waiting for threads to complete, prevent collisions on global variables and hardware registers, etc.

It's a lot of work, and has lots of caveats, but could be a fun project. For those of you who say it won't work, keep in mind that such kernels have been written for 8-bit (6502, 8080, Z80, etc.) computers with just a few K of RAM.

Oh, one more thing - this will all have to be written in assembly language.

--Doug

I wrote a little one once for a 6809 - nicest 8-bitter ever made. Assy of course - and a long time ago.

regards

Allan.

"Precise GPS calculations" Well, floating point isn't precise. As you've found out a 32-bit number can't represent every meter of the earth's surface. I suggest that if you're using 64-bit floats, you're solving the wrong problem. A local coordinate system is one possible method.

One problem with a multi-threaded operating system is access to the peripherals. You probably have sensors and other devices connected on SPI or I2C. You can't stop an SPI transfer in the middle and go and run some other function. Even worse if that other function wants to access a different SPI device. So the operating system has to provide 'drivers' to do that low-level device access independent of the user threads. Once you do this, you've filled the Uno's memory and there's no space for user threads.

If a single GPS calculation is taking too long, can you split it into smaller chunks and only do half the calculation before returning to the main loop?

It's probably time to upgrade to a 32-bit Arduino. The Due is nominally discontinued but still available and still supported. Unfortunately it's only 3.3V, so it won't plug straight into your system. The Teensy 3.2 is 5V-tolerant but it's a different shape, so it may or may not help you. The Teensy 3.5 is significantly faster again and it has hardware floating-point.

The idea is to have one background thread and one foreground thread running the main loop {} - all external devices to be handled from the foreground thread and the background thread to be used for heavy calculation e.g. like Gps64 calculation and communicate with the foreground using shared variables.

The switching can be done using Timer1 interrupt and for synchronization can be used noInterrupts()l and interrupts();

I actually think all can be written on C including bad jumps from one place in the code to another using labels - required for the starting of the two threads:)

and YES it is kind of a ugly in C , but will be a lot more readable for other people who decide to use it, and YES SRAM is the problem in ATMega2560 with 8KB still the memory is the main issue as writing it on C has a lot of wrapping generated and I need to push and pop all 32 registers + overhead some pushing and poping from the wrapper - e.g.about ~300 bytes of the stack lost per thread for thread switching and of course it will be slower then having everything in one thread.

I have started to build the lib - still having issues and very difficult to debug :slight_smile: but using avr-objdump -S - to see what is the actual assembly code generated helps a lot.. initial sketch uploaded :

vtomanov:
The idea is to have one background thread and one foreground thread running the main loop {} - all external devices to be handled from the foreground thread and the background thread to be used for heavy calculation e.g. like Gps64 calculation and communicate with the foreground using shared variables.

The switching can be done using Timer1 interrupt and for synchronization can be used noInterrupts()l and interrupts();

It seems to me all that Thread switching will just use up CPU cycles that could better be used for getting your calculations done?

What am I missing?

...R

Yes - correct - but thread switching is too small of time consumer compared to 12sec calculations...

Robin2:
It seems to me all that Thread switching will just use up CPU cycles that could better be used for getting your calculations done?

What am I missing?

...R

Sorry if I seem dense ...

From your Original Post I got the impression that something that takes 150-500 millis is too slow. And to my mind a threading system will just make it slower.

Can you not just break up the long task into smaller pieces so that no single piece takes too much time (whatever "too much" might be).

I don't understand where the 12 seconds comes from?

...R

If I understand the thread correctly, vtomanov has a calculation that takes 'ages' to complete. To not block the rest of the Arduino program, he wants to have this calculation running in a background process and give it a slice of time every so-now-and-then. This will slow down the calculation but at least keeps the other process (e.g. monitoring of buttons) responsive.

Whether 'ages' is 500ms or 12 seconds is quite irrelevant :wink:

Looking at code on github (GitHub - vtomanov/Gps64: GPS Calculations for Arduino ( distance, bearing and destination point. )), the below takes 150ms

// Fast distance between two coordinates in meters
inline f64 dist64o(f64 & latStart,
                   f64 & lngStart,

                   f64 & latEnd,
                   f64 & lngEnd)
{
  f64 _latStart(dtor64(latStart));
  f64 _lngStart(dtor64(lngStart));

  f64 _latEnd(dtor64(latEnd));
  f64 _lngEnd(dtor64(lngEnd));

  f64 dlon(_lngEnd - _lngStart);
  f64 dlat(_latEnd - _latStart);

  f64 sin64_dlat(sin64o(dlat / f64(2L)));
  f64 sin64_dlon(sin64o(dlon / f64(2L)));

  f64 a(sin64_dlat * sin64_dlat + cos64o(_latStart) * cos64o(_latEnd) * sin64_dlon * sin64_dlon);
  f64 c(f64(2L) * atan264o(sqrt64o(a), sqrt64o(f64(1L) - a)));
  return f64(6371000L) * c;
}

Improvements can be made by implementing it in some form of statemachine that is called repeatedly till the process has been completed. The statemachine needs to do the smallest possible steps and keeping a flag that indicates if the calculation was completed or not. How effective it is needs to be seen.

Exactly as described from sterretje - btw : first version implemented and tested and working with example :

This revision:

  • gives equal time to both foreground ( mail loop function) and whatever function is decided to run in background.

  • support only one foreground thread and one background thread.

  • templates for safe access to shared ordinal variables also provided

  • together with the threading gives the user timer counter also that ++ a variable every 10 millis which can be used for patterns like: many functions calls are in the main loop and everyone is started every X milliseconds -

Next:

I will build a real word example including the threading and the timer ...

Ref:

The code of the lib is in only one .h file and I try to use as less as possible assembler to make it easy to understand for most people

Real world - little bit complicated example added also:

Provider(in foreground) -> Processor(in background) -> Consumer(in foreground)
and also two separate function on timer in foreground.

deficiencies : can lead to some empty cycles - but as more complicated the sketch becomes ( a real world one) less free time the CPU will have

New macros and simplified example for pattern producer->processor->consumer added

Added full featured new example - e.g. using it you can build with arduino a marine fixed GPS for pennies similar to this model : ( all calculation and code in the example - just add the gps censor and display and send the data to the display instead of the serial - including reset of the starting point and simultaneous calculation for average speed, distance and bearing from starting point and current distance , speed and bearing from last point )

example : Thread64WithGps64Example.ino

updated : Thread64WithGps64Example.ino with Yield to solve/demo hw to solve the problem with non-used CPU cycles