Strtok() bites me again

PieterP · January 22, 2022, 4:38pm

These kinds of parsers and other state-machines can often be replaced by simpler coroutines.
For example, the following tokenize function is easier to read and follow, without the need to manually restore the string or to rely on static, non-reentrant state, and without having to invert the control flow (i.e. the parser returns its tokens to the caller, you don't have to provide a callback that is called by the parser).

mp_coro::generator<str_view> tokenize(str_view s, str_view separators) {
    auto prev_split = begin(s);
    for (auto it = begin(s); it != end(s); ++it) { // iterate over the string
        if (separators.contains(*it)) { // if the current character is a separator
            co_yield {prev_split, it};  // return the substring
            prev_split = it + 1;        // skip over the separator
        }
    }
    if (!s.empty()) // return no substrings if the original string is empty
        co_yield {prev_split, end(s)}; // from the final separator to the end
}

int main() {
    char arrayA[] {"a b/c/d e f/g"};
    println("parsing '", arrayA, "'");
    for (auto substr : tokenize(arrayA, " ")) {
        println("'", substr, "'");
        for (auto subpart : tokenize(substr, "/")) {
            println("  → '", subpart, "'");
        }
    }
}

parsing 'a b/c/d e f/g'
'a'
  → 'a'
'b/c/d'
  → 'b'
  → 'c'
  → 'd'
'e'
  → 'e'
'f/g'
  → 'f'
  → 'g'

Full working example: Compiler Explorer

The co_yield keyword “returns” a value to the caller, but unlike a traditional return statement, the function is not terminated, all your local variables are still intact, and you can resume the function later, it will continue executing right after the co_yield statement.
This is similar to generators in Python, for example (where they use the yield keyword).

Unfortunately, the ESP32 Arduino core still uses GCC 8.4, whereas you need GCC 11 for coroutines^*, so you'll have to wait for them to upgrade before you can use it in practice

^{(*) Or GCC 10 with the -fcoroutines flag.}

J-M-L · January 22, 2022, 5:39pm

cool stuff but I guess the point of the example was to have something that runs on a UNO.

I doubt that #include <mp-coro/generator.h> will be recognized if you try to port that example on an Arduino UNO

PieterP · January 22, 2022, 6:24pm

Sure, it was just a simple demonstration of how the principle of coroutines can simplify code like this.

mp-coro is a header-only library, so easy to port to Arduino, you just put it in your ~Arduino/libraries folder. The generator class definition and the necessary boilerplate for promise and iterator types is less than 100 lines of code, with no external dependencies. You could simply implement your own generator if you wanted to, I just used this library for simplicity.
It relies on the standard library of course, so it won't work on AVR boards, but the Arduino ecosystem is much more than these outdated 8-bit microcontrollers.
I've successfully used coroutines on the Raspberry Pi Pico (which allows using a modern compiler much more easily than the Arduino IDE), and it wouldn't be a problem on ESP32 or ARM-based Arduino boards either, once they upgrade their compilers.

Coroutines are pretty new, and we'll see improvements in compiler optimization, for e.g. the coroutine frame allocation and perhaps some bug fixes, but it's interesting to try out already, it greatly simplifies things like parsers, state machines, cooperative multitasking, asynchronous operations, and so on. Especially the latter would be really useful on network-enabled microcontrollers like the ESP32.

The goal is to standardize things like generator in the C++ standard library, but they don't want to rush it because standards are hard to change if you get it wrong. So for the coming years we'll have to rely on other libraries (or your own implementations) to provide coroutine utilities.

J-M-L · January 22, 2022, 6:29pm

thanks for the explanation. good stuff there!

alto777 · January 22, 2022, 6:48pm

Haha, new to whom? Or what?

a7

PieterP · January 22, 2022, 6:54pm

C++ compiler developers

They were introduced in C++20, the latest standard. Of course they were available in other languages way before that (since the 1950s or 60s IIRC).

bperrybap · January 22, 2022, 7:31pm

There are multiple versions of the xxprintf() libraries on the AVR.
The default one does not support floating point to save code space.
To get floating point and double support you have to link in a different (larger) version than the default version.
More than 10 years ago the Arduino.cc team was even offered the code to update the preferences gui to be able to select the different versions of the AVR libraries. It was nice and worked well.
They turned it down.
Their decision to not support a printf() method in the Print class is not based on technical reasons.
It is a choice/preference.

GigaNerdTheReckoning · January 23, 2022, 6:50am

Good to know about this issue with Strtok on ESP. I'm wondering if you could replace practically every function to do with char arrays and byte arrays with some for-loops and/or while loops (to either read or manipulate the arrays), if future compatibility issues arise?

J-M-L · January 23, 2022, 9:20am

Those functions work with loops and char arrays too but depend on a unique static index or buffer etc. They were written a loooong time ago

You just need to know about the risks and you can use them when it makes sense or write your own or use safe alternative as Reentrance and being thread safe are not new.

system · July 22, 2022, 9:21am

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Strtok() oddity or my misunderstanding of it Programming	6	1023	July 17, 2022
Serial Input Basic's usage of Strtok() unsafe? Programming	22	3990	February 5, 2023
A doubt a strtok Programming	2	490	May 5, 2021
strtok Programming	9	12204	May 5, 2021
char array splitting with strtok Programming	5	10534	May 6, 2021

Strtok() bites me again

Related topics