These kinds of parsers and other state-machines can often be replaced by simpler coroutines.
For example, the following tokenize
function is easier to read and follow, without the need to manually restore the string or to rely on static, non-reentrant state, and without having to invert the control flow (i.e. the parser returns its tokens to the caller, you don't have to provide a callback that is called by the parser).
mp_coro::generator<str_view> tokenize(str_view s, str_view separators) {
auto prev_split = begin(s);
for (auto it = begin(s); it != end(s); ++it) { // iterate over the string
if (separators.contains(*it)) { // if the current character is a separator
co_yield {prev_split, it}; // return the substring
prev_split = it + 1; // skip over the separator
}
}
if (!s.empty()) // return no substrings if the original string is empty
co_yield {prev_split, end(s)}; // from the final separator to the end
}
int main() {
char arrayA[] {"a b/c/d e f/g"};
println("parsing '", arrayA, "'");
for (auto substr : tokenize(arrayA, " ")) {
println("'", substr, "'");
for (auto subpart : tokenize(substr, "/")) {
println(" β '", subpart, "'");
}
}
}
parsing 'a b/c/d e f/g'
'a'
β 'a'
'b/c/d'
β 'b'
β 'c'
β 'd'
'e'
β 'e'
'f/g'
β 'f'
β 'g'
Full working example: Compiler Explorer
The co_yield
keyword βreturnsβ a value to the caller, but unlike a traditional return
statement, the function is not terminated, all your local variables are still intact, and you can resume the function later, it will continue executing right after the co_yield
statement.
This is similar to generators in Python, for example (where they use the yield
keyword).
Unfortunately, the ESP32 Arduino core still uses GCC 8.4, whereas you need GCC 11 for coroutines*, so you'll have to wait for them to upgrade before you can use it in practice
(*) Or GCC 10 with the -fcoroutines
flag.