Serial Input Basic's usage of Strtok() unsafe?

Hello,

I've used Example 5 from the Serial Input Basics tutorial many times for various projects over the years Serial Input Basics - updated - #3 by Robin2

I've never run into any problems using it. However, I have recently seen various threads discussing that the function "strtok()" is not safe to use, as it isn't "thread-safe". Serial input basics uses this same function.

Here's one thread I read discussing these problems with strtok; Strtok() bites me again, though I'm not quite advanced enough to understand what the specific problem(s) with strtok() actually are.

Here is the usage of strtok() from example 5 of the SIB tutorial;

void parseData() {      // split the data into its parts

    char * strtokIndx; // this is used by strtok() as an index

    strtokIndx = strtok(tempChars,",");      // get the first part - the string
    strcpy(messageFromPC, strtokIndx); // copy it to messageFromPC
 
    strtokIndx = strtok(NULL, ","); // this continues where the previous call left off
    integerFromPC = atoi(strtokIndx);     // convert this part to an integer

    strtokIndx = strtok(NULL, ",");
    floatFromPC = atof(strtokIndx);     // convert this part to a float

}

Is the author's usage of strtok() in this example problematic? If so, what would be a better way of parsing multiple variables out of a single char array?

FFI, I run nearly all my projects on ESP32-WROOM-32U boards these days.

thanks

For numerical data, I would use strtol or strtod to read the first integer or floating point number in a string.

char const line[] = "10, 1.1, 100";
char* endptr;
Serial.println(strtol(line, &endptr, 10));
Serial.println(strtod(endptr + 1, &endptr));  // +1 to skip the comma.
Serial.println(strtol(endptr + 1, NULL, 10));

This leaves the original string unchanged, so it should be safe.

[edit] Example modified thanks to @anon57585045 (see post #6).

1 Like

The function strtok() stores a pointer locally. That means if strtok() is used in a interrupt and in the loop(), then it goes wrong. Also when used from multiple tasks in a multitasking system then it goes wrong.

You won't notice any problem, unless you use a ESP32 and decide to start another FreeRTOS task which also uses strtok(). The chance that you do that and also have forgotten about the risks of strtok() might be 1%. Then why take a risk of 1% that your project will fail ? Just don't use strtok() anymore.

1 Like

see strtok_r()

1 Like

Thanks. Your explanation makes sense to me. Even if it's such a rare occurrence that I'd ever use strtok() on 2 cores, it seems trivial to just use a different method.

I will see if I can modify the example with a different function/method. I will post my results to this thread.

No, strtol() can store the end pointer. You just chose to discard it by making the second argument NULL. If you keep the end pointer, after the conversion of an unknown number of digits, you have a pointer to the remaining string,

char const line[] = "10,100";
char* dex;
Serial.println(strtol(line, dex, 10));
Serial.println(strtol(dex+1, NULL, 10));

" If endptr is not NULL, strtoul() stores the address of the first invalid character in *endptr. If there were no digits at all, however, strtoul() stores the original value of nptr in endptr. (Thus, if *nptr is not '\0' but **endptr is '\0' on return, the entire string was valid.)"

An unknown number and type of delimiters is a different matter, e.g. '<space>,<space>' so that would require some extra footwork. But I've used this technique to collect a series of <space> separated numeric arguments. Because also:

" The string may begin with an arbitrary amount of white space (as determined by isspace())"

1 Like
Serial.println(strtol(line, &dex, 10));
1 Like

Thanks everyone.

I have modified the parseData() function of the Serial Input Basics - updated - #3 by Robin2 tutorial to use the strtok_r function instead. The link provided in Post #4 of this thread seems to indicate that this is a thread-safe alternative to strtok. I tested it and it seems to work fine.


void parseData() {
  
  char *str;
  char *p = tempChars; //example "hello,123,3.14"
  byte counter = 0;
  
  while ((str = strtok_r(p, ",", &p)) != NULL)  // Don't use \n here it fails
    {
     switch(counter){
      case 0:
      strcpy(messageFromPC, str);
      break;
      
      case 1:
      integerFromPC = atoi(str);
      break;
      
      case 2:
      floatFromPC = atof(str);
      break;
     }

     counter++;
    }

}

I based that off a simple example of using the strtok_r() function found here: Splitting delimited strings with strtok_r - #5 by headingwest

Please let me know if there are any troubles with my code & if this modification solves the issue mentioned in post #3.

thanks

1 Like

The following may be a bit over the top, but when tucked away in a library it could be a convenient way to parse strings.

The idea is to let the parseLine function decide how to parse a field based on the types of the parameters it receives. New types can be added by overloading the parse function.

void parse(int& result, char** endptr) {
  result = strtol(*endptr, endptr, 10);
}

void parse(double& result, char** endptr) {
  result = strtod(*endptr, endptr);
}

void parse(char* result, char** endptr) {
  while(**endptr and **endptr != ',') {
    *result = **endptr;
    result++;
    (*endptr)++;
  }
  *result = 0;
}

inline void parseLine_(char**) {}

template <class H, class... Tail> void parseLine_(char** endptr, H& head, Tail&... tail) {
  parse(head, endptr);
  (*endptr)++;
  parseLine_(endptr, tail...);
}

template <class... Args> void parseLine(char const* line, Args&... args) {
  char* endptr = const_cast<char*>(line);
  parseLine_(&endptr, args...);
}

Usage should be relatively straightforward, for example:

char const* lines[] = {
  "hello, 1.23, 45",
  "world, 2.34, 56"};
for (char const* line: lines) {
  char a[10];
  double b;
  int c;
  parseLine(line, a, b, c);
  Serial.println(a);
  Serial.println(b);
  Serial.println(c);
}

That might actually come in handy for me one day, but a lot of the code is breaking my brain for now :rofl:

if there are spaces in addition to commas isn't one redundant?

The spaces can be omitted.

[edit]

Here I put everything in a class to make the delimiter configurable.

so what exactly is the delimiter: " ", "," or " ,". (rhetorical)

if it's coding between machines, why have a space.

In the example above, "," is used as the delimiter. In the link to the simulator the delimiter is ", ". For machine to machine communication I would not use text at all, so I assume that the incoming data is generated by a spreadsheet program or something like that.

Can I also get an answer to my question about whether using strtok_r() in post #8, instead of strtok() will prevent this from happening?

strtok() captures the initial ptr it is given, which means subsequent calls with a NULL ptr are based on it and calls with a new non-NULL ptr overwriting its internally saved ptr and possibly interfering with the original string being further processed.

with strtok_r(), a ptr variable is passed to it to use, so that it doesn't need an internal ptr, allowing multiple calls to strtok_r() where each is passed a unique ptr variable

1 Like

Have you checked that all the other functions your code uses are also reentrant-safe? All your functions? All the functions in the libraries you're using?

If not, why are you obsessing over strtok() specifically?

I have not checked my other functions/library.

I'm not obsessed. I was simply asking for information about a concept that I didn't understand.

Another scenario where it goes wrong is if you're parsing something like 12=1,14=0,6=55.
I strtok on the comma first and get 12=1; next I strtok that result on the equal sign before doing the next strtok on the comma.

Thanks. But sorry, I'm not really following it though. Are you using both the comma and equals sign as separate delimiters?