strtok() when string contains a null?

I am trying to parse a String - yes, I swore off String class months ago, but this is what I have to work with.

The code below works fine with
String messageString = "10,11,12";

But if the incoming String contains a null value, I crash.
String messageString = "10,,12";

Is there a workaround that I am missing?

void setup() {
  Serial.begin(115200);
  delay(10);
  Serial.println();
  Serial.println(F("Tokenize demo"));
  
  String messageString = "10,11,12";
  char buf[10];

  strcpy(buf, messageString.c_str());

  // Tokenize
  char *tmpbuf;
  tmpbuf = strtok(buf, ",");
  int i1 = atoi(tmpbuf);
  Serial.print(F("i1= "));
  Serial.println(i1);

  tmpbuf = strtok(NULL, ",");
  int i2 = atoi(tmpbuf);
  Serial.print(F("i2= "));
  Serial.println(i2);

  tmpbuf = strtok(NULL, ",");
  int i3 = atoi(tmpbuf);
  Serial.print(F("i3= "));
  Serial.println(i3);

}

void loop() {
}

Is it strtok that crashes, or atoi when presented with a null string?
I suspect atoi....

Without being able to test it, I’d try....

int i2 = (tmpbuf == null || *tmpbuf == 0) ? 0 : atoi(tmpbuf);

You are using strtok() on a String. Surely it is designed to work on a string.

If you must use a String then use a String function such as indexOf() to find the commas in order to separate the String into its component parts or turn it into a char array before using strtok() or better still don''t use Strings in the first place

What Arduino crashes? Or is it esp8266 or esp32?

I don't think you have a null, you have a zero length string. Use strlen to check and if so forget atoi and return whatever you fancy to represent "nothing there".

UKHeliBob:
You are using strtok() on a String. Surely it is designed to work on a string.

If you must use a String then use a String function such as indexOf() to find the commas in order to separate the String into its component parts or turn it into a char array before using strtok() or better still don''t use Strings in the first place

That's why the first thing I do is strcpy message into a char buffer.

I would rather not use the String, but (until now) I haven't groked how to use the message in this call:

void callback(String topic, byte * message, unsigned int length) {

DOH!, Cast it.

tmpbuf = strtok((char *)message, ",");

No String required.
But this doesn't fix the actual question.

New data- I'll post at the end of the thread while I respond to the other inputs.

wildbill:
I don't think you have a null, you have a zero length string. Use strlen to check and if so forget atoi and return whatever you fancy to represent "nothing there".

"" is a null.

cattledog:
What Arduino crashes? Or is it esp8266 or esp32?

It's an esp8266, but I can't imagine that the strtok() function compiles differently from an Uno.

It's an esp8266, but I can't imagine that the strtok() function compiles differently from an Uno.

If strtok returns a null pointer, that is a pointer to memory address 0 the esp8266 will give a fatal error ("crash"), but on the UNO, it will just print a 0.

pcbbc:
Is it strtok that crashes, or atoi when presented with a null string?
I suspect atoi....

Without being able to test it, I’d try....

int i2 = (tmpbuf == null || *tmpbuf == 0) ? 0 : atoi(tmpbuf);

It's the strtok that crashes.

In the example where the input is "10,11,12", I expect an output of:

i1= 10
i2= 11
i3= 12

If I test for NULL as suggested, the output is

i1= 10
i2= 12

In other words, i2 got the value of the third item, then the next strtok crashes because there's no more data to tokenize.

What I would like to get is:

i1= 10
i2= 
i3= 12
void callback(String topic, byte * message, unsigned int length) {

Is that an MQTT callback ? If so then which MQTT library are you using ?

cattledog:
If strtok returns a null pointer, that is a pointer to memory address 0 the esp8266 will give a fatal error ("crash"), but on the UNO, it will just print a 0.

I do believe that's exactly what's happening. That explains the results from my post #9.

Now, on to fix it if the pointer==0.

This is a start:

void setup() {
  Serial.begin(115200);
  delay(10);
  Serial.println();
  Serial.println(F("Tokenize demo"));

  String messageString = "10,,12";
  char buf[10];

  strcpy(buf, messageString.c_str());

  // Tokenize
  int values[2];
  byte k = 0;
  char *tmpbuf;
  tmpbuf = strtok(buf, ",");
  values[k++] = (tmpbuf == NULL || *tmpbuf == 0) ? 0 : atoi(tmpbuf);

  tmpbuf = strtok(NULL, ",");
  values[k++] = (tmpbuf == NULL || *tmpbuf == 0) ? 0 : atoi(tmpbuf);

  tmpbuf = strtok(NULL, ",");
  values[k++] = (tmpbuf == NULL || *tmpbuf == 0) ? 0 : atoi(tmpbuf);

  for (k = 0; k < 3; k++) Serial.println(values[k]);

}

void loop() {
}

But the output is still not as expected:

10
12
-17829890

The third value is obviously pointing outside of my array.

UKHeliBob:

void callback(String topic, byte * message, unsigned int length) {

Is that an MQTT callback ? If so then which MQTT library are you using ?

Yes, it is PubSubClient.h

As I said in #5, just cast it... doh!

My test code in #11 is just for testing the tokenizing with a null value. I think that Cattledog nailed it in #8 that in the ESP, strtok() returns a zero for a null item.

So, the crashing has stopped, but still the question, how to parse a string that contains an empty value:
'10,,12'.

I would like to get:

i1= 10
i2=
i3= 12

But I am getting:

i1= 10
i2= 12
i3= -17829890

I don't think you can do it with strtok. I found this:

To determine the beginning and the end of a token, the function first scans from the starting location for the first character not contained in delimiters (which becomes the beginning of the token). And then scans starting from this beginning of the token for the first character contained in delimiters, which becomes the end of the token. The scan also stops if the terminating null character is found.

That implies (as is borne out by your tests) that strtok will skip over multiple delimiters as it looks for the start of the next token.

You will need to parse it "by hand".

wildbill:
I don't think you can do it with strtok. I found this:
That implies (as is borne out by your tests) that strtok will skip over multiple delimiters as it looks for the start of the next token.

You will need to parse it "by hand".

Probably. I am a bit surprised that no one else has run into this before.

I am a bit surprised that no one else has run into this before.

They have. For instance c - strtok() and empty fields - Stack Overflow 10 years ago

how to parse a string that contains an empty value:
'10,,12'.

There is a work around which adds a space between the two commas. It adds one to the length of the String. I don't know how memory fragmentations safe it is as I don't typically work with Strings. There is probably as strstr solution with the c-string functions used after the conversion of the String to a cstring.

 messageString.replace(",,", ", ,");

Then

void setup() {
  Serial.begin(115200);
  delay(10);
  Serial.println();
  Serial.println(F("Tokenize demo"));

  //String messageString = "10,11,12";
  String messageString = "10,,12";

  //Serial.println(messageString.length());
  messageString.replace(",,", ", ,");
  //Serial.println(messageString.length());
  //char buf[10];
  char buf[messageString.length() + 1];
  strcpy(buf, messageString.c_str());

  // Tokenize
  char *tmpbuf;
  tmpbuf = strtok(buf, ",");
  int i1 = atoi(tmpbuf);
  Serial.print(F("i1= "));
  if (i1 != 0)
    Serial.println(i1);
  else
    Serial.println();

  tmpbuf = strtok(NULL, ",");
  int i2 = atoi(tmpbuf);
  Serial.print(F("i2= "));
  if (i2 != 0)
    Serial.println(i2);
  else
    Serial.println();

  tmpbuf = strtok(NULL, ",");
  int i3 = atoi(tmpbuf);
  Serial.print(F("i3= "));
  if (i3 != 0)
    Serial.println(i3);
  else
    Serial.println();

}

void loop() {
}

Gives

Tokenize demo
i1= 10
i2= 
i3= 12

You could mimic the function of strtok using strchr to locate the comma's and tokenizing the string manually, or just create your own version of strtok that does not skip over delimiters.

void setup() {
  Serial.begin(115200);
  delay(10);
  Serial.println();
  Serial.println(F("Tokenize demo"));

  String messageString = "10,,12";
  char buf[10];

  strcpy(buf, messageString.c_str());

  // Tokenize
  int values[3];
  char *tmpbuf = buf;
  char *tmpbufend;
  for (byte k = 0; k < 3; k++) {
    tmpbufend = strchr(tmpbuf, ','); //locate comma
    if (tmpbufend != NULL) *tmpbufend = '\0'; //if comma found, set position of comma as end of string
    values[k] = atoi(tmpbuf); 
    if (tmpbufend != NULL) tmpbuf = tmpbufend + 1; //if comma found, set start of string to first character past the comma
  }

  for (byte k = 0; k < 3; k++) Serial.println(values[k]);

}

void loop() {
}

Thank you all. Cattledog nailed it. The blank data will cause a crash on the ESP8266.

I did write a function to do this for my purposes, before I saw the other solutions offered above.

void iParse(char *buf, char *delim, int (& intArray) [10]) {
  /*
   * buf contains a set of integers in a c-string, for example '12,13,14'
   * delim is the delimiter between integers, ","
   * intArray is an integer array that will receive the results
   * 
   * Why did I write this instead of using strtok()?
   * The strtok() function cannot properly handle blank data.
   * If the input data is, for example, '12,,14', strok() returns 12 and 14.
   * (If you are running it on an ESP processor, it will crash).
   * iParse would return 12,0, and 14 in this example.
   * This has only been tested on an Wemos D1 Mini (ESP8266).
   * 
   */
  int values[2];                            // Temporary results go here
  char tmpBuf[10];                          // Temp buffer, we build the results here
  int j = 0;
  int intArrayPtr = 0;

  for (int i = 0; i < strlen(buf); i++) {   //For each character in the input buffer...
    if (buf[i] != 44) {                     //Is this a delimiter character?
      tmpBuf[j++] = buf[i];                 //No, copy the character into the result
    } else {
      intArray[intArrayPtr++] = atoi(tmpBuf);   //Save the numeric result in the values array
      j = 0;                                //clear the result array.
      memset(tmpBuf, 0, sizeof(tmpBuf));
    }
  }
  intArray[intArrayPtr] = atoi(tmpBuf);         //Save the last numeric result in the values array
}