Parsing comma and pipe in string

I am looking to parse a string that will look like the following:

14,255,255,255|15,255,0,0|16,0,157,25|17,18,213,274

I'm trying to see if the following code here will take care of this or if there is a better, faster way:

    char theString[] = "14,255,255,255|15,255,0,0|16,0,157,25|17,18,213,274";
    char *p          = theString;
    char *pipe;
    char *comma;

    while ((pipe = strtok_r(p, "|", &p)) != NULL)
        while ((comma = strtok_r(pipe, ",", &pipe)) != NULL)
           Serial.println(comma) //This should loop 4 times (ex. 14,255,255,255)
        }
       
        //Loop until its at the last "|"
    }

Does this seem like it will work or is there a better way of doing it?

Looks okay to me if you don't need to preserve the results:

void setup() {
  // put your setup code here, to run once:
  Serial.begin(9600);
  char theString[] = "14,255,255,255|15,255,0,0|16,0,157,25|17,18,213,274";
  char *p          = theString;
  char *pipe;
  char *comma;

  while ((pipe = strtok_r(p, "|", &p)) != NULL) {
    while ((comma = strtok_r(pipe, ",", &pipe)) != NULL) {
      Serial.print("  ");
      Serial.print(comma); //This should loop 4 times (ex. 14,255,255,255)
    }
    Serial.println();
    //Loop until its at the last "|"
  }
}

void loop() {

}

That doesn't look like correct use of strtok_r() to me (according to the documents.)
it returns "subsequent" token; I think you'll end up looking at the wrong token.
"subsequent calls should pass NULL as the first argument."
The third argument is for saving context; it's not clear whether you can point it back at the first argument.
my man page says this is replaced by "strsep"

Did you try it? Does it actually do what you want?

or if there is a better, faster way:

Of course there is a better way. The strtok_r() is the thread-safe re-entrant version of strtok(). It is bigger and slower and uses more memory that strtok(). And for what? So multiple threads can parse the same string at one time, without conflict. How many threads are you using on the Arduino?

Use strtok()!

I would use strtok() and split the string twice - once for the sections delimited by the pipe character and subsequently split each of those sections based on the commas

I presume the pipe character is intended to delimit groups - otherwise it might be easier to split on the comma first.

The parse example in Serial Input Basics uses strtok()

...R

PaulS:
Of course there is a better way. The strtok_r() is the thread-safe re-entrant version of strtok(). It is bigger and slower and uses more memory that strtok().

Why would any of those be true? (They aren't.)
What is different between strtok() and strtok_r() implementation-wise?

Why would any of those be true? (They aren't.)

They are ALL true. The strtok_r() version needs to make a copy of the string to be parsed. The strtok() version does not. The strtok_r() function needs to include code to check which thread is trying to access which data. The strtok() function does not. The strtok_r() function needs to do more work. Doing more work is never faster than doing less work.

What is different between strtok() and strtok_r() implementation-wise?

Feel free to look at the code. Even the argument lists give a big clue.

Can't strtok take two delimiter chars and tell you which one terminated the current section of the string?

Ah - no, it doesn't.

Well, in that case, roll your own:

char *p          = theString;
char *fragment;
char termninator;

while(*p) {
  fragment = p;
  while(*p != '|' && *p != ',' && *p!='\0') 
    p++;
  terminator = *p;
  if(termintator != '\0') p++;

  Serial.print("number is ");
  Serial.print(fragment);
  Serial.print(", and the terminator was ");
  if(termintator == '\0')
    Serial.print("end of string");
  else
    Serial.print(terminator);
  Serial.println();


}

PaulMurrayCbr:

char termninator;

terminator = *p;

if(termintator != '\0') p++;

spelling

PaulS:
They are ALL true.

None of the statements are true. Let's take them one at a time.

The strtok_r() version needs to make a copy of the string to be parsed.

Not true. It does not make a copy of the string. That is not required for reentrancy.

The strtok_r() function needs to include code to check which thread is trying to access which data.

Not true. A function doesn't have to be aware of threads to be reentrant.

The strtok_r() function needs to do more work.

Definitely not true. It does the same work that strtok() would do. The only difference between strtok() and strtok_r() is that the first uses a shared context (making it non-reentrant) while the second takes an argument to the context.

If that doesn't convince you that strtok_r() is not "bigger and slower and uses more memory" than strtok(), consider this: strtok() is implemented (in avr-libc) as a single call to strtok_r():

char *
strtok(char *s, const char *delim)
{
    return strtok_r(s, delim, &p);
}

Clearly, then, strtok() cannot be smaller, faster, or use less memory than strtok_r().

Feel free to read the specification for strtok_r and how strtok_r is actually implemented before spreading misinformation. The implementation can be found at avr-libc/strtok_r.S at master · vancegroup-mirrors/avr-libc · GitHub, which is in assembly language but includes a C version as a reference.

PaulS:
Feel free to look at the code.

I already did.

strtok_r does not copy anything or give a damn about which thread is calling it.

This is a typical strtok implementation when both functions are defined.

char * strtok(char *s, const char *delim)
{
     static char *last;
     return strtok_r(s, delim, &last);
}

Where would it copy the string anyways? When would it release the space reserved for the copy?

christop:
None of the statements are true. Let's take them one at a time.
Not true. It does not make a copy of the string. That is not required for reentrancy.
Not true. A function doesn't have to be aware of threads to be reentrant.
Definitely not true. It does the same work that strtok() would do. The only difference between strtok() and strtok_r() is that the first uses a shared context (making it non-reentrant) while the second takes an argument to the context.

If that doesn't convince you that strtok_r() is not "bigger and slower and uses more memory" than strtok(), consider this: strtok() is implemented (in avr-libc) as a single call to strtok_r():

char *

strtok(char *s, const char *delim)
{
    return strtok_r(s, delim, &p);
}




Clearly, then, strtok() cannot be smaller, faster, or use less memory than strtok_r().

Feel free to read the specification for strtok_r and how strtok_r is actually implemented before spreading misinformation. The implementation can be found at https://github.com/vancegroup-mirrors/avr-libc/blob/master/avr-libc/libc/string/strtok_r.S, which is in assembly language but includes a C version as a reference.

Well, gee.... If you're going to derail the whole discussion by dragging up actual facts, you're going to take all the fun out of it.... :slight_smile:

Regards,
Ray L.

I am looking to parse a string that will look like the following:

14,255,255,255|15,255,0,0|16,0,157,25|17,18,213,274

I'm trying to see if the following code here will take care of this or if there is a better, faster way:

What do you want the final output to look like?

So... This code (slightly modified from the original post) does seem to work:

int main() {
    char theString[] = "14,255,255,255|15,255,0,0|16,0,157,25|17,18,213,274";
    char *p          = theString;
    char *pipe;
    char *comma;

    while ((pipe = strtok_r(p, "|", &p)) != NULL) {
        while ((comma = strtok_r(pipe, ",", &pipe)) != NULL) {
	    printf("%s ", comma); //This should loop 4 times (ex. 14,255,255,255)
        }
	printf("\n");        //Loop until its at the last "|"
    }
}

It produces :

./a.out
14 255 255 255
15 255 0 0
16 0 157 25
17 18 213 274

Presumably you need to use the _r variant because otherwise the saved context of the first tokenization (for "|") would be overwritten by the second tokenization (for ",")

My only objection at this point is the use of the same variable for the string and the context. For this to work, you are relying on undocumented and perhaps implementation-dependent behavior...

It's probably not the most efficient code possible, since strtok() checks for any number of delimiters, while you only actually need to check for one (which is much easier), but it doesn't seem that that should be horribly inefficient, either.

westfw:
So... This code (slightly modified from the original post) does seem to work:

int main() {

char theString[] = "14,255,255,255|15,255,0,0|16,0,157,25|17,18,213,274";
    char *p          = theString;
    char *pipe;
    char *comma;

while ((pipe = strtok_r(p, "|", &p)) != NULL) {
        while ((comma = strtok_r(pipe, ",", &pipe)) != NULL) {
    printf("%s ", comma); //This should loop 4 times (ex. 14,255,255,255)
        }
printf("\n");        //Loop until its at the last "|"
    }
}



It produces :

Presumably you need to use the _r variant because otherwise the saved context of the first tokenization (for "|") would be overwritten by the second tokenization (for ",")

My only objection at this point is the use of the same variable for the string and the context. For this to work, you are relying on undocumented and perhaps implementation-dependent behavior...

It's probably not the most efficient code possible, since strtok() checks for any number of delimiters, while you only actually need to check for one (which is much easier), but it doesn't seem that that should be horribly inefficient, either.

That worked great, Tesla! Thanks! :slight_smile:

This works too:

// Look Ma! No libraries!

void setup( void )
{
  Serial.begin( 115200 );
  
  char theString[] = "14,255,255,255|15,255,0,0|16,0,157,25|17,18,213,274";
  char *chp = theString;
  
  while ( *chp )
  {
    if ( *chp == '|' )
    {
      Serial.println();
    }
    else if ( *chp == ',' )
    {
      Serial.print( " " );
    }
    else
    {
      Serial.print( *chp );
    }
    chp++;
  }
  Serial.println();
}

void loop( void )
{
}

14 255 255 255
15 255 0 0
16 0 157 25
17 18 213 274