Char corrupted by strtok

Hi everyone,

I am programmed a parser for my Arduino Uno and beside an odd problem it is working fine.
With strtok I split a char but in some cases, I don't know why, the last token it extractes is damaged at the end. When printing it it shows that some characters of the string were replaced by a varying number of special characters. It looks like this:

WAT=OFFPER1=10200PER2=9800PI1=10PI2=10PTT=2~LIGHT RAT=30 COL=255,255,0~BLACK RAT=20~LIGHT RAT=30 COLB=255,255,0 COLT=255,255,0~BLACK RAT=20END
LIGHT RAT=30 COL=255,255,0
BLACK RAT=20
LIGHT RAT=30 COLB=255,255,0 COLT=255,255,0
BLACK RAT=2h⸮⸮⸮

The first line is the total string. The 4 bottom lines are the tokens. The last one actually has more weired symbols but it could only copy these four...

The backgrund is the following:

I use #define to at the begining of my code to store information about different modes of the device I am building.

#define DEFINITIONS_MODE_1   "\
WAT=OFF\
PER1=10200\
PER2=9800\
PI1=10\
PI2=10\
PTT=2\
~LIGHT\
 RAT=30\
 COL=255,255,0\
~BLACK\
 RAT=20\   
~LIGHT\
 RAT=30\
 COLB=255,255,0\
 COLT=255,255,0\
~BLACK\
 RAT=20\
"

Then I pass it to the parser function:

read_definitions(PSTR(DEFINITIONS_MODE_1));

what as far as I understand stores it in Flash and saves ram (what it definitly does).
In the parser the const char* is then copied to a char* in ram and is splitted by strtok.

void read_definitions(const char* str_flash) {

  char str_ram[strlen_P(*str_flash)];
  strcpy_P(str_ram, str_flash);
//  str_ram[strlen(str_ram)-1] = NULL;
  Serial.println(str_ram);
 
  char* ptr = strtok(str_ram, "~");
  char* mode_definitions = ptr;
  char* wave_definitions[MAX_NUM_WAVES];

  byte wave_counter = 0;

  while((ptr = strtok(NULL, "~")) != NULL) {
    wave_definitions[wave_counter] = ptr; 
    Serial.println(ptr);

    wave_counter++;
  }

As you see above it splits them fine but the end is damaged... it does not appear on other definitions and the error is reproducable as long as I dont change anything, but if I change a definition or add another one. I know this is probably not a beautiful way to program this but I have bad experiences with reading from PROGMEM either.
Thanks ahead!

It's entirely possible I don't know what I'm talking about but, try appending a concluding null character '\0' to your list of definitions. Functions which act on C-strings look for this as a termination character.

Thanks for the fast reply.
Do you see this commented line here in the code, where I tried exactly this? I also tried with '\0'... nothing helped

strtok() does modify the input string, by replacing tokens with zeros. You might check to see what it is did.

Also check that the array bound MAX_NUM_WAVES is not being exceeded.

I think this might be the issue:

void read_definitions(const char* str_flash) {

  char str_ram[strlen_P(*str_flash)];
  strcpy_P(str_ram, str_flash);
//  str_ram[strlen(str_ram)-1] = NULL;
  Serial.println(str_ram);

strlen() (and I assume strlen_P() by extension) returns the length of the string not including the terminating NULL byte. So you're allocating an array on the stack which is one byte too short for the string you're trying to copy into it.

Try:

void read_definitions(const char* str_flash) {
  char str_ram[strlen_P(*str_flash)+1] = {};
  strcpy_P(str_ram, str_flash);
  Serial.println(str_ram);

As an aside -- I think by copying the string from flash into the str_ram array in read_definitions() you're possibly losing the benefit of leaving the string in flash in the first place.

char str_ram[strlen_P(*str_flash)];

That line is creating an array that is the exact length of the input string, NOT counting the terminating NULL. It should be:

char str_ram[strlen_P(*str_flash) + 1];

That defnitly solved another issue of allways loosing the last character. But unfortunatly the corruption is still present.

Maybe it helps to know that there are several of these definitions:

DEFINITIONS_MODE_0
DEFINITIONS_MODE_1
etc

The problem only appreas on

DEFINITIONS_MODE_1

and appears somewhere else when something is modified or another defintion is added... it seams they interact with each other but I dont know how

Always post ALL the code, or at least the minimum, running code that demonstrates the problem.

Is the code I posted not enough? What are you missing? I cant post the complete code due to the 9000 characters limit...

PS: I just fund out that changes in the definitions can also cause corruption within the strings not just at the end or it causes the program to crash and restart...
Something is really wrong here, would be so great if someone finds it. I am not very experienced with cstrings ;-/

apollo91:
Is the code I posted not enough? What are you missing? I cant post the complete code due to the 9000 characters limit...

NOBODY wants to look at hundreds or thousands of lines of cluttered unrelated code. Post an MRE. That's the smallest COMPLETE code that compiles and demonstrates the problem at hand. And, only that problem.

What are you missing?

Obviously, you did not read reply #4, which points out that MAX_NUM_WAVES is not defined in the snippet you posted.

Have you checked yet whether array bounds are being violated? You certainly missed the instances pointed out above.

Oh you gonna love this! Already this piece of code causes an error.

#include <string.h>

//BEGIN Mode Definition

   
#define DEFINITIONS_MODE_1   "\
WAT=OFF\
PER1=10200\
PER2=9800\
PI1=10\
PI2=10\
PTT=2\
~LIGHT\
 RAT=30\
 COL=255,255,0\
~BLACK\
 RAT=20\   
~LIGHT\
 RAT=30\
 COLB=255,255,0\
 COLT=255,255,0\
"

void setup() {
  Serial.begin(9600);
   Serial.println("Start");
 
  read_definitions(PSTR(DEFINITIONS_MODE_1));
}

void read_definitions(const char* str_flash) {

  char str_ram[strlen_P(*str_flash)+1] = {};
  strcpy_P(str_ram, str_flash);
}

void loop() {
}

The terminal only prints "Start" if the String definition is shortened to a view characters. If it is longer it only prints "St"... Is there a limit for Strings in #define or am I totally missing something here?

The code in your last post will not compile and run without warnings and errors.

Are you paying attention to ANYTHING we are saying?

A simple working example of (possibly) what you want:

#include <string.h>

#define DEFINITIONS_MODE_1   "\
WAT=OFF\
PER1=10200\
PER2=9800\
PI1=10\
PI2=10\
PTT=2\
~LIGHT\
 RAT=30\
 COL=255,255,0\
~BLACK\
 RAT=20\   
~LIGHT\
 RAT=30\
 COLB=255,255,0\
 COLT=255,255,0\
"

#define MAX_NUM_WAVES 8

void setup() {
  Serial.begin(115200);
  Serial.println("Start");
  read_definitions(DEFINITIONS_MODE_1);
}

void read_definitions(const char* str) {
  char *wave_definitions[MAX_NUM_WAVES] = {};
  char * ptr = strtok(str, "~");
  byte idx = 0;

  if (!ptr) return;

  wave_definitions[idx++] = ptr;

  while((ptr = strtok(NULL, "~"))) {
    wave_definitions[idx++] = ptr;
    if (idx >= MAX_NUM_WAVES) break;
  }

  for (idx = 0; idx < MAX_NUM_WAVES; idx++) {
    if (!wave_definitions[idx]) break;
    Serial.print(F("wave "));
    Serial.print(idx);
    Serial.print(F(" "));
    Serial.println(wave_definitions[idx]);
  }
}

void loop() {
}

Spits out this on my Duemilanove:

Start
wave 0 WAT=OFFPER1=10200PER2=9800PI1=10PI2=10PTT=2
wave 1 LIGHT RAT=30 COL=255,255,0
wave 2 BLACK RAT=20
wave 3 LIGHT RAT=30 COLB=255,255,0 COLT=255,255,0
#include <string.h>

const char DEF_M_1_P[] PROGMEM = {
  "WAT=OFFPER1=10200PER2=9800PI1=10PI2=10,PTT=2~LIGHT RAT=30 COL=255,255,0~BLACK RAT=20~LIGHT RAT=30 COLB=255,255,0 COLT=255,255,0"
};

#define MAX_NUM_WAVES 8

void setup() {
  Serial.begin(115200);
  Serial.println("Start");
  read_definitions(DEF_M_1_P, strlen_P(DEF_M_1_P)+1);
}

void read_definitions(const char *str, size_t len) {
  char *wave_definitions[MAX_NUM_WAVES] = {};
  char tmp[len];
  byte idx = 0;
  char *ptr;
  
  strncpy_P(tmp, str, len);
  
  ptr = strtok(tmp, "~");
  if (!ptr) return;

  wave_definitions[idx++] = ptr;

  while((ptr = strtok(NULL, "~"))) {
    wave_definitions[idx++] = ptr;
    if (idx >= MAX_NUM_WAVES) break;
  }

  for (idx = 0; idx < MAX_NUM_WAVES; idx++) {
    if (!wave_definitions[idx]) break;
    Serial.print(F("wave "));
    Serial.print(idx);
    Serial.print(F(" "));
    Serial.println(wave_definitions[idx]);
  }
}

void loop() {
}

And there's a version which uses PROGMEM. Don't quote me on this one since I'm not 100% sure on passing the char array as a plain old const char * pointer, although it "appears" to work; and does indeed save some RAM.

One thing that occurs to me: the wave_definitions array AND the backing array it refers to are both declared on the stack in read_definitions() -- so although it's OK for the sake of a demonstration, you must be careful to ensure that the scope of these arrays stays in sync if you need to access the parsed tokens outside of read_definitions.

You cannot store pointers returned by strtok like you do. Whenever "strtok(NULL, ..)", the previously returned token pointer cannot be expected to be valid anymore. If you want to store all the tokens, you will need to "strdup(token)" in order to create a (new buffer with a) copy of the token. Storing token pointers in an array as you do should not be necessary, though.

apollo91:
Thanks for the fast reply.
Do you see this commented line here in the code, where I tried exactly this? I also tried with '\0'... nothing helped

I'll have to admit, I did not see that.

Danois90:
You cannot store pointers returned by strtok like you do. Whenever "strtok(NULL, ..)", the previously returned token pointer cannot be expected to be valid anymore. If you want to store all the tokens, you will need to "strdup(token)" in order to create a (new buffer with a) copy of the token. Storing token pointers in an array as you do should not be necessary, though.

Nonsense! I do it all the time. Each call to strtok() places a '\0' at the location of the delimiter it finds, and does not in any way alter ANY other part of the array. Al of the tokens remain completely intact until YOU over-write the buffer.

Thanks tomparkin! I am gonna test ist tomorrow.
(I know there where errors, I just didnt understand where they come from)

tomparkin:
One thing that occurs to me: the wave_definitions array AND the backing array it refers to are both declared on the stack in read_definitions() -- so although it's OK for the sake of a demonstration, you must be careful to ensure that the scope of these arrays stays in sync if you need to access the parsed tokens outside of read_definitions.

I tested it it seems to work. Thanks!
What exactly so you mean with "you must be careful to ensure that the scope of these arrays stays in sync"?
I pass the tokens to a function identifyng the required values.

mode_definition_wave_fade_intervals[w][FADE_BRIGHT] = read_value(wave_definitions[w], PSTR("FIB="));
unsigned short read_value(char* definition_str, const char* keyword_str) {
  
  char* relevant_str = strstr_P(definition_str, keyword_str) + strlen_P(keyword_str);

  byte pos = 0;
  unsigned short value = 0;

  while(isDigit(relevant_str[pos])) {
    value = value * 10 + relevant_str[pos] - '0';
    pos ++;
  }

  if(value == 0) {
    error_flag = true;
    Serial.println(F("DEFINITION_ERROR: No valid value found!"));
  }
  
  return value;
}

It appears to work so far but do you think that could cause problems if it is done to every single token?