String vs string vs char + Arrays? Correct approach?

Hey gang-

I'm hoping someone can educate me here where I am little unclear on how to go about/approach this.

I know/understand that we SHOULD NOT (ever) use the "S"tring clas (capital "S" here)... and should use string or char.

Its a shame the String class causes those stack/heap (whatever) errors... as it easier to understand/work with
*IMHO at least.. but what do I know.. I'm a spoiled web developer, so taking in to account memory/space issues is foreign to me.. and I'm having some difficulty wrapping my brain about how one goes doing the following.

Summary:

  • I want to take some incoming serial data...
  • It has start & ending packet (characters) to signify the start and end of the incoming data
  • The data 'between' the SOP/EOP characters is varying length.

What I'd like to do (in a perfect world).. would be to take this incoming serial data, and chop the data between the SOP/EOP characters by the comma delimiter... and put each of these 'chunks' of data into an Array.

Here is where I get a little unclear on how to do things.

Strings are bad... so using an array of strings is also bad.....right?
If we use char arrays... each 'character' is an item/index in the array..... right?

So how do we 'store' several characters together as one 'index' in an array?
I have read serial basics...etc.. but I dont think that applies to what I am asking here.. I'm not asking how to collect the incoming serial data to some variable.

I'm not asking how to break it up by some delimiter (well sorta!..lol)...

I'm specifically asking about breaking it up and having those groups of values be stored 'together' in an array index?

Here is some code to perhaps help illustrate things.

//serial format examples
//<b=1:1,b=2:1,m=6:200>
//<b=1:1,b=2:1,b=5:2,m=6:100,m=2:300>

There is no telling how many 'snippets/chunks' there will be between the < & > characters..
There could be 2 or 3.. or up to maybe 10?

I would say that each 'chunk' would NOT be more than 8-10 chars (tops...probably not more than 8)..
But for the same of planning..lets say 10 'chunks' of no more than 10 characters each.. so 100 character (max) on any incoming serial 'command'.

I'd like to get each:
b=1:1,
b=2:1,
b=5:2,
m=6:100,
m=2:300

into an array... so I can just loop through the array and parse each 'action'.. and when the array is done.. there are no more 'actions/steps' to be performed...etc..

fake example:
parse...parse

actionArray[counter] = (add delimited chunk here,,blah blah);

*(but this would need to be a String array...right? (which we do not want?)

in the end left with something like:

actionArray['b=1:1','b=2:1','b=5:2','m=6:100','m=2:300'];

*(but this would need to be a String array...right? (which we do not want?)

Then loop through the above array:

for(int z=0; z<30; z++) {
        if(actions[z] != "0") {
          parseAction(actions[z]);
        }
}

So the above (for now) is my intended goal... but understanding how to go about WITHOUT using the String class has me a bit stumped.

In the code example below..

The parsing part is still assuming an older format.... (but even that was just to practice breaking up the string at specific delimiter).... the same question holds true.. how can I get each snippet into an array to be parsed (acted upon) later on?

//example recipie format (old format)
//<b=1:1><b=2:1><m=6:200> = bottle#1/1 shot, bottle#2/1 shot, mixer#6/200ms

//example recipie format (new format)
//<b=1:1,b=2:1,m=6:200> = bottle#1/1 shot, bottle#2/1 shot, mixer#6/200ms

#define SOP '<'
#define EOP '>'
bool hasStarted = false;
bool hasEnded = false;

char incomingSerialData[100];
byte index;

int positionValue = 0;
int amountValue = 0;

void setup() {
  //serial monitor output
  Serial.begin(9600);
}

//example recipie format  (old format)
//<b=1:1><b=2:1><m=6:200> = bottle#1/1 shot, bottle#2/1 shot, mixer#6/200ms

//example recipie format  (new format)
//<b=1:1,b=2:1,m=6:200> = bottle#1/1 shot, bottle#2/1 shot, mixer#6/200ms



void loop(){

  // Read all serial data available, as fast as possible
  while(Serial.available() > 0){
    char incomingCharacter = Serial.read();
    //Serial.println(incomingCharacter);
    if(incomingCharacter == SOP){
      index = 0;
      incomingSerialData[index] = '\0';
      hasStarted = true;
      hasEnded = false;
    }else if (incomingCharacter == EOP){
      hasEnded = true;
      break;
    }else{
      if (index < 100) {
        incomingSerialData[index] = incomingCharacter;
        index++;
        incomingSerialData[index] = '\0';
      }
    }
  }

 // Packet data done...parse/evaluate data
  if(hasStarted && hasEnded){

    Serial.print("TOTAL COMMAND PACKET (CP) CHECK: ");
    Serial.println(incomingSerialData);
    char *actionToken = strtok(incomingSerialData, "="); //split action value from location/amount values
    if(actionToken){
      Serial.print("ACTION COMMAND: ");
      Serial.println(actionToken);   
           
      //check if bottle 'action' data coming over
      if(strcmp(actionToken, "b") == 0){       
        //grab position value     
        char *positionToken = strtok(NULL, ":");
        if(positionToken){
          Serial.print("BOTTLE #: ");
          Serial.println(positionToken);
          positionValue = atoi(positionToken);

          //now grab 'amount' value
          char *amountToken = strtok(NULL, "\0");
          if(amountToken){
            Serial.print("SHOT COUNT: ");
            Serial.println(amountToken);
            amountValue = atoi(amountToken);
          }
        }
      }

      //check if mixer action data coming over
      if(strcmp(actionToken, "m") == 0){
        char *positionToken = strtok(NULL, ":");
        if(positionToken){
          Serial.print("VALVE #: ");
          Serial.println(positionToken);
          positionValue = atoi(positionToken);
          //now grab 'amount' value
          char *amountToken = strtok(NULL, "\0");
          if(amountToken){
            Serial.print("OPEN TIME: ");
            Serial.println(amountToken);
            amountValue = atoi(amountToken);
          }
        }
      }
    }
    Serial.println("");
    // Reset for the next packet
    hasStarted = false;
    hasEnded = false;
    index = 0;
    incomingSerialData[index] = '\0';
  }

So for now.. we can ignore the innards of the:

if(hasStarted && hasEnded){}

conditional... and just use that break things up into an array.

Or at least that is my goal here.

Maybe there is a different better approach to be had for this scenario?

I know the serial 'format' could be changed to make it easier to parse..etc.. and when I get further along and I'll regroup and refactor that..but for now.. I am just concerned/curious about how to get those comma delimited values (strings?) into an array?

Especially if we a NOT supposed to be using the String class? Maybe its my thinking/understanding of the char arrays that is flawed?

Maybe this is a situation where the String class may be acceptable? (although that incoming serial data >> array will happen over and over and over and over)

Basically a touch screen 'menu' will be sending this command/action (serial string) over, anytime someone uses the touch screen menu.

So to re-cap:

How can I parse incoming serial data by a comma delimiter and store those values into an array.. if we are NOT supposed to use the String class?

Thanks.

You want an array of c style strings, which is an array of chars... In other words, a multi-dimensional array of chars.

char scientists[4][10] = {"Newton", 
                          "Maxwell", 
                          "Einstein", 
                          "Feynman"};

The only caveat is that you have to have some max length for the field.

See example #5 of the serial input basics thread.. You end up with an array of character arrays.

@Perehama

Thanks for the reply.

Looking at your example though... you seem to pre-define the number of elements in your array:

char scientists[4][10]

ie: [4]
But there is no way to know this before hand..
it is acceptable to just use a value that is over the threshold?
ie:

char scientists[10][10]

And when adding do I just do:

scientists[x] = (parsed, comma delimited chunk)

Thinking that there will NOT be more than '10 steps' containing more than '10 characters'?.. even if one command comes in with only say 2 actions of less characters each? As long as I dont go BEYOND that declared limit.. I'm fine?

Lastly.. how do I handle 'clean up' then?

How do I clear out my

char scientists[10][10]

array? So it can accept a newly parsed incoming serial line?
And how do I access that then? As simple as:

scientists[x]

??
to get the intended index?

@groundFungus
I'm not sure what part of the #5 examples is shedding light on my question I guess? (maybe I dont understand it properly?)
I see the parseData() function.

void parseData() {      // split the data into its parts

    char * strtokIndx; // this is used by strtok() as an index

    strtokIndx = strtok(tempChars,",");      // get the first part - the string
    strcpy(messageFromPC, strtokIndx); // copy it to messageFromPC
 
    strtokIndx = strtok(NULL, ","); // this continues where the previous call left off
    integerFromPC = atoi(strtokIndx);     // convert this part to an integer

    strtokIndx = strtok(NULL, ",");
    floatFromPC = atof(strtokIndx);     // convert this part to a float

}

It looks like you have to KNOW how many delimited chunks there will be? (not acceptable in this scenario)

I'm guessing this line is the focus:

strtokIndx = strtok(tempChars,",");      // get the first part - the string
    strcpy(messageFromPC, strtokIndx); // copy it to messageFromPC

but I dont really understand strok(32, ","); line? WHere is the tempChars/32 value coming into play here? (why?)
And then copies it into the:

char messageFromPC[numChars] = {0};

char array?
Why doesnt this have to be a multi-dimensional array?

Thanks!

*Update: not sure why all the code snippets have so much extra space after them? Cant seem to delete it.

Yes, this is the beauty of a null-terminated c string. your max size is 10 per field and your max number of fields is 4. If you only have 3 fields of 3 chars each, you add a NULL ('\0') in the 4th position of fields 1-3 and a null in the 1st position of field 4, then you have an array of 3 strings.

When it comes to parsing the data from the serial port, you receive one byte at a time, which falls into one of 4 catagories: start, string, delimiter and stop. Assuming your strings do not ever contain the same characters for the start, delimiter and stop, then you can make simple arguments to determine when you store the received byte in the array and when you insert a null.

Thanks!

ok.. just so I'm clear...

Even if there is no data left to add to the array.. I still need to populate any/all remaining (empty) indexes with a NULL ('\0') character?

From the example #5 linked to above... why does that one, -not- look (to me at least) to be a multi-dimensional array?

Looks like (and I could be understanding it all wrong), that he is taking the characters from index * to first instance of delimited character, and copying it to a char array?

strcpy(messageFromPC, strtokIndx); // copy it to messageFromPC

How is this getting him an array of strings??
To me it looks like he has stripped out '3' specific and known (count) values... and saved them to 3 separate variables?

--

Just saw your new message as I posted the above.

You are correct,.. none of the data (chunks) will container any of the characters used for:

start, end, delimiter values..

So.. I would then need to pass each character to

myArray[10][0] = (while incoming character that is not start/stop/delimiter)

when the next delimiter is hit..then I

myArray[10][1] = (while incoming character that is not start/stop/delimiter)..

So your suggesting to do this while the serial data is being received? And not take the whole incoming data, save to char, and then chop it up and put in a multi-dimensional array?

you need only one null directly after the last valid char to terminate the string inside the array.... The processor finds the start of the array by the address, knowing the next one is exactly x characters over, and the end of the array by the null. If you fill the rest with garbage, or nothing, or null, it's the same.

How is this getting him an array of strings??
To me it looks like he has stripped out '3' specific and known (count) values... and saved them to 3 separate variables?

You are correct. My apologies.

Here is a strtok demo that makes an array of pointers to the pieces. Don"t know if it helps, but it is what I was thinking of.

char array[] = "act,bore,cascade,different,eye,fill";
char *strings[10];  
char *ptr = NULL;

void setup()
{
    Serial.begin(9600);
    //Serial.print(array);
    byte index = 0;
    ptr = strtok(array, ",");
    while(ptr != NULL)
    {
        strings[index] = ptr;
        index++;
        ptr = strtok(NULL, ",");
    }
    //Serial.println(index);
    for(int n = 0; n < index; n++)
   { 
    Serial.println(strings[n]);
   }
}

void loop()
{
    // put your main code here, to run repeatedly:

}

groundFungus:
You are correct. My apologies.

Here is a strtok demo that makes an array of pointers to the pieces. Don"t know if it helps, but it is what I was thinking of.

char array[] = "act,bore,cascade,different,eye,fill";

char *strings[10];  
char *ptr = NULL;

void setup()
{
   Serial.begin(9600);
   //Serial.print(array);
   byte index = 0;
   ptr = strtok(array, ",");
   while(ptr != NULL)
   {
       strings[index] = ptr;
       index++;
       ptr = strtok(NULL, ",");
   }
   //Serial.println(index);
   for(int n = 0; n < index; n++)
  {
   Serial.println(strings[n]);
  }
}

void loop()
{
   // put your main code here, to run repeatedly:

}

Whew! LOL I wasnt trying to challenge you by any means! (to be clear)
I just started to doubt myself.. and how I was understanding/reading things!
I'll check out your example once I get home from work tonight.
Just for full disclosure..
I found this open source variant of the same type of project I am doing.

Which uses the String class.. (which I'm trying to avoid)..
but the 'logic' and approach makes a more sense to me.
At least the part of getting these values into an array and loping through the 'command array' to execute 1 to many 'steps'
I have some questions on your code example... but I'm going to wait and play with it to see if things make more sense once the code is run.

Thanks.

The Evils of Arduino Strings article explains why to avoid Strings if you don't know how they work.

Yeah.. I';ve read Majenko's article.. (where I learned about the problems.. stack/heap...etc..as well as a thread here between RayLivingston and Robin2)

Just a little confused about the work-arounds using char arrays..

for me..(in my world..lol).. its a VERY easy thing..

//parse into array
$myString = "<b=1:1,b=2:1,m=6:200>";
$myArray =explode(" ",$myString);

//check (access) array/index (not needed, just a debug line)
echo 'ACTION 1: ' . $myArray[0];

//execute command(s)
for($i; $i<count($myArray); $i++){
    //do the 'work'
    parseAction($myArray[$i]);
}

done.

so I'm trying to wrap my brain around how to add the 'string groups' (character groups) as single items in an array,.... then access them after the fact. (I see some things that are unclear to me from your example.. and the other provided here, as one uses a multi-dimensional array.... and I -think- your uses a pointer to the string/array?

Just having a hard time grasping the basics of how to use a char array.. since its not a true 'array'... but one where each character is an item/index.

The Evils of Strings article is not just that Strings are bad. After explaining why not to use Strings there is a good tutorial on using strings (small s).

*not sure if I should make a new thread or not?

anyways..

ok.. so I have something 'close' (I think).. LOL..

The problem is when I try to use the 'sub-parsing' function called in the 'while loop'.

If I comment out the parseAction() call in the while() loop... it executes, and delivers what I expect in the serial monitor.

*but that just a simple break down of sections by comma delimited character.. (havent stored them anywhere where they can be passed around and further parsed/broken down..etc).

if I un-comment that line.. and let it call the parseAction() function..

it executes ONE time perfectly.... then nothing else.

I get a DEBUG line in the (after the) WHILE loop saying that parsing of the comma delimited cunks is complete.. (when really only the FIRST one was parsed and attempted to be passed to the parseAction() function.)

Here is my code..

  • paste this into the serial monitor:
    <b=1:1,b=2:1,m=6:200> and comment/un-comment line 72 for the parseAction(actionPacket); behavior results.
#define SOP '<'
#define EOP '>'
bool hasStarted = false;
bool hasEnded = false;

char incomingSerialData[100];
byte index;

char *actionPacket;
byte packetCounter;

int positionValue = 0;
int amountValue = 0;

void setup() {
  //serial monitor output
  Serial.begin(9600);
}


//example recipie format  (paste into serial monitor for output example)
//<b=1:1,b=2:1,m=6:200> = bottle#1/1 shot, bottle#2/1 shot, mixer#6/200ms



void loop(){
  // Read all serial data available, as fast as possible
  while(Serial.available() > 0){
    char incomingCharacter = Serial.read();
    //Serial.println(incomingCharacter);
    if(incomingCharacter == SOP){
      index = 0;
      incomingSerialData[index] = '\0';
      hasStarted = true;
      hasEnded = false;
    }else if (incomingCharacter == EOP){
      hasEnded = true;
      break;
    }else{
      if (index < 100) {
        incomingSerialData[index] = incomingCharacter;
        index++;
        incomingSerialData[index] = '\0';
      }
    }
  }

 // Packet data done...parse/evaluate data
  if(hasStarted && hasEnded){
      
    Serial.println("");
    Serial.print("TOTAL COMMAND PACKET (CP) CHECK: ");
    Serial.println(incomingSerialData);
    Serial.println("");

    //packet counter reset
    packetCounter = 0;

    //get first delimited packets from command string
    actionPacket = strtok(incomingSerialData, ","); //split action packets from command string

    //keep parsing these delimited packets from the command string until there are none left to parse
    while(actionPacket != NULL){
      packetCounter++;
      Serial.println("-----------------------------------");
      Serial.print("PARSED ACTION PACKET (AP) ");
      Serial.print(packetCounter);
      Serial.print(": ");
      Serial.println(actionPacket);    
      
      //send off to parse function/loop for processing      
      parseAction(actionPacket); //problem stems from here!!!!

      //update parsing params to next comma delimited chunk/action packet
      //only works if we dont call the 'sub parsing function' and pass in the pointer/value a a param
      actionPacket = strtok(NULL, ":");       
    }

    //parsing complete
    Serial.println("");
    Serial.print("PARSING PACKETS COMPLETE");
         
    // Reset for the next packet
    hasStarted = false;
    hasEnded = false;
    index = 0;
    incomingSerialData[index] = '\0';
    
  }

}



void parseAction(char *packetData) {
  Serial.println("");
  Serial.print("TOTAL ACTION PACKET (AP) RECEIVED CHECK: ");
  Serial.println(packetData);
  char *actionToken = strtok(packetData, "="); //split action value from location/amount values
  if(actionToken){
    Serial.print("ACTION COMMAND: ");
    Serial.println(actionToken);  
        
    //check if bottle 'action' data coming over
    if(strcmp(actionToken, "b") == 0){      
      //grab position value    
      char *positionToken = strtok(NULL, ":");
      if(positionToken){
        Serial.print("BOTTLE #: ");
        Serial.println(positionToken);
        positionValue = atoi(positionToken);

        //now grab 'amount' value
        char *amountToken = strtok(NULL, "\0");
        //char *amountToken = strtok(NULL, ",");
        if(amountToken){
          Serial.print("SHOT COUNT: ");
          Serial.println(amountToken);
          amountValue = atoi(amountToken);
        }
      }
    }

    //check if mixer action data coming over
    if(strcmp(actionToken, "m") == 0){
      char *positionToken = strtok(NULL, ":");
      if(positionToken){
        Serial.print("VALVE #: ");
        Serial.println(positionToken);
        positionValue = atoi(positionToken);
        //now grab 'amount' value
        char *amountToken = strtok(NULL, "\0");
        //char *amountToken = strtok(NULL, ",");
        if(amountToken){
          Serial.print("OPEN TIME: ");
          Serial.println(amountToken);
          amountValue = atoi(amountToken);
        }
      }
    }
  }

   //update parsing params
   //actionPacket = strtok(NULL, ",");
}

I feel like something in the parseAction() function is messing up the 'pointer' for the main parent while loop to think it is at the end of the comma delimiters/char array text.

How can I fix that?

You didn't post the parseAction() function.

Where is parseAction()?

DOH!.

sorry about that..(I didnt copy down far enough) post above updated!

I have it heavily commented/debug lines.. so I can try and follow things.

I know the 'format' can be streamlined.. but I'm not really focused on that right now.

This char array vs String/string stuff.... and how its not really text but pointers..etc is really throwing me a for a loop right now.

I parse out the first 'chunk' of comma delimited data.. and I want to pass that to a function to further parse/break it down and do some 'actions'.. then return back to my PARENT while() loop and move on to stripping out the next comma delimited chunk and send that to the function....etc..

My initial approach was to store these in an array.. and then loop through and parse them.... but I guess this works just as well?..

Thanks for the 'eyes' guys!

strtok() keeps an internal variable that remembers where it was up to. That is how it is able to work when you give it NULL.

You have used it in two places. One outside the function and one inside. It cannot remember both positions.

If your delimiters always go =:, then process it all in one loop. If not then you will have to do one of those layers of parsing by yourself. Just scan down the string and note the position of the first comma and then copy that to another string to pass to the inner parser.

Thanks MorganS for the reply.

I'm not quite following the pointer issue..

In my mind.. I am moving the pointer once.. (to get the first 'chunk' of comma delimited data)....

I thought I was moving the same pointer again in another function (to break down/sub-parse that trimmed out comma delimited snippet)

Help me understand?

Once called in the main loop.. and once called in the parseAction() function..

Is the one in the parseAction() function not seen because of scope or something? (ie: I accidentally created a new pointer reference or something?)

  • Can you show me an example of how to parse it all in 'one loop'? (I never fully understood the examples of using multiple delimiters, and the how to access them after?)

  • Can you elaborate on how I would 'layer parsing' manually? (myself) (how do I note the position of the comma? for later use?)

Thank you.

An example of using strtok in a loop: C library function - strtok(). Instead of printing, you can copy token to an element of an array of pointers

void showNewData()
{
  if (newData == true)
  {
    Serial.print("This just in ... ");
    Serial.println(receivedChars);
    newData = false;

    // array to store 10 texts (tokens)
    char *text[10];
    // position where to store a token
    int textIndex = 0;

    /* get the first token */
    char *token = strtok(receivedChars, ",");

    /* walk through other tokens */
    while ( token != NULL )
    {
      Serial.println(token );
      // copy token pointer to text array
      text[textIndex++] = token;
      // there is only space for 10 pointers, so break when we reach that
      if(textIndex == 10)
      {
        break;
      }
      token = strtok(NULL, ",");
    }

    for (uint8_t cnt = 0; cnt < textIndex -1; cnt++)
    {
      Serial.println(text[cnt]);
    }
    Serial.println("---");

  }
}

Notes:
1)
At the moment that you start receiving new characters, the data in receivedChars is no longer valid; and hence the pointers in the text array are no longer valid. Not a problem in the above code as it processes all data before receivedChars might be overwritten.
2)
I don't know if it was mentioned but in a system with limited resources, you will have to set limits; e.g. on a system with a 328P processor, you can simply not received 2kByte of data in one go. 10 chuncks of 10 characters will not pose a problem if you adjust receivedChars so it can hold 100 + 9 + 1 character.

I'm a spoiled web developer, so taking in to account memory/space issues is foreign to me

In that case your lucky or you know what you're doing; I've recently seen an asp.net application crash with an "out of memory" exception after being 3 months in production 8)