Reference use of String

Hello all, I'm putting together a reference programming example and I believe it represents good practice for the use of String objects to process commands sent over serial to set the time of a Clock (although feedback is welcome).

However, it looks like I might need to port the code back to Arduino 1.0.5 to be able to target a particular school's computing lab. I therefore need to remove the use of String#reserve() and String#remove() as these came in a later version of Arduino. For this reason I'm puzzling how to observe best practice for memory management of String objects without these.

Can anyone make suggestions for an effective way to go about transforming the string parsing code to target 1.0.5 ?

If I'm targeting Arduino 1.0.5 is there any legitimate way to use the String object at all, (given the issues with malloc, free and the String implementation itself) or do I have no choice but to go back to explicitly allocating c-style strings and using pointers?

It's a novice programming example, so strtok and strcmp will most likely do students heads in, whilst with the String object method invocations it's at least fairly explicit what's going on, and with explicit memory reservations, buffer management and operator-overloading-for-append the memory allocation behaviour is respectable I think, but I have no choice but to try and avoid the use of these later features.

The full code is at projects/Clock02SetTime.ino at fe5cb1776dd2a3e084aa307ea4ec3ff020475c9e · ShrimpingIt/projects · GitHub

...and the following elements which read the data from serial and parse it to numbers are the most relevant bits, which are currently dependent on the String library later than 1.0.5 especially the coupling between reserve() and remove() ...

  //reserve enough string memory for ISO8601 date string like 2015-06-01T12:00:00
  command.reserve(COMMANDMAX + 1); 
  field.reserve(COMMANDMAX + 1);

[...]

  while(Serial.available()){
    char nextChar = Serial.read();
    if(nextChar != '\n'){
      command += nextChar;
    }
    else{
      processCommand();
      command.remove(0);
    }
  }

[...]

int nextIntField(char terminator){
  int fieldEnd = command.indexOf(terminator, fieldStart);
  if(fieldEnd != -1){
    field += command.substring(fieldStart,fieldEnd);
    fieldStart = fieldEnd + 1;
    int value = field.toInt();
    field.remove(0);
    return value;
  }
  else{
    return -1;
  }
}

int remainingIntField(){
  field += command.substring(fieldStart);
  int value = field.toInt();
  field.remove(0);
  return value;
}

Hi, @Delta_G your point is well taken from a technical point of view..

However, two comments for the two main points you've made.

I believe that we have a significant surplus of resources on the microcontroller for this project, and for many similarly simple projects. An emphasis of the outreach work we're doing is to enable people to achieve real-world results from their programming experimentation by doing physical computing projects. This is exactly so they develop interest in the craft and want to optimise and push the boundaries. I don't think we can expect them to start there. A good programmer would be forced to understand your point of view and get to grips with it, but hopefully on project #20 not project #0. Being able to send commands and process them simply is a kind of '101' activity I think so we would like to find a way for them to do that that's accessible. We're somewhat in line with Guido van Rossum on this, that "Code is much more often read than written" (http://elementary.io/es/docs/code/code-style) so we're willing to put up with a lack of efficiency in favour of these other objectives, but not actual bugs.

Secondly, when targeting 1.6.3 I'd developed a lot more confidence in the String object from an efficiency and bug-free point of view, and understood that most of the underlying problems which had bugged people for years were now out of the picture. However perhaps you know otherwise, or you've seen some aspect of our code which actually means there could be runaway memory allocation issues. We're certainly not expecting to have students ignore memory allocation, but we would like operations to be explicit and explainable for the job (such as String#reserve() and String#remove()).

@Delta_G Isn't that problem explicitly fixed by reserve() and the operator overloading which handles += for the String object ?

I was trying to specifically eliminate this bug after following the trail of the way reserve() - Arduino Reference had been added to the library in response to Google Code Archive - Long-term storage for Google Code Project Hosting.

Did I mess up? That's what I was banking on, so if I've got that wrong then I definitely have to go back to the drawing board.

I believe that we have a significant surplus of resources on the microcontroller for this project

You and zoomkat are in the minority. In fact, you two are the only ones that believe that.

cefn:
it represents good practice for the use of String objects

Not on an Arduino.

...R

@PaulS I don't think you're familiar with my project although you seem to suggest that in your reply. It's a RealTimeClock project which just allows you to set and get the current date and time via Serial. I don't know if the person @zoomcat is also making a similar RealTimeClock, is he?

I believe this project, as implemented via character arrays, doesn't use very many of the resources (operations, memory) of the microcontroller. For this reason, we have what might be called a "surplus of resources".

If the concern with Strings is merely inefficiency, rather than actual bugs, then we might choose to employ that surplus of resources by adopting programming structures which are not the most efficient, for the sake of readability by novices. Please let me know if this isn't clear. Hopefully this gives you a start to understand something about the project and its intent.

@Robin2 you say "not on an Arduino" which suggests you have some kind of issue in mind. What is the issue with the way we have used String objects which means it's not best practice for the use of String objects? That's exactly what we're hoping to understand so if you have a handle on it, it would be great to hear your point of view, but it's hard to figure out from your post.

You may be saying there's no such thing as best practice for the use of String objects, because no-one should use them. Is that your view? Do you mean no-one should use them even in version 1.6.3?

I hope that the issues arising from naive String concatenation are in fact addressed, (assuming well-formed commands sent over Serial) by the use of the reserve() invocation in our setup() function and the default behaviour of String#concat() and String#remove(). If this isn't the case it would be good to know.

I believe the call to String#reserve() ensures that memory allocation happens once, and subsequent calls to remove(0) after filling some or all of the backing array causes the string to start writing again from the beginning of the allocated memory, hence shouldn't trigger additional allocation assuming the commands sent are not longer than expected. To handle malicious actors, we should additionally check the length of the string to prevent runaway allocation from being forced by malformed serial data I guess.

Wondering if I've triggered a kind of automatic response to the use of String objects rather than being able to have a discussion about the details. I'm not hearing any comment on the technical questions raised. To clarify these are twofold...

  1. Can the existing implementation targeting 1.6.3 with a careful use of reserve() concatenation and remove() prevent runaway memory allocation when using String objects. If not what change should be made, or if it's impossible I'd be grateful if someone could help me understand why it's impossible and where the bug is. I could try and file it on github.

  2. Can any implementation targeting 1.0.5 prevent runaway memory allocation when using String objects, or is the only way to avoid the bug in such an early version actually to reimplement with char arrays (was hoping for a workaround). I think I've drawn the conclusion from Arduino/WString.cpp at 1b3ae5fa63c2ca7c5a585f3929d382b441bbba9f · arduino/Arduino · GitHub which seems to be the source for version 1.0.5 of the String object that there's no sane way to use that version at all whilst preventing memory allocation problems.

Whilst your use of reserve may reduce or eliminate memory fragmentation in the latest versions of the IDE, I am not keen on teaching the use of String classes for beginners, especially if they happen to use earlier versions of the IDE, or take away that you can use String (and not bother with all that "reserve" stuff).

If I may offer an alternative, teach the use of state machines. I have a state machine write-up which goes over the fundamentals. Here is code to solve your particular problem of parsing a date/time string:

// Expected input format:
//    YYYY-MM-DDThh:mm:ss

// possible input states
enum { WANT_YEAR,
       WANT_MONTH,
       WANT_DAY,
       WANT_HOUR,
       WANT_MINUTE,
       WANT_SECOND,
       HAVE_ERROR   // got error, wait for new line
     } state = WANT_YEAR;

unsigned int currentValue;

// show an error, go to error state
void error (const char * why)
  {
  Serial.print ("ERROR: ");
  Serial.println (why);
  state = HAVE_ERROR;
  }  // end of error
  
unsigned int year, month, day, hour, minute, second;
  
bool setDate ()
  {
  // validation
 
  if (year < 1900 || year > 2050)
    {
    error ("Year out of range 1900 to 2050");
    return false;
    }
    
  if (month < 1 || month > 12)
    {
    error ("Month out of range 1 to 12");
    return false;
    }

  if (day < 1 || day > 31)
    {
    error ("Day out of range 1 to 31");
    return false;
    }
   
  Serial.print (F("Setting date to: "));
  Serial.print (year);
  Serial.print (F("-"));  
  Serial.print (month);
  Serial.print (F("-"));  
  Serial.println (day);
  return true;  // set date OK
  }  // end of setDate
  
void setTime ()
  {
  // validation
 
  if (hour > 23)
    {
    error ("Hour greater than 23");
    return;
    }
    
  if (minute > 59)
    {
    error ("Minute greater than 59");
    return;
    }

  if (second > 59)
    {
    error ("Second greater than 59");
    return;
    }
   
  Serial.print (F("Setting time to: "));
  Serial.print (hour);
  Serial.print (F(":"));  
  Serial.print (minute);
  Serial.print (F(":"));  
  Serial.println (second);
    
  }  // end of setTime
  
void processIncomingByte (const byte inByte)
  {
  // if previous error, discard input until newline
  if (state == HAVE_ERROR)
    {
    if (inByte == '\n')
      {
      state = WANT_YEAR;
      currentValue = 0;
      }
    return;
    } // end of error
    
  // if incoming digit, add to previous value after multiplying it by 10
  if (isdigit (inByte))
    {
    // cannot exceed 65535 after multiply by 10 and adding 9
    if (currentValue > 6552) 
      {
      error ("Number too big");
      return;
      }  // end of too large
    currentValue *= 10;
    currentValue += inByte - '0';
    return;
    }  // end of digit

  // not a digit, process possible sequences
  switch (inByte)
    {
      
    // hyphen
    case '-':
       switch (state)
         {
         case WANT_YEAR:
            year = currentValue;
            currentValue = 0;
            state = WANT_MONTH;
            break;
         case WANT_MONTH:
            month = currentValue;
            currentValue = 0;
            state = WANT_DAY;
            break;
         default:
            error ("Unexpected '-'");
            return;
         }  // end of switch on state
         break;
      
    // colon   
    case ':':
       switch (state)
         {
         case WANT_HOUR:
            hour = currentValue;
            currentValue = 0;
            state = WANT_MINUTE;
            break;
         case WANT_MINUTE:
            minute = currentValue;
            currentValue = 0;
            state = WANT_SECOND;
            break;
         default:
            error ("Unexpected ':'");
            return;
         }  // end of switch on state
         break;
         
    // the letter 'T'
    case 'T':
    case 't':
         if (state == WANT_DAY)
           {
           day = currentValue;
           currentValue = 0;
           state = WANT_HOUR;
           break;
           }
         else
           {
           error ("Unexpected 'T'");
           return;  
           }
         break;
         
    // carriage-return
    case '\r':
          break;  // ignore carriage-return
          
    // newline
    case '\n':
          switch (state)
             {
             case WANT_DAY:
               day = currentValue;
               setDate ();
               break;
               
             case WANT_MINUTE:
               minute = currentValue;
               second = 0;
               if (setDate ())
                 setTime ();
               break;
             
             case WANT_SECOND:
               second = currentValue;
               if (setDate ())
                 setTime ();
               break;
               
             default:
                error ("Unexpected end of line");
                break;  // drop back to wanting the year again
             }  // end of switch on state
           currentValue = 0;
           state = WANT_YEAR;
           break;
          
      // other 
      default:
           error ("Unexpected character");
           return;
    }  // end of switch on inByte
  }  // end of processIncomingByte

void setup ()
  {
  Serial.begin (115200);
  Serial.println ();
  Serial.println ("Starting ...");
  }  // end of setup

void loop ()
  {
  while (Serial.available () > 0)
    processIncomingByte (Serial.read ());
  // do other stuff here    
  }  // end of loop

My code does not use String at all, nor even character arrays. Instead by using a state machine you keep memory use to a minimum. This sort of technique is also handy for parsing things like GPS strings, incoming HTML, etc. where the incoming text might be lengthy (more than would fit in RAM even) but you only want a few salient details from it, like the latitude and longitude.

My view is that this sort of stuff (from your Github page) is confusing:

int nextIntField(char terminator){
  int fieldEnd = command.indexOf(terminator, fieldStart);
  if(fieldEnd != -1){
    field += command.substring(fieldStart,fieldEnd);
    fieldStart = fieldEnd + 1;
    int value = field.toInt();
    field.remove(0);
    return value;
  }
  else{
    return -1;
  }
}

int remainingIntField(){
  field += command.substring(fieldStart);
  int value = field.toInt();
  field.remove(0);
  return value;
}

If this is for beginners, you are using something simple - String - and then throwing in a lot of complex stuff like concatenation, indexOf, substring, toInt, remove, reserve, etc. which effectively destroys the simplicity you are aiming for in the first place.

A state machine, on the other hand, is more like how we humans process things. We look for the year, then the month, then the day and so on, which is what the state machine does.

Another approach you can use is to create wrapper functions around the stdlib functions for all the char string operations that you think would be useful for the students to complete the task.

It would make a good introduction to calling functions for the beginners, and advanced students could examine them, learn from them, and modify or create similar ones for themselves.

It would teach modularity, a cornerstone of good programming.

Another approach, which is closer to your use of tokenizing, is to use strtok, collect the input as a complete string, and then tokenize. This is more "string oriented" if that is your objective:

// Expected input format:
//    YYYY-MM-DDThh:mm:ss

const int MAX_INPUT = 20;

// show an error
void error (const char * why)
  {
  Serial.print ("ERROR: ");
  Serial.println (why);
  }  // end of error
  
unsigned int year, month, day, hour, minute, second;
  
bool setDate ()
  {
  // validation
 
  if (year < 1900 || year > 2050)
    {
    error ("Year out of range 1900 to 2050");
    return false;
    }
    
  if (month < 1 || month > 12)
    {
    error ("Month out of range 1 to 12");
    return false;
    }

  if (day < 1 || day > 31)
    {
    error ("Day out of range 1 to 31");
    return false;
    }
   
  Serial.print (F("Setting date to: "));
  Serial.print (year);
  Serial.print (F("-"));  
  Serial.print (month);
  Serial.print (F("-"));  
  Serial.println (day);
  return true;  // set date OK
  }  // end of setDate
  
void setTime ()
  {
  // validation
 
  if (hour > 23)
    {
    error ("Hour greater than 23");
    return;
    }
    
  if (minute > 59)
    {
    error ("Minute greater than 59");
    return;
    }

  if (second > 59)
    {
    error ("Second greater than 59");
    return;
    }
   
  Serial.print (F("Setting time to: "));
  Serial.print (hour);
  Serial.print (F(":"));  
  Serial.print (minute);
  Serial.print (F(":"));  
  Serial.println (second);
    
  }  // end of setTime
  
bool isNumeric (char * str)
  {
  while (*str)
    {
    if (!isdigit (*str))
     return false;
    else
      str++; 
    }  // end of while
  return true;
  }  // end of isNumeric
  
void process_data (char * date_time)
  {
  char * strYear;
  char * strMonth;
  char * strDay;
  char * strHour;
  char * strMinute;
  char * strSecond;
  
  // split into tokens
  strYear   = strtok (date_time, "-");
  strMonth  = strtok (NULL, "-");
  strDay    = strtok (NULL, "Tt");
  strHour   = strtok (NULL, ":");
  strMinute = strtok (NULL, ":");
  strSecond = strtok (NULL, "");
    
  if (strYear == NULL || strMonth == NULL || strDay == NULL)
    {
    error ("Invalid date");
    return;
    }
 
  if (!isNumeric (strYear))
    {
    error ("Year not numeric");  
    return;
    }
  if (!isNumeric (strMonth))
    {
    error ("Month not numeric");  
    return;
    }
  if (!isNumeric (strDay))
    {
    error ("Day not numeric");  
    return;
    }
    
  year  = atoi (strYear);
  month = atoi (strMonth);
  day   = atoi (strDay);
  
  // is the time supplied?
  if (strHour != NULL)
    {
    if (strMinute == NULL)
      {
      error ("Invalid time");
      return;
      }
        
    if (!isNumeric (strHour))
      {
      error ("Hour not numeric");  
      return;
      }
    if (!isNumeric (strMinute))
      {
      error ("Minute not numeric");  
      return;
      }

    hour  = atoi (strHour);
    minute = atoi (strMinute);

    if (strSecond)
      {
      if (!isNumeric (strSecond))
        {
        error ("Second not numeric");  
        return;
        }
      second   = atoi (strSecond);      
      }
    else
      second = 0;

    if (setDate ())
      setTime ();
    }  // end of having the time too
  else
    setDate ();
    
  }  // end of process_data
  
void processIncomingByte (const byte inByte)
  {
  static char input_line [MAX_INPUT];
  static unsigned int input_pos = 0;

  switch (inByte)
    {

    case '\n':   // end of text
      input_line [input_pos] = 0;  // terminating null byte
      
      // terminator reached! process input_line here ...
      process_data (input_line);
      
      // reset buffer for next time
      input_pos = 0;  
      break;

    case '\r':   // discard carriage return
      break;

    default:
      // keep adding if not full ... allow for terminating null byte
      if (input_pos < (MAX_INPUT - 1))
        input_line [input_pos++] = inByte;
      break;

    }  // end of switch
   
  } // end of processIncomingByte  

void setup ()
  {
  Serial.begin (115200);
  Serial.println ();
  Serial.println ("Starting ...");
  }  // end of setup

void loop ()
  {
  while (Serial.available () > 0)
    processIncomingByte (Serial.read ());
  // do other stuff here    
  }  // end of loop

Hello all, I'm putting together a reference programming example and I believe it represents good practice for the use of String objects to process commands sent over serial to set the time of a Clock (although feedback is welcome).

Keep up with your project. As you can see, most of the respondents are all over the place with their issues and suggestions. If your project helps new persons, it has performed its mission. Note that when newbies have String questions, the typical forum responders bash the newbie, but never post compilable replacement code to support whatever it is they are talking about.

... but never post compilable replacement code to support whatever it is they are talking about.

Apart from the two above posts by me, I presume you mean.

... whatever it is they are talking about ...

We are talking about people who post that their fish-feeder worked for 18 hours and and then stopped working. That sort of trivial stuff.

zoomkat:
As you can see, most of the respondents are all over the place with their issues and suggestions.

I don't see how this could be a bad thing. Wouldn't it provide a richer field from which to select promising avenues? To get diverse answers to a question, is a large part of the purpose of a public forum.

aarg:
I don't see how this could be a bad thing. Wouldn't it provide a richer field from which to select promising avenues? To get diverse answers to a question, is a large part of the purpose of a public forum.

I just stated my observations from being in this forum for a while, and provided my positive thoughts on the project, unlike others have done.

I wouldn't make any tutorial for newbies that use the String class.

I am not keen on teaching the use of String classes for beginners,

cefn:
@Robin2 you say "not on an Arduino" which suggests you have some kind of issue in mind. What is the issue with the way we have used String objects which means it's not best practice for the use of String objects? That's exactly what we're hoping to understand so if you have a handle on it, it would be great to hear your point of view, but it's hard to figure out from your post.

String objects are just a higher level convenience that hides from the programmer the nuts and bolts of handling strings - which is what C/C++ uses down in the basement where the guys stoke the furnace.

The code that makes String objects work was not written with 2k of RAM in mind.

A major problem in the Arduino system is the incompatibility of libraries - partly because people don't take any trouble to make them compatibe but also, and maybe more so, because it is hard to avoid overlaping demands when there are few resources. Everyone wants the cherry to put on his iced bun.

If you encourage people to be extravagant with resources because there is no obvious restriction for a small demo program that extravagance will come back to bite them when they want to build a more complex project. What's more, because they have not learned the use of char arrays they will not know what the problem is or how to fix it.

This is all said much better by Joel Spolsky in the Law of leaky abstractions.

Another, and separate reason for teaching newbies about strings (char arrays) is that it provides a very useful insight into the way the Arduino actually stores and uses data.

...R

zoomkat:
I just stated my observations from being in this forum for a while, and provided my positive thoughts on the project, unlike others have done.

Here's the problem, and it is a big one.

If you teach String class, it will work in simple applications. Applications where you don't get memory fragmentation or use up available memory.

So you get enthusiastic. You start to love String, like zoomkat does. You think it will solve every problem. So you start building a bigger and bigger application, and now available memory starts to shrink (heck, you only ever had 2k to start with, and once you use Serial, and Wire, and SD classes, your memory is probably down to 500 bytes).

Now the String class comes to bite you. Your code crashes intermittently. You blame the Arduino. You blame the compiler. You blame ... global warming.

The fact is that you have been sucked into using something simple, easy ... and ultimately dangerous.

Thanks for all the considered answers which I've found inspiring, insightful and incredibly useful.

The contributions since my last post (while I was asleep - UK time) have been extraordinary and offered plenty more resources for me to draw upon and explore. They've reassured me (indirectly) that there's no actual bugs in our code. They've also convinced me that there are bugs in our approach!

Although I don't have enough time to do it today, I'm actually looking forward to refactoring this code to remove String using state machines or tokenization, and I feel able to implement this in a comprehensible way, as well as properly representing the issues to the students I'm working with.

I was partly reading my nextIntField() implementation this from a Java programmer's point of view (the learners are also learning Java) so the complexity seemed very low, but reflecting on the inherent complexity of what's going on underneath, as well as the sheer number of different string operations employed to achieve this end, it's not so different to write comprehensible wrappers around string tokenization/state machines and seems to be directly required if we're targeting 1.0.5

Here's the problem, and it is a big one.

Here is the problem, and it is a big one. I'd say 90% of people posting in the forum are working on a project and are not here to learn c programming in itself. I try to supply working code to newbies to make their project start happening instead of beating them down for their approach to developing code for their project. If the project is simple, use simple coding approaches to keep moving on with the project. If the simple coding methods don't work, try more cryptic c methods and see if that fixes the problem. From what I've observed most issues initially blamed on the use of Strings (or simply trigger the c-sting rants) are not simply fixed by using c-strings. People should use what they understand to get the project underway, otherwise they will have to keep returning and deal with some forum participants that seem to have the single mission of trolling the poster. Those that can post working example code do, those that can't post working example code revert to being the forum trolls. Sad

zoomkat:
Here is the problem, and it is a big one. I'd say 90% of people posting in the forum are working on a project and are not here to learn c programming in itself.

Those that can post working example code do, those that can't post working example code revert to being the forum trolls. Sad

I have some sympathy for the first point. However if people don't already know C programming isn't it just as easy to teach them char arrays as to teach them Strings? I suspect most of the trouble with Strings is due to people bringing PC programming knowledge to the Arduino and I see nothing wrong with telling them the environment is very different.

I think I have a reasonable record of posting working code.

...R

zoomkat:
People should use what they understand to get the project underway, otherwise they will have to keep returning and deal with some forum participants that seem to have the single mission of trolling the poster.

So you aren't unduly concerned that the project, once underway, and with String throughout it, then fails after 10 hours? And then there is another forum post? And then they have to do the work twice, by removing String?

Why on earth would we troll people if their code is likely to work perfectly?

Those that can post working example code do ...

Which is exactly what I do. I post links to:

In this particular thread I wrote two examples from scratch, to illustrate this particular point. So flying off with the "troll" handle is a little unfair.