Issues creating a library with char arrays

Hello,

I'm pretty new to writing libraries for Arduino. Reason I'm writing a library is because my code is getting messy and too long. I was using Strings before in my code, but since I read a lot of things to not use these, because of memory issues, heap fragmentation etc I stopped using them (my ESP8266 also kept overflowing over and over..).

So long story short I am writing a class in the library that needs to do a lot of data parsing, concentating, finding certain key words etc. However I have not been able to get far because of the many many issues I ran into using char arrays in a library. To start of I can't declare it in my header file, since that will give an error. Not too much of a problem since I can move it to a set/get public function. However when I try to modify the data to something useful I keep running into overflows, errors and issues. My question is, how should I do this properly, and safe (memory-wise)?

Here an example of the class .h file (I editted the class a little bit to rename sensitive vars and remove some unused/unrelevant stuff):

#ifndef Test_h
#define Test_h

#ifndef stdlib_h
#include <stdlib.h>
#endif


class Test
{
	public:
		Test(int ID);
		char *request_msg(void);
		void process_msg(char *payload);
		
		int ID = 0;
		int a = 1500;
		int b = 1500;
		
}
#endif

Here the .cpp file:

#include "Test.h"

#ifndef stdio_h
#include <stdio.h>
#endif

#ifndef string_h
#include <string.h>
#endif

Test::Test(int ID)
{
  // initialize this instance's variables
  this->ID = ID;
}

char* Test::request_msg(void)
{
  char buffer[25];
  snprintf(buffer, sizeof(buffer), "init_r=%d", ID);
  return buffer;
}

void Test::process_data(char *payload)
{
  const char delimiter[] = "=";
  char parsedStrings[6][20];
  char *token = strtok(payload, delimiter);                     // Points string after delimter
  strncpy(parsedStrings[0], token, sizeof(parsedStrings[0]));   // Copy the first one
  for (int i = 1; i < 7; i++)
  {
    token = strtok(NULL, delimiter);
    strncpy(parsedStrings[i], token, sizeof(parsedStrings[i]));
  }

  this->a = atoi(parsedStrings[2]);
  this->b = atoi(parsedStrings[3]);
}

Now I'm aware that for example request_msg() should use a buffer defined in the header, since it will now return rubbish since it's only defined in there. And once the I call it in my code now already other data could be written in those allocations. About the process_data() function I'm completely clueless how to get this working. I want to parse incoming data like: identifier=text=20=5=5.0 and then it should parse it into char arrays of:

  • text
  • 20
  • 5
  • 5.0

which I later convert to float/int etc. This all works just fine in a .ino code I wrote, but here I'm clueless..

Any help or tips on to write this would be appreciated!

when you do this, you allocate the buffer on the stack and return a pointer to the caller... but that memory is just transient and the stack gets reused for the next operations... so the pointer you gave to the caller is useless
you need to have memory statically allocated for your instance (one option) or let the caller provide the buffer (and a size) and you fill that in for the caller.

Now I’m aware that for example request_msg() should use a buffer defined in the header, since it will now return rubbish since it’s only defined in there.

Yes you are right. I already tried some by using a buffer defined in my header file but couldn't get it working. Do you have an example? Do you maybe also know how I can get the process_data working?

you could define the buffer inside the function as static but then the function would not be re-entrant. similarly, you could define it as private to the class.

why not have the caller pass in a buffer along with it's size?

Could you give an example? I'm not sure whether I understand you :slight_smile:

Edit: This seems to work when I add buffer in my class. But it doesn't work when I use in the snprintf function sizeof(buffer) but when I use a const length of 25 it seems to work. I'm not sure whether this is how I should do it:

class Test
{
	private:
        char* buffer = (char*) malloc(100);
}

char* Player::request_msg(void)
{
  const int len = 25;
  snprintf(this->buffer, len, "init_r=%d", ID);
  return this->buffer;
}

The other function process_data() still leaves me clueless, it's instantly crashing the ESP8266.

Little tip..

When starting a base class library.. include Arduino.h at the top. That'll give you most all of the goodies you are used to having for programming these things.

-jim lee

This doesn't do anything, just use #include <stdlib.h>.

This is much worse than using Strings. It dynamically allocates, just like the String class, and on top of that it causes memory leaks.
You should avoid manual calls to memory allocators at all costs (and definitely don't use malloc in C++). Use containers that manage your memory for you, so you never have to worry about memory leaks. You can use a String or an std::string in this case, just reserve the memory beforehand.

Why? He included all he needs, why pull in all the unnecessary Arduino code?

that's just not the way to do it

  • why malloc a buffer when it could simply be defined as a character string
  • this isn't the way malloc() is intended to be used. it is intended for allocating unknown at compile time size buffers or an unknown number of buffers.
  • why define the size of the buffer in the function instead of defining it as a class member, if you were to do that
  • of course you're using snprintf() because you think this is more professional. but there's no need if the function has control of the data and knows the size of the buffer

the conventional, professional way to do something like this is for the caller function to pass a pointer to a buffer and its size. using this approach, the function should use snprintf to populate the buffer limited to the size of the buffer.

if you really want to use malloc(), that function could malloc() an appropriate size buffer and return it to the caller. but now the caller is responsible for freeing the buffer. (is there a better approach? see above)

Thanks for pointing out, I wasn't aware.

That sounds good, but I'm not sure how to implement this, do you have an example? I would like this method and prevent unnecessary data allocation or overflows.

So.. your bringing in a string then cutting it up into a bunch of tokens? Text, 20 5 5.0 and you then want the to be a list of strings and values? Just trying to get straight what your overall plan is. (In human)

-jim lee

the code below produces the following

main: my secret mysterious message
main: my secret mys
#include <stdio.h>
#include <string.h>

int
getMsg (
    char *buf,
    int   size )
{
    snprintf (buf, size-1, "my secret mysterious message");
    return strlen (buf);
}

int
main ()
{
    char s [100];

    getMsg(s, sizeof(s));
    printf ("%s: %s\n", __func__, s);

    getMsg(s, 15);
    printf ("%s: %s\n", __func__, s);
}

If you want to prevent overflow, you let the compiler figure out the size for you:

#include <stdio.h>
#include <string.h>
#include <stddef.h>

size_t getMsg(char *buf, size_t size) {
  return snprintf(buf, size, "my secret mysterious message");
}
template <size_t N>
size_t getMsg(char (&buf)[N]) {
    return getMsg(buf, N);
}

int main() {
  char s[10];
  getMsg(s);
  puts(s);
}
my secret

As a side note, snprintf expects the buffer size, not the buffer size minus 1: std::printf, std::fprintf, std::sprintf, std::snprintf - cppreference.com

At most buf_size - 1 characters are written. The resulting character string will be terminated with a null character, unless buf_size is zero. If buf_size is zero, nothing is written and buffer may be a null pointer, however the return value (number of bytes that would be written not including the null terminator) is still calculated and returned.

I'm receiving data in the following format answer=sometext=50=100=30=5.5=3.5 where all these values need to be split up after the delimiter = in the function process_data(). Then the next step is to convert them to either float or ints (or leave them as text), the position of these values are fixed, so position value 3 will always be an int for example. This function works already in my .ino sketch but no in the library sketch, it causes an instant overflow on my ESP8266, no clue why.

Thanks! A question to begin with, the function getMsg() could also be a void and not return anything and still work right? Since returning the strlen of buf is not necessary, and you send in a pointer so the function is directly writing in the buffer we're passing.

This seems to work just fine :slight_smile:, and then in other functions I can just use the same buffer in the caller I use in this function so it can overwrite the buffer?

Thanks, this looks pretty new to me. size_t automatically gets the buffer size? Does this happen because you define the function getMsg() as size_t?

Now the function request_msg() is resolved, process_data() to go :).

No, size_t is just a data type like int or float. It is the type used to store sizes of arrays etc. It has no special semantics, usually it's equivalent to unsigned long.
The buffer size is deduced because of the use of the function template.

char (&buf)[10] declares buf as a reference to an array of 10 characters. You need to pass the array by reference, since you want the result to be written to it.
Because of the [10], this function would only work for buffers of length 10, if you want it to work for any size, you introduce a template parameter N of type size_t. The compiler automatically determines the value of N for you whenever you call the function.

puts is just “put string”, it prints the given string followed by a newline.

Ahh wow now I understand your previous post a lot better. Its first going through the function to determine the buffer size, and then the actual function with the snprintf, interesting.
How does the template <size_t N> work? I'm getting compile errors that N is not defined. How should I define that in my header file? I added this to my header file now:

size_t request_msg(char* buffer, size_t size);
size_t request_msg(char (&buf)[N]);

You forgot to add the template <size_t N>. You need all of this in your header:

size_t request_msg(char* buffer, size_t size);
template <size_t N>
size_t request_msg(char (&buf)[N]) {
    return request_msg(buf, N);
}

The actual implementation of request_msg(char*, size_t) can be in your implementation file (.cpp), the template has to be in your header.

The template <...> lets the compiler know that what follows is a function template, not an ordinary function. The arguments between the angled brackets <...> are the template arguments. These can either be types or values, and the compiler will deduce these arguments for you whenever you call the function template.

Works like a charm, thanks! I don't understand it 100% but I'll google some more info on how templates, [X] and size_t works :slight_smile: .

Now do you have an idea why my second function process_data doesn't work? I call it with this test sample data I put in:

uint8_t payload[100] = "init_answer=Yes=25=75=50=2.5=6.5";  
char* _payload = (char*) &payload[0];

Now I use this as test data since from the WebSocket I'm receiving a uint8_t payload that I convert to a char* like that in my .ino code, where I copy pasted the process_data function from that works perfectly to parse all the data. However in my library it doesn't work, the crash seem to happen in the for loop.

There are quite some issues with process_data. See my comments:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main() {
  char payload[100] = "init_answer=Yes=25=75=50=2.5=6.5";  
  const char delimiter[] = "=";
  char parsedStrings[7][20] {}; // Note initialization to '\0' and size
  char *token = strtok(payload, delimiter);
  for (auto &parsedString : parsedStrings) { // Range-based for to prevent index out of bounds error
    if (token) // You cannot pass nullptr to strncpy
        strncpy(parsedString, token, sizeof(parsedString) - 1); // Size of buffer - 1, not just size of buffer
    // If you don't initialize parsedStrings to '\0', you have to null-terminate here, strncopy doesn't do that for you!
    token = strtok(nullptr, delimiter);
  }

  int a = atoi(parsedStrings[2]); // Shouldn't you check if there's a valid number in the string?
  int b = atoi(parsedStrings[3]);

  // Print the results
  for (const auto&parsedString : parsedStrings)
    puts(parsedString);
  printf("a=%d, b=%d\n", a, b);
}

I highly recommend checking out the sanitizers in your compiler (not on ESP8266 of course, but when you run it on your computer):
Running your original code with the sanitizers turned on: Compiler Explorer
My version: https://godbolt.org/z/6xzsoTdhs

==1==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe65731d18 at pc 0x00000044f1f4 bp 0x7ffe65731bd0 sp 0x7ffe65731380
WRITE of size 20 at 0x7ffe65731d18 thread T0
    #0 0x44f1f3  (/app/output.s+0x44f1f3)
    #1 0x4fb481  (/app/output.s+0x4fb481)
    #2 0x7f90f759b0b2  (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #3 0x41f2fd  (/app/output.s+0x41f2fd)

Address 0x7ffe65731d18 is located in stack of thread T0 at offset 312 in frame
    #0 0x4fb07f  (/app/output.s+0x4fb07f)

  This frame has 3 object(s):
    [32, 132) 'payload' (line 6)
    [176, 178) 'delimiter' (line 7)
    [192, 312) 'parsedStrings' (line 8) <== Memory access at offset 312 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/app/output.s+0x44f1f3) 
Shadow bytes around the buggy address:
[...]

Ok.. From what I read here, this is the functionality you are looking for? Always these 7 bits? But different values?

like this? (ino version)

#define  BUFF_BYTES   100
#define  DELIM       "="

struct theBits {
   char* str1;
   char* str2;
   int   int1;
   int   int2;
   int   int3;
   float float1;
   float float2;
};

char     inBuff[BUFF_BYTES];
theBits  parsedBits;


void parseStr(char* payload) {

   char *token;
   
   if (payload) {
      strcpy(inBuff,payload);
       parsedBits.str1 = strtok(inBuff, DELIM);
       parsedBits.str2 = strtok(NULL, DELIM);
       token = strtok(NULL, DELIM);
       parsedBits.int1 = atoi(token);
       token = strtok(NULL, DELIM);
       parsedBits.int2 = atoi(token);
       token = strtok(NULL, DELIM);
       parsedBits.int3 = atoi(token);
       token = strtok(NULL, DELIM);
       parsedBits.float1 = atof(token);
       token = strtok(NULL, DELIM);
       parsedBits.float2 = atof(token);
   }
}


void setup(void) {

   parseStr("answer=sometext=50=100=30=5.5=3.5");
   Serial.println("Output");
   Serial.print("str1 : ");
   Serial.println(parsedBits.str1);
   Serial.print("str2 : ");
   Serial.println(parsedBits.str2);
   Serial.print("int1 : ");
   Serial.println(parsedBits.int1);
   Serial.print("int2 : ");
   Serial.println(parsedBits.int2);
   Serial.print("int3 : ");
   Serial.println(parsedBits.int3);
   Serial.print("float1 : ");
   Serial.println(parsedBits.float1);
   Serial.print("float2 : ");
   Serial.println(parsedBits.float2);
}

void loop() { }

-jim lee

If you are doing a lot a text parsing and moving away from Strings then check out my SafeString library which provides similar Arduino String text parsing methods, and more, but works with static char[.] underneath.

One of the advantages of SafeStrings is that you don't have to pass the buffer size around to all your methods. SafeStrings also have better text to int conversion routines then atoi()
SafeString also gives you precise error msgs if you get the char[.] sizes wrong.

BTW if you have a sketch for the ESP8266 using Arduino String that was causing you problems can you msg me with it.
My Taming Arduino Strings covers how to avoid String memory problems and I am interested in peoples' real life examples to check the solutions work in those cases.