Using strings: porting from FreeBasic

I am confused by the difference between char, array of chars, char pointers and strings and when they should be used. I am ignoring String for now.

Here is a piece of FreeBasic code, that simply reads in several pairs of strings and puts them into two arrays. The first string is searched for and when matched, the second string is sent out a serial port.

[size=8pt][color=blue]// read in short description and corresponding SSC-32 command
Dim Move As String , CtrlrCmd As String;
Do While Not Eof(1)And ErrorFound =0 {
    Input #1, Move, CtrlrCmd;
    If UCase(Move) = "END" { Exit Do}  
    MoveIndex += 1;
    ReDim Preserve MoveDescription(MoveIndex)As String, MoveCommand(MoveIndex)As String;
    MoveDescription(MoveIndex) = UCase(Move);
    MoveCommand(MoveIndex) = CtrlrCmd+cr;       
}[/color] [/size]

How would this look in Arduino C? Anything like this?

[size=8pt][color=blue]// read in short description and corresponding SSC-32 command
char* MoveDescription[64][20];
char* MoveCommand[64][20];
char* Move[20];
char* CtrlrCmd[20];

Do While Not Eof(1)And ErrorFound =0 {
    Input #1, Move, CtrlrCmd;
    If UCase(Move) == "END"  break   
    MoveIndex += 1;
    MoveDescription[MoveIndex] = UCase(Move);
    MoveCommand(MoveIndex) = CtrlrCmd+cr;   // concatenation?   
}[/color] [/size]

What about the differences between char, array of chars, char pointers and strings? I think I understand char vs array of char, but the other formats and their usages are not clear.

There are major differences between Basic and C. As a starting point, I would read an online tutorial on C so you can see the syntax differences. Also, C has no intrinsic I/O functions like Basic. They are all part of the Standard C library. If you read a tutorial, you will see how important libraries are to C, where they almost don't exist in many dialects of Basic.

As to which tutorial to select, that's a matter of personal preference, plus I'm a little biased. Google "C tutorial" and read a little and, if you don't like what you read, try another until you find one that makes sense to you. After you complete the tutorial, chances are you won't even need to come back here with your question.

I rushed my post. Didn't mean to use the Basic intrinsic input function in my C example. I've written programs in both Basic and C, but none of yet required string manipulation.

So ignore the input function. What else do I need to do to accomplish my objective. I can provide a simpler example if that will make communications easier.

The Arduino IDE and its underlying compiler, recognizes strings in the C sense, which are null-terminated character arrays. However, because the underlying compiler also supports C++, you can also use the String class. (Note the uppercase 'S'.) In most cases, I think most prefer the character array approach to using string data in C.

There are numerous string processing functions available to you from the C library. Most of these begin with "str" (e.g., strcmp(), strcat(), strtok(), etc.) There are also a number of preprocessor macros that look like function but are actually macro, such as toupper(), tolower(), isalpha(), etc. If you take a little time and do some Google searching, you'll likely find what you need. A redim is another animal, as there is no underlying op system to do garbage collection for you.

I am ignoring String for now.

Forever, you mean.

I am confused by the difference between char, array of chars, char pointers and strings and when they should be used.

A char holds one character. A char array can hold more than one character, if the size is greater than one. A string is a char array that is NULL terminated. A pointer is another matter, altogether.

Imagine a post office, with a row of mail boxes. You can put a letter in a box (that is a char). You can put a bunch of letters in a bunch of successive boxes (that's an array). You can put a bunch of letters in successive boxes, with a special letter (in a red envelope, let's say) in the last box that indicates that that is the end of the bunch of letters (that's a string).

A pointer is like a 3x5 card with a post office address and box number on it. You go to that address, and find that box number to get the first letter. You open the adjacent boxes to get the rest of the letters, until you get to the one with the red envelope. Then, you have the string pointed to by the pointer.

Suppose you have the following code in a program:

int val;
int *ptr;

Now, take a clean sheet of paper and on the right side near the top write val. Draw a 45 degree line from the base of val down and to the right. Label the line rvalue. The term comes from assembly language days and stands for "register value". Now draw another 45 degree line from the base of val down and to the left and label it lvalue. (From assembly again for "location value".) Let's pretend the compiler placed val at memory location 1000, so write that number at the end of your lvalue line. My Bucket Analogy likens what you've drawn to a bucket: the lvalue tells you where the bucket is located in memory, the rvalue tells you what is inside the bucket, and the int type specifier tells you that the bucket is big enough to hold 2 bytes of data. (An int on other platforms might be a 4 byte bucket.)

Now on the left side of the sheet, draw the same lvalue/rvalue diagram for ptr. We'll pretend that the compiler placed this variable at memory location 1050, so write that number at the base of its lvalue. Now consider this code:

val = 10;

To perform this assignment, the code need to know what value to assign and where to put it in memory. Assignments change the rvalue of a variable, so the code "goes" to memory address 1000, takes 2 bytes of data that has a binary 10 in them, and pours them into val's bucket. The rvalue of val is now 10.

Now consider the statement:

ptr = &val;

This is also an assignment, but fashioned for a pointer assignment. The address of operator (the ampersand, &, in front of val) says that, instead of doing an rvalue-to-rvalue assignment, get the address of val (i.e., 1000) and assign it to ptr. This means that ptr now has an rvalue that is the memory address of where the val bucket resides in memory. This leads to an important fact about pointers: A valid pointer should only store one of two things: 1) a null value, which means the pointer points to nothing useful, or 2) a valid memory address where some data are stored.

Now consider the statement:

*ptr = 20;

The indirection operator () says: "Go to the memory address held in *ptr's rvalue (i.e., 1000), and place 2 bytes of data there which holds the value of 20." The process, called indirection, allows you to change the rvalue of one variable (val) using a pointer (ptr). Note that the type specifier for ptr (an int) is critical, because it tells the pointer how big the bucket is. If by mistake you define ptr as:

long ptr;

the assignment using indirection would likely fail because you are trying to pour 4 bytes of data into a 2 byte bucket. Also, something like:

ptr++;

always changes the rvalue of the pointer by the size of its type specifier, or a 2 byte displacement for an int but a 4 byte change for a long or a float.

I hope this helps.

A lot dependson what you want to do with the array of characters at a later stage. There are many C/C++ functions that are designed to work on a "string" - an array of characters with a byte of 0 as then end marker. However if you only intend to process the data as an array of chars the end marker may be unnecessary.

One thing to bear in mind is that C/C++ doesn't automatically extend character arrays to make more space for a longer "string". This is especially important on the Arduino where SRAM memory space is very limited.

I suggest creating a char array long enough for your longest "string" + 1 extra char for the null terminator. You can easily store shorter strings - that just depends on where the terminator is.

...R

This is helpful. What I am trying to do is create an array of array of char, to hold a list of text strings. For example…

"Dogs drool" "Cats rule" "Hamsters sleep" "Lizards leap"

Then I want to be able to search through this list for a match. For the first part, I would define the array of char arrays as… char* myList[4]15

? dj

You’ve got it right, but I added some alternative ways to do things:

void setup()
{
  char myList[][15] = {
    "Dogs drool",
    "Cats rule",
    "Hamsters sleep",
    "Lizards leap"
  };
  Serial.begin(9600);
   for (int i = 0; i < sizeof(myList) / sizeof(myList[0]); i++)
    Serial.println(myList[i]);
}  //End setup

You can leave the first array rank empty and the compiler can fill its value in. That way, if you add more strings to the initializer list, it automatically adjusts the size. Note that the for loop calculates the proper number of elements in the array. Again, this is so you don’t need to set it if you change the list.

Thanks. I find this example very helpful. dj

I have this working in an Arduino simulator.

Since all these strings won’t be changed, copied, compared, etc… I defined them as constants. This also saves me SRAM memory which I need for other data structures. I had plenty of flash memory available in my program, since it uses only about 25% of code space. (Or does it; where is the text stored?)

My understanding is that this definition creates a scalar array of pointers to the actual text, hence my array only has one dimension. It also allows me to refer to the text messages with a single index.

One more non-critical question. Why does the simulator keep complaining about the “};” line after my constant array definition?

const char* MoveCommand[]={
"#0P1500#1P1500#2P1000#3P570#4P1500#5P2400#6P510#8P1500#9P1500#10P1500#11P1400T1000",
"#2P1100",
"#0P2000 #1P1100 T100",
"#2P750",
"#9P860S3000 #11P2500S1500",
"#9P900S2500 #11P1200S1500 #8P900S3000 #10P900S2500",
"#9P1500S2000 #11P1500S1000 T500",
"#9P2200S3000 #11P870S1500",
"#10P1100S1667 #8P2250S1667 T500",
"#10P1500S1667 #8P1500S1667 T500",
"#10P2200S1667 #8P750S1667 T500",
"#8P1500S1667 #9P1500S2000 #10P1500S1667 #11P1500S1000 T500",
"#8P1500S1667 #9P1500S2000 #10P1500S1667 #11P1500S1000 T1500",
"#5P2500 #6P650 T100",
"#4P2200",
"#4P1500",
"#4P750",
"#4P2200T1000",
"#9P900S2000  #11P1200S1000 #8P1000S1667 #10P900S1667  T300",
"#3P570",
"#3P600",
"#3P680",
"#3P790",
"#3P1005"
};
 
 const char cr=13;


void setup()
{ 
  Serial.begin(9600);
}

void loop()
{ 
  for (int i=0;i<63;++i) { 
  Serial.print("Value[");
  Serial.print(i);
  Serial.print("]=");
  Serial.print(MoveCommand[i]);
  Serial.println(cr);
  }
while(1){};
}

Can you have an array of variable length arrays?

...R

It compiles and simulates fine.

My understanding is that it is an array of pointers to character arrays. Not an array of arrays. Subtle difference, but what fit my requirements.

Robin2:
Can you have an array of variable length arrays?

…R

Not really… and it surely can be wasteful on resources.
Using Quincy 2005 (free download) on 32-bit Windows, the following code compiles:

#include <stdio.h> 
#include <conio.h> 
#include <iostream>	// std::cout

using namespace std ;

// allocating WORST CASE length ... wasteful for short strings
char myList[][15] = {
    "Dogs drool",
    "Cats rule",
    "Hamsters sleep",
    "Lizards leap"
  };

int main()
{
	cout << "Beginning test run... \n\n\n" ;
	int size = sizeof(myList) ;
	cout << "Size of []:" << size << "\r\n" ;

	for (int i = 0; i < sizeof(myList) / sizeof(myList[0]); i++)
	{
    	cout << i << "= " << myList[i] << "\n\r";
	}

	for (int i = 0; i < sizeof(myList) / sizeof(myList[0]); i++)
	{
    	cout << myList[i] ;
	}

   return 0;
}

and produces this console output:

Beginning test run…

Size of []:60
0= Dogs drool
1= Cats rule
2= Hamsters sleep
3= Lizards leap
Dogs droolCats ruleHamsters sleepLizards leap
Press Enter to return to Quincy…

So the 46+4 nulls (\0) = 50 characters actually took 60 bytes of storage because each line was blocked at 15 characters!

Ray

djsfantasi:
It compiles and simulates fine.

My understanding is that it is an array of pointers to character arrays. Not an array of arrays. Subtle difference, but what fit my requirements.

In your example, the program compiles but blows up in the loop at index = [24].

Beginning test run…

Size of :96
Value[0]=#0P1500#1P1500#2P1000#3P570#4P1500#5P2400#6P510#8P1500#9P1500#10P1500#1
1P1400T1000
Value[1]=#2P1100
Value[2]=#0P2000 #1P1100 T100
Value[3]=#2P750
Value[4]=#9P860S3000 #11P2500S1500
Value[5]=#9P900S2500 #11P1200S1500 #8P900S3000 #10P900S2500
Value[6]=#9P1500S2000 #11P1500S1000 T500
Value[7]=#9P2200S3000 #11P870S1500
Value[8]=#10P1100S1667 #8P2250S1667 T500
Value[9]=#10P1500S1667 #8P1500S1667 T500
Value[10]=#10P2200S1667 #8P750S1667 T500
Value[11]=#8P1500S1667 #9P1500S2000 #10P1500S1667 #11P1500S1000 T500
Value[12]=#8P1500S1667 #9P1500S2000 #10P1500S1667 #11P1500S1000 T1500
Value[13]=#5P2500 #6P650 T100
Value[14]=#4P2200
Value[15]=#4P1500
Value[16]=#4P750
Value[17]=#4P2200T1000
Value[18]=#9P900S2000 #11P1200S1000 #8P1000S1667 #10P900S1667 T300
Value[19]=#3P570
Value[20]=#3P600
Value[21]=#3P680
Value[22]=#3P790
Value[23]=#3P1005
Value[24]=
Press Enter to return to Quincy…

edited:

#include <stdio.h> 
#include <conio.h> 
#include <iostream>	// std::cout

using namespace std ;

// allocating WORST CASE length
const char* MoveCommand[]={
"#0P1500#1P1500#2P1000#3P570#4P1500#5P2400#6P510#8P1500#9P1500#10P1500#11P1400T1000",
"#2P1100",
"#0P2000 #1P1100 T100",
"#2P750",
"#9P860S3000 #11P2500S1500",
"#9P900S2500 #11P1200S1500 #8P900S3000 #10P900S2500",
"#9P1500S2000 #11P1500S1000 T500",
"#9P2200S3000 #11P870S1500",
"#10P1100S1667 #8P2250S1667 T500",
"#10P1500S1667 #8P1500S1667 T500",
"#10P2200S1667 #8P750S1667 T500",
"#8P1500S1667 #9P1500S2000 #10P1500S1667 #11P1500S1000 T500",
"#8P1500S1667 #9P1500S2000 #10P1500S1667 #11P1500S1000 T1500",
"#5P2500 #6P650 T100",
"#4P2200",
"#4P1500",
"#4P750",
"#4P2200T1000",
"#9P900S2000  #11P1200S1000 #8P1000S1667 #10P900S1667  T300",
"#3P570",
"#3P600",
"#3P680",
"#3P790",
"#3P1005"
};
 
 const char cr=13;

int main()
{
	cout << "Beginning test run... \n\n\n" ;
	int size = sizeof(MoveCommand) ;
	cout << "Size of []:" << size << "\r\n" ;

  	for (int i=0;i<63;++i) 
  	{ 
  		cout << "Value[" ;
  		cout << i ;
  		cout << "]=" ;
  		cout << MoveCommand[i] ;
  		cout << "\r\n" ;
  	}

   return 0;
}

Typo in the program. The test in the for statement should be “i<24”, not “i<63”. It blows up because there are only 23 strings defined. The full structure in my full program has 63 elements; the test program only has 23.

djsfantasi:
Typo in the program. The test in the for statement should be “i<24”, not “i<63”. It blows up because there are only 23 strings defined. The full structure in my full program has 63 elements; the test program only has 23.

Yes, I was aware of the issue. Typos happen.

Ray

djsfantasi: I defined them as constants. This also saves me SRAM memory which I need for other data structures.

I don't think so. You'll need to define them using the PROGMEM keyword to actually save any SRAM (see http://arduino.cc/en/Reference/PROGMEM).

I could be wrong, but isn't that what PROGMEM is for?

Regards,

Brad KF7FER