Writing/Reading a looooooong String to/from Program Space

Hi folks --

Running this on an UNO.

I have a long string -- the longer I can make it, the better -- and right now it seems to be choking the dynamic memory if I use more than 750 or so characters (or a Global variable total over 52% of the whole).

Since I don’t need to do anything other than read the string when the program’s running, Program Space seemed like a perfect solution. I’m just slightly mystified by the syntax (I’m NOT a programmer, I’m an artist). So right now I have this:

String dnaString = "CGGAC...GGATTTA"; // the longer the string the better!

// some other stuff, then

if (dnaBase < dnaString.length()) {
    if (dnaString.charAt(dnaBase) == 'A') { // etc...

So I have this link (found elsewhere on the forum) avr-libc: Data in Program Space, but I can’t quite follow it, especially when it comes to getting the length AND the charAt of the string back out....

I guess I could just hard-code the length, but it would be better later if I don’t do that for obvious reasons...

Thanks in advance!

I know you can put variables in program memory. You however cannot change them on a micro controller. I put variables in program memory on an 8051 based board before. To get started I would look at the link below. It should show the syntax, etc. of what you need for an Arduino board.

I am curious about what you are working on. Ill be checking for updates. More information on what you need the memory for would be helpful.

If you want sequences of characters, C strings are much simpler to store and recall from progmem than Strings.

String ---> GAME OVER!

String objects are created in RAM. Always!

Writing ---> GAME OVER!

PROGMEM flash memory is "read only". What's in PROGMEM is uploaded with your compiled program and is a constant that cannot be changed during runtime.

The only type of strings you can have in PROGMEM, are nullterminated C-strings.

The PROGMEM memory is limited to 64 KB.
A single array could be max 32 KB.
So if your controller has enough flash memory, you could have two C-strings each 32 KB long.

Oops: fixed the link in my post, somehow i dropped the “L” at the end... :-[

Delta_G:
Certainly considering that for almost all of the standard c-string methods they have the _P versions that can handle working from progmem in ways that look just like working from SRAM.

Okay, sounds like a plan! Now... What’s a c-string when it’s at home? The only result that seemed relevant and/or applicable was this one: http://linux.dd.com.au/wiki/Arduino_Static_Strings
So... Still help? Like I said, artist, not programmer. Just need to give the Arduino a whole bunch of letters then read them back one at a time. THANKS ALL! :slight_smile:

What's a c-string when it's at home?

A NULL terminated array of chars.

So... Still help? Like I said, artist, not programmer. Just need to give the Arduino a whole bunch of letters then read them back one at a time. THANKS ALL!

Perhaps it might be best to explain just what your code is trying to accomplish. Maybe you don't need the long character string approach you are using.

I second zoomkat's suggestion above, but here is an example of defining long byte arrays in program memory (from the Talkie voice synthesis project). C-strings end with a zero byte, of course. Each definition is on one line, but it need not be.

    const uint8_t spZERO[] PROGMEM = {0x69,0xFB,0x59,0xDD,0x51,0xD5,0xD7,0xB5,0x6F,0x0A,0x78,0xC0,0x52,0x01,0x0F,0x50,0xAC,0xF6,0xA8,0x16,0x15,0xF2,0x7B,0xEA,0x19,0x47,0xD0,0x64,0xEB,0xAD,0x76,0xB5,0xEB,0xD1,0x96,0x24,0x6E,0x62,0x6D,0x5B,0x1F,0x0A,0xA7,0xB9,0xC5,0xAB,0xFD,0x1A,0x62,0xF0,0xF0,0xE2,0x6C,0x73,0x1C,0x73,0x52,0x1D,0x19,0x94,0x6F,0xCE,0x7D,0xED,0x6B,0xD9,0x82,0xDC,0x48,0xC7,0x2E,0x71,0x8B,0xBB,0xDF,0xFF,0x1F};
    const uint8_t spONE[] PROGMEM = {0x66,0x4E,0xA8,0x7A,0x8D,0xED,0xC4,0xB5,0xCD,0x89,0xD4,0xBC,0xA2,0xDB,0xD1,0x27,0xBE,0x33,0x4C,0xD9,0x4F,0x9B,0x4D,0x57,0x8A,0x76,0xBE,0xF5,0xA9,0xAA,0x2E,0x4F,0xD5,0xCD,0xB7,0xD9,0x43,0x5B,0x87,0x13,0x4C,0x0D,0xA7,0x75,0xAB,0x7B,0x3E,0xE3,0x19,0x6F,0x7F,0xA7,0xA7,0xF9,0xD0,0x30,0x5B,0x1D,0x9E,0x9A,0x34,0x44,0xBC,0xB6,0x7D,0xFE,0x1F};

;

If you are representing the nucleobases in a DNA sequence you shouldn't be using characters or strings, you can encode two nucleobases in 1 byte:

#define NBASE_A 0x00
#define NBASE_C 0x01
#define NBASE_G 0x02
#define NBASE_T 0x03

This allows you to store a nucleobase using only 2 bits and four nucleobases in 1 byte using bitwise shift operator:

uint8_t couple = (NBASE_A << 6) & (NBASE_C << 4) & (NBASE_C << 2) & NBASE_T; // ACCT sequence
uint8_t anotherCouple = (NBASE_G << 6) & (NBASE_A << 4) & (NBASE_C << 2) & NBASE_A; // GACA sequence

This means you will be able to store a DNA four times long the same DNA represented as a string.

If you define all the permutations (256) it will be easy to define the sequence in progmem

#define NB_AAAA 0x00
#define NB_AAAC 0x01
#define NB_AAAG 0x02
#define NB_AAAT 0x03
#define NB_AACA 0x04
#define NB_AACC 0x05
#define NB_AACG 0x06
#define NB_AACT 0x07
#define NB_AAGA 0x08
#define NB_AAGC 0x09
#define NB_AAGG 0x0A
#define NB_AAGT 0x0B
#define NB_AATA 0x0C
#define NB_AATC 0x0D
#define NB_AATG 0x0E
#define NB_AATT 0x0F
#define NB_ACAA 0x10
#define NB_ACAC 0x11
#define NB_ACAG 0x12
#define NB_ACAT 0x13
#define NB_ACCA 0x14
// keeps going, but you got the point

const uint8_t DNA[] PROGMEM = {NB_CAGT, NB_TGCA};

It might be worth writing a little program to generate the above sequence and store it in a separate header file you will be able to include.

So you are trying to store a long string in PROGMEM and to use it in chunks because working memory is limited.

The issue, I suspect, is that C strings are sequences of characters terminated with a NUL, a '\0'. All of the various string functions (length, strcpy and so on) rely on this. If your array in progmem is not terminated with nuls, then you have to use the memcpy functions and you need to supply those functions with the length explicitly.

So store your array in PROGMEM, use memcpy to put it into a char array whose size is one byte longer than the chunk you are working with, and explicitly set that last byte to '\0'.

Here is a complete example

#include <avr/pgmspace.h>

// I have put an exclamation point to make the size  
// an odd number, to confirm that the code works ok
const char dna[] PROGMEM  = {"DNA-DNA-DNA-DNA-DNA-DNA-DNA-DNA-DNA-DNA-DNA-DNA-DNA-DNA-DNA-DNA-DNA-DNA!"};

void setup() {
  Serial.begin(9600);
  Serial.println("PAUSING TO GIVE YOU TIME TO OPEN THE SERIAL CONSOLE :)");
  for (int i = 5; i >= 0; i--) {
    delay(1000);
    Serial.print(i);
    Serial.print(' ');
  }
  Serial.println();
  Serial.println("begin!");

  // read chunks from dna and work on them 10 bases at a time

  const int CHUNKSIZE = 10; // CUNKSIZE *must* be an even number!!
  char workingSpace[CHUNKSIZE + 1];
  int baseIndex = 0;

  while (baseIndex < sizeof(dna)) {
    int chunk;
    if (baseIndex + CHUNKSIZE > sizeof(dna)) {
      chunk = sizeof(dna) - baseIndex;
    }
    else {
      chunk = CHUNKSIZE;
    }
    memcpy_PF(workingSpace, ((uint_farptr_t)dna) + baseIndex, chunk);
    workingSpace[chunk] = '\0';

    Serial.print("Got a chunk [");
    Serial.print(workingSpace);
    Serial.println("]");
    
    baseIndex += CHUNKSIZE;
  }
}

void loop() {}

amateur6:
So... Still help? Like I said, artist, not programmer. Just need to give the Arduino a whole bunch of letters then read them back one at a time. THANKS ALL! :slight_smile:

So here is a programming example for a long string in PROGMEM and how to read the chars back and print on Serial:

const char dnaString[] PROGMEM =
"ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCC"
"CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGC"
"CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGG"
"AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCC"
"CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAG"
;

void setup() {
  Serial.begin(9600);
  for (int dnaBase=0; dnaBase<strlen_P(dnaString); dnaBase++)
  {
    char dnaChar= pgm_read_byte(&dnaString[dnaBase]);
    Serial.println(dnaChar);
  }

}

void loop() {
}

The maximum string length doing it that way is 32767 bytes per String, but you will need an Arduino MEGA to use one or two strings of 32767 characters each in your program.

You really have to define a string? Just random characters is not enough for you?

You could create random lists of characters (as much as you want) while you need just some random characters and not special characters that you have defined before.

If your string has to be defined before and must be much longer than 32 KB, you could use an SD card as a mass storage. You can put your string into a file on SD card. Max file size is 2 giga bytes, so on SD card you could have a string of 2147483647 characters length and access each of them by index, but access time would be slower.

:o

Thanks everyone for all the replies! Great suggestions and help all the way around; since I needed to finish the project back on the 9th, I just used the first 750 or so characters that I could fit in a normal string...'

To answer a few questions: it was for an art project that involved using a flashing LED to display the nucleobases in morse code (honestly, 750 bases was probably more than enough). Yes, I could have just generated random characters, but it was important to me (and, I think, to the integrity of the project) that the bases be truly mine -- so better to use fewer of them than to fake it.

Again, thanks very much and I may refer back to this if I return to the project -- and maybe it will help someone else further down the timestream...

:smiley: