Reading a 10-byte buffer out of a PROGMEM string stored in a string table

Hi,

I’m storing a large snippet of text in three separate PROGMEM char arrays:

const char script1[] PROGMEM = "....";
const char script2[] PROGMEM = "....";
const char script3[] PROGMEM = "....";

The first two are 32766 characters, and the last one is 23867 characters (89400 characters in total). I store these in PROGMEM on an Arduino Mega 2560 (which has 256K of program memory available).

I stored these three strings in a table, together with their lengths:

const int TABLE_SIZE = 3;
const char * const scriptTable[] PROGMEM = 
{   
  script1,
  script2,
  script3
};

const int scriptLengths[] {32767, 32767, 23867};

I have some issues trying to chunk up the string into 10-byte buffers, and processng it that way. I want to iterate through the table, process the first part in 10-byte chunks first, then move on to the second part, and finally the last part.

For a single string, the following works fine:

const char code[] PROGMEM = "....";

// Chunk up large code string in PROGMEM.
const int CHUNKSIZE = 10; // CHUNKSIZE *must* be an even number!!
char workingSpace[CHUNKSIZE + 1];
int baseIndex = 0;
      
while (baseIndex < sizeof(code)) {
    int chunk;
    if (baseIndex + CHUNKSIZE > sizeof(code)) {
        chunk = sizeof(code) - baseIndex;
        }
        else {
          chunk = CHUNKSIZE;
        }
        memcpy_PF(workingSpace, ((uint_farptr_t)code) + baseIndex, chunk);
        workingSpace[chunk] = '\0';

        // for debugging
        Serial.print("#");
        Serial.print(baseIndex);
        Serial.print("– Got a chunk [");
        Serial.print(workingSpace);
        Serial.println("]");        
            
baseIndex += CHUNKSIZE;

However, I’m having some difficulties doing the memcpy_PF call using the table.

int tableIndex = 0;
// for each of the three script strings
for (tableIndex = 0; tableIndex < TABLE_SIZE; tableIndex++) {         
    // Chunk up large script strings in PROGMEM.
    const int CHUNKSIZE = 10; // CHUNKSIZE *must* be an even number!!
    char workingSpace[CHUNKSIZE + 1];
    int baseIndex = 0;
      
    // we can only use sizeof if it's a static var.
    // strlen should only be calculated once, expensive operation (loop until we encounter '\0').
    // now stored statically in scriptLengths table.
    Serial.println("tableIndex: ");
    Serial.println(tableIndex);     
    int strLength = scriptLengths[tableIndex];
    Serial.print("length script string: ");
    Serial.println(strLength);
      
    while (baseIndex < strLength) {
        int chunk;
        if (baseIndex + CHUNKSIZE > strLength) {
          chunk = strLength - baseIndex;
        }
        else {
          chunk = CHUNKSIZE;
        }
        // do memcpy_PF call here

I have tried several options here, mostly having difficulties with the second parameter (how to address the string stored in the table, and increase its indexing parameter). I read that the way to get a string out of a string table is to use pgm_read_word. However, with both of these below, I either get gibberish in the buffer string, or it seems to start in an incorrect location in the string.

        memcpy_PF(workingSpace, ((uint_farptr_t) &scriptTable[tableIndex][baseIndex]), chunk);
        memcpy_PF(workingSpace, ((uint_farptr_t) (char*)pgm_read_word(&(scriptTable[tableIndex])) + baseIndex), chunk);

Any help would be greatly appreciated!

Thanks!

It will not work with standard char*, these stay 2 bytes even on a Mega.

You could initialize a RAM table with the addresses

const char below[4 * 1028] PROGMEM = "above 64";
const char filler1[31 * 1024] PROGMEM = "filler1";
const char filler2[31 * 1024] PROGMEM = "filler2";
const char filler3[31 * 1024] PROGMEM = "filler3";
const char above[1024] PROGMEM = "below 64";

unsigned long table[5];

void setup() {
  Serial.begin(250000);
  table[0] = pgm_get_far_address(below);
  table[1] = pgm_get_far_address(filler1);
  table[2] = pgm_get_far_address(filler2);
  table[3] = pgm_get_far_address(filler3);
  table[4] = pgm_get_far_address(above);
  for (byte i = 0; i < 5; i++) {
    dumpFF(table[i], 0x10);
  }
}

void dumpFF(unsigned long adr, int len) {
  byte idx;
  if (len) {
    for (; len > 0; len -= 16, adr += 16) {
      phBytesB((byte*)&adr, 4);
      Serial.write(':');
      Serial.write(' ');
      for (idx = 0; idx < 16; idx++) {
        if (idx < len ) {
          phByte(pgm_read_byte_far(adr + idx));
          Serial.write(' ');
        } else {
          Serial.print(F("   "));
        }
      }
      Serial.write('\'');
      for (idx = 0; (idx < 16) && (idx < len); idx++) {
        byte curr = pgm_read_byte_far(adr + idx);
        Serial.write(curr < 0x20 ? '.' : curr);
      }
      Serial.write('\'');
      Serial.println();
    }
  }
}

void psdec(void (*pprint)(byte), byte* ptr, byte len) {
  for (ptr += len - 1; len--;) {
    (*pprint)(*ptr--);
  }
}

void phBytesB(byte* ptr, byte len) {
  psdec(phByte, ptr, len);
}

void phByte(byte val) {
  phNibble(val >> 4);
  phNibble(val);
}

void phNibble(byte val) {
  val &= 0xF;
  Serial.write(val + (val < 10 ? '0' : 'A' - 10));
}

void loop() {}
000178E4: 61 62 6F 76 65 20 36 34 00 00 00 00 00 00 00 00 'above 64........'
0000FCE4: 66 69 6C 6C 65 72 31 00 00 00 00 00 00 00 00 00 'filler1.........'
000080E4: 66 69 6C 6C 65 72 32 00 00 00 00 00 00 00 00 00 'filler2.........'
000004E4: 66 69 6C 6C 65 72 33 00 00 00 00 00 00 00 00 00 'filler3.........'
000000E4: 62 65 6C 6F 77 20 36 34 00 00 00 00 00 00 00 00 'below 64........'

Hi Whandall,

Thanks for your reply! Could you explain a little bit what you are doing here? What are above and below? What are you doing in the functions below?

I dump the first 16 bytes of PROGMEM strings scattered over around 100k byte, accessed via a table of pointers. Instead of dumping you could feed the data into your logic.

Wasn't that what you tried to do?

The code demonstrates one way of accessing PROGMEM via far pointers.

P.S. above has an address above 64k (which does not fit in a char*), below has an address that fits.

Hi Whandall,

Hmm, I didn't know it would be this complex, so you are getting it byte by byte? How do I know which one has an address above or below 64K? Why 4x 1028 (and not 1024)?

Is the above 64K and below 64K related to this:

"If possible, put your constant tables in the lower 64K and use pgm_read_byte_near() or pgm_read_word_near() instead of pgm_read_byte_far() or pgm_read_word_far() since it is more efficient that way, and you can still use the upper 64K for executable code."

Source: https://cours.etsmtl.ca/ele542/labo/Ref-AVRStudio/avr-libc-user-manual/group__avr__pgmspace.html

Thanks!

jozilla: so you are getting it byte by byte?

Yes, just for dumping the data. You could use memcpy_PF().

jozilla: How do I know which one has an address above or below 64K?

If I had such an amount of tables, I would just treat all as possibly above. The dumpFF function shows the address.

jozilla: Why 4x 1028 (and not 1024)?

Probably a typo. The real number does not matter here, just the placement.

The compiler treats all addresses to data as 16 bit values, so addresses above 64k will just be wrapped. So it's a little convoluted to access far data and hold pointers to this data.

OK that makes sense, so that’s why we need long addresses?
So I tried to store it the way you did it now, as follows:

const char script1[] PROGMEM = "....";
const char script2[] PROGMEM = "....";
const char script3[] PROGMEM = "....";

const int TABLE_SIZE = 3;
unsigned long scriptTable[3];

void setup()
{
  scriptTable[0] = pgm_get_far_address(script1);
  scriptTable[1] = pgm_get_far_address(script2);
  scriptTable[2] = pgm_get_far_address(script3);
}
void loop() 
{
  int tableIndex = 0;
  // for each of the three script strings
  for (tableIndex = 0; tableIndex < TABLE_SIZE; tableIndex++) {
    Serial.print("String #");
    Serial.println(tableIndex);
    
    long len = scriptLengths[tableIndex];
    Serial.println(len);
    for (unsigned long k = 0; k < len; k++)
    {
      unsigned long address = scriptTable[tableIndex];
      char myChar = pgm_read_byte((uint_farptr_t) address + k);
      Serial.print(myChar);
      Serial.print(" - ");
      Serial.println(k);
    }
    Serial.println();
  }  
}

This works well, but it seems to read garbage after k > 8365:

String #0
32767
2 - 0
. - 1
. - 2
. - 3
. - 4
| - 5

<snip>

| - 8366
. - 8367
| - 8368
. - 8369
- - 8370
| - 8371
. - 8372
- - 8373
. - 8374
 - 8375
� - 8376
� - 8377
� - 8378

Any ideas? This seems to be a memory problem. The strange thing is that it does work for reading script2 and script3 (all the way until the end).

I don't have any experience with far pointers to progmem in Arduino, but I noticed this:

const int CHUNKSIZE = 10; // CHUNKSIZE *must* be an even number!!

but your arrays have odd-numbered sizes (e.g 32767) so the last chunksize to be processed will NOT be an even number, it will be 7. Is this what messes up the processing for the subsequent array?

Hi,

Not using that CHUNKSIZE variable anymore, so not sure if that's what it is.

-- Jo

OK sorry I hadn't noticed your last example wasn't using chunksize. I'm puzzled by what's in the arrays in the first place and how you declared them. I assume your example

const char script1[] PROGMEM = "....";

is just for simplicity in your post, and the actual code has real data. If so, how are you creating it; are you sure it's OK in the first place? If not, and the program fails with just the simple definition of four dots, why is your example printout giving a '2' then four dots and then a '|'? The output shows, apart from dots, '-' and '|' until 8375, so where are those characters coming from?

Why do you use

pgm_read_byte((uint_farptr_t) address + k);

instead of the needed (for addresses above 64k)

pgm_read_byte_far(address + k);

?

Hi Whandall,

The string is a morse code string interspersed with numbers. Sorry the … was for simplicity. Good catch about the pgm_read_byte_far, that is probably what caused the problem!

Thanks!

With such big data arrays, you could run into other problems:

Arduino Mega 2560 oddity: Large PROGMEM arrays break basic functionality

https://github.com/arduino/Arduino/issues/2226

Wow, that's crazy, also that it's still an open issue.

We have now have a working version, working with chunks and memcpy_PF, based on the pgm_get_far_address() functions you pointed me to. Fortunately, we also have reduced the total size of the strings to about 40KB, so hopefully we won't run into these issues.

const int TABLE_SIZE = 2;

uint_farptr_t scriptTable[2];

const unsigned int scriptLengths[] {
    sizeof (script1), 
    sizeof (script2) 
};

void setup()
{
    // Get 'far' addresses for strings
    scriptTable[0] = pgm_get_far_address(script1);
    scriptTable[1] = pgm_get_far_address(script2);
}

...
// Relevant chunking code
if (needsNewChunk) {
        unsigned long strLength = scriptLengths[tableIndex];
        int chunk;
        if (stringIndex + BUFFERSIZE > strLength) {
            chunk = strLength - stringIndex;
        }
        else {
            chunk = BUFFERSIZE;
        }
        // Copy chunk over to local buffer
        const uint_farptr_t address = scriptTable[tableIndex];
        memcpy_PF(chunkBuffer, ((uint_farptr_t) address + stringIndex), chunk);

        chunkBuffer[chunk] = '\0';
}