Global dynamic memory usage correct?

Hi Friends,

Maybe you can help me to understand this here

i am curios about the info shown in the arduino ide about the dynamic memory usage.
Maybe there is somewhere explanation how this work in detail (especially with classes)?

My problem is that i need every byte of the 2k mem of the arduino, so i decided to put most stuff in stack
But if i gonna use functions from an aes library i found in the internet, it boosts my dynamic mem usage.
I wonder why this is happening, because the aes is based on a class and i create a new instance for the encryption/decryption.
all the needed buffers should be within the class and everything else seems to be in progmem.

An example is here

void encode32Byte(uint8_t inbuf[], uint8_t outbuf[], uint8_t key[]) {
  byte iv[] =  { 0xa6, 0xfc, 0x78, 0x10, 0x23, 0x82, 0xf8, 0xd1, 0x73, 0x3a, 0x37, 0x2f, 0x02, 0x0e, 0x00, 0x01 };

  AES aes;
  aes.set_key(key, 32);
  aes.cbc_encrypt (inbuf, outbuf, 2, iv);
  aes.clean();
}

Output is:
Der Sketch verwendet 2478 Bytes (8%) des Programmspeicherplatzes. Das Maximum sind 30720 Bytes.
Globale Variablen verwenden 25 Bytes des dynamischen Speichers.

In this example it consume 16 additional bytes (seem to stuck on the size of iv)
In my current program it consumes 100 additional bytes (i call it in 2 functions)

Is this a problem caused by the class?
Because it doesnt seem to be a problem to define the iv and use it without the class

This look all so weird and random to me

this is the aes.h file

#ifndef __AES_H__
#define __AES_H__

#include <avr/pgmspace.h>

typedef unsigned char byte ;

#define N_ROW                   4
#define N_COL                   4
#define N_BLOCK   (N_ROW * N_COL)
#define N_MAX_ROUNDS           14
#define KEY_SCHEDULE_BYTES ((N_MAX_ROUNDS + 1) * N_BLOCK)
#define SUCCESS (0)
#define FAILURE (-1)

class AES
{
 public:

/*  The following calls are for a precomputed key schedule

    NOTE: If the length_type used for the key length is an
    unsigned 8-bit character, a key length of 256 bits must
    be entered as a length in bytes (valid inputs are hence
    128, 192, 16, 24 and 32).
*/
  byte set_key (byte key[], int keylen) ;
  void clean () ;  // delete key schedule after use
  void copy_n_bytes (byte * dest, byte * src, byte n) ;

  byte encrypt (byte plain [N_BLOCK], byte cipher [N_BLOCK]) ;
  byte cbc_encrypt (byte * plain, byte * cipher, int n_block, byte iv [N_BLOCK]) ;

  byte decrypt (byte cipher [N_BLOCK], byte plain [N_BLOCK]) ;
  byte cbc_decrypt (byte * cipher, byte * plain, int n_block, byte iv [N_BLOCK]) ;

 private:
  int round ;
  byte key_sched [KEY_SCHEDULE_BYTES] ;
} ;

#endif

Thx

I wonder why this is happening, because the aes is based on a class

What does "based on a class" have to do with anything? The object code that is produced knows NOTHING about classes or objects.

My problem is that i need every byte of the 2k mem of the arduino, so i decided to put most stuff in stack

Where do you think the stack lives?

Ok i guess i explained it bad, thought it was more clear.

Ofc all Variables in the stack, but not in the topmost (global?) section - can i say this?

An example: i need a big 1kb graphicbuffer at any point and thats why i put all the rendering stuff in an function, while the handling of the state is in another function, just to free the graphicbuffer at the time it wont be used (pretty common i guess?)

If i compile a empty sketch it would use 9 byte for global variables (for whatever)
if i make a function (no matter how complex) the memory will be allocated and the stack extended at the moment i call the function (right?)

loop {
   statehandling();
   rendering();
   someshitforfun();
   ...
}

The Serial interface need mem for its buffers, the state need mem and 2KB is pretty quick consumed.
to my question: why do the function in my first post consume extra global memory?
It should be temporary allocated with the function in the stack.

What did i understand wrong?

Greets

if i make a function (no matter how complex) the memory will be allocated and the stack extended at the moment i call the function (right?)

Assuming the function uses no static variables, yes.

why do the function in my first post consume extra global memory?

What makes you think that it does?

Well, thats the funny thing
I dont think it does, but the output from the compiler in the arduino ide says:


Der Sketch verwendet 2478 Bytes (8%) des Programmspeicherplatzes. Das Maximum sind 30720 Bytes.
Globale Variablen verwenden 25 Bytes des dynamischen Speichers.

in try to translate:
The sketch use 2478 bytes(8%) of the programspace. the maximum is 30720 Bytes.
Global variables use 25 byte of the dynamic space.

If i dont use the constellation i shown in my first post its just 9 byte of dynamic space.
The extra used space seems to be bound to the size of the iv array IN the function.
Witch make no sense at all.

That make me think:
Whether the compiler make a wrong output which is (hopefully) unlikely or i do not understand what the output mean.

Without seeing all of your code, and having links to the libraries you are using, it's hard to say for certain what it happening. But, I'm going to guess that the compiler, or linker, sees that the array is never modified, so it makes a global instance in the heap, rather than allocating the array on the stack every time the functions is called.

OP, why make your life difficult sweating out every byte of your precious 2K of RAM? There comes a time to cut your losses and move up to a more capable processor. Something with an ARM or ESP8266.

gfvalvo:
OP, why make your life difficult sweating out every byte of your precious 2K of RAM? There comes a time to cut your losses and move up to a more capable processor. Something with an ARM or ESP8266.

Or if you just need a few bytes, upload via the ICSP and ditch the bootloader.

Or if you just need a few bytes, upload via the ICSP and ditch the bootloader.

How will removing the bootloader from flash memory save any SRAM?

Here is a compiling minimum sketch that show the behavior:

#define N_ROW                   4
#define N_COL                   4
#define N_BLOCK   (N_ROW * N_COL)

void xor_block (byte * d, byte * s)
{
  for (byte i = 0 ; i < N_BLOCK ; i += 4)
    {
      *d++ ^= *s++ ;  // some unrolling
      *d++ ^= *s++ ;
      *d++ ^= *s++ ;
      *d++ ^= *s++ ;
    }
}

class TST
{
 public:
  byte testfkt (byte * plain, byte * cipher, int n_block, byte iv [N_BLOCK]) ;
};


byte TST::testfkt (byte * plain, byte * cipher, int n_block, byte iv [N_BLOCK])
{
  while (n_block--)
    {
      xor_block (iv, plain) ;
    }
  return 0;
}
  
void setup() {
}

void loop() {
  // put your main code here, to run repeatedly:

  uint8_t inbuf[32], outbuf[32], key[32];
  
  encode32Byte( inbuf, outbuf, key);  
}


void encode32Byte(uint8_t inbuf[], uint8_t outbuf[], uint8_t key[]) {
  byte iv[] =  { 0xa6, 0xfc, 0x78, 0x10, 0x23, 0x82, 0xf8, 0xd1, 0x73, 0x3a, 0x37, 0x2f, 0x02, 0x0e, 0x00, 0x01, 0xa6, 0xfc, 0x78, 0x10, 0x23, 0x82, 0xf8, 0xd1, 0x73, 0x3a, 0x37, 0x2f, 0x02, 0x0e, 0x00, 0x01 };

  TST test;
  test.testfkt (inbuf, outbuf, 2, iv);  
}

Compiler output is:
Der Sketch verwendet 676 Bytes (2%) des Programmspeicherplatzes. Das Maximum sind 30720 Bytes.
Globale Variablen verwenden 41 Bytes des dynamischen Speichers.

41 byte = 9 byte + length of iv

I already thought it may be caused by the optimization of the compiler
Any idea how this could be controlled?
I don't wanna change all used libraries to read the default values manually from progmem.
It would be nice to get this to work on my atmega328.

greetz

Wouldn't it be easier to just use an Arduino Mega with double the SD RAM for global variables?

boylesg:
Wouldn't it be easier to just use an Arduino Mega with double the SD RAM for global variables?

Yes this would be easier, but it wont explain my question
Maybe i need someting better for other reason, but if i can get away with my atmega328 i wont buy an atmega644
because if i do it would make me sad for my entire life :smiley: :smiley: :smiley:

You could always store this sort of static data in the EEPROM:

byte iv[] = { 0xa6, 0xfc, 0x78, 0x10, 0x23, 0x82, 0xf8, 0xd1, 0x73, 0x3a, 0x37, 0x2f, 0x02, 0x0e, 0x00, 0x01 };

See EEPROM and EEPROMEx libraries.

Well, this is really some weird voodoo s*** !

I took your code and messed with it, trying to nail it down. Eventually finding this:

#define N_ROW                   4
#define N_COL                   4
#define N_BLOCK   (N_ROW * N_COL)

class TST
{
 public:
  byte test2( byte *plain, byte * cipher, int n_block, byte iv[N_BLOCK] );
};

void test2func( byte * a, byte *b )
{
  for (byte i = 0 ; i < N_BLOCK ; i += 4)
  {
    // THIS IS THE BIT HERE------------>>>


    //*a++= *b;           // Sketch uses 444 bytes, Global vars use 9 bytes
    //*a= *b++;           // Sketch uses 444 bytes, Global vars use 9 bytes
    *a++= *b++;          // Sketch uses 596 bytes, Global vars use 41 bytes
  }
}

byte TST::test2( byte *plain, byte * cipher, int n_block, byte iv[N_BLOCK] ){
  while (n_block--)
    {
      test2func( iv, plain );
    }
  return 0;
}

  
void setup() {
}

void loop() {
  // put your main code here, to run repeatedly:

  uint8_t inbuf[32], outbuf[32], key[32];
  
  encode32Byte( inbuf, outbuf, key);  
}

void encode32Byte(uint8_t inbuf[], uint8_t outbuf[], uint8_t key[]) {
  byte iv[] =  { 0xa6, 0xfc, 0x78, 0x10, 0x23, 0x82, 0xf8, 0xd1, 0x73, 0x3a, 0x37, 0x2f, 0x02, 0x0e, 0x00, 0x01, 0xa6, 0xfc, 0x78, 0x10, 0x23, 0x82, 0xf8, 0xd1, 0x73, 0x3a, 0x37, 0x2f, 0x02, 0x0e, 0x00, 0x01 };

  TST test;
  test.test2( inbuf, outbuf, 2, iv );
}

Those 3 lines in test2func() - uncomment one at a time, compile it and it reports that memory usage.

Eh ?

Must be doing something really stupid but I just can't see it :confused:

Yours,
TonyWilk

 //*a++= *b;           // Sketch uses 444 bytes, Global vars use 9 bytes

//*a= *b++;          // Sketch uses 444 bytes, Global vars use 9 bytes
    *a++= *b++;          // Sketch uses 596 bytes, Global vars use 41 bytes

Presumably the optimiser is doing something interesting. Note that 41-9 is 32, an interestingly round number. The for loop, once the constants are evaluated, amounts to for(i=0; i<16; i+=4). Perhaps the compiler is unrolling the loop in some way?

Incidentally, I belive the AVR is a 16-bit processor (or is it 32?). So if you want the XOR to be quicker, declare th pointers to be pointers to uint16_t and XOR them like that. Be sure to divide the number of bytes by 2!

Another trick for optimisation is to count down to zero rather than counting up to a number. Checking if a number is zero is easy. Checking if it is < a value involves a subtraction.

And, of course, if you are really, really worried about the speed of this XOR, then jam a bit of assembler in there and call it good.

In any case. If you are concerned about RAM usage, find out which compiler optimisation flags minimise that, and work out how to feed them to the compilation step of the build.

boylesg:
Wouldn't it be easier to just use an Arduino Mega with double the SD RAM for global variables?

8K / 2K is not 2.

PaulS:
8K / 2K is not 2.

My memory is malfunctioning then - I thought mega had 4K sram.

4 times the memory then :slight_smile:

atmega328 has 2 k sram/ 32 kb flash and its a 8bit avr
There is a wide field of atmega mcus up to 16kb sram/128kb flash

Anyway, what i figured out:
Like PaulS said it looks like the compiler put some arrays into sram
I proved the output and the consumed memory is what the compiler tell me

I still dont know the exact reason because it happens only sometimes, but i gonna force arrays now into flash with PROGMEM

const uint8_t avalue[] PROGMEM = { ... }

If arrays grow to big, the compiler seems to do this by itself
Which is super annoying, because i would like to force the compiler to do this by default.
I searched endless to find good docs about the avr-gcc compiler flags, but cant find anything about.
Maybe such a flag dont exists, but i cant believe, because this have to be a common problem.

greetz