Optimisation Flash and RAM usage

Hi

I have a program that is running out space (and I would like to add some more features yet). I have tackled everything I can find:

  • Removing the boot loader
  • Making sure variables are the right size
  • Edited Software Serial to reduce the Rx Buffer (I only need to Tx with Software serial)
  • Using Progmem and F() for some arrays and strings
  • Optimising the code, removing floats and strings.

and I get to here:

Sketch uses 32300 bytes (98%) of program storage space. Maximum is 32768 bytes.
Global variables use 1648 bytes (80%) of dynamic memory, leaving 400 bytes for local variables. Maximum is 2048 bytes.
Low memory available, stability problems may occur.

But now I am stuck:

  • Can I turn on Link Time Optimisation? I have read about it not being safe?
  • atoi() is apparently 600 bytes of code, can I do that differently?
  • What can I do with or in the library's I am using?

I haven't tried running the code yet - I am still building the PCB.

Thoughts, options, help appreciated!

I have had to attach the code otherwise the message is too long.

Thanks very much
Kevin

Ugh. Why do you have code in .h files ? You shouldn't do that. Either name them ".cpp", and they'll be compiled separately, or .ino, and they'll be concatenated to your sketch before compilation...

(But that shouldn't affect the sketch size. So, on to your questions:)

Can I turn on Link Time Optimisation? I have read about it not being safe?

LTO is on by default for the Arduino AVR boards, so it can't be THAT awful. I guess you're using MiniCore to be able to get back the bootloader space? Probably only you can try it out... (Size goes down to 26764 bytes when I compile with lto, here.)

atoi() is apparently 600 bytes of code, can I do that differently?

How are you telling that? I don't even see atoi() in the binary. (maybe you mean atol()?)
I think it can be improved by having your own special case code that isn't as careful...

Top memory consumers:

[b]avr-nm -SC --size-sort[/b] /tmp/Arduino1.8.12Build/*.elf

   :
0000181c 00000180 T makeTime(tmElements_t const&)
0000385c 0000019e T Si5351::set_pll(unsigned long long, si5351_pll)
000051dc 000001b2 T JTEncode::wspr_message_prep(char*, char*, unsigned char)
0000161c 000001b8 T breakTime(unsigned long, tmElements_t&)
00000520 000001d2 T loc8calc()
00003c04 00000200 T Si5351::reset()
00003f92 00000214 T Si5351::set_ms(si5351_clock, Si5351RegSet, unsigned char, unsigned char, unsigned char)
000055e0 0000021c T JTEncode::wspr_bit_packing(unsigned char*)
00004e66 00000240 T __vector_24
0000538e 00000252 T JTEncode::jt9_bit_packing(char*, unsigned char*)
000032f8 00000280 T Si5351::select_r_div(unsigned long long*)
00003578 00000288 T Si5351::select_r_div_ms67(unsigned long long*)
00005cf4 0000029a T JTEncode::init_rs_int(int, int, int, int, int, int)
000012ea 000002f2 T TXtiming()
0000215e 00000384 T TinyGPSPlus::endOfTermHandler()
00002f64 00000394 T Si5351::multisynth67_calc(unsigned long long, unsigned long long, Si5351RegSet*)
00000834 000003ea T loc_dbm_telem()
0000272c 000003ea T Si5351::pll_calc(si5351_pll, unsigned long long, Si5351RegSet*, long, unsigned char)
00002b16 0000044e T Si5351::multisynth_calc(unsigned long long, unsigned long long, Si5351RegSet*)
000041a6 000007fa T Si5351::set_freq(unsigned long long, si5351_clock)

What can I do with or in the library's I am using?

It looks like your Si5351 library is really rather bloated. 64bit integers, almost 2k for just set_freq() - you're essentially doing FM with a set of fixed frequencies, right? Those could be pre-computed, and that would probably save a lot of space. (but it sounds like pretty obscure code.)

Frankly, if you're that close to the limits of the processor, on code that isn't even DONE yet, you should find a board that has more resources.

Hmm. For the frequencies I think you are dealing with, it might be possible to use a version of the si5351 library that only uses 32bit ints, which should be helpful.

However, I came across:

// JT9 and WSPR Frequencies
#define JT9_FREQ        14096957UL           // = 1,357Hz // Set frequency here, 6hz wide, Default 14000000UL
#define WSPR_FREQ       14097175UL           // = 1,575Hz // Set frequency here! Default 14000000UL

and I don’t understand. the comments don’t match the code as to either the magnitude ot the value of the frequency. 14096957 is surely 14MHz if it’s in Hertz (as indicated by " si5351.set_freq((freq * 100) + (tx_buffer * tone_spacing), SI5351_CLK0);"), or 140kHz if it’s in the 1/100 Hz units used by set_freq()

What board are you using? Have you considered using a board with more memory to start with?

I would recommend the Arduino 2560 with 250000+ total program memory with 8192 bytes of ram for local/global storage. There relatively inexpensive at about 20+ dollars on Amazon. I'm using them for my DDS Module that generates frequencies up to 60MHz with FM AM FSK BPSK Single Tone and more. I'm real close to finishing that of and still have only used about 10% of memory.
Good luck.
pamam

I would recommend the Arduino 2560 with 250000+ total program memory with 8192 bytes of ram for local/global storage. There relatively inexpensive at about 20+ dollars on Amazon. I'm using them for my DDS Module that generates frequencies up to 60MHz with FM AM FSK BPSK Single Tone and more. I'm real close to finishing that of and still have only used about 10% of memory.
Good luck.
pamam

Hi All

Firstly thank you for all of your time and input to this, it is much appreciated.

Chip choice - I have PCB's made, so for now I am stuck with the 328 in a 32 lead QFN at 5mm x 5 mm at about £1 each. I could redesign those, the next 'size' up that I can find is the ATmega1281 in a 64 MLF package at 9mm x 9mm, at about £7 each - when I am making 10 or more that is a big step up in price.

.h files - that's how I inherited it, will change them to .cpp :slight_smile:

Frequency - It is 14Mhz, not sure why the set_freq call multiplies it by 100, I need to work that one out!

I have now found someone else that is using their own si5351 code, so will check that out for size.

Sorry typo in my original note - I ment itoa, not atoi - "itoa(gps.altitude.meters(), myalt, 10);". The 600 ref came from conversation on forums.

SoftwareSerial seems an obvious choice to change, as I only need Tx, it can be whatever baud I choose, and it is only for debugging - but I cant seem to find a 'mini TX only software serial' library - any clues?

Thanks very much
Kevin

KevWal:
I have a program that is running out space (and I would like to add some more features yet). I have tackled everything I can find:

  • Removing the boot loader

Right about there, I would have simply switched to a processor board with more resources. I haven't looked at your code. But, is there anything in there that's so AVR-specific that you couldn't convert it for a different processor? If not, simply get yourself a Feather M0 or Teensy 3.2 and move on with your project.

Are you going to be using the USB connection for serial communications in the final project? If not, you could use the hardware serial in place of software serial.

Turn on link time optimization, it helps considerably for program space usage, and do not wait until the entire code is complete to begin testing, test each part of the code as you finish it, that can prevent considerable backtracking when you discover an early part of the code and consequently everything you wrote afterwards needs modification.

Hi David

No USB connection on a 328P, I am using the 328 hardware serial port for comms with the GPS, and hence need a software serial option for debugging.

Testing - agreed, I need to look at my testing options and work out how I separate hardware and software issues. Thoughts around test setups, simulated environments, simulated GPS input, etc welcome.

Thanks
Kevin

For testing, since you intend to use an atmega328, I would get a mega and hook everything up on a breadboard, that way you don't have to be overly concerned with memory usage until you get the software working properly.

You really need to get everything working properly before designing the pc board, the atmega328 may end up being unsuitable for this project.

The 328PB has an extra hardware serial port, same footprint. Maybe eliminating software serial would gain you some memory.

Thanks all, some good ideas.

328PB to get rid of Software Serial (thanks aarg), works well as the second serial port is on MISO and MOSI, which I already have bought out to external connectors for programming already. However, I know Hardware Serial still takes some library code, so not a total saving, but easy to test.

SAMD21 looks like a fantastic chip (thanks gfvalvo) , but I wonder how much pain the total change of architectures might give me. I am thinking Timing, Interrupts, library's (especially the SI5351A and its I2C bus.)

Right now I am working through the SI5351A code, looking to make myself a cut down version of that, and still on the look out for a tiny Serial, Tx only, limited BAUD rates library.

In the mean time, PCB building is coming along, it looks like I successfully hand soldered a 328P 32MLF tonight, that's the VQFN no lead package! Well it programs and runs Blink anyway :slight_smile:

Cheers
Kev

Frequency - It is 14Mhz, not sure why the set_freq call multiplies it by 100, I need to work that one out!

The way I read the code, set_freq() uses a value that is in 1/100th Hz, and overall the code implements a radio transmission by encoding a message into a sequence of symbols, which are essentially offsets from a carrier frequency, and then the si chip is manipulated into generating the frequencies; essentially making a modulated output.

Cute. And it's about there that my radio and signal processing knowledge ends. :frowning: It LOOKS like there are only four different frequencies used, which means that instead of re-calculating the si5351 register settings for each of the four frequencies EVERY TIME, you should be able to pre-calcuate the register settings and get rid of nearly all of si5351 library. Just i2cwrite(registers, symbolindex); or equiv instead.

I have now found someone else that is using their own si5351 code, so will check that out for size.

Yeah, what you probably want is a variant of that library that is optimized for this particular FSK use...

Optimising the code, removing floats and strings.

the code still includes floats. TinyGPS is full of them :frowning:

Is it essential that your telemetry be human readable? If not, you can format your transmissions using a much simpler protocol like CSV, you will release some memory and also your transmissions will be faster. Then, at the receive end, you can run the data through a conversion program that formats everything nicely for reading.

Some of your code can be further optimized, this is from TelemFunctions.h:

if (dbm_telemetry == 0) dbm_telemetry = 0;
  else if (dbm_telemetry == 1) dbm_telemetry = 3;
  else if (dbm_telemetry == 2) dbm_telemetry = 7;
  else if (dbm_telemetry == 3) dbm_telemetry = 10;
  else if (dbm_telemetry == 4) dbm_telemetry = 13;
  else if (dbm_telemetry == 5) dbm_telemetry = 17;
  else if (dbm_telemetry == 6) dbm_telemetry = 20;
  else if (dbm_telemetry == 7) dbm_telemetry = 23;
  else if (dbm_telemetry == 8) dbm_telemetry = 27;
  else if (dbm_telemetry == 9) dbm_telemetry = 30;
  else if (dbm_telemetry == 10) dbm_telemetry = 33;
  else if (dbm_telemetry == 11) dbm_telemetry = 37;
  else if (dbm_telemetry == 12) dbm_telemetry = 40;
  else if (dbm_telemetry == 13) dbm_telemetry = 43;
  else if (dbm_telemetry == 14) dbm_telemetry = 47;
  else if (dbm_telemetry == 15) dbm_telemetry = 50;
  else if (dbm_telemetry == 16) dbm_telemetry = 53;
  else if (dbm_telemetry == 17) dbm_telemetry = 57;
  else if (dbm_telemetry == 18) dbm_telemetry = 60;

That just screams, "please put me in a look up table!". If it's derived from a formula, you might just calculate the value. It looks very close to

dbm_telemetry *= 3;

I've engaged some of my Ham friends...

Hi Aarg

aarg:
Is it essential that your telemetry be human readable?

Some of your code can be further optimized, this is from TelemFunctions.h:

if (dbm_telemetry == 0) dbm_telemetry = 0;

else if (dbm_telemetry == 1) dbm_telemetry = 3;
  else if (dbm_telemetry == 2) dbm_telemetry = 7;




That just screams, "please put me in a look up table!".

The format for the WSPR messages is very fixed unfortunately, and is already a fine balance of meeting the protocol specifications whilst encoding as much information as possible.

The dbm_telemetry for example can only use the exact list on the right hand side of the code above, but now implemented as a lookup saves me 100 bytes - thank you for the idea - it all counts :slight_smile:

  static const PROGMEM uint8_t dbm_lookup[19] ={3,7,10,13,17,20,23,27,30,33,37,40,43,47,50,53,57,60};
  dbm_telemetry = pgm_read_byte(dbm_lookup + dbm_telemetry);

Thanks very much
Kevin

Well, I have replaced the SI5351 library (thanks westfw for the links) and that saved me 10k of code and 100 bytes flash. I would still like to find a bit more flash saving, but looking a lot more healthy now.

Thanks all.

Cheers
Kev

Post your new code (and a pointer to the library) and we’ll continue to analyze, if you want. It’s an interesting problem!
(A .zip file of the sketch directory would work better than individual files…)

There are some constant tables in JTEncode that could go into PROGMEM…