Program storage space usage monitoring

Hi,

A project with Arduino IDE 2.2.1, the Blue Pill.

Compiler output:

...
Sketch uses 40828 bytes (62%) of program storage space. Maximum is 65536 bytes.
...

As a Blinky it used about 40%. The project functionality is still nonexistent, just a few libraries added:

#define _TASK_SCHEDULING_OPTIONS
// #define _TASK_TIMECRITICAL
// #define _TASK_SLEEP_ON_IDLE_RUN
#include <TaskScheduler.h>

// Display mode: 1 - I2C; 2 - 10-pin.
#define _LCD_TYPE 1
#include <LCD_1602_RUS_ALL.h>

#include <EasyButton.h>
#include <STM32encoder.h>

LCD_1602_RUS lcd(0x27, 16, 2);
...

This is a hint for me that I have to use the program storage space sparingly. Is the Output console the only tool to monitor the program storage space usage? Something more sophisticated is very welcome :blush: .

Why? YOur program will not shrink/expand after flashing.

2 Likes

I think the best tool is the map file provided by the linker, possibly with the STM32 it is already available in the sketch temp folder.

Unfortunately general purpose libraries can eat up the space pretty quickly.

Eh? The development of my program hasn't yet been really started. In all probability, I'll have to ditch this or that to stay in available memory. I'm interested in an analytic instrument that will show which uses what :D. That's what the theme is about.

I would not worry till it's too late :smiley:

A simple sketch (and any sketch for that matter) does a load of work behind the scenes, mostly to do the initialisation of the processor's registers as needed; this is a once-off thing.

Your taskscheduler might also be expensive; no idea.

Just for fun, compile the same sketch for 3 blinking LEDs. How much does the required storage space increase?

It so happened, that those three blinking LEDs were already there :slight_smile: . I just removed them (callbacks, initialization and everything), for fun.

Sketch uses 40316 bytes (61%) of program storage space. Maximum is 65536 bytes.

Total gain about 500 bytes.

Assuming you are using STM32duino with a 64K Blue Pill config, then a map file is produced in the sketch "build" folder. That folder is something like "C:\Users\bobco\AppData\Local\Temp\arduino\sketches\621650FED2CA4AD36C0A52D03AFE0F22"

In there is a .map file called <sketch name>.ini.map. Unfortunately that file has a lot of cruft and is not easy to read. The interesting section starts at a line beginning with ".text". The following lines have a name, address and size, as well as the name of the file that created the object.

Probably you have not shown the part of your code that consumes most space.

So the developers preferred to hide the ugly file, instead of creating a friendly tool for it :slight_smile: .

Start and do this.

Find “nm” - it doesn’t have to match architectures, any nm should work.
something like: "C:\Users\westf\AppData\Local\Arduino15\packages\arduino\tools\arm-none-eabi-gcc\7-2017q4\bin\arm-none-eabi-gcc-nm.exe"

Then do “...arm-none-eabi-gcc-nm -SC --size-sort myobjectfile.elf"
(finding the .elf file can be tricky.)
And it will show you all the functions and data in your sketch, sorted by how big they are.

1 Like

Do you really need a task scheduler to write pseudo-parallel tasking code?

You don't have to. The way to write Main Loop tasking on a single thread was known over 40 years ago but spread in the 80's computer boom. Every time void loop() runs again, time has advanced. Do it right, every task runs a step or returns (wait not over) quickly and an Uno can read multiple pins tens of times per millisecond while the scheduler switches tasks every how many ms? Main Loop began from EE's in the 70's if not the 60's.

If I have it right, this task scheduler does just that: takes the time checking out of sight, no more.

Removed, for fun, any trace of the TaskScheduler from the sketch:

Sketch uses 39080 bytes (59%) of program storage space. Maximum is 65536 bytes.

The overhead looks more than acceptable, IMHO.

Tricky it is. I can't see any .elf file possibly generated by the Arduino IDE.

$ find /tmp -name \*.elf

.elf should be in same folder as the .map file.

How thick are the task time slices?
If it uses interrupts, every trigger has 85 cycles plus IRQ code overhead so the task switcher, how often does it switch tasks?
How many tasks each get a slice of run time?

It can interact with interrupts, but it doesn't need them. You define callbacks and the respective call periods, you register them, you call the scheduler from loop(). The scheduler calls a callback if the time has come. The sequence of checking is the sequence of callback registering. Nothing fancy, everything like it always used to happen inside loop(). Plus readability, plus certain minor benefits. Unless I miss something crucial :slight_smile: .

If you run blocking code and your tasker does the switching, the blocking blocks loop() so how does the tasker switch except through interrupt?

With blocking code and switched tasks, execution has to be stopped, context-saved, context-switched to another task and run for some period long enough to justify the switching since running 100 task slices in 1 ms would be about 10% efficient. That's the price of context switching without blocking at even 10 ms pin watching.

Slice the code instead of time and you don't need context switching to be watching the pin closer to 10 us.

Do what you want. Just know the difference and buy a board with a fast AMD when you need speed that could be gotten from an Uno R3.

Post a sketch that is complete enough to compile (and specify where the libraries come from), and we could do the analysis. For a relatively small sketch (asciiTable example - compiles to ~20k) the big functions are:

080035cf 00000072 T CDC_ReceiveQueue_ReserveBlock
08002ea9 0000007c T HAL_UART_ErrorCallback
08000189 0000007c T loop
080038e1 00000084 t USBD_CDC_Control
08003781 0000008c T CDC_ReceiveQueue_Read
080004dd 0000008c T HAL_DMA_Abort_IT
08002307 00000092 T USB_EPClearStall
080042d5 000000a4 T HAL_PCD_MspInit
08003c3b 000000a6 T USBD_LL_DataInStage
080028f5 000000a8 T set_GPIO_Port_Clock
08001a13 000000b6 t UART_Receive_IT.isra.0
08004965 000000bc t USBD_CDC_Init
0800380d 000000ce T CDC_ReceiveQueue_ReadUntil
08002ff9 000000d8 t _GLOBAL__sub_I__Z22stm32_interrupt_enableP12GPIO_TypeDeftSt8functionIFvvEEm
08000581 000000e4 T HAL_PCD_Init
08004825 000000f0 t USBD_CDC_Setup
08001765 000000f4 T HAL_RCCEx_PeriphCLKConfig
0800162d 00000138 T HAL_RCC_ClockConfig
080040b7 0000013a T USBD_StdEPReq
0800185d 00000144 T HAL_TIM_IRQHandler
08002cad 0000017c T pin_function
08002121 000001a6 T USB_DeactivateEndpoint
08002a5d 000001f8 t pin_SetF1AFPin
08003dd5 00000288 T USBD_StdDevReq
08001e61 000002c0 T USB_ActivateEndpoint
08001ac9 000002d8 T HAL_UART_IRQHandler
080012a9 00000338 T HAL_RCC_OscConfig
080023d9 000004e4 T USB_EPStartXfer
080006c1 000009c4 T HAL_PCD_IRQHandler

The STM32 "HAL" (Hardware Abstraction Layer") code is well known to be pretty "bloated" (check out that 800+ bytes to configure the clock (HAL_TCC_OscConfig)!), but it's essentially the price you pay for wide device support. If you only need blue pill compatibility, you can probably save a lot of space using the older Roger Clark core
(also not as easy to install, and I don't know if it'll support all the libraries you want to use.)