Programming efficiency

Working with limited memory ...exploring more efficient programming. I’ve noticed port manipulation makes a difference in compile size. Does the compile size get smaller using assembly? At that everything looks alien phase. Curious

wolframore:
Does the compile size get smaller using assembly?

Always.

it's funny you mentioned that 'alien phase'

I'm not sure I agree with that - I'm not sure you can do better in assembly vs direct port manipulation from C.

I apologize. I misread the question.

I have 3 channel switching in port manipulation that compiled to 138 bytes. There are some interesting byte variables for tracking time with nested loops. Arduino blink compiles to around 1k. Not convinced that assembly would reduce size since commands and variables have a cost. In any case every time I learn something new I find tools that can make life easier (or harder) looking forward to the challenge.

@Coding - no apology necessary, I’m sure assembly compiles smaller than similar code written in Arduino C. My question wasn’t phrased well. Words are not my strength.

I would put effectiveness a long way ahead of efficiency.

And it is much easier to define effectiveness - does it actually do what you want?

There is little point wasting time or brain power reducing the size of a program that already fits comfortably in Arduino memory or reducing the number of machine cycles in a program that already runs fast enough.

On the other hand there can be a lot of advantage in ease of maintenance and future development by avoiding the use of clever techniques that save a byte here or a machine cycle there but which make a program hard to understand. Write code that is easy for humans first. Only optimize if it is essential to get the program to meet the performance spec. And don't make the spec any more demanding than is actually required.

...R

Hi Robin... I agree with your logic. the 328 has 32K of memory... my first computer was 8 bit with 16K ram and it was plenty for a lot of amazing things. And I suppose on the other side you could move to a Cortex push 48 MHz and with 256K you could push all the bloated codes you want with no ill effects.

I'm trying to push the ability of some of the lower chips 1,2, 4, and 8K for various reasons. I'm looking for efficiency. Nice thing is if you start with a good series they can be upgraded to the next size up being pin compatible... but it bumps up pricing a bit each time... until the point where I wonder, why didn't I just use a cheap STM32 M0 (there are some reasons but too much detail and not relevant to our topic)

Constraints:
1.size
2.cost
3.power

for which the 8 pins SOIC and TSSOP varieites would work... Also looking at the 14 pin ones in 20 pin TQFN as a possiblity... but I don't need all those pins.

Just trying to squeeze the last drop out.

At some point there are diminishing returns but I'm not there yet.

So to get back to the point - are there any other ways of getting more for less flash memory???

wolframore:
So to get back to the point - are there any other ways of getting more for less flash memory???

Use the appropriate datatype. If a value is a byte-sized, use a byte-sized datatype.

Tell the compiler. This...

byte x;
byte y;
byte z;

z = x + y;

...generates more code than this...

byte x;
byte y;
byte z;

z = (byte)((byte)x + (byte)y);

...even though they are functionally equivalent.

Turn off / disable automatic inlining. The compiler favours speed over size just a bit too much for programming-in-the-very-tiny.

hmm.... I would have thought that the byte was set in declaration so functionally and otherwise the two are identical... I'm curious to test that and figure out why...

I have had issues where the variable overflows and having to change type or range to make it work. OK my theory is that the summation can convert the bytes into a larger variable which is then truncated as byte on the output... does that sound right? Therefore the declaration of byte from the summation of x and y removes any doubt and any placeholding for the overflow. Would this be the same as your second example?

byte x;
byte y;
byte z;

z = (byte)(x+y);

Tell the compiler. This...

Code: [Select]

byte x;

byte y;
byte z;
z = x + y;



...generates more code than this...
Code: [Select]


byte x;
byte y;
byte z;
z = (byte)((byte)x + (byte)y);



...even though they are functionally equivalent.

No, it doesn't. :slight_smile:

I entered your code in the compiler explorer to check. You can see the code AVR-GCC generates for this example here Compiler Explorer. Both versions end up with exactly the same assembly.

Huh. Looks like the optimizer has improved.

But back to the issue at hand, there is usually a trade-off already between Speed & Flash usage, Speed & RAM usage and RAM & Flash usage. This all depends on the program (and on the chip used)
And there is the consideration That Robin mentioned about clear programming style (readable) and a faster or smaller program. (also depending on the chip) For instance i wrote a program for addressable led controlling, and on an Arduino 328 atmega, RAM space is limited, that the input buffer ended up being doubled up as the output buffer (receiving 512 bytes of DMX and sending neopixel signal i had space for 170 extra RGB leds ) Speed did not turn out to be a very scarce commodity, though i avoided using Float computations as much as possible and did things like uint8_t x = (uint16_t) y * z / 255; to perform a 16-bit computation on 8 bit values (rather than using 16-bit variables throughout the program if not required)
If several boolean variables a needed you can save them in a single 8-bit variable at the expense some flash space (and readable code)
For some patterns the functions where so similar that they could be considered versions of the same thing, and by adding a parameter i could use big parts of the same function saving a lot of flash memory (but less readable code) All in all 32kb Flash is already quite a lot of space if you write most of the code ( twice the amount i had on my zx81) 2kb RAM is not an awful lot (my zx81 actually didn't really have any flash, it was all RAM divided between the program, the screen and variables) and the 328 is quite fast (though not the fastest out there by a mile ...) i share you passion for minimalism and the ATtiny melody player was quite a fun project, amazing what you can do on 1 KB ..

Yes, nowadays you practically can't beat the compiler.

My favorite example is how the compiler handles the gaussian sum (1+2+...+n). Looks like it is at least as smart as Gauss was :slight_smile:

constexpr int gauss(int n)
{  
  return n == 0 ? 0 : n + gauss(n-1);  // recursively calculate the gaussian sum 1 + 2 + 3 + ... + n
}

int main() 
{      
    const int sum = gauss(100);  // result should be 5050
    int array[sum];
    return sum; 
}

Here the funny result: Compiler Explorer

Hehe smart compiler. That is a great bit of programming wizardry.

Deva i never imagined that 1K flash would be of any use. Now I’m having fun trying to push it to its limits. I’ve designed a little board with a power supply and switched outputs for some experimenting. I have a severe dislike of breadboards.

I’m sure that I will end up using 4 or 8k versions in the end since I keep getting more and more requirements being requested but I really like the simplicity of the Tiny13s. I’ll get there and having fun at it.

i come here from c# where you are working with 1000* more ram and speed. now i am now looking to do shortened versions of complex tasks with far far less resourses with arduino. if you are doing simple typical stuff like blinking a couple lights or spinning a couple motors i agree with robin... if it works then who cares?

but since these boards are limited if you are looking to do anything advanced then profficiency is probobly more important than any other type of programming.

but how basic operators work or how the compiler is responding to certain things... let it go.
the freindly expert developers at arduino have worked and still work to make the compiler as profficient as possible. let that be there job.

of course there are some common knowledge things that hold true accoss any language such as multiplication before devision or using the smallest data type.

if you are worried about profficientcy in your project focus on the overall approach or "system" you build with your code to call calculations as little as possible and using the smallest amount of variables you need when you code. and you can do a lot with these little boards.

speed and variable limits are the two challenges exclusive to arduino.

AFAIK the compiler does not include code for unused functions or variables.

Try it out, IIRC using Serial costs about 2 or 3K.

@smoke, I’m certain that’s the case. Compile an empty setup and main loop. It’s the minimum compiled size. It’s been pointed out compiling the basic blink it takes up around 1K due to all the fuse setting and other steps that digitalWrite() does to make sure a noob won’t burn it out from the get go. The issue is that all these safety codes are run each time slowing down a simple blink and bloating the compile. The solution is to just toggle the pin but it requires that the fuses are set correctly. For prototyping digitalWrite is fine, it will allow you to quickly check something.

There are situations that require a lower level approach should be explored:
If project shows a slow down while the code gets bigger
If it’s vital to sample or output at very high frequencies.
If your code has grown is now in danger of being too big for your chip
If you’re just OCD and love this stuff
You’re the guy that opened up things as a kid

I’m not here to say this is the best approach or the only way to do thing. It’s just what I would like to do and finding some interesting benefits.

There are other ways to program and compile than using the Arduino IDE. Arduino is great but it was designed to make things easy. Easy is not always most efficient. Look at CircuitMaker it uses visual blocks to code. It works but... anyhow it’s great way to get beginners to play with Arduinos. I got my son to finally program an Arduino but hope he starts to dig into the code and not just keep playing with the blocks.

There are a lot of alternative ways to code a given algorithm or task, and that can have a huge impact on code size or speed. Beginners here often opt to use the String class over char arrays, perhaps because of some previous coding experience with another language (e.g., VB). We have also seen many examples where sprintf() is used when a series of str*() calls could save about 1K of memory. We have all seen algorithm choices based upon a misunderstanding of the issues at hand (e.g., Boyer-Moore string search over brute force, Shell sort over Bubble sort). All of us here have helped newbies see the light in one or more such areas, but a good measure of writing good code boils down to a better understanding of the environment you're working in...it's a learning experience.

You might want to take a look at:
Analysis-of-an-empty-arduino-sketch
(also available, more or less, here: https://forum.arduino.cc/index.php/topic,5680.0.html )

In general, you can "easily" get assembly language to be smaller than C for the case of very small programs, essentially by ignoring a bunch of principles that make larger programs work better. You can easily have global register variables, different calling conventions for each subroutine, minimize each operation to exactly what is needed, and etc. (Arduino's digitalWrite() function isn't big and slow because its authors were stupid or lazy, it's big and slow because it handles a very GENERAL case of pin-setting...)

But this tends to fall apart for larger programs. They get buggy and difficult to debug or manage.
I might write assembly language in order to fit "as much functionality as possible" into a 1k ATtiny13, but trying to convert a 40k ATmega328 program to fit in the available 32k, by converting Arduino to Assembly Language? Probably not...

There are other issues as well. Your Arduino sketch might easily port to a bigger, faster, board, making and "efficiency improvements" unnecessary. But not an assembly language program...

Example: I've been maintaining the non-Arduino version of Optiboot for a while now. It's binary must be less than 512bytes. Each time a new version of the C compiler comes out, I have to check whether "something has changed" that pushes it over the limit. So I've occasionally considered just re-writing it in assembly language (which would not have that problem.) ("Nerd Ralph" has fit increased functionality in less than 256 bytes, so it's clearly possible...)

But Optiboot now supports "many" different AVR cpus with slightly different architectures, and I'm pretty sure that it would be a big pain to conditionalize the ASM code to support all of their "slightly different" quirks that are made invisible by C...