I want to hand-optimize the assembly code generated by the compiler. I have spent weeks looking for this on google, to no success. Can anyone help?
Thanks!
I want to hand-optimize the assembly code generated by the compiler. I have spent weeks looking for this on google, to no success. Can anyone help?
Thanks!
http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1281448690/0
Read this, then in AvrStudio is alot easier to access the assembly file to edit it, but I have doubts that you can optimize the assembly any more than the gcc c compiler, you can optimize the serial, 2wire, adc and some others functions provided by the IDE as c++ and avr's are less efficient than pure c, and taking out a lot of crap that is actuality in those functions as they are made to be easy to use and not made to be code efficient
Is there any other way of doing this more directly, as I run linux/OSX and cannot install AVR Studio?
Thanks again, and thanks for the quick reply.
Hand optimizing compiler code is a bad idea. This will lead into big trouble. A better idea is to implement the really critical stuff in assembler and use C for the rest.
Learn about direct port manipulation first. You will need it anyway if you intend to code assembler.
This will give a significant performance boost over the digitalWrite stuff.
If this is not sufficient then you might want to know this:
http://www.atmel.com/dyn/resources/prod_documents/DOC1022.PDF.
Of course you will also need to read and understand the datasheet.
If you want to do just some part in assembler you can do this with gcc. Google for avr gcc assembler to find lots of useful links like those below:
http://www.nongnu.org/avr-libc/user-manual/inline__asm.html
Be prepared for quite some deal of confusion and frustration. This is considered part of the fun
Udo
In older versions of the IDE the complete hex, map, asm, lst, and all the other files generated by the compiler were easily accessed, now they are a bit more hidden, but search in the forum how to discover then and then you just need to edit the .asm, recompile via command line and upload via command line using the avrdude, but the gcc compiler will generally out-smart you, as it is pretty decent in doing its thing, you can as said use inline assembly in your c code, look this nice tutorial about that:
Hand optimizing compiler code is a bad idea. This will lead into big trouble. A better idea is to implement the really critical stuff in assembler and use C for the rest.
The proper way to optimize your code is:
Profiling the code means setting a timer at some "beginning" and then comparing the amount of time it takes for the code to get to an "ending" point. Say you wanted to know how long a loop takes to execute: Set a timer at the beginning, and after the loop, look at how much time has elapsed. On the Arduino, the millis()/micros() functions can be used for this.
Once you've located your "slow" areas of code, optimize them, starting with the code inside inner loops (and perhaps the loop itself). You might find that you can use inline assembler, and "unroll" the loop, depending on its complexity (I am not sure if the optimizer in the compiler will do loop unrolling - something to check out). Some loops can be unrolled, other can't. What "unrolling" means is, say you have a loop that add a value to a variable 10 times. Instead of this:
int x = 0;
for (int i = 0; i < 10; i++) {
x++;
}
You could do this:
int x=0;
x++;
x++;
x++;
x++;
x++;
x++;
x++;
x++;
x++;
x++;
}
of course, that could be simplified into:
int x=0;
x += 10;
...but that is a very simplified example; the idea being, that, instead of implementing a loop and such, you duplicate the code over and over again for the number of iterations of the loop instead; typically, you would do this on the assembled version of the code - and incorporate that unrolled bit as inline assembler. Unrolling regular C code probably won't help much.
You want to concentrate on the code in the inner loops first, because they will ultimately cause the most bang-for-the-buck when you can optimize them.
Also keep in mind that it is possible to "over optimize" a program; in fact, it is possible to optimize a set of code in such a way that the compiler tries to do its thing and makes it run slower. Keep this in mind; this is why it is important to profile your changes after you have done them, and keep records of what you have done - so you know if a given optimization has made a real difference.
Keep the good optimization in; if an optimization only shaves off a slight bit of time, it probably isn't needed, unless it is a very time-critical piece of code. Also, document in the code with comments the optimizations you make and why, especially if the code seems obscure or hard to read. You might find that in the future you need to update it, and not understand what you were doing when you made the optimization (at best, you will waste a fair bit of time getting back in the groove to make your mods).
Repeat all of this until it is "good enough"; if it performs the job properly, and fast enough for it to do its job while not frustrating the user, it is probably as fast as it needs to be. Leave it alone at that point, consider it "done". Don't over-optimize, trying to shave off a millisecond here or there (unless, as noted, the code is time critical and every millisecond counts). You'll probably not need to optimize the entire assembled version of the code at all, but if you do want to, follow the above steps again; likely you'll do some loop unrolling (more common in assembler), and you might find that you can insert that same unrolled assembler inline in your C code instead. Do that, keep it neat, add comments to the code as to why the inline assembler is there.
@cr0sh: my point was that hand optimizing compiler generated assembler code is a bad idea. Of course optimizing the source code is good idea. My point is that intermediate machine generated code should be left alone.
With regard to your suggested approach: yes, this is one of the standard approach. Another approach is to
Here the point is that sometimes it pays off to change the architecture / choice of algorithm. This does not contradict your approach, both can and should be utilized.
Udo
Thanks guys, but all I need is how to optimse the assembly, I allready know a lot about the C compiler and a lot of assembly, both for AVR and X86. I shall look for the hex files and use a dissassembler to optimise, or read code with avrdude, then use avrdude to re-upload.
Thanks all!
EDIT:
Sorry for the late reply, my emails wern't coming through so I didn't know I had any replys!
Even simple code seems to take hundreds of instructions, why is this?
Code:
void setup(){
DDRD=1;
}
void loop(){
PORTD=1;
}
Although this code has no use at all, it produces a few hundred assembler instructions, which is really unnecessary. This could be achieved with code like:
sbi ddrd,0
loop:
sbi portd,0
rjmp loop
Is this anything to do with the bootloader?
Thanks!
Nothing to do with the bootloader. I'm sure the Arduino IDE links some library or initialization functions as even a minimum sketch shown below compiles to 448 bytes for a 328 chip.
void setup(){}
void loop(){}
Lefty
Even simple code seems to take hundreds of instructions, why is this?
You're missing the difference between a few instructions, and a C program. Don't forget your:
void setup () {}
void loop () {}
is just part of a C program :
main ()
{
init ();
setup ();
for (;;)
loop ();
}
so most of your "hundreds of instructions" are probably in "init()" or even before "main".
You need to work out what you want to "optimise" and what those optimisations (speed, program memory, RAM memory...) need to be.
Even simple code seems to take hundreds of instructions, why is this?
It doesn't. A program containing simple code ends up containing hundreds of instructions, because it includes vectors, startup code, libraries, and core operating functions (like a timer ISR to count millis()), but your "simple code" actually compiles to very few instructions.
The first thing to learn to do if you want to optimize your code is to analyze the final program with objdump. It's been discussed somewhat in earlier messages; search the forums...
Your sample program compiles 456 bytes. A completely empty Arduino sketch compiles to 448 bytes. So your example code is 8 bytes (4 instructions.)
Note that your sample sketch and your sample assembler don't quite do the same things. If I correct the C to use "|=", it compiles to 452 bytes (two instructions); it's hard to imagine doing much better than that..
I've posted a detailed "accounting" of the 400+ bytes that make up an "empty" sketch, here: http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1283329855
Sorry for the very late reply, been very busy recently.
Thanks all for your help.