Disclaimer: I'm very new to the Arduino and the ATMega328P.
I recently purchased my first Arduino Uno and am having a lot of fun with it, however I've come across a problem and could not find a solution for it. When I upload a sketch from the Arduino IDE everything works perfectly and when I press the on-board reset button it resets and starts the loaded program again.
Since I wanted to try and do some assembly I also installed AVRstudio and followed a guide to upload the generated HEX files to the Arduino through the USB-connection. Again everything works fine and the program loads when I upload it, but when I press the on-board reset button or toggle the power to the board it takes almost an entire minute for it to restart the loaded program.
Now I don't know whether it is my code, the way in which I upload it, or something else that causes it. For completeness I will include my code for a blinking LED. Any tips besides the reset problem are also more than welcome!
.equ LED_dir = DDRB + 0x20
.equ LED_pos = PORTB + 0x20
ldi r16, 0x03
sts TCCR0A+0x20, r16 ;set timer 0 control register A
sts TCCR0B+0x20, r16 ;set timer 0 control register B
ldi r16, 0xfa ;r16 = 250
sts OCR0A + 0x20, r16 ;OCR0A – Output Compare Register A, set to 250
ldi r16, 0x02
sts TIMSK0, r16 ;Bit 1 – OCIE0A: Timer/Counter0 Output Compare Match A Interrupt Enable
ldi r17, 0x20
sts LED_dir, r17
sei ;enable interrupts
cpi r27, 0x01
cpi r26, 0xf4
ldi r26, 0x00
ldi r27, 0x00
sbrs r18, 0 ;branching if LSB is set
sts LED_pos, r17
ldi r18, 0x00
sts LED_pos, r1
ldi r18, 0x01
adiw x, 0x01
Assembly is generally the dark art of experts; usually used on parts with extremely limited flash to squeeze in a bit more code (this is less effective than it used to be as compilers have gotten smarter), or when you need to know exactly how fast something executes (ie, you are using the instruction execution as timing, ex, to control neopixels or something else with a single wire interface). The latter case is generally handled via "inline assembler" - most of the program is written in C/C++, but the timing critical part is written in assembly.
The much more common use case for knowing assembly is being able to read it - particularly in the context of interpreting assembly listings (my cores output these to sketch directory when you do "export compiled binary") - this is very useful when you are trying to cut flash usage. You don't strictly speaking need to be able to read it to get use out of this, but it helps to understand what it's doing, rather than just being able to see which functions are taking up the space.
An example of inline assembly is here:
Very well commented code (it's my optimized version of Adafruit_NeoPixel - I sure didn't write those beautiful comments)
The relevant part is show(), starting on line 101 - there's some lovely commentary explaining what's going on. Notice how starting at line 153 it uses #if to choose the right block of hand-tuned assembler for the clock frequency. A couple of particularly interesting parts not as exhaustively commented on:
Compare the length of the assembly for 16MHz and 8MHz - at 8MHz we have to have 10 instructions per bit, at 16 we need 20. And yet, the assembly for 8MHz is much longer (and the generated code is larger) - this illustrates the technique known as "loop unrolling"
For speeds lower than 14.7ish, we need PORT-specific implementations (this was part of the impetus for tinyNeoPixel - Adafruit_NeoPixel allowed you to output on any pin (hence, a pin on any port), and did not require this to be set at compile-time. Which meant that multiple copies of that same block of assembly were needed, one for each PORT, and each of these copies would eat some flash - and there isn't a way to get a #define from the sketch into the library - but since it's integrated with the core, this was achieved with a tools submenu.
Finally, note the block at line 292 which is commented out. I had attempted to slow down the 8MHz code for 10MHz. I had not taken note of the comments though - the resulting routine was too large to use a relative branch to jump back to the start. The approach I went with further down was based on slowing down the 12MHz code instead.
And yes, this probably doesn't make that much sense to you if new to embedded programming, but this illustrates the depth of understanding you need to have in order to be writing assembler under conditions where it makes sense to do!