Finally some sense writing assembly code in a sketch, or is there?

I am desperately trying to use assemler together with c. I copied the following code to a new arduino sketch. The original listing is much longer, but now we are not interested about what it does. I have ArduinoMega2560, but the code is for (a board with) ATtiny13. That is irrelevant though,
I only want to learn how to write sensible assembly code together with c in a sketch.
I was surprised that when I verified the code, I got only some errors:

core.a(main.cpp.o): In function main': C:\JIOY\Arduino\arduino-1.0.1\hardware\arduino\cores\arduino/main.cpp:5: undefined reference to setup'
C:\JIOY\Arduino\arduino-1.0.1\hardware\arduino\cores\arduino/main.cpp:15: undefined reference to `loop'

My Qs:

  1. Why did I get the errors, what do they mean
  2. if defined(AVR_ATtiny13) where is this defined and can/need I use my board and write something like AVR_ATMega2560 ?
  3. Can I really use so neat looking and understandable assembly code in my sketch, or am I just a dreamer?
    Thanks

#include <avr/io.h>

#if defined(AVR_ATtiny13) // what is this?
//can I write #if defined(AVR_ATMega2560) (=my board) instead ?
#ifdef ASSEMBLER // what is this?

define sreg_save r2

define flags r16

define counter_hi r4

#else /* !ASSEMBLER */

#include <stdint.h>

register uint8_t sreg_save asm("r2");
register uint8_t flags asm("r16");
register uint8_t counter_hi asm("r4");

#endif /* ASSEMBLER */

/*

  • Timer 0 hit TOP (0xff), i.e. it turns from up-counting
  • into down-counting direction.
    */
    .global TIM0_COMPA_vect
    TIM0_COMPA_vect:
    in sreg_save, _SFR_IO_ADDR(SREG)
    inc counter_hi
    clr flags
    out _SFR_IO_ADDR(SREG), sreg_save
    reti

/*

  • Timer 0 hit BOTTOM (0x00), i.e. it turns from down-counting
  • into up-counting direction.
    */
    .global TIM0_OVF_vect
    TIM0_OVF_vect:
    in sreg_save, _SFR_IO_ADDR(SREG)
    inc counter_hi
    ser flags
    out _SFR_IO_ADDR(SREG), sreg_save
    reti
    // the original listing is much longer, but I only wanted to verify some assembly code

#endif

  1. Why did I get the errors, what do they mean

You get the errors about the missing setup() and loop() functions because you don't have setup() and loop() functions. Duh.

  1. if defined(AVR_ATtiny13) where is this defined

It's based on the board selected in the IDE, prior to your attempt to compile, link, and upload.

  1. Can I really use so neat looking and understandable assembly code in my sketch,

Not by itself. On the other hand, why?

or am I just a dreamer?

Only you can answer that.

PaulS:

  1. Why did I get the errors, what do they mean

You get the errors about the missing setup() and loop() functions because you don't have setup() and loop() functions. Duh.

Nope. Verify gives the same errors even if I add setup() and loop()

Actually I noticed that the verify doesn't report any errors even if I write nonsense like: " compile this please" in the assembly code, hah?? So I am sure that no code is generated from the assemböy instructions.

  1. Can I really use so neat looking and understandable assembly code in my sketch,

Not by itself. On the other hand, why?

Why I like to use neat assembly code?
Because it's neat and I want to use assembler.
Why is it so damn difficult to use assembler in this environment?

It's not hard. Here's some examples:

Thanks CR!
Now the mystery lies in variables like %0, %1.
Is it that the following statements tells something about them?
:: "I" (_SFR_IO_ADDR(DDRB)), "I" (DDB5)
Why these are there:?
:: "I" (_SFR_IO_ADDR(PORTB)), "I" (PORTB5)
: "r18", "r20", "r21", "r24", "r25
I already managed to use the registers R0-R31 but didn't find out how to use the i/o register names in asm.g

I find it a bit odd that registers Rx have to be clobbered like:
::: "r24", "r25"
I think it's done so that the c-compiler knows it has them free for use.
I can't imagine situations other than interrupt routines where there could be a problem, but a few PUSH and POP instructions could easily save and restore registers. I have always pushed and popped used register when programming with assemblers. (well, not always, heh)

I don't know. I've only added some NOPs after a write to the SPI register to make sure the next write could happen as fast as possible, something like:

SPDR = array[0]; nop; nop; .... nop; (17 total I think)
SPDR = array[1]; nop; nop; .... nop; (17 total I think)
:
:
SPDR = array[15]; nop; nop; .... nop; (17 total I think)

After that, direct port manipulation is about as close to assembly as I've come since programming Z8080's back in college (fun - but not needed for Arduino).

Direct port manipulation is essentially assembler.

IMO there is no way to do neat inline ASM, it's a total dog's breakfast. The best way is to link in a .S file and write your code properly in a separate file. It's easy on the LPC environment but I don't know how to do it with the Arduino toolchain. Expect to dust off your makefile skills I think.

You can combine multiple ASM instructions in one asm() block,

ISR (WDT_vect, ISR_NAKED) {
  
  /////////////////////////////////////////////
  // DIY prologue so I know exactly where the
  // program counter and working registers are
  asm (
    "push r0\n"
    "in r0,0x3f\n"
    "push r0\npush r1\npush r2\npush r3\npush r4\n"
    "push r5\npush r6\npush r7\npush r8\npush r9\n"
    "push r10\npush r11\npush r12\npush r13\npush r14\n"
    "push r15\npush r16\npush r17\npush r18\npush r19\n"
    "push r20\npush r21\npush r22\npush r23\npush r24\n"
    "push r25\npush r26\npush r27\npush r28\npush r29\n"
    "push r30\npush r31\n"
  );  // 33 bytes on the stack
  // C code now
  byte command, addr_lo, addr_hi, data_lo, data_hi, resp_lo = 0, resp_hi = 0;

But to write an entire program like this is an exercise in frustration I think.


Rob

Your initial example is of "assembler source" suitable for being in a .S file, which is significantly different that the "inline assembler" that you can include in C or C++ code (or your sketch's .ino file.) I'm surprised you don't get more errors...

Write your assembler code in a separate .S file.
See my blog post on writing assembler - straight assembler, not inline assembler:

And note that the Gnu assembler does NOT have quite the same syntax as the Atmel-defined assember.
(OTOH, Atmel assembler programs aren't linkable with C...)

CrossRoads:
I don't know. I've only added some NOPs after a write to the SPI register to make sure the next write could happen as fast as possible, something like:

SPDR = array[0]; nop; nop; .... nop; (17 total I think)

SPDR = array[1]; nop; nop; .... nop; (17 total I think)
:
:
SPDR = array[15]; nop; nop; .... nop; (17 total I think)



After that, direct port manipulation is about as close to assembly as I've come since programming Z8080's back in college (fun - but not needed for Arduino).

That's pretty tight timing when you can't afford the cycles to check SPIF. It only adds a few cycle delay.

Wait_Transmit:
in r16, SPSR
sbrsr16, SPIF
rjmp Wait_Transmit

Yes it is. I am setting up 45 shift registers to be updated at 20 KHz rate.
The above method yielded about 47uS to load all 45, leaving time to get ready to act upon an interrupt and create the output register clock signal, then start the load of the next 45 again.
Using a for:loop added too much time.
I even disabled interrupts while loading because the micros() tick was throwing off the data going out.

The GNU asm extension is documented in the GCC manuals (though frankly it is tough sledding for many people, as the syntax comes from the internal way instructions are emitted in the final phase of the compiler): Extended Asm - Using the GNU Compiler Collection (GCC) (and follow on to the next page for Constraints, and go to the sub-pages).

it is tough sledding for many people

Ain't that the truth :slight_smile:

Give me an S file and "normal" ASM any day, not that I've found the need for ASM for quite some time and don't expect to again.

Thanks for that link Ralph, filed in case I ever do need ASM again.


Rob

CrossRoads:
Yes it is. I am setting up 45 shift registers to be updated at 20 KHz rate.
The above method yielded about 47uS to load all 45.
Using a for:loop added too much time.

Using an assembly loop with the check/branch would be no more than 20 cycles per byte, or 56.25uS. If that's still to slow, then you could write a loop that takes exactly 16 cycles (writing SPDR every 16 cycles). That would give you a total of exactly 45uS.

Or if you use an ATtiny with USI, you can do it twice as fast - USI can do one shift per clock cycle.

  // three wire mode, clock from Counter0 compare match
  USICR = (1<<USIWM0)|(1<<USICS0);

Counter0 needs to go off every cycle:

  TCNT0 = 0; // reset counter
  TCCR0A = (1<<WGM01); // turn on CTC mode
  OCR0A = 0; // clear when count reached
  TCCR0B = (1<<CS00);  // turn on and clock from i/o clock

USI doesn't generate a clock for you, so you'd have to generate one, perhaps with the CKOUT fuse set and a transistor to tun it on/off. That would get your 45 shift registers filled in 22.5uS. And if you run your AVR at 20Mhz, you're down to 18uS.

But I need '1284P so I have the 16K SRAM to use for a very large array: 45 bytes x 325 rows.
Having the 45 SPDRs with NOPs in loop of 325 worked out well.
I start testing with full up hardware tomorrow, so we'll see! Tests with logic analyzer and 200 MHz scope on subset of hardware looked good in initial prototyping before hardware was ordered & assembled.

There is also this manual section for inline assembler in avr-gcc: Inline Assembler Cookbook

Graynomad:
Give me an S file and "normal" ASM any day, not that I've found the need for ASM for quite some time and don't expect to again.

I doesn't expose low-level (and useful) things like the carry flag. I know of no way to check for integer overflow in C, yet it is extremely simple in assembler.
There doesn't seem to be any way to tell gcc you want to reserve registers. Say I want to use GPIOR1 in an ISR - I can't because gcc may be using it (I've heard it does use GPIOR0 for bitfields so it can use sbi/cbi to manipulate them).
I also find avr-gcc doesn't optimize very well. For most people most of the time it's good enough, but it annoys me that I can generally cut the size down by 10-20% while still sticking to gcc's register allocation rules.
What I'd really like to have is an enhanced assembler that would allow use of 8 and 16-bit variable names and do register allocation and optimization.

CrossRoads:
But I need '1284P so I have the 16K SRAM to use for a very large array: 45 bytes x 325 rows.

Wow. What is it? I can't imagine it being a 360x325 LED display, as powering something like that would be quite a challenge.