Cannot call an assembler function more than once

Hi all,

I've got some AVR assembler code and I'm having trouble with one function. See this code:

static inline void delay_us (uint8_t usec)
{
    __asm__ __volatile__ (
        " push r16\n"
        " push r17\n"
        " ldi r16, %0\n"
        "l2: ldi r17, 4\n"
        "l1: dec r17\n"
        " nop\n"
        " brne l1\n"
        " dec r16\n"
        " brne l2\n"
        " pop r17\n"
        " pop r16\n"
        ::
        "M" (usec)
    );
}

If my program calls this function ONCE, it compiles fine. If I call it more than once, I get this error:

~~~
/tmp/ccowYDG1.s: Assembler messages:
/tmp/ccowYDG1.s:340: Error: symbol l2' is already defined /tmp/ccowYDG1.s:341: Error: symbol l1' is already defined
make: *** [ws2812.o] Error 1

~~~

I'm guessing that the compiler is trying to generate one copy of the function for each time it's called, causing the labels to be "redefined".

I'm quite sure what I need is some kind of "local labels" that won't be duplicated, or else some way to make the compiler generate only one copy of the function for each time it's called.

Any ideas or help will be greatly appreciated.

do you need it inline?

J-M-L:
do you need it inline?

I did that so the compiler would not "optimize" anything away. The timing has to be VERY precise.

Besides, I already tried that. Took out "static", then "inline", then both. Same exact problem.

not sure it works with AVR - it would with PICmicro

try with a Local labels then - they have the same syntax rules as global labels, but they must begin with a colon :

static inline void delay_us (uint8_t usec)
{
    __asm__ __volatile__ (
        " push r16\n"
        " push r17\n"
        " ldi r16, %0\n"
        ":l2: ldi r17, 4\n"
        ":l1: dec r17\n"
        " nop\n"
        " brne :l1\n"
        " dec r16\n"
        " brne :l2\n"
        " pop r17\n"
        " pop r16\n"
        ::
        "M" (usec)
    );
}

(have not checked your code exactly just replace l1 and l2 with :l1 and :l2)

J-M-L:
not sure it works with AVR - it would with PICmicro

try with a Local labels then - they have the same syntax rules as global labels, but they must begin with a colon :

static inline void delay_us (uint8_t usec)

{
   asm volatile (
       " push r16\n"
       " push r17\n"
       " ldi r16, %0\n"
       ":l2: ldi r17, 4\n"
       ":l1: dec r17\n"
       " nop\n"
       " brne :l1\n"
       " dec r16\n"
       " brne :l2\n"
       " pop r17\n"
       " pop r16\n"
       ::
       "M" (usec)
   );
}




(have not checked your code exactly just replace l1 and l2 with :l1 and :l2)

Good idea, but sadly it didn't work. :frowning:

OK - you could always branch "by hand" such as

brne PC-1

(double check exactly where PC points to, I think it's still on the current instruction - well what I mean is just use relative indexing with PC to branch to the right place)

(was)
bon, vous pouvez toujours mettre "à la main" genre

brne PC-1

(vérifiez exactement où est PC, je crois qu'il est toujours sur l'instruction courante - bref utilisez un adressage relatif par rapport à PC)

J-M-L:
bon, vous pouvez toujours mettre "à la main" genre

brne PC-1

(vérifiez exactement où est PC, je crois qu'il est toujours sur l'instruction courante - bref utilisez un adressage relatif par rapport à PC)

Désolé, je ne parle que l'anglais et allemand.

I did, however, manage to make it work. It's a kludge in my opinion, but it works.

I looked at the assembler listing for the delay code and saw the branch offsets. So, instead of using labels, I used branch offsets. No labels, no problems.

This one works:

static inline void delay_us (uint8_t usec)
{
    __asm__ __volatile__ (
        " push r16\n"
        " push r17\n"
        " ldi r16, %0\n"
        " ldi r17, 4\n"
        " dec r17\n"
        " nop\n"
        " brne .-6\n" // <- note hard coded offset
        " dec r16\n"
        " brne .-12\n" // <-- note
        " pop r17\n"
        " pop r16\n"
        ::
        "M" (usec)
    );
}

I don't like doing it this way, but it works......

Vielen Dank für Ihre Hilfe! :slight_smile:

Ooops - not sure why I posted in french - fixing it :slight_smile:

Vielen Dank für Ihre Hilfe! :slight_smile:

bitte sehr

your solution is what I was suggesting above in french :slight_smile: The dot is similar to doing the math with PC

You can use numbers as local labels.

static inline void delay_us (uint8_t usec)
{
    __asm__ __volatile__ (
        " push r16\n"
        " push r17\n"
        " ldi r16, %0\n"
        "2: ldi r17, 4\n"
        "1: dec r17\n"
        " nop\n"
        " brne 1b\n"       // jump backward to closest "1" label; forward would be 1f 
        " dec r16\n"
        " brne 2b\n"
        " pop r17\n"
        " pop r16\n"
        ::
        "M" (usec)
    );
}

You can use numbers as local labels.

cool - did not know that one!

J-M-L:
bon, vous pouvez toujours mettre "à la main" genre

...

(vérifiez exactement où est PC, je crois qu'il est toujours sur l'instruction courante - bref utilisez un adressage relatif par rapport à PC)

Guys ... this is the English language part of the forum.


static inline void delay_us (uint8_t usec)

You know, there's a built-in function to delay for X cycles:

 __builtin_avr_delay_cycles(n)

Yes sorry about that - guess I was tired and replied in French by mistake. Then went to edit my post but wanted to leave the original piece otherwise Krupski remark would have been weird.

I did not know that... sure makes things easier!

oqibidipo:
You can use numbers as local labels.

static inline void delay_us (uint8_t usec)

{
   asm volatile (
       " push r16\n"
       " push r17\n"
       " ldi r16, %0\n"
       "2: ldi r17, 4\n"
       "1: dec r17\n"
       " nop\n"
       " brne 1b\n"       // jump backward to closest "1" label; forward would be 1f
       " dec r16\n"
       " brne 2b\n"
       " pop r17\n"
       " pop r16\n"
       ::
       "M" (usec)
   );
}

Thank you! That was ultimately what I was looking for.

Krupski:
I did not know that... sure makes things easier!

The advantage is, the compiler knows what registers it is using, so it doesn't (necessarily) have to push and pop anything. It also adjusts the generated code in such a way that you can get cycle granularity, not just µs granularity.

Interesting results..... here's what one __builtin_avr_delay_cycles() compiles to (or should I say assembles to):

        __builtin_avr_delay_cycles(20);
  94:   86 e0           ldi     r24, 0x06       ; 6
  96:   8a 95           dec     r24
  98:   f1 f7           brne    .-4             ; 0x96 <main+0x1c>
  9a:   00 c0           rjmp    .+0             ; 0x9c <main+0x22>

Notice that the decimal 20 input parameter was changed to 6.
So, I did a bunch of them, left out the ones that added an extra NOP and got this data (first digit is the input parameter, the second is the loop count generated):

** **20, 6 30, 10 50, 16 60, 20 80, 26 90, 30 110, 36 120, 40 140, 46 150, 50 170, 56 180, 60 200, 66** **

Graphing it and doing a linear curve fit gives me these values:

[b]name:     Linear
kind:     Regression
family:   Linear Regressions
equation: m * x + b

Parameters:
b =    -3.331E-01
m =    3.331E-01

[/b]

So, it looks like it takes about 3 cycles each time around the loop, which makes sense... the DEC instruction takes 1 cycle and the BRNE instruction takes 2 cycles if the branch is taken.

All quite interesting.......

      __builtin_avr_delay_cycles(20);

94:  86 e0          ldi    r24, 0x06      ; 6
  96:  8a 95          dec    r24
  98:  f1 f7          brne    .-4            ; 0x96 <main+0x1c>
  9a:  00 c0          rjmp    .+0            ; 0x9c <main+0x22>

OK:

  • LDI - 1 cycle
  • DEC - 1 cycle
  • BRNE - 1 cycle if no branch, 2 cycles if branch
  • RJMP - 2 cycles

So, 5 * DEC/BRNE with a branch (3 cycles each, total of 15 cycles). Add 1 * DEC/BRNE with no branch (2 cycles). Add LDI and RJMP. Total of 20 cycles. As requested. That's the smart sort of thing the compiler does to give you the exact requested number of cycles.

Yup. So I see.

I remember doing a lot of cycle counting when I did Motorola HC11 and 6809 programming. Somehow, the assembler syntax of Motorola (and even Intel) is SO much easier to deal with than the "packed bits" gibberish of Atmel.

I mean.... in Motorola, [b]LDA #$20[/b] assembles to "[b]86 20[/b]". Makes perfect sense. In Atmel ASM, I can imagine it being something like [b]30 E2[/b] <----(whatever THAT means).

Kinda difficult to pop a NOP or a BRN into running code to debug it. Yes, Motorola has a "Branch Never" instruction, a 2 byte complement to BRA (Branch Always) which comes in VERY handy for debugging and for doing the "CMPX immediate" trick of jumping over 2 bytes.

Oh wait... you did Motorola stuff... you already know this, yes?

Yes, I used to hand-assemble Motorola code. :slight_smile: