__asm__ __volatile__ question

Hi,

I was running a quick and dirty test where I needed a small delay. So, I did this:

    asm (
    " nop\n"
    " nop\n"
    ..lots more..
    " nop\n"
    );

...and it seemed as though they didn't even exist. They didn't slow down the loop at all!

But this:

    __asm__ __volatile__ (
        " nop\n"
        ...lots more...
        " nop\n"
    );

Worked as expected.

I ASSUME that somehow the compiler "optimized" out my NOP calls or re-arranged them or something. Anyone have an idea or an explanation why the volatile version worked while the "plain" one didn't?

Thanks!

I found this post on avrfreaks to be very helpful in understanding inline assembler: http://www.avrfreaks.net/forum/few-remarks-avr-gcc-inline-assembler?skey=inline%20assembler%20tutorial. Here’s a quote regarding the “volatile” keyword used with asm:

The exact effect of “volatile” keyword in conjunction with “asm” keyword may be different from some expectations (namely it does not provide “code barrier”, preventing reordering of code). It merely tells the compiler (optimizer) not to remove the code, even if it appears to do “nothing” (e.g. if it has no input/output operands (in which case the compiler sets the asm() statement implicitly volatile)) as it may have side effects (similarly to variables tagged volatile and (seemingly redundant) accesses to them). The compiler still can remove such code if it proves that it is never reachable.

The guy seems to love his parentheses. From that quote, it looks like your diagnosis is on the mark.

The Inline Assembler Cookbook for avr-gcc, in its many incarnations, suggests, but doesn’t directly say, something similar - it describes the compiler deleting references to variables that appear to have no effect, and says that it can be persuaded to keep them in the code with “volatile.”

tmd3: I found this post on avrfreaks to be very helpful in understanding inline assembler: http://www.avrfreaks.net/forum/few-remarks-avr-gcc-inline-assembler?skey=inline%20assembler%20tutorial. Here's a quote regarding the "volatile" keyword used with asm:The guy seems to love his parentheses. From that quote, it looks like your diagnosis is on the mark.

The Inline Assembler Cookbook for avr-gcc, in its many incarnations, suggests, but doesn't directly say, something similar - it describes the compiler deleting references to variables that appear to have no effect, and says that it can be persuaded to keep them in the code with "volatile."

Another obscure problem solved! Thanks!

Do you about the _delay_loop_1() function in tools/avr/avr/include/util/delay_basic.h?

... or __builtin_avr_delay_cycles()?

https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/AVR-Built-in-Functions.html

I suggest something else was going on with your code, as the compiler will not typically remove inline asm code, even if it appears to do nothing. My guess is the compiler optimized out something else (DCE/DSE possibly), then decided to remove your naked inline asm code also.

For example, guess how many “nops” will be in this arduino code:

void loop() {
  asm ( "nop\n"  );
  return;
  asm volatile ( "nop\n" "nop\n" "nop\n" );
}

Answer:

000000d4 <loop>:
  d4:	00 00       	nop
  d6:	08 95       	ret

000000d8 <pinMode>:
  d8:	cf 93       	push	r28

oqibidipo: ... or __builtin_avr_delay_cycles()?

https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/AVR-Built-in-Functions.html

So why did Team Arduino even bother with _delay_loop_1()? Just to have something with roughly three times the range?

They (Arduino) didn't -- _delay_loop_1() is part of avr-libc and has been there much longer (since 2002) than the GCC __builtin_avr_xxx() functions which were introduced in version 4.7 (released in 2012).

JimEli:
I suggest something else was going on with your code, as the compiler will not typically remove inline asm code, even if it appears to do nothing. My guess is the compiler optimized out something else (DCE/DSE possibly), then decided to remove your naked inline asm code also.

For example, guess how many “nops” will be in this arduino code:

void loop() {

asm ( “nop\n”  );
 return;
 asm volatile ( “nop\n” “nop\n” “nop\n” );
}




Answer:


000000d4 :
 d4: 00 00       nop
 d6: 08 95       ret

000000d8 :
 d8: cf 93       push r28

This is quite strange… look at this source code snippet (compiled with -O3 using AVR-GCC v4.9.3:

        sleep_cpu(); // power down cpu (all clocks stopped)

        // cpu is now asleep - awaiting pushbutton LOW to fire INT0

        // debounce start button in case we use it for other
        // stuff such as tying it to the "select" pin.
        x = 10; // initial debounce count
        while (x--) {
            if (IO_PIN & BUTTON) { // if button bounced up...
                x = 10; // ...load debounce count
            }
            delay_msec (5);
        }

        asm (
            " nop\n" // plain nop 1
            " nop\n" // plain nop 2
            " nop\n" // plain nop 3
            " nop\n" // plain nop 4
            " nop\n" // plain nop 5
            " nop\n" // plain nop 6
            " nop\n" // plain nop 7
            " nop\n" // plain nop 8
        );

        __asm__ __volatile__ (
            " nop\n" // volatile nop 1
            " nop\n" // volatile nop 2
            " nop\n" // volatile nop 3
            " nop\n" // volatile nop 4
            " nop\n" // volatile nop 5
            " nop\n" // volatile nop 6
            " nop\n" // volatile nop 7
            " nop\n" // volatile nop 8
        );

        // send shutter1 or shutter2 command
        send_cmd (((IO_PIN & SELECT) ? shutter1 : shutter2), 3, 0);

        // wait for button to be RELEASED for 100 msec (prevent repeat if button held down)
        x = 10; // initial debounce count
        while (x--) {
            if (! (IO_PIN & BUTTON)) { // if button pressed...
                x = 10; // ...load debounce count
            }
            delay_msec (10);
        }
    }
}

…and here’s what’s in the listing (there are NO NOP’s in delay_msec! source!)

// msec delay using the IR timer interrupt (soft delays are
// inaccurate when an ISR is running and stealing CPU cycles!)
// valid delays are 0 to 65535 (0.0 to 65.535 seconds)
void delay_msec (uint16_t msec)
{
    while (msec--) { // for each millisecond
 1e6:   89 f7           brne    .-30        ; 0x1ca <main+0x54>
        // cpu is now asleep - awaiting pushbutton LOW to fire INT0

        // debounce start button in case we use it for other
        // stuff such as tying it to the "select" pin.
        x = 10; // initial debounce count
        while (x--) {
 1e8:   61 11           cpse    r22, r1
 1ea:   e9 cf           rjmp    .-46        ; 0x1be <main+0x48>
    ...
            " nop\n" // volatile nop 7
            " nop\n" // volatile nop 8
        );

        // send shutter1 or shutter2 command
        send_cmd (((IO_PIN & SELECT) ? shutter1 : shutter2), 3, 0);
 20c:   b1 99           sbic    0x16, 1 ; 22
 20e:   4c c0           rjmp    .+152       ; 0x2a8 <__stack+0x49>
 210:   ae e1           ldi r26, 0x1E   ; 30
 212:   b0 e0           ldi r27, 0x00   ; 0
 214:   63 e0           ldi r22, 0x03   ; 3
 216:   70 e0           ldi r23, 0x00   ; 0
 218:   9d 01           movw    r18, r26
        // incrementable pointer (*command is const)
        cmd_ptr = (uint16_t *) command;
        // this loop exits when read data = 0
        while (1) {
            // flag ISR busy
            busy = 1;

…and again here:

// msec delay using the IR timer interrupt (soft delays are
// inaccurate when an ISR is running and stealing CPU cycles!)
// valid delays are 0 to 65535 (0.0 to 65.535 seconds)
void delay_msec (uint16_t msec)
{
    while (msec--) { // for each millisecond
 294:   89 f7           brne    .-30        ; 0x278 <__stack+0x19>
        // send shutter1 or shutter2 command
        send_cmd (((IO_PIN & SELECT) ? shutter1 : shutter2), 3, 0);

        // wait for button to be RELEASED for 100 msec (prevent repeat if button held down)
        x = 10; // initial debounce count
        while (x--) {
 296:   8f ef           ldi r24, 0xFF   ; 255
 298:   89 0f           add r24, r25
 29a:   99 23           and r25, r25
 29c:   09 f4           brne    .+2         ; 0x2a0 <__stack+0x41>
 29e:   7e cf           rjmp    .-260       ; 0x19c <main+0x26>
            if (! (IO_PIN & BUTTON)) { // if button pressed...
 2a0:   b2 9b           sbis    0x16, 2 ; 22
 2a2:   e7 cf           rjmp    .-50        ; 0x272 <__stack+0x13>
 2a4:   98 2f           mov r25, r24
 2a6:   e6 cf           rjmp    .-52        ; 0x274 <__stack+0x15>
            " nop\n" // volatile nop 7
            " nop\n" // volatile nop 8
        );

        // send shutter1 or shutter2 command
        send_cmd (((IO_PIN & SELECT) ? shutter1 : shutter2), 3, 0);
 2a8:   a4 e7           ldi r26, 0x74   ; 116
 2aa:   b0 e0           ldi r27, 0x00   ; 0
 2ac:   b3 cf           rjmp    .-154       ; 0x214 <main+0x9e>

000002ae <_exit>:
ENDF _exit

AVR assembler is SO alien… I’m used to Motorola and Intel… where it makes sense. I have no idea WTH is going on here.

Are you certain there are no NOPs in there? . . .

 1ea:   e9 cf           rjmp    .-46        ; 0x1be <main+0x48>
    ...
            " nop\n" // volatile nop 7
            " nop\n" // volatile nop 8
        );

        // send shutter1 or shutter2 command
        send_cmd (((IO_PIN & SELECT) ? shutter1 : shutter2), 3, 0);
 20c:   b1 99           sbic    0x16, 1 ; 22