Go Down

Topic: Using assembly NOP for accurate code timing... (Read 10926 times) previous topic - next topic

LongTom

Evening all,

I've spent the last few evenings trying to figure out how to precisely time sections of my code with assembly NOP instructions.  I'm struggling because I'm not getting the results I'm expecting.  I'm using an Uno board...

Initially I was adding some NOP instructions to my code to see how the assembly is affected and was suprised that the NOP's were not appering in the listing as I expected.  A single instruction appears but multiple instructions just put a "..." in the listing... Can anyone explain this?

I am already aware of the 4us resolution limit of the micros() funcion but I'm still confused at the results I'm getting.

This code prints "took 20us":
Code: [Select]

unsigned long start =  micros();
delayMicroseconds(16);
unsigned long finished = micros();
Serial.print("took ");
Serial.print(finished - start);
Serial.println("us");


But this code prints "took 4us":
Code: [Select]

unsigned long start =  micros();
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
unsigned long finished = micros();
Serial.print("took ");
Serial.print(finished - start);
Serial.println("us");


Can anyone explain why?  Are the NOP instructions being optimised out by the compiler? I didn't think that should happen...

If anyone can point me to a good resource for learning about the listing files generated by avr-objdump or for learning how I can understand and control the timing of my code that would be amazing.

Pulling my hair out!

Cheers,
Tom

holmes4

How many nop's to the micro second do you think there are?

Mark


nickgammon

You might want to use the -z flag for avr-objdump (since a NOP is zero):

Code: [Select]
  -z, --disassemble-zeroes       Do not skip blocks of zeroes when disassembling
Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

nickgammon

If you use IDE 1.5.8 (or higher) you can use DELAY_CYCLES to delay a specific number of cycles.

Code: [Select]

#define DELAY_CYCLES(n) __builtin_avr_delay_cycles(n)


The compiler cunningly generates optimized code to delay "n" cycles, rather than having to use lots of NOPs.
Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

nickgammon

Quote
I've spent the last few evenings trying to figure out how to precisely time sections of my code with assembly NOP instructions.
How does adding NOPs time code? It merely introduces delays.
Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

westfw

Quote
was suprised that the NOP's were not appering in the listing as I expected.
Be sure to include "-z" in your avr-objdump command, or the disassembler will show blocks of zeros (nops) as "...":

000000d0 <loop>:
  d0:   8f 92           push    r8
  d2:   9f 92           push    r9
  d4:   af 92           push    r10
  d6:   bf 92           push    r11
  d8:   cf 92           push    r12
  da:   df 92           push    r13
  dc:   ef 92           push    r14
  de:   ff 92           push    r15
  e0:   0e 94 f2 00     call    0x1e4   ; 0x1e4 <micros>
  e4:   6b 01           movw    r12, r22
  e6:   7c 01           movw    r14, r24
        ...
 108:   0e 94 f2 00     call    0x1e4   ; 0x1e4 <micros>


000000d0 <loop>:
  d0:   8f 92           push    r8
  d2:   9f 92           push    r9
  d4:   af 92           push    r10
  d6:   bf 92           push    r11
  d8:   cf 92           push    r12
  da:   df 92           push    r13
  dc:   ef 92           push    r14
  de:   ff 92           push    r15
  e0:   0e 94 f2 00     call    0x1e4   ; 0x1e4 <micros>
  e4:   6b 01           movw    r12, r22
  e6:   7c 01           movw    r14, r24
  e8:   00 00           nop
  ea:   00 00           nop
  ec:   00 00           nop
  ee:   00 00           nop
  f0:   00 00           nop
  f2:   00 00           nop
  f4:   00 00           nop
  f6:   00 00           nop
  f8:   00 00           nop
  fa:   00 00           nop
  fc:   00 00           nop
  fe:   00 00           nop
 100:   00 00           nop
 102:   00 00           nop
 104:   00 00           nop
 106:   00 00           nop
 108:   0e 94 f2 00     call    0x1e4   ; 0x1e4 <micros>



krupski

#7
Feb 13, 2015, 04:56 am Last Edit: Feb 13, 2015, 05:02 am by Krupski
Evening all,

I've spent the last few evenings trying to figure out how to precisely time sections of my code with assembly NOP instructions.  I'm struggling because I'm not getting the results I'm expecting.  I'm using an Uno board...

Initially I was adding some NOP instructions to my code to see how the assembly is affected and was suprised that the NOP's were not appering in the listing as I expected.  A single instruction appears but multiple instructions just put a "..." in the listing... Can anyone explain this?

I am already aware of the 4us resolution limit of the micros() funcion but I'm still confused at the results I'm getting.

This code prints "took 20us":
Code: [Select]

unsigned long start =  micros();
delayMicroseconds(16);
unsigned long finished = micros();
Serial.print("took ");
Serial.print(finished - start);
Serial.println("us");


But this code prints "took 4us":
Code: [Select]

unsigned long start =  micros();
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
__asm__ __volatile__ ("nop\n\t");
unsigned long finished = micros();
Serial.print("took ");
Serial.print(finished - start);
Serial.println("us");


Can anyone explain why?  Are the NOP instructions being optimised out by the compiler? I didn't think that should happen...

If anyone can point me to a good resource for learning about the listing files generated by avr-objdump or for learning how I can understand and control the timing of my code that would be amazing.

Pulling my hair out!

Cheers,
Tom
At 16 mhz, each NOP takes 62.5 NANOseconds.

Doing the math, you find that you need 320 NOPs to generate a 20 usec delay.

Where do the numbers come from? A NOP takes 1 CPU cycle, so a NOP needs 1 / 16e6 seconds = 62.5 nsec.

You want 20 usec, so 20e-6 / 62.5e-9 = 320, therefore you need 320 NOPS.

(edit to add): I just tried it here and got 24 usec for 320 NOPs. I guess the extra 4 come from the overhead of getting the start time, then calculating the run time.
Gentlemen may prefer Blondes, but Real Men prefer Redheads!

larryd

One trap can be interrupts may effect times (may not be in this case though).
Turn interrupts off, then do nops, then turn interrupts on.
CLI
nop
.
.
.
nop
SEI
No technical PMs.
If you are asked a question, please respond with an answer.
If you are asked for more information, please supply it.
If you need clarification, ask for help.

nickgammon

Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

LongTom

Wow, thanks for all the helpful responses :D

How many nop's to the micro second do you think there are?

Mark
I think there should be 16 NOPs per microsecond.

Cycle counting info here.
Looks like a great resource, I will have a read tonight, thanks!

You might want to use the -z flag for avr-objdump (since a NOP is zero):

Code: [Select]
 -z, --disassemble-zeroes       Do not skip blocks of zeroes when disassembling

Amazing, this is exactly what I needed! I actually check the switches but somehow didn't notice the -z option.  Tested and confirmed I can see the NOPs in the assembly output now :D

How does adding NOPs time code? It merely introduces delays.
I am trying to interface with serial-like hardware which requires very precising timing of high and low voltages on the line.  In some cases the line needs to be held high for exactly 4 microseconds for example.  Looking at the assembly generated and carefully structuring the code, adding NOPs where required, I think I can achieve this?

At 16 mhz, each NOP takes 62.5 NANOseconds.

Doing the math, you find that you need 320 NOPs to generate a 20 usec delay.

Where do the numbers come from? A NOP takes 1 CPU cycle, so a NOP needs 1 / 16e6 seconds = 62.5 nsec.

You want 20 usec, so 20e-6 / 62.5e-9 = 320, therefore you need 320 NOPS.

(edit to add): I just tried it here and got 24 usec for 320 NOPs. I guess the extra 4 come from the overhead of getting the start time, then calculating the run time.

Thanks for this, it did help, though you are working in nanoseconds whereas I am in working in microseconds.  This is my basic calculation:

16Mhz is 16000000 cycles per 1 second
1 second is 1000000 microseconds
16000000 / 1000000 = 16 cycles per 1 microsecond
1 / 16 = 0.0625 microseconds for one cycle
NOP is 1 cycle so 16 NOPs would be 1 microsecond

One trap can be interrupts may effect times (may not be in this case though).
Turn interrupts off, then do nops, then turn interrupts on.
CLI
nop
.
.
.
nop
SEI
Actually I was already using noInterrupts() and interrupts() functions, I just forgot to mention in my original post.  Thanks for pointing that out!

jboyton

I am trying to interface with serial-like hardware which requires very precising timing of high and low voltages on the line.  In some cases the line needs to be held high for exactly 4 microseconds for example.  Looking at the assembly generated and carefully structuring the code, adding NOPs where required, I think I can achieve this?
What do you mean by "exactly"?

The resonator in a typical 16 MHz Uno doesn't cycle exactly 16 million times per second. There will be thousands more or fewer cycles each second, depending on the particular resonator and its temperature.

An alternative to using instructions is to use one of the timers. The timers can be programmed to increment at full clock speed.

An example of instruction based delays is the system SoftwareSerial library. I read somewhere that these routines were hand tuned using an oscilloscope.

krupski

One trap can be interrupts may effect times (may not be in this case though).
Turn interrupts off, then do nops, then turn interrupts on.
CLI
nop
.
.
.
nop
SEI
Nope. Tried it - still an extra 4 usec.
Gentlemen may prefer Blondes, but Real Men prefer Redheads!

krupski

What do you mean by "exactly"?

The resonator in a typical 16 MHz Uno doesn't cycle exactly 16 million times per second. There will be thousands more or fewer cycles each second, depending on the particular resonator and its temperature.

An alternative to using instructions is to use one of the timers. The timers can be programmed to increment at full clock speed.

An example of instruction based delays is the system SoftwareSerial library. I read somewhere that these routines were hand tuned using an oscilloscope.
I bought a bunch of 22.1140 mhz crystals (yes crystals, not resonators) in the 2 SMD package and replaced the resonators on all my Arduinos with crystals, then burned a new bootloader with the proper F_CPU.

They all work great, and 22.1140 is an exact integer multiple of common baud rates, so my baud rates are exact, not approximate as they are with 16 mhz.
Gentlemen may prefer Blondes, but Real Men prefer Redheads!

jboyton

Nope. Tried it - still an extra 4 usec.
Using micros and a single iteration you're never going to get anything better than +/- 4us.

Go Up