extra ops between loops?

I wrote a simple code that does nothing but switch a pin on and off via port manipulations, with a one microsecond delay. I then hooked it up to an oscillator and noticed that between loops there seemed to be extra time being inserted.

#define VIDEO_ON PORTB|=B00000010
#define VIDEO_OFF PORTB&=B11111101

#define NOP asm("nop")

inline void one_us()
{
  // 1us / 62.5ns = 16
  NOP; NOP; NOP; NOP;
  NOP; NOP; NOP; NOP;
  NOP; NOP; NOP; NOP;
  NOP; NOP;
  //NOP; remove one to offset time for next instruction to complete
  //NOP; remove one more due to switching time, but will
  //result in slightly less than 1 us (but less than 62.5ns off)
  //~.996ns vs 1.06us, slightly less drift
}

void loop()
{

  VIDEO_ON;
  one_us();
  VIDEO_OFF;
  NOP;
}

If my information is correct, 1 nop should take 62.5ns, however in between the pin going low and high there's more than 62.5ns. The time between high and low is about 1us, as planned.

I expect that there's going to be a jump back to the top, which takes a couple of instructions, but even then I'm measuring ~800ns between loop end and start.

What's going on between loops? Is there anyway to cut down this extra time?

Well, in between the VIDEO_OFF and VIDEO_ON there is a call to a non-inline function, namely loop()!

What happens if you change loop() to:

void loop()
{
  while (true)
  {
    VIDEO_ON;
    one_us();
    VIDEO_OFF;
    NOP;
  }
}

?

Mikal

yes i realize that it calls loop() which takes some time, but it can't possibly take 800ns! that's at least 10 cycles.

interesting, I tried putting that while and it's dropped to 300ns. that's about 4~5 cycles, pretty reasonable for doing a while loop...

edit: I also tried modifying it to:

void loop()
{
  asm("VLoop:");
  VIDEO_ON;
  one_us();
  VIDEO_OFF;
  NOP;
  asm("jmp VLoop");
}

and got the same result (~300ns between pin going low and back high)

Hmm. I see:

000000a8 <loop>:
  a8:      29 9a             sbi      0x05, 1      ; 5
      ...
  c6:      29 98             cbi      0x05, 1      ; 5
  c8:      00 00             nop
  ca:      08 95             ret

000000cc <main>:
  cc:      0e 94 0e 01       call      0x21c      ; 0x21c <init>
  d0:      0e 94 54 00       call      0xa8      ; 0xa8 <loop>
  d4:      fd cf             rjmp      .-6            ; 0xd0 <main+0x4>

Note that "call" and "ret" are both 4-clock instructions, while rjmp is 2 clocks, so you "at least 10 cycles" is awfully close. sbi and cbi are also among the 2-cycle instructions.

If you're going to do video, you probably are going to need a more extensive set of tools than those provided by the arduino environment. The above dissassembly was obtained with "avr-objdump -d", which is included in the tools packaged with arduino (.../hardware/tools/avr/bin/avr-objdump on my mac. YMMV.)

(you also haven't done anything about the periodic timer tick, which will come along and take much more time, every once in a while (approximately every millisecond.))

FYI (I had it handy):

// disable timer 0 overflow interrupt
TIMSK0&=!(1<<TOIE0);

You must use bitwise negation (as opposed to logical negation).

TIMSK0&=~(1<<TOIE0);

thanks, forgot to consider that certain instructions take more cycles.

p.s. I've been doing cli(), does that actually disable interrupts?

I've been doing cli(), does that actually disable interrupts?

Yes but note that that method disables all interrupts. This may be an issue if you have serial I/O occurring at the same time and interrupts are disabled for more than 10 to 15 bit times.