Minimize interrupt handler overhead?

Hello everyone,

I’m trying to build a cheap DIY stereo-3D system using Arduino. It reads signal from VGA line and generates signal for LCD shutter glasses.

The main problem so far is that interrupts are too bloated for doing per-line stuff in time.

Here’s an example program, it converts interlaced 3D signal into anaglyph (red/cyan) by shortening R or G/B to ground (using pinMode) for every other line.

Used pins:
HSYNC - digital 2 (external interrupt INT0)
VSYNC - digital 8 (using pin-change interrupt)
R,G,B - digital 4,5,6

void setup() {
  // disable Timer0
  TCCR0B = 0;
  
  // INT0 (hsync)
  EIMSK |= _BV(INT0); // Enable external interrupt 0
  EICRA = _BV(ISC01); // Set ext INT0 mode
  // PCINT0 (vsync)
  PCICR |= _BV(PCIE0); // Enable Pin Change interrupt 0
  PCMSK0 |= _BV(PCINT0); // Set pin to interrupt (8 = B0)
  
  EICRA = _BV(ISC00)|_BV(ISC01); // + hsync
  //EICRA = _BV(ISC01); // - hsync
}

// hsync interrupt
ISR(INT0_vect) {
  DDRD ^= 0x70;
}

// vsync interrupt
ISR(PCINT0_vect) {
  DDRD = 0x10;
}

void loop() {
}

So basically when a new frame starts, it shortens red (0x10 = pin 4), and when a new line starts, it inverts the RGB shorts.

[continued in second message, can’t post links/pics in my first post]

I’ve drawn a picture in Paint to observe the effect obviously:

And here’s how it looks on screen:

There should be solid cyan color, but Arduino can’t switch on time.

Resolution is 1680x1050x60, and transitional zone width is one Arduino clock pulse (edit: 3 actually), so saving ~5-10 cycles would help a lot.

I’ve disassembled the resulting .elf using “avr-objdump -S” and here’s how interrupt handlers look:

000000cc <__vector_1>:

// hsync interrupt
ISR(INT0_vect) {
  cc:      1f 92             push      r1
  ce:      0f 92             push      r0
  d0:      0f b6             in      r0, 0x3f      ; 63
  d2:      0f 92             push      r0
  d4:      11 24             eor      r1, r1
  d6:      8f 93             push      r24
  d8:      9f 93             push      r25
  DDRD ^= 0x70;
  da:      8a b1             in      r24, 0x0a      ; 10
  dc:      90 e7             ldi      r25, 0x70      ; 112
  de:      89 27             eor      r24, r25
  e0:      8a b9             out      0x0a, r24      ; 10
}
  e2:      9f 91             pop      r25
  e4:      8f 91             pop      r24
  e6:      0f 90             pop      r0
  e8:      0f be             out      0x3f, r0      ; 63
  ea:      0f 90             pop      r0
  ec:      1f 90             pop      r1
  ee:      18 95             reti

000000f0 <__vector_3>:

// vsync interrupt
ISR(PCINT0_vect) {
  f0:      1f 92             push      r1
  f2:      0f 92             push      r0
  f4:      0f b6             in      r0, 0x3f      ; 63
  f6:      0f 92             push      r0
  f8:      11 24             eor      r1, r1
  fa:      8f 93             push      r24
  DDRD = 0x10;
  fc:      80 e1             ldi      r24, 0x10      ; 16
  fe:      8a b9             out      0x0a, r24      ; 10
}
 100:      8f 91             pop      r24
 102:      0f 90             pop      r0
 104:      0f be             out      0x3f, r0      ; 63
 106:      0f 90             pop      r0
 108:      1f 90             pop      r1
 10a:      18 95             reti

The actual code is in the middle (2-4 instructions), and everything surrounding it is unnecessary code I’d like to strip away.

I haven’t found any useful info on this subject, except this one, which doesn’t help much.

Is there a way to minimize the handler function? Maybe a fully-assembler handler or something…

Please suggest anything you know, but being able to use the convenient Arduino bootloader would be a huge plus.

Thank you for any helpful info.

See information about ISR_NAKED, for example here: http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1261537186 But also elsewhere in avr-gcc tutorials/etc.

The actual code is in the middle (2-4 instructions), and everything surrounding it is unnecessary code I'd like to strip away.

Eight instructions in the first example and four in the second. You MUST preserve the registered that are used.

These two are very important...

  d6:      8f 93             push      r24
  d8:      9f 93             push      r25

Because this code uses r24 and r25...

  DDRD ^= 0x70;
  da:      8a b1             in      r24, 0x0a      ; 10
  dc:      90 e7             ldi      r25, 0x70      ; 112
  de:      89 27             eor      r24, r25
  e0:      8a b9             out      0x0a, r24      ; 10

These two are important to restore the two registers...

  e2:      9f 91             pop      r25
  e4:      8f 91             pop      r24

If you don't preserve the used registers ... well, let's just say you'll be pulling your hair out trying to find the bug.

Thank you a lot :)

I've found how variables can be bound to registers, so I've reserved some:

register unsigned char rgb_mask asm("r2");
register unsigned char r7 asm("r7");

void setup() {
  ...
  rgb_mask = 0x70;
}

and now my ISR looks like this:

// hsync interrupt
ISR(INT0_vect, ISR_NAKED) {
  asm(""
    //"push r0\n"
    //"in   r0, 0x3F\n" // save status
  );
  asm( // DDRD ^= 0x70;
    "in r7, 0x0A\n"
    "eor r7, r2\n"
    "out 0x0A, r7\n"
  );
  asm(
    //"out 0x3F, r0\n" // restore status
    //"pop r0\n"
    "reti\n"
  );
}

If I leave the r0 stuff uncommented, I can still see transition on the left of the screen. Is it needed at all?

you should do something so that the status bits at the end of the ISR are the same as when you entered, or "weird stuff" will happen in the main code (Hmm. I guess it's possible that the main code never actually looks at status bits, but that would be a dangerous assumption!) You don't have to do the full get/push/pop set sequence, though. You could reserve another register... I'm not near my datasheets at the moment, but I think the only instruction in your ISR that affects the status is the eor, and I think it only changes some of the bits. Other ways of preserving the limit set of bits might be faster...

OK, got it. I guess the program worked properly without saving SREG only because loop() is empty.

I’ve maximally optimized this program, and even won two spare clock cycles which I’ll use for branching later. Here’s how it looks like now:

register unsigned char rgb_mask asm("r2");
register unsigned char rgb_ddr asm("r3");
register unsigned char r7 asm("r7");

void setup() {
  // disable Timer0
  TCCR0B = 0;
  
  // INT0 (hsync)
  EIMSK |= _BV(INT0); // Enable external interrupt 0
  EICRA = _BV(ISC01); // Set ext INT0 mode
  // PCINT0 (vsync)
  PCICR |= _BV(PCIE0); // Enable Pin Change interrupt 0
  PCMSK0 |= _BV(PCINT0); // Set pin to interrupt (8 = B0)
  
  EICRA = _BV(ISC00)|_BV(ISC01); // + hsync
  //EICRA = _BV(ISC01); // - hsync
  
  rgb_mask = 0x70;
}

// hsync interrupt
ISR(INT0_vect, ISR_NAKED) {
  asm volatile( // wasting spare time :)
    "nop\n"
    "nop\n"
  );
  asm(
    "out 0x0A, r3\n" // DDRD <- rgb_ddr
    // everything above must not modify SREG
    "in r7, 0x3F\n" // save SREG
    "eor r3, r2\n" // rgb_ddr ^= rgb_mask
    "out 0x3F, r7\n" // restore SREG
    "reti\n"
  );
}

// vsync interrupt
ISR(PCINT0_vect) {
  rgb_ddr = 0x10;
}

void loop() {
}

Thanks again for the info!

Two more things come to mind...

  1. The processor can be run at 20 MHz (at 5V). I believe there are bootloaders / cores available that work correctly at this speed. Rather than try to hand optimize the ISR, your time may be better spent either buying or building something that runs at the higher clock speed.

  2. For this item, I'm speculating. I honestly have no idea if this is or is not an issue. Or, if anything I propose is a soltuion.

The compiler is VERY aggresive at removing dead code. Without active references to rgb_mask, rgb_ddr, and r7 the compiler may not honor your request to reserve the corresponding registers. Including volatile (if possible) may be a good idea...

volatile register unsigned char rgb_mask asm("r2"); volatile register unsigned char rgb_ddr asm("r3"); volatile register unsigned char r7 asm("r7");

But, register may imply volatile making my suggestion irrelevant. Or, you may need to include artificial references.

In any case, I suggest looking over the dumped assembly from time-to-time to ensure the compiler really does leave those registers alone. The optimizer can be a tricky thing; keep a watchful eye on it!

  1. This is a hobby project, so quality and minimum hardware cost are of higher priority than spent time. I plan porting the whole thing to an ATtiny2313 after it works reliably on Arduino, and contain it in a VGA pass-thru dongle. Guess that with such tight timing requirements I'll have to hook a 20MHz crystal to it, initially I thought internal 8MHz would be enough.

  2. I think compiler doesn't "reserve" the register anyway, it just binds a variable to a register for faster access. The avr-libc manual says it should be safe to use r2..r15. While browsing the disassemby I've found that these registers are used only in Arduino's monster-functions like Serial::begin (r5 and up) or Print::printNumber (r0..r31 - all of them!), but I won't need anything Serial/Print anyway, debug "output" directly on monitor is much more effective :)

And I've tried the "volatile" keyword before register vars, and it produces identical code.

Please let us know how you progress!