Go Down

Topic: Mixing C with Assembler: Assembly code before push instructions in a function (Read 1 time) previous topic - next topic

westfw

Something like:
Code: [Select]
#include <avr/io.h>
#include <avr/interrupt.h>

volatile unsigned char myvar;

// define our ISR code to have NO prologue/epilogue
ISR(PCINT0_vect, ISR_NAKED)
{
    asm volatile(
// Note that none of these instructios affect "status register"
" push r31\n"        // Save R31
" in r31, 0x03\n"    // Read portB as soon as possible
" sts myvar, r31\n"  // Save it where gcc can see it
" rcall myint\n"     // Call a "normal" ISR function
" pop r31\n"         // restore r31
" reti\n"            // return from interrupt
);
}

// But define this func to have the "extra" prologue/etc needed by an ISR
void myint(void) __attribute__ ((signal));
void myint(void)
{
    if (myvar > 200) {
PORTD = 0;
    } else {
PORTD = 1;
    }
}


Which produces:
Code: [Select]
000000a6 <__vector_3>:
  a6:   ff 93           push    r31
  a8:   f3 b1           in      r31, 0x03       ; 3
  aa:   f0 93 00 01     sts     0x0100, r31
  ae:   02 d0           rcall   .+4             ; 0xb4 <myint>
  b0:   ff 91           pop     r31
  b2:   18 95           reti

000000b4 <myint>:
  b4:   1f 92           push    r1
  b6:   0f 92           push    r0
  b8:   0f b6           in      r0, 0x3f        ; 63
  ba:   0f 92           push    r0
  bc:   11 24           eor     r1, r1
  be:   8f 93           push    r24
  c0:   80 91 00 01     lds     r24, 0x0100
  c4:   89 3c           cpi     r24, 0xC9       ; 201
  c6:   10 f0           brcs    .+4             ; 0xcc <myint+0x18>
  c8:   1b b8           out     0x0b, r1        ; 11
  ca:   02 c0           rjmp    .+4             ; 0xd0 <myint+0x1c>
  cc:   81 e0           ldi     r24, 0x01       ; 1
  ce:   8b b9           out     0x0b, r24       ; 11
  d0:   8f 91           pop     r24
  d2:   0f 90           pop     r0
  d4:   0f be           out     0x3f, r0        ; 63
  d6:   0f 90           pop     r0
  d8:   1f 90           pop     r1
  da:   18 95           reti


I have my doubts that this is a good idea.  the V-usb code that bit-bangs low-speed USB doesn't need to be so aggressive sampling its inputs after an interrupt, for example.  Also, you've entered the realm where the required C code options are more mysterious than straight assembler would have been...

GreyGnome



I have my doubts that this is a good idea.  ...  Also, you've entered the realm where the required C code options are more mysterious than straight assembler would have been...


Yes, I'm reconsidering my original plan.  I think what I'll do is just read my register, first thing upon entering my ISR, and then call the function that I need.  Rather than reading the register inside the function.  Thus, there is only one set of push instructions in front of reading the register, instead of two.

GreyGnome


Quote
but doesn't the compiler push all registers it's concerned with?


Yes.

But, there is an assumption that whoever called the ISR preserved certain registers.  That's the purpose of all those pushes; to do what a caller would have done if the ISR had been called like a normal function.



Got it.  And I see that, for example, if the ISR calls a C++ method, eg:
Code: [Select]

     4a0:       83 e7           ldi     r24, 0x73       ; 115
     4a2:       91 e0           ldi     r25, 0x01       ; 1
     4a4:       0e 94 8f 01     call    0x31e   ; 0x31e <_ZN9PCintPort5PCintEv>

...it uses r24 and r25.  There's no telling which ones may be used in circumstances like that, so if the ISR doesn't push/pop the registers, things could get quite sticky quite quickly.

GreyGnome


I am calling a function from an interrupt, and due to timing issues I would like to preserve the state of a register very quickly when the interrupt takes place.

Why do you want to do this?


In order to grab and make available to the user of my library the state of the pin at interrupt.  The further the check of the pin state, the less likely (under certain circumstances, such as switch bounce) the port read is going to actually reflect the state of the port at the time of the interrupt.

Quote

Also, I don't know if you can do much better than the compiler. For example:

Code: [Select]
ISR (SPI_STC_vect)
{
// do nothing  
}


Generates:

Code: [Select]
...(deleted for brevity)...



I noticed that it's pretty lean in the case of an empty ISR.  In my circumstances, since I'm calling a method from inside the ISR, the preamble looks more like this:

Code: [Select]

ISR(PCINT1_vect) {
    442:       1f 92           push    r1
    444:       0f 92           push    r0
    446:       0f b6           in      r0, 0x3f        ; 63
    448:       0f 92           push    r0
    44a:       11 24           eor     r1, r1
    44c:       2f 93           push    r18
    44e:       3f 93           push    r19
    450:       4f 93           push    r20
    452:       5f 93           push    r21
    454:       6f 93           push    r22
    456:       7f 93           push    r23
    458:       8f 93           push    r24
    45a:       9f 93           push    r25
    45c:       af 93           push    r26
    45e:       bf 93           push    r27
    460:       ef 93           push    r30
    462:       ff 93           push    r31


I am thinking of actually making the ISR "naked", duplicating that bit of code, but inserting my little bit of work at the beginning of all of that.

Nick Gammon

Well in my test, the generated code for this sketch:

Code: [Select]
volatile byte savedPort;
byte bar;

void foo ()
  {
  bar = savedPort; 
  }
 
ISR (PCINT0_vect)
{
  savedPort = PINB;
  foo ();
}  // end of PCINT0_vect

void setup ()
{
   // pin change interrupts
  PCMSK0 = _BV (PCINT1);  // only want pin 9
  PCIFR  = _BV (PCIF0);   // clear any outstanding interrupts
  PCICR |= _BV (PCIE0);   // enable pin change interrupts for PCINT0..7
}

void loop () {}


Only had this before saving the port:

Code: [Select]
ISR (PCINT0_vect)
100: 1f 92        push r1
102: 0f 92        push r0
104: 0f b6        in r0, 0x3f ; 63
106: 0f 92        push r0
108: 11 24        eor r1, r1
10a: 8f 93        push r24
{
  savedPort = PINB;
10c: 83 b1        in r24, 0x03 ; 3
10e: 80 93 00 01 sts 0x0100, r24


Which isn't much worse than you can do with assembler. You have to save the status register, and before you do that you have to save R0.

Now I caution you about getting too carried away about shaving nanoseconds off ISRs. I did a few tests earlier, using this code:

Code: [Select]
ISR (PCINT0_vect)
{
  PORTB = 4;  // turn on pin 10
}  // end of PCINT0_vect


void setup ()
{
  digitalWrite (9, HIGH); // pullup

  // pin change interrupts
  PCMSK0 = _BV (PCINT1);  // only want pin 9
  PCIFR  = _BV (PCIF0);   // clear any outstanding interrupts
  PCICR |= _BV (PCIE0);   // enable pin change interrupts for PCINT0..7
 
  pinMode (10, OUTPUT);
  digitalWrite (10, LOW);
}

void loop () {}


Now, measuring the time taken between pin 9 going low (by my touching it to ground) and the time that D10 is brought high, as promptly as I could, I got these figures on consecutive tests:

Code: [Select]

1.2500 uS
1.4375 uS
1.5625 uS
1.3750 uS


That's a difference of 0.3125 uS (5 clock cycles) in what should be a repeatable experiment! I think at least 4 can be accounted for by the fact that main does a CALL to call loop, and CALL takes 4 clock cycles. Once the instruction starts, it has to finish before the interrupt can be serviced. Probably the 5th would be because of the exact time the interrupt occurred with reference to when the clock pulses.

So you already have something like 5 clock cycles of "jitter", and that is without doing anything else. For example, Timer 0 will cause an interrupt. Whether or not it is higher or lower priority than your pin change isn't the point. Once it starts, it has to finish. So that could be another 5 or 6 uS down the drain. And if your code calls millis() that turns interrupts off briefly. So that delays things too.

So with all these variables, whilst it is nice to design for a fast response, all this assembler code might be bit of an overkill.

Quote
The further the check of the pin state, the less likely (under certain circumstances, such as switch bounce) the port read is going to actually reflect the state of the port at the time of the interrupt.


Pin change interrupts can be deduced somewhat by comparing the now value to the previous one. Of course it could change back quickly, but switches don't tend to bounce that fast. And for other interrupts (eg. a falling level interrupt) if it fired you know what the new state is.
http://www.gammon.com.au/electronics

GreyGnome


Well in my test, the generated code ... Only had this before saving the port:

Code: [Select]
ISR (PCINT0_vect)
100: 1f 92        push r1
102: 0f 92        push r0
104: 0f b6        in r0, 0x3f ; 63
106: 0f 92        push r0
108: 11 24        eor r1, r1
10a: 8f 93        push r24
{
  savedPort = PINB;
10c: 83 b1        in r24, 0x03 ; 3
10e: 80 93 00 01 sts 0x0100, r24


Which isn't much worse than you can do with assembler. You have to save the status register, and before you do that you have to save R0.


In this small case I would agree.  As a matter of fact, it's exactly what I would do with assembler.  Interestingly, I have created a silly ISR that calls a function with a large number of varargs: 45 arguments, to be exact.  The ISR created this set of push instructions:

Code: [Select]

ISR(PCINT2_vect) {
12c:   1f 92           push    r1
12e:   0f 92           push    r0
130:   0f b6           in      r0, 0x3f        ; 63
132:   0f 92           push    r0
134:   11 24           eor     r1, r1
136:   2f 92           push    r2
138:   3f 92           push    r3
13a:   4f 92           push    r4
13c:   5f 92           push    r5
13e:   6f 92           push    r6
140:   7f 92           push    r7
142:   8f 92           push    r8
144:   9f 92           push    r9
146:   af 92           push    r10
148:   bf 92           push    r11
14a:   cf 92           push    r12
14c:   df 92           push    r13
14e:   ef 92           push    r14
150:   ff 92           push    r15
152:   0f 93           push    r16
154:   1f 93           push    r17
156:   2f 93           push    r18
158:   3f 93           push    r19
15a:   4f 93           push    r20
15c:   5f 93           push    r21
15e:   6f 93           push    r22
160:   7f 93           push    r23
162:   8f 93           push    r24
164:   9f 93           push    r25
166:   af 93           push    r26
168:   bf 93           push    r27
16a:   ef 93           push    r30
16c:   ff 93           push    r31
16e:   df 93           push    r29
170:   cf 93           push    r28


Quote

Now I caution you about getting too carried away about shaving nanoseconds off ISRs. ...


So what I have learned, and what I suspected from everybody's dire warnings and general uncomfortableness with my idea, is that the compiler will create a variable number of push instructions, up to and including pushing the entire set of registers.  And, even though the only thing my ISR does is call another function, some of the registers used in the function are not pushed in the function.  The registers are pushed by the ISR.  Thus, trying to outsmart the compiler writers is probably a bad idea.  If I wanted to do the compiler's work and create a "naked" ISR, I would by necessity need to save all the registers, all the time.

So my little experiment puts the final smackdown on my idea.  Not to mention, again, everyone's general uncomfortableness with my plan.

I think the only time I would use ISR_NAKED is if I was writing a hand-assembled ISR and I knew the static set of registers that were going to be utilized.

Thanks everybody for taking your time.  I have learned a lot, believe it or not.  I am going to write my humble little ISR like this:

Code: [Select]
ISR(PCINT0_vect) {
    PCintPort::curr = portB.portInputReg;
    portB.PCint();
}


...Saving the input register's state into the PCintPort::curr static class variable as soon as is prudently possible.  I'll leave the assembly code for the super low-level guys.


Nick Gammon

It's great you are looking into it in this depth.

As for the registers, if you get into the "mind" of the compiler you can probably save some pushes. For example, if your ISR calls another function, which itself calls another one, which might potentially call virtually anything, the compiler probably just pushes all registers to be on the safe side.

The potential problem with making a class is that, perhaps, it will use more registers than a straight function.

I've been having similar issues to you in another thread:

http://arduino.cc/forum/index.php/topic,91219.0.html

In the end, I had to rejig and redesign a bit to get any sort of reasonable performance. For example, using an 8-bit timer just caused too many interrupts for it to work reliably. Using the 16-bit timer gave much more reliable performance (there were less timer overflows).

And, remembering this thread, I saved an important counter (the current timer state) as the very first instruction in the ISR. Taking all that into account, I am pleased with the performance of the timer.
http://www.gammon.com.au/electronics

Coding Badly

Quote
As for the registers, if you get into the "mind" of the compiler you can probably save some pushes.


That's been my experience.  You will be lucky to shave off a few machine instructions but, in my opinion, it is not worth the effort or the risk.  Your time and money would be better spent slapping a 20 MHz crystal on the processor.

The most dramatic difference is when you can eliminate all function calls.  That's when the optimizer can really go to town.

The only clear exception I have found is the the R1 register (the "zero value" register) setup.  The compiler always outputs the three machine instructions whether or not R1 is used.  Eliminating three machine instructions is certainly not worth hand-assembling an interrupt service routine.

Go Up