Pages: 1 [2]   Go Down
Author Topic: Mixing C with Assembler: Assembly code before push instructions in a function  (Read 1485 times)
0 Members and 1 Guest are viewing this topic.
SF Bay Area (USA)
Offline Offline
Tesla Member
***
Karma: 124
Posts: 6652
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Something like:
Code:
#include <avr/io.h>
#include <avr/interrupt.h>

volatile unsigned char myvar;

// define our ISR code to have NO prologue/epilogue
ISR(PCINT0_vect, ISR_NAKED)
{
    asm volatile(
// Note that none of these instructios affect "status register"
" push r31\n"        // Save R31
" in r31, 0x03\n"    // Read portB as soon as possible
" sts myvar, r31\n"  // Save it where gcc can see it
" rcall myint\n"     // Call a "normal" ISR function
" pop r31\n"         // restore r31
" reti\n"            // return from interrupt
);
}

// But define this func to have the "extra" prologue/etc needed by an ISR
void myint(void) __attribute__ ((signal));
void myint(void)
{
    if (myvar > 200) {
PORTD = 0;
    } else {
PORTD = 1;
    }
}

Which produces:
Code:
000000a6 <__vector_3>:
  a6:   ff 93           push    r31
  a8:   f3 b1           in      r31, 0x03       ; 3
  aa:   f0 93 00 01     sts     0x0100, r31
  ae:   02 d0           rcall   .+4             ; 0xb4 <myint>
  b0:   ff 91           pop     r31
  b2:   18 95           reti

000000b4 <myint>:
  b4:   1f 92           push    r1
  b6:   0f 92           push    r0
  b8:   0f b6           in      r0, 0x3f        ; 63
  ba:   0f 92           push    r0
  bc:   11 24           eor     r1, r1
  be:   8f 93           push    r24
  c0:   80 91 00 01     lds     r24, 0x0100
  c4:   89 3c           cpi     r24, 0xC9       ; 201
  c6:   10 f0           brcs    .+4             ; 0xcc <myint+0x18>
  c8:   1b b8           out     0x0b, r1        ; 11
  ca:   02 c0           rjmp    .+4             ; 0xd0 <myint+0x1c>
  cc:   81 e0           ldi     r24, 0x01       ; 1
  ce:   8b b9           out     0x0b, r24       ; 11
  d0:   8f 91           pop     r24
  d2:   0f 90           pop     r0
  d4:   0f be           out     0x3f, r0        ; 63
  d6:   0f 90           pop     r0
  d8:   1f 90           pop     r1
  da:   18 95           reti

I have my doubts that this is a good idea.  the V-usb code that bit-bangs low-speed USB doesn't need to be so aggressive sampling its inputs after an interrupt, for example.  Also, you've entered the realm where the required C code options are more mysterious than straight assembler would have been...
Logged

0
Offline Offline
Jr. Member
**
Karma: 4
Posts: 90
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset


I have my doubts that this is a good idea.  ...  Also, you've entered the realm where the required C code options are more mysterious than straight assembler would have been...

Yes, I'm reconsidering my original plan.  I think what I'll do is just read my register, first thing upon entering my ISR, and then call the function that I need.  Rather than reading the register inside the function.  Thus, there is only one set of push instructions in front of reading the register, instead of two.
Logged

0
Offline Offline
Jr. Member
**
Karma: 4
Posts: 90
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
but doesn't the compiler push all registers it's concerned with?

Yes.

But, there is an assumption that whoever called the ISR preserved certain registers.  That's the purpose of all those pushes; to do what a caller would have done if the ISR had been called like a normal function.


Got it.  And I see that, for example, if the ISR calls a C++ method, eg:
Code:
     4a0:       83 e7           ldi     r24, 0x73       ; 115
     4a2:       91 e0           ldi     r25, 0x01       ; 1
     4a4:       0e 94 8f 01     call    0x31e   ; 0x31e <_ZN9PCintPort5PCintEv>
...it uses r24 and r25.  There's no telling which ones may be used in circumstances like that, so if the ISR doesn't push/pop the registers, things could get quite sticky quite quickly.
Logged

0
Offline Offline
Jr. Member
**
Karma: 4
Posts: 90
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I am calling a function from an interrupt, and due to timing issues I would like to preserve the state of a register very quickly when the interrupt takes place.

Why do you want to do this?

In order to grab and make available to the user of my library the state of the pin at interrupt.  The further the check of the pin state, the less likely (under certain circumstances, such as switch bounce) the port read is going to actually reflect the state of the port at the time of the interrupt.

Quote
Also, I don't know if you can do much better than the compiler. For example:

Code:
ISR (SPI_STC_vect)
{
// do nothing  
}

Generates:

Code:
...(deleted for brevity)...


I noticed that it's pretty lean in the case of an empty ISR.  In my circumstances, since I'm calling a method from inside the ISR, the preamble looks more like this:

Code:
ISR(PCINT1_vect) {
     442:       1f 92           push    r1
     444:       0f 92           push    r0
     446:       0f b6           in      r0, 0x3f        ; 63
     448:       0f 92           push    r0
     44a:       11 24           eor     r1, r1
     44c:       2f 93           push    r18
     44e:       3f 93           push    r19
     450:       4f 93           push    r20
     452:       5f 93           push    r21
     454:       6f 93           push    r22
     456:       7f 93           push    r23
     458:       8f 93           push    r24
     45a:       9f 93           push    r25
     45c:       af 93           push    r26
     45e:       bf 93           push    r27
     460:       ef 93           push    r30
     462:       ff 93           push    r31

I am thinking of actually making the ISR "naked", duplicating that bit of code, but inserting my little bit of work at the beginning of all of that.
« Last Edit: February 08, 2012, 11:49:15 pm by GreyGnome » Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 474
Posts: 18696
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Well in my test, the generated code for this sketch:

Code:
volatile byte savedPort;
byte bar;

void foo ()
  {
  bar = savedPort; 
  }
 
ISR (PCINT0_vect)
{
  savedPort = PINB;
  foo ();
}  // end of PCINT0_vect

void setup ()
{
   // pin change interrupts
  PCMSK0 = _BV (PCINT1);  // only want pin 9
  PCIFR  = _BV (PCIF0);   // clear any outstanding interrupts
  PCICR |= _BV (PCIE0);   // enable pin change interrupts for PCINT0..7
}

void loop () {}

Only had this before saving the port:

Code:
ISR (PCINT0_vect)
 100: 1f 92        push r1
 102: 0f 92        push r0
 104: 0f b6        in r0, 0x3f ; 63
 106: 0f 92        push r0
 108: 11 24        eor r1, r1
 10a: 8f 93        push r24
{
  savedPort = PINB;
 10c: 83 b1        in r24, 0x03 ; 3
 10e: 80 93 00 01 sts 0x0100, r24

Which isn't much worse than you can do with assembler. You have to save the status register, and before you do that you have to save R0.

Now I caution you about getting too carried away about shaving nanoseconds off ISRs. I did a few tests earlier, using this code:

Code:
ISR (PCINT0_vect)
{
  PORTB = 4;  // turn on pin 10
}  // end of PCINT0_vect


void setup ()
{
  digitalWrite (9, HIGH); // pullup
 
  // pin change interrupts
  PCMSK0 = _BV (PCINT1);  // only want pin 9
  PCIFR  = _BV (PCIF0);   // clear any outstanding interrupts
  PCICR |= _BV (PCIE0);   // enable pin change interrupts for PCINT0..7
 
  pinMode (10, OUTPUT);
  digitalWrite (10, LOW);
}

void loop () {}

Now, measuring the time taken between pin 9 going low (by my touching it to ground) and the time that D10 is brought high, as promptly as I could, I got these figures on consecutive tests:

Code:
1.2500 uS
1.4375 uS
1.5625 uS
1.3750 uS

That's a difference of 0.3125 uS (5 clock cycles) in what should be a repeatable experiment! I think at least 4 can be accounted for by the fact that main does a CALL to call loop, and CALL takes 4 clock cycles. Once the instruction starts, it has to finish before the interrupt can be serviced. Probably the 5th would be because of the exact time the interrupt occurred with reference to when the clock pulses.

So you already have something like 5 clock cycles of "jitter", and that is without doing anything else. For example, Timer 0 will cause an interrupt. Whether or not it is higher or lower priority than your pin change isn't the point. Once it starts, it has to finish. So that could be another 5 or 6 uS down the drain. And if your code calls millis() that turns interrupts off briefly. So that delays things too.

So with all these variables, whilst it is nice to design for a fast response, all this assembler code might be bit of an overkill.

Quote
The further the check of the pin state, the less likely (under certain circumstances, such as switch bounce) the port read is going to actually reflect the state of the port at the time of the interrupt.

Pin change interrupts can be deduced somewhat by comparing the now value to the previous one. Of course it could change back quickly, but switches don't tend to bounce that fast. And for other interrupts (eg. a falling level interrupt) if it fired you know what the new state is.
Logged

0
Offline Offline
Jr. Member
**
Karma: 4
Posts: 90
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Well in my test, the generated code ... Only had this before saving the port:

Code:
ISR (PCINT0_vect)
 100: 1f 92        push r1
 102: 0f 92        push r0
 104: 0f b6        in r0, 0x3f ; 63
 106: 0f 92        push r0
 108: 11 24        eor r1, r1
 10a: 8f 93        push r24
{
  savedPort = PINB;
 10c: 83 b1        in r24, 0x03 ; 3
 10e: 80 93 00 01 sts 0x0100, r24

Which isn't much worse than you can do with assembler. You have to save the status register, and before you do that you have to save R0.

In this small case I would agree.  As a matter of fact, it's exactly what I would do with assembler.  Interestingly, I have created a silly ISR that calls a function with a large number of varargs: 45 arguments, to be exact.  The ISR created this set of push instructions:

Code:
ISR(PCINT2_vect) {
 12c:   1f 92           push    r1
 12e:   0f 92           push    r0
 130:   0f b6           in      r0, 0x3f        ; 63
 132:   0f 92           push    r0
 134:   11 24           eor     r1, r1
 136:   2f 92           push    r2
 138:   3f 92           push    r3
 13a:   4f 92           push    r4
 13c:   5f 92           push    r5
 13e:   6f 92           push    r6
 140:   7f 92           push    r7
 142:   8f 92           push    r8
 144:   9f 92           push    r9
 146:   af 92           push    r10
 148:   bf 92           push    r11
 14a:   cf 92           push    r12
 14c:   df 92           push    r13
 14e:   ef 92           push    r14
 150:   ff 92           push    r15
 152:   0f 93           push    r16
 154:   1f 93           push    r17
 156:   2f 93           push    r18
 158:   3f 93           push    r19
 15a:   4f 93           push    r20
 15c:   5f 93           push    r21
 15e:   6f 93           push    r22
 160:   7f 93           push    r23
 162:   8f 93           push    r24
 164:   9f 93           push    r25
 166:   af 93           push    r26
 168:   bf 93           push    r27
 16a:   ef 93           push    r30
 16c:   ff 93           push    r31
 16e:   df 93           push    r29
 170:   cf 93           push    r28

Quote
Now I caution you about getting too carried away about shaving nanoseconds off ISRs. ...

So what I have learned, and what I suspected from everybody's dire warnings and general uncomfortableness with my idea, is that the compiler will create a variable number of push instructions, up to and including pushing the entire set of registers.  And, even though the only thing my ISR does is call another function, some of the registers used in the function are not pushed in the function.  The registers are pushed by the ISR.  Thus, trying to outsmart the compiler writers is probably a bad idea.  If I wanted to do the compiler's work and create a "naked" ISR, I would by necessity need to save all the registers, all the time.

So my little experiment puts the final smackdown on my idea.  Not to mention, again, everyone's general uncomfortableness with my plan.

I think the only time I would use ISR_NAKED is if I was writing a hand-assembled ISR and I knew the static set of registers that were going to be utilized.

Thanks everybody for taking your time.  I have learned a lot, believe it or not.  I am going to write my humble little ISR like this:

Code:
ISR(PCINT0_vect) {
    PCintPort::curr = portB.portInputReg;
    portB.PCint();
}

...Saving the input register's state into the PCintPort::curr static class variable as soon as is prudently possible.  I'll leave the assembly code for the super low-level guys.

Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 474
Posts: 18696
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

It's great you are looking into it in this depth.

As for the registers, if you get into the "mind" of the compiler you can probably save some pushes. For example, if your ISR calls another function, which itself calls another one, which might potentially call virtually anything, the compiler probably just pushes all registers to be on the safe side.

The potential problem with making a class is that, perhaps, it will use more registers than a straight function.

I've been having similar issues to you in another thread:

http://arduino.cc/forum/index.php/topic,91219.0.html

In the end, I had to rejig and redesign a bit to get any sort of reasonable performance. For example, using an 8-bit timer just caused too many interrupts for it to work reliably. Using the 16-bit timer gave much more reliable performance (there were less timer overflows).

And, remembering this thread, I saved an important counter (the current timer state) as the very first instruction in the ISR. Taking all that into account, I am pleased with the performance of the timer.
Logged

Global Moderator
Dallas
Offline Offline
Shannon Member
*****
Karma: 200
Posts: 12779
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
As for the registers, if you get into the "mind" of the compiler you can probably save some pushes.

That's been my experience.  You will be lucky to shave off a few machine instructions but, in my opinion, it is not worth the effort or the risk.  Your time and money would be better spent slapping a 20 MHz crystal on the processor.

The most dramatic difference is when you can eliminate all function calls.  That's when the optimizer can really go to town.

The only clear exception I have found is the the R1 register (the "zero value" register) setup.  The compiler always outputs the three machine instructions whether or not R1 is used.  Eliminating three machine instructions is certainly not worth hand-assembling an interrupt service routine.
Logged

Pages: 1 [2]   Go Up
Jump to: