Mixing C with Assembler: Assembly code before push instructions in a function

Hello,
I am calling a function from an interrupt, and due to timing issues I would like to preserve the state of a register very quickly when the interrupt takes place.

The C compiler generates a bunch of code to preserve registers when the ISR (or any function) is called. I want to insert some assembler which is to be run before those registers are preserved. Is this possible? Is there some directive I can add to my asm line to tell it exactly where to inject itself in the C code? Thanks.

The code looks iike this. I know- the push instructions are there to preserve registers. Consider my asm to be pseudo-code; I will figure out how to use tmp_reg if I decide that my efforts are worth it. Thanks:

ISR(PCINT2_vect) {
    uint8_t value;
    // I want this to take place before the compiler-created push instructions.
    asm volatile("in %0, %1" : "=r" (value) : "I" (_SFR_IO_ADDR(PORTD)));
    portD.PCint();
}

Do you understand the purpose of the push instructions?

I want to insert some assembler which is to be run before those registers are preserved.

Does the "some assembler" include an in instruction (like your pseudo-code)?

I want to insert some assembler which is to be run before those registers are preserved.

Why? The registers are copied, not altered. You can still access the register afterwards, and get the same value.

GreyGnome:
I would like to preserve the state of a register very quickly when the interrupt takes place.

I guess you mean you want to preserve the state of an I/O register at the earliest possible point in an ISR.

Have you thought of a way to do that without using any processor registers? Since you can't know what processor registers are in use you mustn't use them until you have preserved their state so that they can be restored afterwards.

Yes, and yes. I plan on taking advantage of this:

Register r0 may be freely used by your assembler code and need not be restored at the end of your code. It's a good idea to use tmp_reg ... instead of r0 or r1, just in case a new compiler version changes the register usage definitions.

PeterH:

GreyGnome:
I would like to preserve the state of a register very quickly when the interrupt takes place.

I guess you mean you want to preserve the state of an I/O register at the earliest possible point in an ISR.

Exactly! You hit the nail on the head. I am writing a library and I am preserving the state of an I/O register- ostensibly at the time of the interrupt- but with all the push instructions, it takes place some 4 microseconds (I figure) after the actual interrupt.

Have you thought of a way to do that without using any processor registers? Since you can't know what processor registers are in use you mustn't use them until you have preserved their state so that they can be restored afterwards.

Not really. I am aware of this:

Register r0 may be freely used by your assembler code and need not be restored at the end of your code. It's a good idea to use tmp_reg and zero_reg instead of r0 or r1, just in case a new compiler version changes the register usage definitions.

So I believe I can safely use r0. All I need is to use the in instruction, then save that register to RAM, and I should be good to go.

That sounds viable, although I don't know enough about the behaviour of those temp registers to know how long their contents will be preserved. For example, if they're trampled by a push operation then using one of these as a scratch area while you do the push would not work very well. :slight_smile:

The only way I can think of to achieve that solution is to write your ISR handler in assembler, have this save your I/O register in your temp register and then call a 'C'/C++ function for whatever logic you want to code in higher level languages. You sound as if you're comfortable writing in assembler, but if you aren't you could get a starting point by listing the assembler generated for your existing handler.

PeterH:
... if they're trampled by a push operation then using one of these as a scratch area while you do the push would not work very well.

... write your ISR handler in assembler, have this save your I/O register in your temp register and then call a 'C'/C++ function for whatever logic you want to code in higher level languages. You sound as if you're comfortable writing in assembler, but if you aren't you could get a starting point by listing the assembler generated for your existing handler.

Excellent suggestion, thanks for the tips! I am not comfortable writing in assembler, so I am going to list the assembler generated and work from there :slight_smile: . But, I am comfortable living life on the edge! So I'm going to try it.

My intention is to get the heck out of the way of the compiler as quickly as possible, so I'm thinking I might need 2 assembly instructions: 1 to read the register, the other to save it to a C variable (ie, RAM).

I notice that my idea is not so far fetched, as a matter of fact I found this link: avr-libc: <avr/interrupt.h>: Interrupts . The ISR_NAKED is what I want. So, my idea is the following, pseudo-code:

ISR(PCINT0_vect, ISR_NAKED)
{
    asm (read_in_register_to_r0)
    asm (store_to_ram)
    // The following function call will preserve the registers on the stack.  Since I haven't
    // mucked with any but r0 thus far in my ISR, I should be ok.  The function call will
    // also restore the necessary registers.
    C_function_call();
    asm (reti);
}

I notice that the ISR pushes a whole bunch of registers on the stack. Function calls do not necessarily save the same large set. I assume that's because the compiler is smart enough to know that a function may not stomp on all the registers, so it's judicious in its saving. Thus, it seems to me the ISR is storing such a large set because it doesn't have any foreknowledge of what it should or should not preserve, so it paints with a large brush.

But if the only thing my ISR does is as listed: 1. my assembly, 2. call a C function, 3. return, then I assume that the heavy lifting of pushing and popping the necessary registers, if handled solely by the C function call in the ISR, should be sufficient. There are no other dragons lying about out, are there? Again, assuming my ISR is just that lean.

Right now my ISR looks like this:

ISR(PCINT0_vect) {
        portD.PCint();
}

...so it's doing two bunches of push/pop operations: The entry into the ISR, and the entry into the function call, which is overkill.

...Just checking my assumptions...

GreyGnome:

Register r0 may be freely used by your assembler code and need not be restored at the end of your code. It's a good idea to use tmp_reg and zero_reg instead of r0 or r1, just in case a new compiler version changes the register usage definitions.

So I believe I can safely use r0. All I need is to use the in instruction, then save that register to RAM, and I should be good to go.

No, you are in an ISR. The main code might be using r0 temporarily when your ISR gets called - if you touch r0, you mess up the main code.

I once tried doing a similar thing with r1, thinking that because it should always be 0 I could quickly use it in an ISR and zero it again afterward - and it didn't work either. Turns out the zero register wasn't always zero!

Good point. I may use r0, but I'll push/pop it. Thanks- 1 push instruction won't hurt.

An addendum to @stimmer's post... Before your interrupt service routine or any function called by your ISR uses any register (R0 - R31), the register value must be preserved. Before your interrupt service routine or any function called by your ISR performs any math operation, the status register value (SREG) must be preserved.

But if the only thing my ISR does is as listed: 1. my assembly, 2. call a C function, 3. return, then I assume that the heavy lifting of pushing and popping the necessary registers, if handled solely by the C function call in the ISR, should be sufficient.

Wrong.

GreyGnome:
Good point. I may use r0, but I'll push/pop it. Thanks- 1 push instruction won't hurt.

Not good enough. You have to preserve the registers used by your function before the function is called.

I understand about r0 (which I'll use) and SREG, but doesn't the compiler push all registers it's concerned with? I've been rummaging around in avr-objdump and I observe that the compiler preserves registers upon entry into the function, not before functions are called.

I am setting ISR_NAKED for the ISR, but when the ISR calls another function that will not be naked.

but doesn't the compiler push all registers it's concerned with?

Yes.

But, there is an assumption that whoever called the ISR preserved certain registers. That's the purpose of all those pushes; to do what a caller would have done if the ISR had been called like a normal function.

GreyGnome:
I am calling a function from an interrupt, and due to timing issues I would like to preserve the state of a register very quickly when the interrupt takes place.

Why do you want to do this?

Also, I don't know if you can do much better than the compiler. For example:

ISR (SPI_STC_vect)
{
// do nothing  
}

Generates:

ISR (SPI_STC_vect)
 100:	1f 92       	push	r1   // save the "zero" register
 102:	0f 92       	push	r0   // save R0
 104:	0f b6       	in	r0, 0x3f	; 63  // get SREG into R0
 106:	0f 92       	push	r0   // save SREG
 108:	11 24       	eor	r1, r1  // make sure R1 is zero
{
// do nothing  
}
 10a:	0f 90       	pop	r0   // get SREG back
 10c:	0f be       	out	0x3f, r0	; 63  
 10e:	0f 90       	pop	r0   // get the old R0 back
 110:	1f 90       	pop	r1   // get the "zero" register back
 112:	18 95       	reti

You can't afford to do much less than that, except perhaps the R1 stuff. Leaving out the R1 stuff would save 3 clock cycles. I don't think you can omit the rest and have it work.

What is it, exactly, that you need to save so quickly?

Something like:

#include <avr/io.h>
#include <avr/interrupt.h>

volatile unsigned char myvar;

// define our ISR code to have NO prologue/epilogue
ISR(PCINT0_vect, ISR_NAKED)
{
    asm volatile(
	// Note that none of these instructios affect "status register"
	" push r31\n"        // Save R31
	" in r31, 0x03\n"    // Read portB as soon as possible
	" sts myvar, r31\n"  // Save it where gcc can see it
	" rcall myint\n"     // Call a "normal" ISR function
	" pop r31\n"         // restore r31
	" reti\n"            // return from interrupt
	);
}

// But define this func to have the "extra" prologue/etc needed by an ISR
void myint(void) __attribute__ ((signal));
void myint(void)
{
    if (myvar > 200) {
	PORTD = 0;
    } else {
	PORTD = 1;
    }
}

Which produces:

000000a6 <__vector_3>:
  a6:   ff 93           push    r31
  a8:   f3 b1           in      r31, 0x03       ; 3
  aa:   f0 93 00 01     sts     0x0100, r31
  ae:   02 d0           rcall   .+4             ; 0xb4 <myint>
  b0:   ff 91           pop     r31
  b2:   18 95           reti

000000b4 <myint>:
  b4:   1f 92           push    r1
  b6:   0f 92           push    r0
  b8:   0f b6           in      r0, 0x3f        ; 63
  ba:   0f 92           push    r0
  bc:   11 24           eor     r1, r1
  be:   8f 93           push    r24
  c0:   80 91 00 01     lds     r24, 0x0100
  c4:   89 3c           cpi     r24, 0xC9       ; 201
  c6:   10 f0           brcs    .+4             ; 0xcc <myint+0x18>
  c8:   1b b8           out     0x0b, r1        ; 11
  ca:   02 c0           rjmp    .+4             ; 0xd0 <myint+0x1c>
  cc:   81 e0           ldi     r24, 0x01       ; 1
  ce:   8b b9           out     0x0b, r24       ; 11
  d0:   8f 91           pop     r24
  d2:   0f 90           pop     r0
  d4:   0f be           out     0x3f, r0        ; 63
  d6:   0f 90           pop     r0
  d8:   1f 90           pop     r1
  da:   18 95           reti

I have my doubts that this is a good idea. the V-usb code that bit-bangs low-speed USB doesn't need to be so aggressive sampling its inputs after an interrupt, for example. Also, you've entered the realm where the required C code options are more mysterious than straight assembler would have been...

westfw:
I have my doubts that this is a good idea. ... Also, you've entered the realm where the required C code options are more mysterious than straight assembler would have been...

Yes, I'm reconsidering my original plan. I think what I'll do is just read my register, first thing upon entering my ISR, and then call the function that I need. Rather than reading the register inside the function. Thus, there is only one set of push instructions in front of reading the register, instead of two.

Got it. And I see that, for example, if the ISR calls a C++ method, eg:

     4a0:       83 e7           ldi     r24, 0x73       ; 115
     4a2:       91 e0           ldi     r25, 0x01       ; 1
     4a4:       0e 94 8f 01     call    0x31e   ; 0x31e <_ZN9PCintPort5PCintEv>

...it uses r24 and r25. There's no telling which ones may be used in circumstances like that, so if the ISR doesn't push/pop the registers, things could get quite sticky quite quickly.

In order to grab and make available to the user of my library the state of the pin at interrupt. The further the check of the pin state, the less likely (under certain circumstances, such as switch bounce) the port read is going to actually reflect the state of the port at the time of the interrupt.

Also, I don't know if you can do much better than the compiler. For example:

ISR (SPI_STC_vect)

{
// do nothing  
}




Generates:



...(deleted for brevity)...

I noticed that it's pretty lean in the case of an empty ISR. In my circumstances, since I'm calling a method from inside the ISR, the preamble looks more like this:

ISR(PCINT1_vect) {
     442:       1f 92           push    r1
     444:       0f 92           push    r0
     446:       0f b6           in      r0, 0x3f        ; 63
     448:       0f 92           push    r0
     44a:       11 24           eor     r1, r1
     44c:       2f 93           push    r18
     44e:       3f 93           push    r19
     450:       4f 93           push    r20
     452:       5f 93           push    r21
     454:       6f 93           push    r22
     456:       7f 93           push    r23
     458:       8f 93           push    r24
     45a:       9f 93           push    r25
     45c:       af 93           push    r26
     45e:       bf 93           push    r27
     460:       ef 93           push    r30
     462:       ff 93           push    r31

I am thinking of actually making the ISR "naked", duplicating that bit of code, but inserting my little bit of work at the beginning of all of that.

Well in my test, the generated code for this sketch:

volatile byte savedPort;
byte bar;

void foo ()
  {
  bar = savedPort;  
  }
  
ISR (PCINT0_vect) 
{
  savedPort = PINB;
  foo ();
}  // end of PCINT0_vect

void setup ()
{
   // pin change interrupts
  PCMSK0 = _BV (PCINT1);  // only want pin 9
  PCIFR  = _BV (PCIF0);   // clear any outstanding interrupts
  PCICR |= _BV (PCIE0);   // enable pin change interrupts for PCINT0..7
}

void loop () {}

Only had this before saving the port:

ISR (PCINT0_vect) 
 100:	1f 92       	push	r1
 102:	0f 92       	push	r0
 104:	0f b6       	in	r0, 0x3f	; 63
 106:	0f 92       	push	r0
 108:	11 24       	eor	r1, r1
 10a:	8f 93       	push	r24
{
  savedPort = PINB;
 10c:	83 b1       	in	r24, 0x03	; 3
 10e:	80 93 00 01 	sts	0x0100, r24

Which isn't much worse than you can do with assembler. You have to save the status register, and before you do that you have to save R0.

Now I caution you about getting too carried away about shaving nanoseconds off ISRs. I did a few tests earlier, using this code:

ISR (PCINT0_vect) 
{
  PORTB = 4;  // turn on pin 10
}  // end of PCINT0_vect


void setup ()
{
  digitalWrite (9, HIGH); // pullup
 
  // pin change interrupts
  PCMSK0 = _BV (PCINT1);  // only want pin 9
  PCIFR  = _BV (PCIF0);   // clear any outstanding interrupts
  PCICR |= _BV (PCIE0);   // enable pin change interrupts for PCINT0..7
  
  pinMode (10, OUTPUT);
  digitalWrite (10, LOW);
}

void loop () {}

Now, measuring the time taken between pin 9 going low (by my touching it to ground) and the time that D10 is brought high, as promptly as I could, I got these figures on consecutive tests:

1.2500 uS
1.4375 uS
1.5625 uS
1.3750 uS

That's a difference of 0.3125 uS (5 clock cycles) in what should be a repeatable experiment! I think at least 4 can be accounted for by the fact that main does a CALL to call loop, and CALL takes 4 clock cycles. Once the instruction starts, it has to finish before the interrupt can be serviced. Probably the 5th would be because of the exact time the interrupt occurred with reference to when the clock pulses.

So you already have something like 5 clock cycles of "jitter", and that is without doing anything else. For example, Timer 0 will cause an interrupt. Whether or not it is higher or lower priority than your pin change isn't the point. Once it starts, it has to finish. So that could be another 5 or 6 uS down the drain. And if your code calls millis() that turns interrupts off briefly. So that delays things too.

So with all these variables, whilst it is nice to design for a fast response, all this assembler code might be bit of an overkill.

The further the check of the pin state, the less likely (under certain circumstances, such as switch bounce) the port read is going to actually reflect the state of the port at the time of the interrupt.

Pin change interrupts can be deduced somewhat by comparing the now value to the previous one. Of course it could change back quickly, but switches don't tend to bounce that fast. And for other interrupts (eg. a falling level interrupt) if it fired you know what the new state is.