How are registers saved in interrupts?

I was reading this article: where someone is trying to see how much the main program slows down while high-frequency interrupts are running. It seems there is very little overhead in addition to the user-supplied interrupt service routine. How is that possible? :o One might expect that plenty of stuff (program counter, status/flag registers, 32 CPU registers) has to be pushed an popped from the stack for the main program to be able to continue as if nothing happened. Is there some smarter solution implemented? Are there two sets of cpu -registers, one for interrupts and one for the main program? or does the compiler allocate, say, 16 registers for use by the main program and 16 for interrupts so they don't need to be saved?

As part of the function call/return code overhead, the compiler generates code to save and restore the registers that the function uses.

More compiler magic. :)


And this magic includes that the compiler tracks which registers to save. Hence as a rule of thumb: less code in the interrupt --> less registers to save --> interrupt call has less overhead. Putting to much code into the interrupt handler is a bad idea.

If you code assembler you can control what is saved and how it is saved. Except for the programm counter and the status register. Those get always pushed to the stack.


So my initial impressions (I am very new here) of interrupts was that they were for special cases. But it sounds to me like they would be of great use for events where you just change state that is then handled in your loop. Is this correct?

@artjumble... Does any of this bring clarity...

But it sounds to me like they would be of great use for events where you just change state that is then handled in your loop. Is this correct?

Exactly - that's a great way to use interrupts.

You want to keep the ISR short and fast, so that other interrupts aren't missed while it is executing.


If you code assembler you can control what is saved and how it is saved. Except for the programm counter and the status register. Those get always pushed to the stack.


From the datasheet ( , page 14

Note that the Status Register is not automatically stored when entering an interrupt routine, nor restored when returning from an interrupt routine. This must be handled by software.

It would have to be very special code in the main loop not to require the status register! :o

read-modify-write of a port might not change the status register, there are other examples also, but not saving SREG is begging for great troubles.

I've seen avr-gcc doing excessive pushes and pops in the stack even if you just reload a timer counter with a constant. How to fix this I did not find out, even after digging around at avrfreaks. At the end you might always code a short ISR in assembler. I am not at all into assembler, and don't understand most of the concepts but was able to reduce the size of a critical ISR used for UART communication.

I am not at all into assembler, and don't understand most of the concepts

If you ever have the chance, take some time to learn a bit about it. Once you do, you will have a very special understanding of just what and how most computers work.

I don't have any experience with AVR assembler, so I can't speak on it directly, but I do have some experience with 6809 (8-bit), 6502 (8-bit), 68000 (16 bit) and 80x86 (16 bit - never got into 32 bit) assembler coding. With the ATMegaxx8 being an 8-bit microcontroller, it shouldn't be too difficult to learn from (though I am not sure how its Harvard architecture plays into things).

Ultimately, what you learn is that for the most part, computers (well, CPUs) are nothing more than high-speed player pianos. You have a bunch of counters (registers), that can be incremented, decremented (sometimes not, sometimes you have to add the 2s-complement to decrement - IIRC), shifted left or right, and if you're really lucky, you can multiply and divide. There's the "status" register, which is a bunch of bits (flags) that typically can be used for branching based on their values; they set things like overflow, error conditions, etc. You have special counters that automatically increment (like the program counter - which keeps track of which address you are on), but that you can (usually) change at will to modify where you are at in the code.

Now, probably something with the ATMega that you can't do (this is just speculation as like I said, I have no experience with assembler on the ATMega, but it is a Harvard architecture device), that you can with a regular CPU - since you can point to any address, and modify the data in that memory cell, you can effectively change the code you are actively running (this has a caveat with certain larger processors and such, where you can set up a "protected mode" to disallow your code from modifying other processes code)! With the ATMega, this shouldn't be possible, because the running code shouldn't be able to modify itself in a Harvard architecture (however, the ATMega may not be a strict HA device, it may be a special case that allows some form?).

You may wonder why you would want to do this, but believe me, on a limited memory 8 or 16-bit machine, it allows you to do some very interesting things that otherwise couldn't be done without a lot more memory available (many of the old-school games and such in the "bad-old" days of 8 and 16 bit machines like the Apple, TRS-80, Amiga, Atari, etc - wouldn't have been possible without the trick).

Then there's concepts like a "stack" and "heap" (basically the same thing, one's larger than the other, usually - and I think there are other differences, but conceptually they're the same) - these are small areas of memory set aside by the programmer (and a register set up to point to them) for "pushing" and "popping" data off of for register saving and recall, as well as data. Typically, registers are pushed/popped on the stack, while the heap is used for data. So you would push your data onto the heap, jump to your routine (either a JMP instruction, which is like a GOTO, or some other instruction which I can't remember a basic mneumonic for but it like a GOSUB - JSR?), push your registers onto the stack, pop your data off the heap, do some calcs, push your results onto the heap, pop your registers off the stack (in case your calcs changed anything), return from your routine, then pop your results off the heap as needed.

Think of it as doing everything the compiler usually does for setting up and using functions, etc - because really, that's what a compiler does! Regarding registers - you only have so many; they are essentially high-speed memory locations in the CPU, but you only have a few, especially in small machines. They'll have small names like "A" (usually the "accumulator" - because historically it is used for adding/subtracting), "B", "C" (which may be a combo of "A" and "B" - so if "A" and "B" are 8-bit registers, "C" will be a 16-bit register that is a mashup of "A" and "B"), "X", "Y", etc. "PC" will be the program counter register, etc.

Finally, realize that each of these assembler instructions, that you enter using "assembler mneumonics" (the shorthand "code") - actually represent values of bits (arranged in bytes) that are nothing more than HIGH and LOW value electrical signals internal to the CPU. These values, which are put on a "bus" (an address and data bus - basically ports on a CPU, exactly like the ports on the ATMega used for digital I/O), effect on each transition of the clock (leading or trailing edge of the clock, usually, although on most processors there is processing done on both - sometimes writes and reads are interleaved in this fashion, among other schemes) a change in the state (or electrical patterns) that appear on the bus(es).

On older 8-bit machines (like the Altair 8800), you would "hand toggle" these bit values into memory - an assembler is essentially a "compiler" that does this for you.

Really - it is a giant player piano - I am serious! Look into how player pianos work (some of the old machines of the era even had rudimentary branching ability!) - they are based on the older technology of Jacquard weaving looms, which used punched cards to dictate how to weave the patterns they produced, and inspired both Charles Babbage (Difference Engine and Analytical Engine) and Herman Hollerith (1890 US Census - his company, Hollerith Tabulating Machines, would later become known as...IBM).

I am just touching the tip of the iceberg here; as you can probably tell, computing and its history are a fascination of mine. The history of computation is one of the greatest stories of mankind; it is literally the story of getting machines to "think", of our (humanity's) quest to realize in a machine that which makes us intelligent. Modern machines, for all their power, barely come close, but there have been recent (and not so recent) ideas and implementations of machinery floated and tried that may ultimately, one day, lead to true human-level and beyond "artificial intelligence". We are starting to transition now from single CPUs to multiple-CPUs in a desktop system; it is now possible to buy a teraflop multi-core desktop machine for under $10,000 US. I expect that as we move on into this decade, we will see more of a transition to "desktop parallel clusters" for our daily use; we are already there for the most part with dual/quad and larger core machines. Hopefully this will translate into more parallel processing software, perhaps systems which embody neural-network strategies to increase the intelligence level of our software systems.

All of this started from fairly humble beginnings - indeed, much of the thought on whether machines can think was started by Aristotle and the Ancient Greeks, and we haven't stopped pondering the concept.

I will shutup now. Sorry for the book!


@drhex, you are right, this was a mistake. Of course SREG is not pushed to the stack. Obviously it is possible to do this in the ISR. However in the 99% case this is the first thing you will have to do. Otherwise it will be very hard to get any reliable behaviour.


AVR LibC gives us an option to create "naked" ISR's as in the following:

// user code here

With the ISR_NAKED attribute, no code whatsoever is generated other than patching the interrupt vector table. The size of the sketch is 20 bytes less than with a default (empty) ISR definition. Apparently then, minimum interrupt overhead is at least 20 cpu cycles plus the call to the ISR itself.

Excellent thread, and thanks for the extensive write up cr0sh! I am pretty sure a lot of people that started with just an Arduino will go deeper into computing and this is something that might make them feel more comfortable. Just for illustration the ISR I was talking about:

// Timer0 Overflow interrupt is used to trigger the sampling of signals on the USI ports.
void TIMER0_OVF0_vect(void) __attribute__((signal,naked));
void TIMER0_OVF0_vect(void)
    /*     TCNT0 += TIMER0_SEED; */

    asm volatile( "push            r24" );
    asm volatile( "in            r24, 0x3f" );
    asm volatile( "push            r24" );

    asm volatile( "in            r24, 0x32" );
    asm volatile( "subi            r24, %[seed]" :: [seed] "M" TIMER0_SEED_INV );
    asm volatile( "out            0x32, r24" );

    asm volatile( "pop            r24" );
    asm volatile( "out            0x3f, r24" );
    asm volatile( "pop            r24" );

    asm volatile ("reti");

What I did was reading the assembler listing that avr-gcc generated and wrote a naked ISR without the excessive pushes and pops. gcc has it's own way of doing things and has some logic which (general purpose) registers to use in specific situations which I did not understand.