Inline Assembler with SPI DAC

So I'm working on yet another improvement to my ongoing midi synth project: adding a 12-bit DAC, and reworking all the synthesis code.
I'm using an MCP4922 SPI DAC.
If I use the normal SPI library functions, I can successfully write values to the DAC.

But I have an interrupt routine running at ~32KHz, so I'm using the GCC inline assembler to code the interrupt in assembly.
Here is the interrupt routine, stripped down to what I believe are the problem sections:

ISR(TIMER1_OVF_vect)  {                  //ASSEMBLER WOOT

  byte tempOutput;

  __asm__ volatile (
  //some port addresses, to make things easier
  ".equ PORTB,0x05" "\n\t"
    ".equ SPDR,0x2e"  "\n\t"
    ".equ SPSR,0x2d"  "\n\t"
"mov %[debugByte],%B[outputA]" "\n\t"
    //output first, most significant, byte for DAC thru SPI
  //this variable changes in the first "batch" of oscillators, so a copy must be made
  "mov %[tempOutput],%A[outputA]"                                "\n\t"
    //slave select pin is pin B2, for the DAC
  "cbi PORTB,2"                                                  "\n\t"
    //begin SPI transfer
  "out SPDR,%B[outputA]"                                         "\n\t"

    //reset outputA.  done after transfer, to have a buffer.
  //set all the neccessary config bits. 
  //and set the output to the middle value.   0x78 = 0b01111000
  "ldi %B[outputA],0x78"                                         "\n\t"
    //lower bit of output is all data, no config bits
  "ldi %A[outputA], 0x00"                                              "\n\t"


    //send lower byte, previously moved, since the first "batch" edited the original output byte
  "Wait1:"                                                       "\n\t"
    //wait until bit 7 of SPSR is set, the SPI flag.  SPI should be done by now.  but just in case.
  //annoyingly, SPSR is not usable in the sbis command.  so put in a register first.
  "in r1,SPSR"                                                 "\n\t"
    "sbrs r1,7"                                               "\n\t"
    "rjmp Wait1"                                                 "\n\t"

    "out SPDR,%[tempOutput]"                                       "\n\t"


    "Wait2:"                                                       "\n\t"
    //wait until bit 7 of SPSR is set, the SPI flag.  SPI should be done by now.  but just in case.
  //annoyingly, SPSR is not usable in the sbis command.  so put in a register first.
  "in r1,SPSR"                                                 "\n\t"
    "sbrs r1,7"                                               "\n\t"
    "rjmp Wait2"                                                 "\n\t"
    "rjmp AllDoneYay"                                            "\n\t"

  "AllDoneYay:"                                                "\n\t"
    "sbi PORTB,2"                                              "\n\t"

:  //outputs
  [outputA] "=&d" (outputA),
  [tempOutput] "=&d" (tempOutput),
  [debugByte] "=&d" (debugByte)
:  //inputs

:  //clobbered

OutputA is a global volatile int declared at the beginning of the program.
DebugByte is a volatile global byte used to monitor the value of outputA. In loop(), debugByte is sent repeatedly thru serial.
The OSCILLATOR CODE parts are where code would go that would calculate the outputs of each oscillator. But the problem persists with or without these parts, so they're not relevant.
I interspersed both SPI transfers so that no time is wasted transferring; the oscillator code is executed while the transfer is carried out. The oscillator code, though, modifies the outputA variable, which is why I had to transfer the low byte of the output to the tempOutput byte.

With the above code, I get loads of garbage.
If I move the debugByte mov instruction to after the ldi's of outputA, I get the expected message "120", or 0x78.
If I move the outputA ldi's to before the SPI transfers, the transfers work fine. I get the expected ~2.5V output. (the DAC takes 2 bytes to set one channel. The first nybble of the high byte is configuration bits for the DAC; they should be 0x7. the 12 other bits are data bits, so 0x7800 gets an output in the middle of the range)

I want the oscillator code to come after the SPI transfers (and the moves to temporary bytes) so the sample rate is constant regardless of any variation in the speed of the interrupt function.
But something's just getting messed up somewhere.

Any ideas?

GCC doesn't produce code that runs fast enough?

Not with what I have tried.

For this application, it was way easier to just write it in assembler than to try to figure out C code that compiled to be fast enough.

To give an idea of how fast this needs to be, this interrupt routine is running at ~32KHz. With a 16MHz clock, that's only 512 clock cycles between interrupt calls.
Through lots of assembler optimization, I was able to get the code that runs for each oscillator down to about 32-35 clock cycles. With 8 (or more) note polyphony, that means the interrupt routine is constantly taking up well over 50% of the CPU.
All the waveforms are also generated by the interrupt routine: there are no wavetables, to keep SRAM freed up. Variable pulse width square and triangle waveforms are transformed from a base sawtooth waveform. Each oscillator also has 256 volume levels.

I don't think that I could get GCC to compile something fast enough for this. :slight_smile:

As you know what registers you are using you may save more time by using the ISR_NAKED directive for the ISR and pushing the 2-3 regs you use.

I assume that the compiler pushes all regs because it doesn't know what's going on inside the volatile ASM block (although it be may be clever enough to figure it out).

As for what's wrong with the code, maybe if it was in normal ASM but I can never follow the GCC inline assembler syntax.


I don't believe the ISR_NAKED would help much. I see what you mean, but yes, the compiler knows which registers are used.
In the end of the assembler function block thing, there's a list of output, input, and mangled variables/registers/whatever.
The inputs/outputs it automatically assigns to registers; this is why I never refer to register names in the assembler code... Except with the mangled registers. The mangled registers are any other registers that are modified in the routine. The mul instruction modifies R1, so that's listed as mangled.
So between the inputs/outputs, and the mangled register list, the compiler knows exactly what to push/pop.

And woohoo! In typing that I thought of what my problem was. The outputA variable was only listed as an output operand, so GCC does not load its value into the register at the beginning. I put an input operand linked to the same register, and it works!

Anything else I can help you with :slight_smile:

Often just describing a problem leads you to the answer.

I'm quite happy working in assembler but that inline syntax really sucks IMO.


Yeah it's annoying.
When I was actually coming up with the assembler code, I actually just handwrote it. Found it easier and much more readable... :slight_smile: I just typed it up and troubleshot (? troubleshooted?) it to make the program.

Then I spent some time playing aro—OH GOD WHAT AM I DOING