Pages: [1]   Go Down
Author Topic: Inline Assembler with SPI DAC  (Read 1050 times)
0 Members and 1 Guest are viewing this topic.
Nowhere
Offline Offline
God Member
*****
Karma: 3
Posts: 852
|-\ |\|\
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

So I'm working on yet another improvement to my ongoing midi synth project: adding a 12-bit DAC, and reworking all the synthesis code. 
I'm using an MCP4922 SPI DAC.
If I use the normal SPI library functions, I can successfully write values to the DAC.

But I have an interrupt routine running at ~32KHz, so I'm using the GCC inline assembler to code the interrupt in assembly.
Here is the interrupt routine, stripped down to what I believe are the problem sections:

Code:
ISR(TIMER1_OVF_vect)  {                  //ASSEMBLER WOOT


  byte tempOutput;


  __asm__ volatile (
  //some port addresses, to make things easier
  ".equ PORTB,0x05" "\n\t"
    ".equ SPDR,0x2e"  "\n\t"
    ".equ SPSR,0x2d"  "\n\t"
//------------------------------------------------------------------------
"mov %[debugByte],%B[outputA]" "\n\t"
   
    //output first, most significant, byte for DAC thru SPI
  //this variable changes in the first "batch" of oscillators, so a copy must be made
  "mov %[tempOutput],%A[outputA]"                                "\n\t"
    //slave select pin is pin B2, for the DAC
  "cbi PORTB,2"                                                  "\n\t"
    //begin SPI transfer
  "out SPDR,%B[outputA]"                                         "\n\t"

    //reset outputA.  done after transfer, to have a buffer.
  //set all the neccessary config bits.
  //and set the output to the middle value.   0x78 = 0b01111000
  "ldi %B[outputA],0x78"                                         "\n\t"
    //lower bit of output is all data, no config bits
  "ldi %A[outputA], 0x00"                                              "\n\t"

//OSCILLATOR CODE

    //send lower byte, previously moved, since the first "batch" edited the original output byte
  "Wait1:"                                                       "\n\t"
    //wait until bit 7 of SPSR is set, the SPI flag.  SPI should be done by now.  but just in case.
  //annoyingly, SPSR is not usable in the sbis command.  so put in a register first.
  "in r1,SPSR"                                                 "\n\t"
    "sbrs r1,7"                                               "\n\t"
    "rjmp Wait1"                                                 "\n\t"
   

    "out SPDR,%[tempOutput]"                                       "\n\t"

//OSCILLATOR CODE

    "Wait2:"                                                       "\n\t"
    //wait until bit 7 of SPSR is set, the SPI flag.  SPI should be done by now.  but just in case.
  //annoyingly, SPSR is not usable in the sbis command.  so put in a register first.
  "in r1,SPSR"                                                 "\n\t"
    "sbrs r1,7"                                               "\n\t"
    "rjmp Wait2"                                                 "\n\t"
    "rjmp AllDoneYay"                                            "\n\t"




    //--------------------------------------------------------------------
  "AllDoneYay:"                                                "\n\t"
    "sbi PORTB,2"                                              "\n\t"
    //=======================================================================


:  //outputs
  [outputA] "=&d" (outputA),
  [tempOutput] "=&d" (tempOutput),
  [debugByte] "=&d" (debugByte)
 
:  //inputs

:  //clobbered
  );
}

OutputA is a global volatile int declared at the beginning of the program.
DebugByte is a volatile global byte used to monitor the value of outputA.  In loop(), debugByte is sent repeatedly thru serial.
The OSCILLATOR CODE parts are where code would go that would calculate the outputs of each oscillator.  But the problem persists with or without these parts, so they're not relevant.
I interspersed both SPI transfers so that no time is wasted transferring; the oscillator code is executed while the transfer is carried out.  The oscillator code, though, modifies the outputA variable, which is why I had to transfer the low byte of the output to the tempOutput byte.

With the above code, I get loads of garbage.
If I move the debugByte mov instruction to after the ldi's of outputA, I get the expected message "120", or 0x78.
If I move the outputA ldi's to before the SPI transfers, the transfers work fine.  I get the expected ~2.5V output.  (the DAC takes 2 bytes to set one channel.  The first nybble of the high byte is configuration bits for the DAC; they should be 0x7.  the 12 other bits are data bits, so 0x7800 gets an output in the middle of the range)

I want the oscillator code to come after the SPI transfers (and the moves to temporary bytes) so the sample rate is constant regardless of any variation in the speed of the interrupt function.
But something's just getting messed up somewhere.

Any ideas?
Logged

Soundcloud page: http://soundcloud.com/beefinator-2
Youtube channel: http://www.youtube.com/user/beefinator14
Old soundcloud page (ran out o

"The old Europe"
Offline Offline
Edison Member
*
Karma: 1
Posts: 2005
Bootloaders suck!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

GCC doesn't produce code that runs fast enough?
Logged

• Upload doesn't work? Do a loop-back test.
• There's absolutely NO excuse for not having an ISP!
• Your AVR needs a brain surgery? Use the online FUSE calculator.
My projects: RGB LED matrix, RGB LED ring, various ATtiny gadgets...
• Microsoft is not the answer. It is the question, and the answer is NO!

Nowhere
Offline Offline
God Member
*****
Karma: 3
Posts: 852
|-\ |\|\
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Not with what I have tried. 

For this application, it was way easier to just write it in assembler than to try to figure out C code that compiled to be fast enough.

To give an idea of how fast this needs to be, this interrupt routine is running at ~32KHz.  With a 16MHz clock, that's only 512 clock cycles between interrupt calls. 
Through lots of assembler optimization, I was able to get the code that runs for each oscillator down to about 32-35 clock cycles.  With 8 (or more) note polyphony, that means the interrupt routine is constantly taking up well over 50% of the CPU. 
All the waveforms are also generated by the interrupt routine: there are no wavetables, to keep SRAM freed up.  Variable pulse width square and triangle waveforms are transformed from a base sawtooth waveform.   Each oscillator also has 256 volume levels.

I don't think that I could get GCC to compile something fast enough for this.    smiley
Logged

Soundcloud page: http://soundcloud.com/beefinator-2
Youtube channel: http://www.youtube.com/user/beefinator14
Old soundcloud page (ran out o

nr Bundaberg, Australia
Offline Offline
Tesla Member
***
Karma: 129
Posts: 8601
Scattered showers my arse -- Noah, 2348BC.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

As you know what registers you are using you may save more time by using the ISR_NAKED directive for the ISR and pushing the 2-3 regs you use.

I assume that the compiler pushes all regs because it doesn't know what's going on inside the volatile ASM block (although it be may be clever enough to figure it out).

As for what's wrong with the code, maybe if it was in normal ASM but I can never follow the GCC inline assembler syntax.
______
Rob
« Last Edit: May 27, 2012, 06:59:24 pm by Graynomad » Logged

Rob Gray aka the GRAYnomad www.robgray.com

Nowhere
Offline Offline
God Member
*****
Karma: 3
Posts: 852
|-\ |\|\
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I don't believe the ISR_NAKED would help much.  I see what you mean, but yes, the compiler knows which registers are used.
In the end of the assembler function block thing, there's a list of output, input, and mangled variables/registers/whatever.
The inputs/outputs it automatically assigns to registers; this is why I never refer to register names in the assembler code... Except with the mangled registers.  The mangled registers are any other registers that are modified in the routine.  The mul instruction modifies R1, so that's listed as mangled.
So between the inputs/outputs, and the mangled register list, the compiler knows exactly what to push/pop.

And woohoo! In typing that I thought of what my problem was.  The outputA variable was only listed as an output operand, so GCC does not load its value into the register at the beginning.  I put an input operand linked to the same register, and it works!
Durrr...
 smiley-razz
Logged

Soundcloud page: http://soundcloud.com/beefinator-2
Youtube channel: http://www.youtube.com/user/beefinator14
Old soundcloud page (ran out o

nr Bundaberg, Australia
Offline Offline
Tesla Member
***
Karma: 129
Posts: 8601
Scattered showers my arse -- Noah, 2348BC.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Anything else I can help you with smiley

Often just describing a problem leads you to the answer.

I'm quite happy working in assembler but that inline syntax really sucks IMO.

_____
Rob
« Last Edit: May 27, 2012, 09:41:45 pm by Graynomad » Logged

Rob Gray aka the GRAYnomad www.robgray.com

Nowhere
Offline Offline
God Member
*****
Karma: 3
Posts: 852
|-\ |\|\
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Yeah it's annoying.
When I was actually coming up with the assembler code, I actually just handwrote it.  Found it easier and much more readable...   smiley   I just typed it up and troubleshot (? troubleshooted?) it to make the program.

Then I spent some time playing aro—OH GOD WHAT AM I DOING
Logged

Soundcloud page: http://soundcloud.com/beefinator-2
Youtube channel: http://www.youtube.com/user/beefinator14
Old soundcloud page (ran out o

nr Bundaberg, Australia
Offline Offline
Tesla Member
***
Karma: 129
Posts: 8601
Scattered showers my arse -- Noah, 2348BC.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Neat.

_____
Rob
Logged

Rob Gray aka the GRAYnomad www.robgray.com

Pages: [1]   Go Up
Jump to: