best solution for 12 analog outputs (5 bits)

How about this:
SPI.transfer()s to 8 74HC595 shift registers =>64 bits, pick off ouptut in groups of 5.
Output of each shift register drives an R2R Ladder.
http://www.bourns.com/pdfs/r2r.pdf (the heart of most DACs)
and then 3 quad op-amps to make voltage output into your controller.

500 Hz is 2mS period = 32,000 clock cycles. Seems like more than enough to update 8 bytes & shift them out via PWM to create your output levels.
pseudocode:

loop(){
read current time in micros;
if (2mS elapsed){
store current time for next pass thru loop;
digitalWrite (latch_signal, LOW); // use direct port manipulation to speed it up
for (x=0 to 7){
SPI.transfer( outputArray[x]);
next x;
digitalWrite (latch_signal, HIGH);  // use direct port manipulation to speed it up
} // end transfer
} // end 2mS check
// now nearly 2 mS available to update the array before next 2mS window
} // end loop

this the essence of blink with delay. Every 2mS you do a little burst of transfers, then spend the rest of the time doing other stuff.

Trade off cost too - 8 74HC595s, 12 resistor networks, 3 quad op amps
vs 3 quad DACs
vs 2 octal DACs
and the time savings wiring it up and the real estate required.