Here's a very crazy idea I had the other day, which I still don't know if its doable or not...
To combine a regular Arduino and a XMOS chip. (www.Xcore.com - www.Xmos.com)
The cheapest one does 4 x 100 Mhz threads (you can't have a single 400 Mhz thread sadly) or 8 x 50 Mhz Threads. Its 32 bits with 64 bits accumulators (no idea on how to use those yet) 64 Kb of RAM (used by the program + variables, AFAIK) and can auto-boot from an external SPI Flash.
Here's a single chip on SparkFun: XMOS Processor - XS1-L1-64 - COM-10109 - SparkFun Electronics
The idea is to make the Arduino board talk to the XMOS chip telling what to do, plus, also upload the data to the SPI Flash, so it wouldn't require any extra tools to program the XMOS chip. Unless the user wanted to really mess around with the chip, then a "XMOS XTAG2 Debug Adapter" would be required.
Here's a thread about how to start-up with those chips: Starting up... again... how to program blank chips? - XCore Exchange
And here are a couple of PDF file that talks about the basic structure on how to setup the chip.
http://www.xmos.com/published/xs1-l1-64lqfp-hardware-designcomponent-list
So, usually those chips use a thread for UART, another for PWM which would kill the whole chip for those simple things. BUT, since the ATmega328 will be taking care of this, we would use the XMOS chip only for 32-bit math, and let the ATmega328 (Arduino board) take care of the rest.
So, the block-diagram would be simple. One timer we call at X samples, that will do the 10bit PWM, like we did on the BeatVox project: https://github.com/Beat707/BeatVox
Once the timer calls the interrupt, it will set the PWM of the two outputs to a previously saved variable, just like the BeatVox = 1 sample buffer delay.
After that, it will send data to the XMOS chip telling what was changed in terms of parameters, send MIDI data and ask for the last processed sample buffer, store, and rest. After that the XMOS chip will calculate the next buffer using all its power-magic.
Of course we could later add a 16-bit DAC to the process, and maybe even have the XMOS chip itself handle that or a better PWM. (couldn't get it to work so far)
All in all, this is just an idea for now. I have the XC1A kit which is great for doing tests, and once I have something that works, I will start posting code. But I will do this slowly, as I wonder if the new Arduino DUE (96 Mhz ARM 32-bits) wouldn't be just easier to handle, still with less power compared to 4 x 100 Mhz Threads, but still, we will see I guess.
Best Regards, WilliamK