Screen + MicroSD + Power amp (audio) - All in 1 Due solution

Hi,
I'm very much a pure assembly language programmer and I admit hardware defeats me. I saw an excellent Arduino Uno product that plugged straight into the headers and provided a TFT screen, MicroSD connector and power-amp so that a mono speaker could be soldered straight onto it.

Is their a similar product for the Due? I'm sorry to ask but I have searched without luck and I don't trust myself to bread-board the thing together.

What I am actually doing is writing an MP3 & ACELP decoder in 100% assembly language. I'm actually sticking to 48MHz with branch speculation, bit-banding and cache disabled. All of the time-sensitive code sits in the 32Kbyte RAM budget I have set myself. The reason for these limits it to produce a decoder that will work on a slightly modified M0+ processor. The only Thumb 2 instructions I am using are CLZ, SMMLS & SMMLA on the basis that an M0 can have these few instructions added in Verilog.

In short, I'm trying to build the very cheapest audiobook player possible. I'm even noting that some Bluetooth chips have a 48MHz Cortex M3 because I wan't to keep down the IC count.

It needn't be tough-screen or indeed colour (but it the extra price is tiny, I will go with it). The resolution just has to be enough to display the synopsis and an image of the book cover.

I'm sorry to sound so useless but if I manage to get the MP3 player running on an M0 (Arduino Uno) then I will of course place it on Github for everyone. My ultimate goal is to provide an audiobook that uses a regular AA battery and works out cheaper than giving school children paper books. It's aimed at the Indian and Pakistani markets at the moment because I believe that both countries have produced truly great literature and I am passionate about enabling every pupil to hear these works along with classics from the East and West.

I MAY end up driving the MicroSD card from the CPU or possibly using the MicroSD CPU to perform the decoding so that the CPU can be an M0+ based SoC running at a lower clock speed.

So:

-256 x 128 pixel screen
-MicroSD slow
-Power amp for audio

Please forgive me if I have stared right through an answer but after finding such an item for the Uno Pro, I have been driven to distraction attempting to find a homologue for the Due.

Many thanks to all

PS I have been lucky enough to have been helped by Jens Bauer on the ARM forums but a 32-bit x 32-bit--->64 bit signed multiply takes 18 cycles but it is so common I cannot place it in-line so the call and return makes a total of 20 cycles. A multiply and accumulate takes 24 cycles and a MULSHIFT32 (32-bit x 32-bit ---->top 32-bits of result) takes 19 cycles. A count leading zeros takes 16-20 cycles as well even if I use the zero-page (address 0-255 so it saves a cycle pointint to lookup) and once again, I cannot place it in line.

PPS If you need any code speeding up, I'm happy to give it my best shot. I spent a couple of decades writing games in 100% assembly language so I pride myself that I can shave off cycles. If you need something optimizing - post below and I will do my best.