My next thought is to run it off a shift register, similar to how the guy in the spritemods link did it. I'm concerned, though, how slow the display would be. I've read people having it take 4 seconds to clear the screen in similar configurations. All I need it to do is display some text data streams, static images, and run a couple sliders. So my needs aren't too demanding on it, so it might work this way.
The other plan is to run it off of an Chinese Mega2560 clone ($10 from ebay, just the atmega2560 uC is more than that everywhere I've looked) with the appropriate shield (another $7ish) .
So could I get some input on the pros and cons of each way of doing it? Or perhaps other solutions to the problem?
For 16 bit parallel you'll need 2 74HC595's daisy chained. Use the hardware SPI to drive the shift registers at full speed (8MHz)
and direct port manipulation to drive the nCS, D/C and nWR lines.
SCLK would drive the clock input of the shift registers, MOSI to the data input, another pin (SS works OK) to the
latch pins. You only need to feed the shift register when the values changes, which means that for repeated pixels
you can just toggle the nWR line and get excellent speed writing TFT RAM. - This is a trick you cannot do on 8-bit parallel
or SPI TFT modules BTW.