VGA controller using FIFO memory, discrete ICs and Arduino Uno/Mega?

I love the Arduino boards. They're super easy to use and give me access to fairly powerful microcontrollers for all of about $12. Unfortunately, 16MHz isn't quite fast enough to display more than 120x60, 2-bit colour VGA (which is what I have right now on my Uno, using the VGAX library). Even trying to squeeze out 3- or 4-bit colour requires a resolution drop.

I'm making an Arduino-powered games console for the fun of it (I love my Mega Drive and thought it might be fun to see what I could achieve with Arduino). My plan WAS to have an Arduino Mega2560 running the game code and storing all the sprites in its 8KB of ram, loading and unloading as necessary to/from an SD card or other external storage. The Mega would then send the 2-bit pixel data, 4 pixels at a time, via an 8-bit parallel bus (literally 8 jumper wires with two "acknowledge" lines) to an Arduino Uno, which would act as the display unit and use 95%+ of its processing power just spewing pixels out to the monitor/TV. That was fine, and worked well... but 2-bit colour just isn't quite enough to reach Mega Drive-era games, and is pretty hard to work with. So I decided to try to invent some sort of custom VGA controller...

Ideally, what I would like to do would be have a small SRAM chip (64KB would be HEAPS, even 8KB would be enough, though 16 would be better), which the Arduino talks to. The Arduino would create an array (i.e. the front buffer) in the first section of ram, then just update the pixels as fast as it can. Then a custom circuit on the other side of the ram, clocked at VGA-suitable speeds (~25MHz?) would read the pixels from the ram (using some sort of counter to move through the array?). That way, even if the Arduino couldn't keep up, the VGA controller wouldn't be left without pixel data - rather, it would just be the pixels from the last frame. So you'd get tearing but that's not a major issue.

There's probably numerous reasons that would be hard to do. Different clock speeds, for a start. I don't really know how SRAM works but as I understand it having two different clock speeds trying to write and read at the same time might be bad. So...

I came across this post. In it they mention FIFO memory. Bingo! I didn't even know this was a thing! They refer to the SN74ALVC7804, a 512x18-bit FIFO memory chip. It sounds perfect. It would appear that the chip doesn't care if the clock is consistent or not, or whether the input and output clocks differ, as long as they don't go above 40MHz. It has pins to show when it's nearly full (and when it's full). It's 18-bit, meaning I can spew 3 pixels of 6-bit colour (nice and neat - a 2-bit resistor DAC on each colour wire, giving me 64 colours to play with) into the buffer so as to keep up with the VGA controller on the other side as it reads each of the 6 bits of data... and this is where I'm stuck.

So here is my real question: What would you suggest? I could always just buy an Arduino Zero/Due and be done with it, but that's no fun (and very expensive in Australia). I want some challenge, just so long as it's not physically impossible (i.e. trying to extract 120x60 4-bit colour from an Uno). I would need some way of switching between the first, second and third set of 6 bits from the FIFO. I'd also need some way of actually timing the pixel outputs (probably the hardest part, now that I think about it - not so much the timing itself but rather counting out the sync pulses, front/back porches, etc.). I'm not looking for someone to give me a straight-up BOM, just point me in the direction.

I love electrical engineering, but I don't have much experience with it. Plus I have no clue what kind of ICs you can buy to achieve this kind of thing. And finally, even though I'm a programmer first and EE second, FPGA programming just melts my brain.

Looks like you can still get them http://www.digikey.com/product-search/en?keywords=SN74ALVC7804 You have to put in all 18 bits at once, and read all 18 out at once. Maybe latch each 6 into separate output registers if you want to clock into DACs at different times from there.

Looks like you can still get them

Yeah but that's only the 25MHz version. It might be fast enough, not sure, but I'd feel happier with the 40MHz chip - but that one's only available in lots of 80 :( PLUS the shipping costs from Digi-Key is insane: a minimum of $46 USD (~$64 AUD) for international postage!! :O For a thing smaller than a flash drive!

EDIT: It's actually $30 to post to Australia. Even so that's still 5 times the price of the product itself.

You have to put in all 18 bits at once, and read all 18 out at once. Maybe latch each 6 into separate output registers if you want to clock into DACs at different times from there.

Dangit, didn't think of that. Wait, if I manually created the clock signal, shouldn't the output pins stay set to the last 18 bits that were read until the next clock signal?

Ok, so I was wandering around outside, wondering if this project was worth continuing and also if there were other possible solutions, and had an idea that might let me spew out 120x60 4-bit colour. At first I thought the idea was amusing with little chance of success, but the more I considered it, the more potential problems I resolved, and... long story short, it might just work.

My idea is to use two Arduino Unos. Yes. This is a good idea.

The problem with the Uno/Mega is that it's only (ha, "only") 16MHz. That means it's just barely fast enough to pull the next pixel from memory and assign it to PORTD. It also only just has enough memory to store a 120x60 2-bit bitmap (totalling 1800 bytes). The obvious solution to getting 4-bit colour is to halve the resolution, but 120x60 is pretty small as is, so I wasn't keen to do that. But what if we halve the amount of work that has to be done instead?

My thinking is this. Instead of a single Uno outputting the pixels based on a timer interrupt, I have a "host" Arduino (another Uno, maybe? or perhaps the game-logic-controlling Mega would have enough spare cycles - err...probably not) with timer interrupts marking when pixels should be sent to the monitor. On each call of the host's timer interrupt, the host sets one of two pins (each connected to one of the "client" Arduinos) high. Each client Arduino has an interrupt set to the rising edge of a particular pin. That interrupt does exactly what the normal timer interrupt when using a single Uno, and renders the pixels to the screen.

"Ah," I hear you say, "but what about those 1800 bytes? You've now got 3600 bytes to store!" Well that's easily solved. Instead of each Uno holding the entire front buffer, each one just holds every second pixel. So client 1 might hold all the odd pixels and client 2 the even pixels. When a pixel needs to be updated, the host Arduino (probably actually the game-controlling Arduino) sends it to the appropriate Uno.

If you're rendering at 120x60 or any multiple of that, you could even use three client Unos, but I don't know whether you might then stretch the hardware of the host Arduino - haven't thought that through.

There's probably some really obvious thing preventing me from going forward with this plan that I haven't thought of, but at the very least I want to try it. It's almost certainly the cheapest and easiest method of achieving 120x60 4-bit colour (I only have two Unos and a Mega atm, and one Uno is acting as a keyboard with an IR receiver for our media centre/NAS PC, but another Uno is only ~$12, so I could buy two of them and still not spend as much as the postage alone for that chip from Digi-Key).

Potential problems that I can see: - The client Unos might go out of sync periodically, causing flickering or distorted picture. Their timers might be very slightly different, so when an interrupt is called on one it might not be called with the same delay on the second. I suppose it might be possible to use just one clock for both Unos...? :P