Toorum's Quest II - ATmega328P based retro video game with 256 color graphics

Hi!

EDIT: new video with music and sound effects!

Visit my blog at http://petenpaja.blogspot.fi for schematic, source code and technical info!


I'm finally ready to show you a project I've been working on and off for several months. It's a 8-bit retro video game based on Arduino and ATmega328P. The project is not yet finished -- I'm working on music playback which is still missing on the video.

Attached images: titlescreen, arduino based proto version, final standalone hardware.

Features:

Tile graphics mode with sprites. Up to three sprites per scanline, unlimited number of sprites vertically. Display resolution 104x80 with 256 colors. NES controller support. 4 audio channels with triangle, pulse, sawtooth and noise waveforms.

Music is still work in progress and not present in this early video footage.

Video signal is generated by the microcontroller by outputting 8-bit RGB values. The program has been heavily optimized and written in assembler, so that the ATmega328 outputs a new pixel exactly every 6th clock cycle. RGB values are converted to NTSC composite signal by AD725 chip.

Audio is generated in software by mixing 4 waveforms together. This should allow C64 SID quality music once the playroutine is finalized and music is composed.

The game is fully playable and has 15 rooms.

Atmega328P has only so many bytes of memory, so several tricks are needed to squeeze everything into memory. For example room data is compressed in flash and inactive rooms use only 1 byte of RAM to store the state of objects when the player is not around.

I'll probably do a full write up with schematics when music is done if people are interested in this.

Credits:

Hardware & software by Petri Häkkinen
Graphics tiles by Antti Tiihonen
Titlescreen by Juho Salila

2 Likes

Nice :slight_smile:

1 Like

Thanks! I noticed you're working on something similar. It's interesting to compare different approaches to video signal generation :slight_smile:

1 Like

My work was aimed at making the screen work for my VIC-20/C64 emulator but turned out to be a better TVout solution :slight_smile:
Still going to use it for my emulator even if it will be released as a library.

I have looked into overclocking the ATmega8 to 32.2MHz for generating NTSC color straight without addon circuits.
It is possible with 32MIPS.

1 Like

You don't need 32 Mhz for that. Linus Åkesson did it with 17.73447 MHz :slight_smile:

Anyway, getting control of colors is difficult with bitbanging because it's impossible to finetune the necessary frequencies. One of the reason why I decided to use the AD725 chip.

Overclocking is an interesting subject because it makes it possible to have higher resolutions.

1 Like

Very interesting. I've ordered some AD725 chips to try to do something like this.

Can you release the code please? I'd be interested to see how you got the video and sound throughput, plus game logic, in what is probably a tight amount of time.

1 Like

PetriH:
I'm finally ready to show you a project I've been working on and off for several months. It's a 8-bit retro video game based on Arduino and ATmega328P. The project is not yet finished -- I'm working on music playback which is still missing on the video.

Impressive...!

1 Like

How did you generate the 4FSC Clock Input at 14.318180 MHz? I see from the video you appear to have an oscillator on board. Is that all that is required, or is there more to it than that?

Would a simple 14.318 MHz crystal do? Or not? There seem to be plenty of those on eBay for like $10 for 100 of them.

1 Like

Thanks for the interest and kind words! They make the endless hours spent on the project worthwhile :slight_smile:

Nick, a crystal can't directly produce the needed frequency for the AD725. The chip needs a strong (buffered) clock signal. The AD725 datasheet has a reference schematic for the oscillator, or you can simply use a crystal oscillator like I did, although they are more expensive and maybe harder to find.

Sure I can release the source code. Just give me a few days to clean up the code and add the playback routine. The code is in a pretty messy state because of many optimizations and because I wanted to release the video as soon as possible.

Here's a few things worth mentioning to maybe get you an idea what's going on:

The hardest part was getting sprites to work. I had to painfully cycle count the scanline routine so that it fetches tile data and masks sprites on top of the scanline while outputting a pixel every 6 cycles. After pulling in tile indices from RAM and tile graphics from ROM, there is hardly any time left to do sprites...

Audio is double buffered, so that at the start of each scanline video generation routine pulls a byte from an audio buffer. An assembly routine mixes 4 audio channels and fills up another buffer during vblank. The buffers are swapped at vsync. Timing needs to be precise, otherwise pops and clicks can be heard.

Audio mix + game logic takes around 40 scanlines so it just fits to the vblank period. There is hardly any cycles left during screen refresh.

1 Like

I'm very much impressed.

Still its magical what they did in the 80’s with C64, only 1mhz (with the sid chip of course).

1 Like

PetriH:
There is hardly any cycles left during screen refresh.

Are there any left at all? Surely at some point it's better to just take over completely. The cycles spent responding to interrupts and saving/restoring state could be used for audio mixing, etc.

1 Like

RobvdVeer:
Still its magical what they did in the 80’s with C64, only 1mhz (with the sid chip of course).

C64 also had a dedicated graphics chip, the VIC-II, which has 8 hardware sprites, for instance. The 6510 CPU alone is hardly powerful enough to produce a NTSC/PAL video signal.

1 Like

fungus:

PetriH:
There is hardly any cycles left during screen refresh.

Are there any left at all? Surely at some point it's better to just take over completely. The cycles spent responding to interrupts and saving/restoring state could be used for audio mixing, etc.

Yes, I have thought about this idea, to output the video signal directly without interrupts. But then everything has to be timed to the video signal and it becomes a huge PITA. I agree that to get the last few cycles out of the MCU it's the only way...

1 Like

Its better to squeez all cycles out if during the active lines and run the game updates during inactive lines.
In my videoblaster it uses all of the CPU when outputing pixels on 200 lines and and nothing during 112 lines.

That equals to an AVR running at 6MHz and that is fast comparing to 6502s.

1 Like

janost:
That equals to an AVR running at 6MHz and that is fast comparing to 6502s.

Probably equivalent to a 12MHz 6502.

1 Like

PetriH:
Nick, a crystal can't directly produce the needed frequency for the AD725. The chip needs a strong (buffered) clock signal. The AD725 datasheet has a reference schematic for the oscillator, or you can simply use a crystal oscillator like I did, although they are more expensive and maybe harder to find.

Ah yes, I see it now. Actually I had an idea that a crystal could be used by simply hooking it up to an AVR chip (hopefully a small one like an Attiny45) and then setting the CLKOUT fuse. That simply take the clock an outputs it on one of the pins. You wouldn't need a sketch running, simply for the processor to be active. Although if I wanted PAL output (which we use in Australia) I could simply clock the Atmega328 at 17.734475 MHz from the crystal, set the CLKOUT fuse, and connect that directly to the AD725. That way the main processor runs a bit faster than usual and I get the clock signal as well.

BTW I found 100 x crystals from eBay for $10, so I thought that was nice and cheap. :slight_smile: We'll see if they actually work. :stuck_out_tongue:

I remember sprites from the C64, my GPascal compiler had various supporting functions for them. You set up the bitmap, location, etc. and the chip did the rest. It also reported collisions, which I will be interested to see how you handled.

I am impressed by the fact that even though you say you can handle 3 sprites per scan line, the images you posted usually look "busier" than that. It's quite impressive how there can be a lot happening on the screen, with the minimal hardware you used.

1 Like

Good idea! That should work. Just make sure that you can generate frequencies close enough to the needed vertical and horizontal syncs.

btw. we use PAL here in Finland too, but TVs and monitors these days seem to accept PAL and NTSC.

Collisions are handled by the game logic. I simply check if the bounding rectangle of the player collides with the bounding rectangle of the enemies.

Yes, there are a few tricks at play :slight_smile: There can only be three sprites per scanline but I do multiplexing so that I have different set of three sprites per scanline. There can be virtually any number of sprites on the screen vertically. Also, collectable items like gold and hearts are technically background tiles. I just animate the tiles, i.e. swap their tile pointers in RAM. I divide the screen vertically to regions and scan one region each frame. If the tile is a heart or gold I update its pointer. This way updating the tiles is quite efficient.

1 Like

The mistake you did, Nick, with your 17cycles and the 9th bit was not the hardware but not checking your software.

A 16MHz AVR can do a heck of a lot more than displaying 20char/per line VGA.

PetriH:
Yes, there are a few tricks at play :slight_smile: There can only be three sprites per scanline but I do multiplexing so that I have different set of three sprites per scanline. There can be virtually any number of sprites on the screen vertically. Also, collectable items like gold and hearts are technically background tiles. I just animate the tiles, i.e. swap their tile pointers in RAM. I divide the screen vertically to regions and scan one region each frame. If the tile is a heart or gold I update its pointer. This way updating the tiles is quite efficient.

Does it render directly from tiles? (ie. no charmap...)

janost:
The mistake you did, Nick, with your 17cycles and the 9th bit was not the hardware but not checking your software.

A 16MHz AVR can do a heck of a lot more than displaying 20char/per line VGA.

I would be interested to hear how, excluding external hardware, given that 20 characters is 160 pixels (at 8 pixels per character).

According to my calculations as described here you have 31.74 µS for each horizontal scan line (using 525 lines at 60 Hz).

(1/60) / 525 * 1e6 = 31.74 uS

Divide that by 800 pixels for one line including the pulse width (96 pixels) the back porch (48 pixels) and the front porch (16 pixels).

((1/60) / 525 * 1e9) / 800 = 39.68  nS

So that is 39.68 nS per pixel.

Now clearly we can't clock out a pixel every 39.68 nS with a CPU clocking at 62.5 nS per clock pulse.

You can't clock out a pixel in a single clock cycle (I don't think, unless you clock out a fixed value), so you need two clock cycles for the SPI hardware to do it. Now the closest you get then is:

 125 / 39.68 = 3.15

So, rounding up, four "VGA" pixels in that time. That is, each pixel is stretched 4 x horizontally.

So if you can demonstrate where my calculations are wrong, and you can send "a heck of a lot more" I would be pleased to hear it.