Toorum's Quest II - ATmega328P based retro video game with 256 color graphics

My work was aimed at making the screen work for my VIC-20/C64 emulator but turned out to be a better TVout solution :slight_smile:
Still going to use it for my emulator even if it will be released as a library.

I have looked into overclocking the ATmega8 to 32.2MHz for generating NTSC color straight without addon circuits.
It is possible with 32MIPS.

1 Like

You don't need 32 Mhz for that. Linus Ã…kesson did it with 17.73447 MHz :slight_smile:

Anyway, getting control of colors is difficult with bitbanging because it's impossible to finetune the necessary frequencies. One of the reason why I decided to use the AD725 chip.

Overclocking is an interesting subject because it makes it possible to have higher resolutions.

1 Like

Very interesting. I've ordered some AD725 chips to try to do something like this.

Can you release the code please? I'd be interested to see how you got the video and sound throughput, plus game logic, in what is probably a tight amount of time.

1 Like

PetriH:
I'm finally ready to show you a project I've been working on and off for several months. It's a 8-bit retro video game based on Arduino and ATmega328P. The project is not yet finished -- I'm working on music playback which is still missing on the video.

Impressive...!

1 Like

How did you generate the 4FSC Clock Input at 14.318180 MHz? I see from the video you appear to have an oscillator on board. Is that all that is required, or is there more to it than that?

Would a simple 14.318 MHz crystal do? Or not? There seem to be plenty of those on eBay for like $10 for 100 of them.

1 Like

Thanks for the interest and kind words! They make the endless hours spent on the project worthwhile :slight_smile:

Nick, a crystal can't directly produce the needed frequency for the AD725. The chip needs a strong (buffered) clock signal. The AD725 datasheet has a reference schematic for the oscillator, or you can simply use a crystal oscillator like I did, although they are more expensive and maybe harder to find.

Sure I can release the source code. Just give me a few days to clean up the code and add the playback routine. The code is in a pretty messy state because of many optimizations and because I wanted to release the video as soon as possible.

Here's a few things worth mentioning to maybe get you an idea what's going on:

The hardest part was getting sprites to work. I had to painfully cycle count the scanline routine so that it fetches tile data and masks sprites on top of the scanline while outputting a pixel every 6 cycles. After pulling in tile indices from RAM and tile graphics from ROM, there is hardly any time left to do sprites...

Audio is double buffered, so that at the start of each scanline video generation routine pulls a byte from an audio buffer. An assembly routine mixes 4 audio channels and fills up another buffer during vblank. The buffers are swapped at vsync. Timing needs to be precise, otherwise pops and clicks can be heard.

Audio mix + game logic takes around 40 scanlines so it just fits to the vblank period. There is hardly any cycles left during screen refresh.

1 Like

I'm very much impressed.

Still its magical what they did in the 80’s with C64, only 1mhz (with the sid chip of course).

1 Like

PetriH:
There is hardly any cycles left during screen refresh.

Are there any left at all? Surely at some point it's better to just take over completely. The cycles spent responding to interrupts and saving/restoring state could be used for audio mixing, etc.

1 Like

RobvdVeer:
Still its magical what they did in the 80’s with C64, only 1mhz (with the sid chip of course).

C64 also had a dedicated graphics chip, the VIC-II, which has 8 hardware sprites, for instance. The 6510 CPU alone is hardly powerful enough to produce a NTSC/PAL video signal.

1 Like

fungus:

PetriH:
There is hardly any cycles left during screen refresh.

Are there any left at all? Surely at some point it's better to just take over completely. The cycles spent responding to interrupts and saving/restoring state could be used for audio mixing, etc.

Yes, I have thought about this idea, to output the video signal directly without interrupts. But then everything has to be timed to the video signal and it becomes a huge PITA. I agree that to get the last few cycles out of the MCU it's the only way...

1 Like

Its better to squeez all cycles out if during the active lines and run the game updates during inactive lines.
In my videoblaster it uses all of the CPU when outputing pixels on 200 lines and and nothing during 112 lines.

That equals to an AVR running at 6MHz and that is fast comparing to 6502s.

1 Like

janost:
That equals to an AVR running at 6MHz and that is fast comparing to 6502s.

Probably equivalent to a 12MHz 6502.

1 Like

PetriH:
Nick, a crystal can't directly produce the needed frequency for the AD725. The chip needs a strong (buffered) clock signal. The AD725 datasheet has a reference schematic for the oscillator, or you can simply use a crystal oscillator like I did, although they are more expensive and maybe harder to find.

Ah yes, I see it now. Actually I had an idea that a crystal could be used by simply hooking it up to an AVR chip (hopefully a small one like an Attiny45) and then setting the CLKOUT fuse. That simply take the clock an outputs it on one of the pins. You wouldn't need a sketch running, simply for the processor to be active. Although if I wanted PAL output (which we use in Australia) I could simply clock the Atmega328 at 17.734475 MHz from the crystal, set the CLKOUT fuse, and connect that directly to the AD725. That way the main processor runs a bit faster than usual and I get the clock signal as well.

BTW I found 100 x crystals from eBay for $10, so I thought that was nice and cheap. :slight_smile: We'll see if they actually work. :stuck_out_tongue:

I remember sprites from the C64, my GPascal compiler had various supporting functions for them. You set up the bitmap, location, etc. and the chip did the rest. It also reported collisions, which I will be interested to see how you handled.

I am impressed by the fact that even though you say you can handle 3 sprites per scan line, the images you posted usually look "busier" than that. It's quite impressive how there can be a lot happening on the screen, with the minimal hardware you used.

1 Like

Good idea! That should work. Just make sure that you can generate frequencies close enough to the needed vertical and horizontal syncs.

btw. we use PAL here in Finland too, but TVs and monitors these days seem to accept PAL and NTSC.

Collisions are handled by the game logic. I simply check if the bounding rectangle of the player collides with the bounding rectangle of the enemies.

Yes, there are a few tricks at play :slight_smile: There can only be three sprites per scanline but I do multiplexing so that I have different set of three sprites per scanline. There can be virtually any number of sprites on the screen vertically. Also, collectable items like gold and hearts are technically background tiles. I just animate the tiles, i.e. swap their tile pointers in RAM. I divide the screen vertically to regions and scan one region each frame. If the tile is a heart or gold I update its pointer. This way updating the tiles is quite efficient.

1 Like

The mistake you did, Nick, with your 17cycles and the 9th bit was not the hardware but not checking your software.

A 16MHz AVR can do a heck of a lot more than displaying 20char/per line VGA.

PetriH:
Yes, there are a few tricks at play :slight_smile: There can only be three sprites per scanline but I do multiplexing so that I have different set of three sprites per scanline. There can be virtually any number of sprites on the screen vertically. Also, collectable items like gold and hearts are technically background tiles. I just animate the tiles, i.e. swap their tile pointers in RAM. I divide the screen vertically to regions and scan one region each frame. If the tile is a heart or gold I update its pointer. This way updating the tiles is quite efficient.

Does it render directly from tiles? (ie. no charmap...)

janost:
The mistake you did, Nick, with your 17cycles and the 9th bit was not the hardware but not checking your software.

A 16MHz AVR can do a heck of a lot more than displaying 20char/per line VGA.

I would be interested to hear how, excluding external hardware, given that 20 characters is 160 pixels (at 8 pixels per character).

According to my calculations as described here you have 31.74 µS for each horizontal scan line (using 525 lines at 60 Hz).

(1/60) / 525 * 1e6 = 31.74 uS

Divide that by 800 pixels for one line including the pulse width (96 pixels) the back porch (48 pixels) and the front porch (16 pixels).

((1/60) / 525 * 1e9) / 800 = 39.68  nS

So that is 39.68 nS per pixel.

Now clearly we can't clock out a pixel every 39.68 nS with a CPU clocking at 62.5 nS per clock pulse.

You can't clock out a pixel in a single clock cycle (I don't think, unless you clock out a fixed value), so you need two clock cycles for the SPI hardware to do it. Now the closest you get then is:

 125 / 39.68 = 3.15

So, rounding up, four "VGA" pixels in that time. That is, each pixel is stretched 4 x horizontally.

So if you can demonstrate where my calculations are wrong, and you can send "a heck of a lot more" I would be pleased to hear it.

PetriH:
The hardest part was getting sprites to work. I had to painfully cycle count the scanline routine so that it fetches tile data and masks sprites on top of the scanline while outputting a pixel every 6 cycles.

I've been trying to guess your timing figures, and from the quoted 104 horizontal resolution, and the NTSC standard of 51.5 µS for the visible portion of the line I am estimating that you have 8 clock cycles per pixel:

 8 * 62.5 * 104 = 52000 nS

Does that sound right? It can't, of course, because you said a pixel every 6 clock cycles, so where did I go wrong?

I thought I'd do quick search on the rgb / ntsc chip...

http://belogic.com/uzebox/index.asp

It seems there's quite a lot going on!

fungus:
Does it render directly from tiles? (ie. no charmap...)

I have a 13x10 buffer of tile pointers in RAM. Each entry in the buffer points to the start of a tile in ROM. Storing addresses rather than tile indices is much faster..