I have an update, plans have changed,...
Are you using the SPI pins for anything? If you use the SCLK pin as your WR line, you can get a hardware clock from it at half the crystal frequency by doing a dummy SPI transfer.
I'm not up to date with the mega, however, if the SPI is in the double row header then for now it is used as I have an LCD shield.
Krupski:
First of all, an LCD has it's own internal processing time. No matter how fast you pulse the enable pin, the LCD can only work as fast as it's able.
Secondly, the physical response time of an LCD screen is slow (in the millisecond range). What benefit do you get from accessing it faster?
Lastly, if you know your clock speed, you can make a delay circuit of about 1/2 the period of the clock then XOR the two together and get a 2X clock.
It all seems pointless though (unless I'm missing something).
You and others are quite right about the processing time,
I managed to optimise the writes up to 7 instructions per pixel. The UTFT overhead was writing the colour every pixel even when not needed. The code below ( operation for each pixel, other design differences too ) cleared the screen in 81ms down from 97ms, far quicker than the original 560ms.
//Pseudo code
wr low;
nop;
nop;
nop;
nop;
nop;
wr high;
The required timing is somewhere in the last nop instruction. As you can see this is no where near the speeds I'm after. I know it can be written to faster as it says its an animation display, which isn't possible at these speeds ( possible, but lacking in frame rate ).
After getting frustrated I went hunting and managed to find a more complete datasheet ( still lots missing though ). I went through and completely modified the initialisation routine, it now uses more power and high internal clock frequencies ( less screen flicker ). It enabled me to get the pixel code to 5 instructions:
//Pseudo code
wr low;
nop;
nop;
nop;
wr high;
This runs at 68ms per refresh, or ~14.7fps. Unfortunately this speed still sucks. I was about to give up when I decided to mess with the control algorithms. Then something strange happened:
In drawing a rectangle, it would normally be under filled by a significant portion if toggled too fast causing pixels to be lost. However it was suddenly drawing more pixels and causing corruption drawing in -ve regions ( unsigned rollover ).
So I simply halved the duration of my loop, this caused the rectangle to be full apart from a short few pixels, so I added 1 more iteration to the loop for a perfect fill.
Moment of relief, I found what I was looking for.
The few pixels unfilled turns out they where used as a synchronisation pattern for a high speed RAM update. This allowed an amazing output of 2 pixels per 4 instructions, with a huge result of 19.7ms per full screen clear to random colour, or ~50.76fps.
The pins are different depending on what they where doing previous, but is basically this pattern:
pin1 low:
pin2 high;
pin2 low;
pin1 high;
Which is as fast as the Arduino is going to do it at 16Mhz. The great thing about this is the initialisation routine now sets up the high speed updates to be clocked using the data control lines. That's what the synchronisation pattern is used for.
So now I need to create a dual pulse circuit to drive two lines at once. I'll be messing with the code for a bit, but I'd like to hear further ideas, I'll post when/if I have some success.
Just exciting as this primitive hardware acceleration may make a game possible even without an external circuit .