Thanks for the comments! Much appreciated.
I tried adding the write-through cache, which I had initially resisted as it was gobbling up a lot of RAM, but admittedly you might have it free.
My measured figures certainly showed a big performance improvement, particularly for large pixel-based operations (eg. filling a large box). It was approximately twice as fast (I'm not sure I would call it "huge" but maybe that's a matter of opinion).
For example, executing this line:
lcd.fillRect (20, 20, 50, 50, 1);
- Without cache: 4.686 seconds
- With cache: 2.586 seconds
Maybe I didn't do it as efficiently as possible. I understand caching, but since I was allowing for multiple displays the cache was a member variable of the lcd class, so accessing it was a couple of dereferences.
I would be more excited if the time went from 4 seconds to 0.4 seconds.
In terms of speed, the original is really quite fast for something like showing a bar graph of volume, temperature, etc.
For example, this test code here:
int sensorPin = A0; // select the input pin for the potentiometer
char buf [20];
void loop ()
{
// read the value from the sensor:
int sensorValue = analogRead(sensorPin);
// draw bar
lcd.clear (0, 16, sensorValue / 10, 23, 0xFF);
lcd.clear (sensorValue / 10 + 1, 16, 127, 23, 0);
lcd.gotoxy (0, 32);
lcd.clear (0, 32, 127, 39);
sprintf (buf, "Value: %i", sensorValue);
lcd.string (buf);
delay (100);
} // end of loop
This read (random noise) from A0 and displayed a bar using the (fast) clear routine. It also showed the value as a number. This ran so fast it flickered annoyingly, hence the 100 ms loop to slow it down a bit.
So I think a bit of careful screen layout, allowing for the more efficient use of boxes aligned on vertical 8-pixel boundaries, is what really speeds things up. Basically you reduce having to do 8 writes to the LCD screen down to one write, which is the big time-saver.
I didn't really emphasise it before, but with the I2C approach you could easily enough have multiple LCD screens, all connected to the same 2 pins on the Arduino. So for a project that needed to show a lot of data, that could be ideal. Of course, they are sharing the same data bus so throughput would be down a bit, but if the important thing to you is to show a lot of data, rather than updating it really quickly, that could be a nice solution.