Share your LCD optimization tips, please.

I'm looking to collate a list of tips and tricks to optimize LCD updating. Currently I'm getting ~25ms for a full screen rewrite on a 40x4 screen, which sucks when I'm also trying to run two stepper motors with precise timing to run a polargraph robot: GitHub - MarginallyClever/Makelangelo-firmware: CNC firmware for many different control boards and kinematic systems. Originally the brain of the Makelangelo art robot.

I'm currently using the Arduino default LiquidCrystal library v1.0.7 to run a sainsmart 40x4 panel and a U8glib to support a full graphic smart controller. Refresh time on both devices is sad face - simply blitting the 40x4 characters to the Smart is ~25ms blocking operation.

Another useful thread with tips is here: make faster looptime with LCD display - Displays - Arduino Forum

My current plan is to make a buffer which contains the text last sent to the display. All attempts to update the display compare against this buffer and set an isDirty flag only if the buffer contents change. When isDirty is true, rewrite the entire display at the end of the main loop. This way I avoid all calls to lcd.setCursor and lcd.clear.

surprise! A loop of lcd.write() is faster than a single lcd.print().

lcd.print(temp); // ~23944 us
for(int i=0;i<LCD_WIDTHLCD_HEIGHT;++i) lcd.write(temp); // ~22700 us*
So... apologies for the ramble. If you have other tips to speed up the display writing, please share!

Why not only re-write the characters which changed? Maybe only 1-2 characters change so it's worthwhile marching the cursor around to those positions instead of writing all 160 characters.

First question is how are you running a 40x4 display using the IDE bundled LiquidCrystal library?
That library only supports a single E line but I believe that two are needed to run a 40x4 display.

In terms of things that can be done, there are several things that can be done at the library level to speed things up quite a bit.
my hd44780 library does things smarter than the IDE bundled library to offer better performance.
But like the LiquidCrystal library hd44780 also does not support a 40x4 LCD.
The hd44780 library is available in the IDE library manager.
It has several different i/o classes. hd44780_pinIO is for direct Arduino pin control, hd4780_I2Cexp is for using i2c backpacks.

As MorganS said a big win can be achieved if you only send the characters that changed.
Using a shadow buffer and sending the entire full screen to screen if a single character has been changed, is not a very good way to handle things.

There are much better ways to handle things.

In terms of actual timing, here are few things to keep in mind.
home() and clear() are VERY slow. Avoid them if you can - in most cases they can be avoided.
If you going to send an entire screen, there is no reason to use clear()
Just set the cursor to 0,0 and write the full screen.
Set the cursor to 0,0 rather than use home() as home is really slow and
it takes the same amount of time to send the command to set the cursor position as to send a regular character.

--- bill

And if you must use a shadow buffer/ frame buffer, interlace your code to write one character at a time along with your other operations. You actually use the delay provided by the other operations to advantage in writing to the LCD.

You never need to update an alphanumeric display more often that five times per second.

aggrav8d:
Currently I'm getting ~25ms for a full screen rewrite on a 40x4 screen ...

Exactly what library and Arduino are you using?
The LiqudCrystal library on an Uno @ 16Mhz using IDE version 1.8.7 transfers a character to the display in 285us.
(It also doesn't support 4x40 displays)
You are transferring characters at 25ms/(4x40) = 25ms/160 chars = .15625 ms/char or ~156 us
That is quite a bit faster than what I've measured with LiquidCrystal and a typical 16Mhz AVR based Arduino.

Are you really using a 4x20 display?
If you were using a typical AVR based 16Mhz Arduino and using a 4x20 display, then your character transfer times would be around 312us which is much closer to the timing I'm measuring with the LiquidCrystal library.

As another data point, the hd44780 library hd44780_pinIO class transfers bytes/characters to the display in 92us
about triple the speed of LiquidCrystal.
The hd44780 library allows the Arduino to run in parallel with the LCD data/command processing whereas LiquidCrystal does not.
The difference is after sending data/command to the LCD, LiquidCrystal waits for the LCD to be ready for another data/command before it returns back to the sketch.
hd44780 is smarter. It never waits after sending data/commands to the LCD.
hd44780 only waits before sending data/command to the LCD if enough time has not elapsed to complete the previous data/command since it was sent to the LCD.
This frees the arduino to use that LCD execution time for other things rather than just spin in a busy/wait loop like pretty much all the other LCD libraries are doing, including LiquidCrystal.

--- bill

So you are using a micros() counter to do that then?

These delays in the LiquidCrystal library were a bugbear of mine for a while. As I understand it after reading through the code, the delays are needed to be in accordance with the datasheet.

Assuming you need to update a significant number of digits frequently and do a lot of other things besides in a timely manner:

  1. I've got a fork of Liquid crystal that works with my task management (IoAbstraction) library. It will let other tasks be scheduled during the longer ( 100ns or more ) delays. Along with only changing values that need changing significant performance improvements can be obtained.

  2. There's also another possibility but I've never tried it, that's to start multiple loops, as I understand it those are run during yields. But just like task manager, you'd have to make sure only one loop updates the display. It's also less clear than in task manager how those loops interact. I'm not sure what hardware supports this or even how stable it is as I didn't even consider it viable.

Should option 1 be of interest the fork of the library is linked below:

However, this is only useful if you're prepared to use the IoAbstraction library to schedule tasks.

Paul__B:
So you are using a micros() counter to do that then?

There are two ways to handle the needed instruction execution times on the LCD to ensure the documented timing in the datasheet is honored.

Poll the LCD BUSY status or ensure that you don't send a future command until the previous LCD instruction has completed.
Using BUSY status on Arduino will be usually always be slower because of the overhead to switch things around to read the busy status. When using direct pin control of the LCD, the poor code implementation of the digital i/o code. i.e. pinMode(), digitalWrite(), digitalRead() is the issue.
The overhead of those routines on a 16Mhz AVR is so much that just flipping the 4 pins used for the data lines from output to input and reading the busy pin one time is longer than the instruction execution of the LCD. That is why it makes no sense to use BUSY.

Using LCD BUSY status could be used in other environments like Teensy 2.x which is the same 16 Mhz AVR but things like digitalWrite() are 20-30 times faster since the Teensy digital i/o code does not use the IDE supplied AVR code.

Some LCD code like newLiquidCrystal still does the same post instruction delays but attempts to shorten the delays based on assumptions about how long the other code in the library and the sketch takes.
It also steps outside of the standard Arduino digital i/o API functions uses indirect raw port i/o in some cases.
Indirect raw port i/o, while not quite as fast as direct port i/o, is fairly portable and is much much faster than the digital i/o functions like digitalWrite() in the AVR environment.
However, not all cores implement the less common digital i/o api functions needed for indirect port i/o.
So while these techniques do speed things up, it has issues like it breaks in certain environments or in some cases won't compile in certain environments.

The hd44780 library uses the Arduino APIs to be fully portable across all h/w platforms but is simply smarter about how it handles the needed LCD instruction timing delays.
All the other libraries I've seen including LiquidCrystal use a blind delay for the instruction timing immediately after sending the instruction. This adds a fixed blind delay for the full LCD instruction time for every LCD instruction sent to the LCD.
Instructions like home() and clear() are 2ms.
hd4480 does not wait at all after sending an instruction. It looks at elapsed time based on micros() before sending an instruction to see if enough time has elapsed for the previously sent instruction before sending an instruction.
This allows the Arduino to run in parallel with the LCD rather than doing a blind full busy wait loop after sending the instruction to the LCD.
This allows hd44780 to use overhead of other things like digitalWrite() pins set up time and the time it takes to actually send the current instruction to the LCD h/w (which can vary depending on the Arduino board, i/o interface, or even IDE versions) or user sketch code time between calls to the library to be part of the needed LCD instruction time delay.

The hd44780 code will only do a busy loop wait if enough time has not elapsed since the previous instruction and even then the wait will only be for the fractional amount of needed time for the previous instruction to complete.
When using a 16 Mhz AVR the amount of time for an instruction to send a simple character to the display has elapsed by the time the next character is sent to the library so no added delay is needed.

Now if you are using a Teensy 2.x AVR board, which has much faster digitalWrite() routines or a faster processor like an ESP8266, there might be some amount of busy wait delay, but only when necessary and only for the amount of time necessary which based on the actual elapsed time since the previous instruction was sent to the LCD.

--- bill

davetcc:
However, this is only useful if you're prepared to use the IoAbstraction library to schedule tasks.

Yep.
Sketch code must be restructured to do things through the task manager otherwise it will make things worse given the additional overhead.

It doesn't seem to break even until you are going to write more than about 4-6 characters at time to the LCD.

Here is the timing of writing to the LCD.
Times are in microseconds for pushing a single byte/command to the LCD when doing back to back writes.
Measurements were done on a 16 Mhz AVR Arduino board using IDE version 1.8.7

Ardino pin control of LCD
  92us hd44780_pinIO 
 285us LiquidCrystal
 340us LiquidCrystalIO 

PCF8574 over I2C
 198us hd44780_I2Cexp  (400khz clock)
 549us hd44780_I2Cexp  (100khz clock)
1230us LiquidCrystalIO (400khz clock)
3016us LiquidCrystalIO (100khz clock)

There is a big hit on i2c vs hd44780. Without digging into it, I'm guessing that most of this is due to the smarter way of handling LCD execution delay timing used by hd44780.

--- bill

bperrybap:
hd4480 does not wait at all after sending an instruction. It looks at elapsed time based on micros() before sending an instruction to see if enough time has elapsed for the previously sent instruction before sending an instruction.

Figured as much. Only way to do it as far as I can see. :sunglasses:

bperrybap:
1230us LiquidCrystalIO (400khz clock)
3016us LiquidCrystalIO (100khz clock)[/tt]

There is a big hit on i2c vs hd44780. Without digging into it, I'm guessing that most of this is due to the smarter way of handling LCD execution delay timing used by hd44780.

Agreed, at the moment there is a larger delay, that's caused because between each write, it's still executing the 100us delay, even though it's maybe not needed. In most cases where task-manager is used, it doesn't matter much because tasks are still running during the delays. I'll take a look at why it's so much slower though.

One thing I've been meaning to do is to provide an override in the constructor that indicates the relative speed of the IO device, and also to allow it to be configured with a port instead of pins. This would be especially useful on an i2c device. Further, in another forum page here someone has LiquidCrystalIO running in 8 bit mode on an MCP23017; which supports faster i2c speeds.

https://forum.arduino.cc/index.php?topic=569141.15

In a future version of IoAbstraction I want to look at async interrupt based i2c on devices that support it, so such updates can be queued to a certain extent in a ring buffer, maybe even on a trigger from the device if possible. The only issue is handling the atomicity around that where there's more than one core on a wide range of boards. But I've not thought this fully through yet. At the moment I'm so busy trying to finish my menu with inbuilt IoT capabilities that I can't prioritise it.

davetcc:
In a future version of IoAbstraction I want to look at async interrupt based i2c on devices that support it, ...

At the current time, while the Wire library does use buffers, the Wire API is blocking since the actual transmission and reception of data is synchronous and blocking.
i.e. Wire.endTransmission() and Wire.requestFrom() block for all the data bytes being transferred.

--- bill

Yep that’s why it’s in the long grass. I would abandon wire but only where suitable hardware was available. It would come with compromises. I’ve already looked over twi utilis to see how the i2c hardware works on AVR. It would only be available where hardware permitted and on a few boards.

davetcc:
Yep that’s why it’s in the long grass. I would abandon wire but only where suitable hardware was available. It would come with compromises. I’ve already looked over twi utilis to see how the i2c hardware works on AVR. It would only be available where hardware permitted and on a few boards.

I've made headway on a completely async driver, although it is still using wire at the moment, I now have a completely re-written and entirely unit tested version of the LiquidCrystalIO on the feature branch listed below for 4 bit mode. All other modes will be trivial, including a port version for 23017 (that would have enough spare ports for a switch and rotary encoder via switches). There is a test for it and it performs much better than previous versions. The rendering functions are completely detached from the code that writes to the display. It won't suit everyone, but for users of task manager, this will be a very optimal and pluggable solution when finished.

The example that does all the timing is called HelloWorldCountIOA.

Branches:
https://github.com/davetcc/LiquidCrystalIO/tree/async-feature
https://github.com/davetcc/IoAbstraction/tree/async-feature

It can now be easily optimised for every case, I may write an AVR and SAMD hardware variant of IoAbstraction that is completely async in the medium term.

Example timings on MEGA2560 with default wire speed:

Avg Idle processing per iteration 34us. This is mainly caused by the blocking wire overhead. Also if there's nothing to do it bails out of idle processing quickly.

Move 24us <-- move cursor and print two characters
Move2 60us <-- move cursor and print 6 characters.
PrInt 240us <-- print a 4 or 5 digit integer value

The average overhead on each idle iteration is about 30 micros. It scales well to larger systems, as the number of times the idle function is called drops when task manager is busy. Also, idle only attempts to do one thing each turn.

It's far from ready for the wild yet, and needs finishing off, this is just the first pass at getting it right. As this is likely to be the most used driver with the tcMenu IoT solution, I want to get it as good as possible.

Avg Idle processing per iteration 34us. This is mainly caused by the blocking wire overhead. Also if there's nothing to do it bails out of idle processing quickly.

Forgot to mention that because of the buffering, only genuine changes (I.E. char changes value) will actually be rendered, and only genuine cursor moves (I.E. during hardware updates there is a need to move position) are actually performed.

By the by, while we are talking of "optimisation", an unrelated hardware matter.

A blatant cock-up in the early use of the HD44780 chip has been mindlessly followed to the present with this chip and its derivatives, demonstrating how little thought engineers often put into designs! :astonished: One part of the original datasheet showed a "test" circuit in which a 10k potentiometer was used to set the voltage source for the LCD multiplexing voltage ladder; the point brought out as "Vo" or pin 3 on most of the present display modules. The ladder provided for the chip is a set of five 2k2 resistors appearing on these boards - predictably - as "R1" to "R5", tied at the top end to Vcc - 5 V - and totalling 11k. (R6 incidentally, is the clock oscillator resistor for the chip, while R8 and/ or R9 determine the backlight current.)

In most cases, the appropriate contrast voltage is about 4.5 to 4.3 V, so the voltage on Vo needs to be about 0.3 to 0.5 V presuming an accurate 5 V supply. A (10k) potentiometer connected between ground and Vcc is an quite inappropriate way to set this voltage as it will be set to very near ground and most of it will simply be in parallel with the built-in 11k ladder, unnecessarily wasting 500 µA which is about the same current as the chip itself draws.

So if the connection of the potentiometer to Vcc is removed, the potentiometer/ now as a variable resistor setting will need to be twice the resistance, which is to say the contrast setting will be twice as sensitive. And the actual value will be less than 1k so that a 1k trimpot would spread the usable contrast setting over its whole range.

I therefore strongly advise as an automatic practice, not connecting the potentiometer to Vcc; connect that end back to the wiper instead, or on the "backpack" modules, if it is possibly to cut that connection to Vcc without disturbing any other Vcc connections, do so.

Paul__B:
I therefore strongly advise as an automatic practice, not connecting the potentiometer to Vcc; connect that end back to the wiper instead, or on the "backpack" modules, if it is possibly to cut that connection to Vcc without disturbing any other Vcc connections, do so.

I've been using PWM contrast for a while on all my projects, with an RC circuit to control the voltage that appears on Vo. I have noticed that nearly all displays I've tried have had very low contrast values of about 9/255; which makes perfect sense when put into this context.

Occasionally we see a posting or reference to an "instructable" suggesting the use of a PWM pin without an RC filter to smooth the contrast voltage. Now while this might intermittently work - especially if the PWM value is set to zero, it is certainly no surprise when it - almost always - completely fails. It might partially work and give an intermittently flickering display as the PWM frequency and display strobe interfere.

For this to work, the resistor value must be less than 1k, perhaps 470 or 220 Ohms. And you will require an electrolytic capacitor of several µF to match.

It does however beg the question of why you would need such control of the display contrast which is really a "set and forget" optimisation.

I rarely use shields so I’m assembling the display from components. The PWM makes it a bit easier to build. I generally use TcMenu and put the contrast in the settings menu. As TcMenu supports Ethernet control I can adjust it from a pc / Mac to the best setting.

I’ve had great success with it, but I always use a suitable filter.

However, I’m sure there are many who try it without understanding PWM and it doesn’t work right.

bperrybap:
...my hd44780 library does things smarter than the IDE bundled library to offer better performance.
But like the LiquidCrystal library hd44780 also does not support a 40x4 LCD...

The advantages of simplification may be obtained via alternative backpack hardware/firmware here:

At $12.00 (US), it's worth looking into, instead of using two SPI/I2C backpacks, and hacking the library code to accommodate the 40x04 character display units.