Pages: [1] 2 3 4   Go Down
Author Topic: Cosa/Boosting LCD 1602 performance (7X SR4W, 6.5X 4-bit parallel, 1.7-2.3X I2C)  (Read 5238 times)
0 Members and 1 Guest are viewing this topic.
Sweden
Offline Offline
Sr. Member
****
Karma: 11
Posts: 452
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Typical LCD devices as the HD44780 and displays such as 1602/1604/2004 are connected to the Arduino using a parallel interface (4 or 8-bit). This takes a lot of pins. To allow more pins left to an application the LCD can be connected through an I2C IO expander. The major drawback being low update speed and the IO blocking in the I2C implementation in Wiring.

Looking into the implementation (LiquidCrystal_I2C) I can across a bottle-neck. When sending a command/data byte to the display the library issues four I2C transmissions. Each transmission is the I2C address of the IO expander and the byte to be written to the port. The reason it is four is because 1) 4-bit access is used to the LCD, 2) the need to toggle the LCD enable pin.

A simple optimization is to merge this to a single I2C transmission with the I2C address followed by the four bytes to be written (in sequence) to the port. This reduces the communication with 3 address bytes and the start-stop-ack-arbitration time. This is possible as the required delay between LCD commands (37 us) is much shorter than the time between I2C transmissions or between bytes in a transmission (min 200 us resp 100 us).

Read more about this on http://cosa-arduino.blogspot.se/.

There is also an optimized version of 4-bit parallel access that achieves 541 fps. The improvements are 80+ % compared to the New LiquidCrystal library.

Cheers!
« Last Edit: July 31, 2013, 10:33:39 am by kowalski » Logged

Dallas, TX USA
Offline Offline
Faraday Member
**
Karma: 67
Posts: 2702
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Yep. I did the same thing a while back.
I've got an update for the new liquidcrystal library that does this.
It's part of an update that also auto detects and supports the MCP23008 chip.
I saw the effective byte transfer speeds from sketch through Print class to LCD
went from 957us to 544us or move from around 31 to 54 FPS.

If I understand your 4 bit fps number correctly, I would have expected much better than only 50% improvement
for 4 bit mode since the optimized 3 wire shift register code is already doing better than than the 444 fps number
you quoted.

Can you run LCDiSpeed sketch (included in the examples directory of the New LiquidCrystal library) on your library?
It reports all the byte transfer numbers and FPS rates normalized so that the display size is not a factor.
I'm curious if you are seeing the same/similar numbers for i2c and what numbers you get for
the optimized parallel mode.

--- bill
« Last Edit: July 04, 2013, 06:56:48 pm by bperrybap » Logged

Western New York, USA
Offline Offline
Faraday Member
**
Karma: 36
Posts: 4326
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

You could look into using the MCP23017 I/O expander.  Use one 8-bit port for an 8-bit LCD interface and the other to implement the RS, RW, and E.  This makes the programming a lot simpler and you can implement more than one E pin to run several LCDs if you wish.  It would also get rid of half of your overhead.

Could one of you explain why it is so important to speed up the byte transfer speed when you then have to wait a much longer time for the device to be ready for the next piece of information?

Couldn't you simply subtract the extra time the I2C implementation takes from the delay times that you insert between the bytes that you send to the LCD controller?  That would take care of half the problem with the single-port I2C devices and all of the problem with the two-port devices.


Don
Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 11
Posts: 452
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Hi Bill thanks for the feedback. Your questions got me thinking if there was more to be done and if I have missed something.

For the I2C IO expander the exec-delay (32 us) could be removed as there is plenty of time between port writes. This then gave approx. 53 fps.

The 4-bit direct port could be optimized further to 523 fps. There was also a bug as I had been a bit sloppy and forgot that the port update should be synchronized (interrupts turned off). There are a few further optimization before going to assembly.

Cheers!
Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 11
Posts: 452
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Don, that was an interesting I/O expander. The only problem I see with this device is the need to provide a register address in the protocol. The GPIO ports may be organized so that a 16-bit port write may be reduced to a four byte transmission (I2C address, register address, data1, data2). And the device requires multiple transmissions to flip any port bit, if I understand the spec correctly. This actually makes this expander slower than the PCF8574 even though it is 16-bit. But the SPI version could compensate for this by the higher serial bit-rate (4 Mhz, which is 40X compared to 100 Khz IC2).

Quote
Could one of you explain why it is so important to speed up the byte transfer speed when you then have to wait a much longer time for the device to be ready for the next piece of information?

Couldn't you simply subtract the extra time the I2C implementation takes from the delay times that you insert between the bytes that you send to the LCD controller?
This is more or less one of the optimizations I did lately where I simply removed the exec-delay (37 us) for the I2C IO expander adapter. This is not needed as the I2C serialization will give at least a 200+ us delay between two port updates (in a single transmission) which gives the LCD controller plenty of time.

The enable pulse should be at least 230 ns and as the time between two updates by two adjacent bytes in the same transmission is at least 100 us there is no need for an additional delay. 

Cheers!
Logged

Dallas, TX USA
Offline Offline
Faraday Member
**
Karma: 67
Posts: 2702
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

You could look into using the MCP23017 I/O expander.  Use one 8-bit port for an 8-bit LCD interface and the other to implement the RS, RW, and E.  This makes the programming a lot simpler and you can implement more than one E pin to run several LCDs if you wish.  It would also get rid of half of your overhead.

While it may be a bit simpler, I don't think it really makes programming a lot simpler.
Running in 4 bit mode, and having to share the output port between the 4 data lines
and the control lines and baclight control is not that difficult.
The MCP23017/MCP23008 will have more i2c overhead than the PCF8574.
This is because those chips are more flexible.
Because of that flexibility, they have control/configuration registers.
Because of these registers,
the first byte transfered always to the chip goes to the address pointer to select which register
you really wanted to write.
This address register normally increments after every write.
You can put the MCP chips into BYTE mode which disables this increment.
This allows writing to the OLAT register with back to back writes to the chip
the way the PCF8574 works.
However, even if you put the chip into BYTE mode, you still have to send an
extra byte to the chip each transmission to initially select the OLAT register.
So I don't think using a MCP23017 in 8 bit mode would be faster than using a PCF8574
in 4 bit mode.
ex:
4 bit PCF8574 to transfer a byte/command to LCD:
- start
- data byte: 4 bit LCD data/control E high
- date byte: 4 bit LCD data/control E low
- data byte: 4 bit LCD data/control E high
- data byte: 4 bit LCD data/control E low
- end

4  bit mode MCP23008 in BYTE mode:
- start
- data byte: 0x0A to point to OLAT
- data byte: 4 bit LCD data/control E high
- date byte: 4 bit LCD data/control E low
- data byte: 4 bit LCD data/control E high
- data byte: 4 bit LCD data/control E low
- end

8 bit mode PCF23017 in bank = 1, BYTE mode:
- start
- data byte: 0x0A to point to OLATA
- data byte: LCD data byte
- end
- start
- data byte: 0x1A to point to OLATB
- date byte: LCD control byte with E high
- date byte: LCD control byte with E low
-end


Oddly enough, while counter intuitive,
it looks like 4 bit mode on a MCP23008 will be faster
than 8 bit mode on the MCP23017. But both MCP chips will be slower
than the PCF8574 because there is simply more i2c overhead with those
chips.

One thing that could really speed things up on the MCP chips would be to bump
the speed of the i2c bus up since those chips can handle 1.7Mhz clock rates vs
the standard/default 100kHz.

Quote
Could one of you explain why it is so important to speed up the byte transfer speed when you then have to wait a much longer time for the device to be ready for the next piece of information?
Couldn't you simply subtract the extra time the I2C implementation takes from the delay times that you insert between the bytes that you send to the LCD controller?  That would take care of half the problem with the single-port I2C devices and all of the problem with the two-port devices.
In fm's NewLiquidCrystal library, the byte/command delays are inside the interface layers themselves and do
take into consideration the interface transfer time as well as the actual time
of the LCD library code as well.
So for example on i2c there is no added delay between LCD data byte transfers.

Because of the optimizations already in place, the only area left to optimize for LCD data transfers
for i2c,  is the time to get the control and data information to the LCD.

The times I quoted are not actual byte transfer times but an averaged and normalized
effective byte transfer time based on updating the full display. This time includes
all the overhead to get from the sketch through the LCD code, over the i2c bus and
to the LCD.

Eliminating extra i2c starts/stops and extra byte transfers is actually pretty significant
as you can see from the 16x2 frame rate numbers.
It goes from 31 to 54 FPS on the PCF8574.
Just having to do the send for the extra byte for the address register on the MCP23008 drops the 54FPS to 45FPS.

--- bill



« Last Edit: July 05, 2013, 11:42:12 am by bperrybap » Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 11
Posts: 452
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

In the latest 1602 LCD I2C optimization and tuning the bus clock was changed from standard 100khz to 400khz. The device driver performs correctly on a MJKDZ module connected to an Arduino Nano/Iteadstudio Nano IO Shield.

The frame rate was pushed to 133 fps giving 4.2X performance improvement compared to the original LiquidCrystal library (31 fps). The improvement compared to 100khz is "only" 2.5X (from 53 fps) even if the bus clock frequency is increased with 4X.

More details on the blog http://cosa-arduino.blogspot.se/2013/07/object-oriented-lcd-management.html

Cheers!
« Last Edit: July 06, 2013, 04:30:23 pm by kowalski » Logged

Dallas, TX USA
Offline Offline
Faraday Member
**
Karma: 67
Posts: 2702
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Interesting. I thought about trying that to see if worked.
but since it is way beyond the specs on the datasheets I've seen, I never actually tried it.
I wonder how stable it is, particularly with multiple devices on bus.


--- bill
Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 11
Posts: 452
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Interesting. I thought about trying that to see if worked.
but since it is way beyond the specs on the datasheets I've seen, I never actually tried it.
I wonder how stable it is, particularly with multiple devices on bus.
I understand your concern. I checked the PCF8574 spec and it seems like most of them sold can handle 400khz. As I ran the initial test on an Arduino Nano/Nano IO Board for a few hours without any problems I though it would be interesting to report.

Didn't think about additional device so I have now setup an Arduino Mega (cheap Chinese clone/Funduino with bad contacts ;-) to an I2C bus on a breadboard with a RTC DS1307, a Digital Compass HMC5883L and the IO expander to the LCD. The wiring is total length of 20 cm and 3 sections. Lots of pF and bad contacts. 

Anyway it is running now and done so for the latest 45 minutes.

Get back to you later when I have stressed it some more,

Mikael
Logged

Dallas, TX USA
Offline Offline
Faraday Member
**
Karma: 67
Posts: 2702
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

All the datasheets that I've been able to find (TI, Philips, & NXP) show 100khz as max
which is probably for the max for the lower voltages.
I'm curious which manufactures were you able to find that show 400khz?

--- bill
Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 11
Posts: 452
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

All the datasheets that I've been able to find (TI, Philips, & NXP) show 100khz as max
which is probably for the max for the lower voltages.
I'm curious which manufactures were you able to find that show 400khz?
Hi again. Below are the data sheets I read. Philips/NXP. It is the PCA version that is 400 khz spec. Cannot really read the text on the MJKDZ module I am using so I would not say this works for all I2C IO expanders. Maybe I was just lucky.

Anyway the test is still going strong ;-)

Hum, there seems to be a 1Mhz version as well.

Mikael

http://ics.nxp.com/products/gpio.expanders/i2c/
http://www.nxp.com/documents/brochure/NXP_Journal_2012_0918.pdf
http://www.nxp.com/documents/data_sheet/PCA8574_PCA8574A.pdf
Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 11
Posts: 452
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Changed the breadboard wires on the test setup (Arduino Mega with three I2C devices on the bus as above) to extra long (20+ cm) and continued the test run (CosaLCDspeed.ino). Still no hick-ups after nearly four hours so I think I can say it is stable at 400 kHz for at least hobby/education setups.

Would be fun to increase the I2C bus frequency until it breaks. Saving that for a rainy day ;-)

Cheers!

 
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 217
Posts: 13734
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Good work kowalski, an interesting thread.

Although I think no one will use 130 frames per second on a character based display.
I would prefer to think of this optimization as minimizing the average time per char

31 fps == 33 millis/char
53 fps == 19 millis/char
130 fps == 7.7 millis/char

update: way off math eliminated (see below)

Faster times means more time to make measurements and to do math.
Note the recent divmod10() optimization discussion which decreased the time to print numbers substantially - http://forum.arduino.cc/index.php?topic=167414.0 - As the print.cpp class is the base class for lcd.print combining these efforts could be very interesting.

For graphics displays the increased performance is evident.
« Last Edit: July 09, 2013, 01:28:03 pm by robtillaart » Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Dallas, TX USA
Offline Offline
Faraday Member
**
Karma: 67
Posts: 2702
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

31 fps == 33 millis/char
53 fps == 19 millis/char
130 fps == 7.7 millis/char
Your math is way off here.
The times you calculated are not per character they are per
full frame/display of characters.
Example:
53fps is much faster than 19 ms/char.
53fps is 19 ms per "full frame" of characters which on a 16x2 display
is 32 characters and 2 set address commands (which take the same time as a data write).
So there are 34 bytes being transfered to a 16x2 display every frame.

The optimization of not disconnecting from the i2c slave between LCD nibble updates
changes the per byte/character transfer time from around 957us down to around 543us on a PCF8574.
Bumping the clock above the default 100khz rate to 400khz chops that down again by about 2.5x.

The LCDiSpeed sketch included as an example in fm's library is very useful
for getting all this kind of timing information in a real operating environment.
It also calculates and displays timing information that can be compared across any sized display.
The sketch displays:

 
Code:
* - Single byte transfer speed (ByteXfer)
 * This is the time it takes for a single character to be sent from
 * the sketch to the LCD display.
 *
 * - Frame/Sec (FPS)
 * This is the number of times the full display can be updated
 * in one second.
 *    
 * - Frame Time (Ftime)
 * This is the amount of time it takes to update the full LCD display.
 *
 * The sketch will also report "independent" FPS and Ftime values.
 * These are timing values that are independent of the size of the LCD under test.
 * Currently they represent the timing for a 16x2 LCD
 * The value of always having numbers for a 16x2 display
 * is that these numbers can be compared to each other since they are
 * independent of the size of the actual LCD display that is running the test.
 * i.e. you also get 16x2 timing information even if the display is not 16x2
 *
 * All times & rates are measured and calculated from what a sketch "sees"
 * using the LiquidCrystal API.
 * It includes any/all s/w overhead including the time to go through the
 * Arduino Print class and LCD library.
 * The actual low level hardware times are obviously lower.





--- bill
« Last Edit: July 08, 2013, 06:33:39 pm by bperrybap » Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 11
Posts: 452
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Good work kowalski, an interesting thread.

Although I think no one will use 130 frames per second on a character based display.
I would prefer to think of this optimization as minimizing the average time per char
...
Faster times means more time to make measurements and to do math.
Note the recent divmod10() optimization discussion which decreased the time to print numbers substantially - http://forum.arduino.cc/index.php?topic=167414.0 - As the print.cpp class is the base class for lcd.print combining these efforts could be very interesting.

For graphics displays the increased performance is evident.
robtillaart, thanks for your interest and encouragement!

Making more processing time available is exactly one of my intentions with the LCD optimization. Also carefull design of the Cosa IOStream::Device and LCD abstract interface together and allowing many more devices within that interface. The topic title is not very good ;-/

I have been following the divmod10 optimization thread with interest. In Cosa I have simply used the AVR standard functions for binary-to-string conversion; itoa, ltoa, utoa, ultoa.

http://www.nongnu.org/avr-libc/user-manual/group__avr__stdlib.html#ga4f6b3dd51c1f8519d5b8fce1dbf7a665

This is where the optimization should go I believe but while waiting and seeing the improvements it would be interesting to adapt that solution to Cosa/IOStream class.

http://dl.dropboxusercontent.com/u/993383/Cosa/doc/html/dd/d83/classIOStream.html

I have not yet pushed "the ultimate optimization" of the LCD driver for I2C. This is when the output to the device becomes asynchronous and works in the background. The Cosa TWI device driver supports this and then the application will be allowed to continue with other work while data is transfered to the display.

The benchmark that writes characters to the display will not show any improvements as it saturates the I2C bus. There is only a very small fraction of the benchmark that could run concurrently with the transfer.

The benchmark that writes numbers could run the binary-to-textual conversion in parallel.

There is also yet another I2C level optimization possible for string output. Currently the Cosa LCD driver implements only IOStream::putchar() and handles puts() and write() as a sequence of putchar(). The function writes the character to the LCD but also handles form-feed, carriage-return-line-feed and a few other control characters (something that LiquidCrystal does not). It is possible to write numbers directly as the string will not contain any control characters. The only issue is text clipping or wrapping. This implies that the whole string could be translated to a single larger I2C block and written as one transaction. This removes the I2C addressing per digit character. This is the same as the nibble optimization only on the next transaction level.    

Cheers!

« Last Edit: July 09, 2013, 01:27:00 pm by kowalski » Logged

Pages: [1] 2 3 4   Go Up
Jump to: