Pages: 1 [2] 3 4   Go Down
Author Topic: Cosa/Boosting LCD 1602 performance (7X SR4W, 6.5X 4-bit parallel, 1.7-2.3X I2C)  (Read 3743 times)
0 Members and 1 Guest are viewing this topic.
Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 168
Posts: 12430
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
Your math is way off here.
smiley-red smiley-red smiley-red
Thanks for the correction, I updated my post and striked through the faulty math.

Still I like to think of it in time/char as a frame/time is dependant on the size of the frame where time/char is not.


Quote
There is also yet another I2C level optimization possible for string output. Currently the Cosa LCD driver implements only IOStream::putchar() and handles puts() and write() as a sequence of putchar(). The function writes the character to the LCD but also handles form-feed, carriage-return-line-feed and a few other control characters (something that LiquidCrystal does not). It is possible to write numbers directly as the string will not contain any control characters. The only issue is text clipping or wrapping. This implies that the whole string could be translated to a single larger I2C block and written as one transaction. This removes the I2C addressing per digit character. This is the same as the nibble optimization only on the next transaction level.
that would really speed up things!
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Dallas, TX USA
Offline Offline
Edison Member
*
Karma: 47
Posts: 2333
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
Your math is way off here.
smiley-red smiley-red smiley-red
Thanks for the correction, I updated my post and striked through the faulty math.

Still I like to think of it in time/char as a frame/time is dependant on the size of the frame where time/char is not.
That is why I wrote  LCDiSpeed to report timing 3 different ways:
- per byte, which is not dependent of frame size
- per frame, which is dependent on frame size. (reported in both FPS and actual time)
- per "iFrame" which is what the frame time is on a 16x2 display regardless of the actual size of the display in use.

That way you get what ever you want/need.

--- bill
Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 6
Posts: 372
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Some further development with I2C LCD adapters.

I recently updated the Cosa I2C driver and did a refactoring of the TWI::Slave class for ATtiny. As a spin-off I created a Virtual LCD class that sends "commands" via TWI to an ATtiny84 running the LCD driver. This allows reducing the number of bytes transmitted even further. From the original 4 transmissions with 2 bytes (address and port value), to the optimization for the IO expander with a single 5 byte message (address and four port values) and now down to a single 2 byte message (address and character to print on the LCD).



Running the LCD driver on the ATtiny84 at 8 Mhz and 4-bit parallel mode gives a frame rate of 413. And running the I2C Slave Virtual LCD on the ATtiny84 gives approx. 72 fps. This includes the Cosa I2C driver ISR pushing an event and the dispatching of the event to the adapter. Current max with the I2C IO expander is 53 fps @ 100 khz. Another 35+ % improvement.

Further improvements are possible (when using an ATtiny as LCD slave) as the IOStream::Device functions puts() and write() can use single messages. Also number conversion could be moved to the slave by sending binary numbers instead of characters.

Cheers!

Below is the LCD/TWI slave sketch which is running on the ATtiny84. This is a simple command interpretor to handle the LCD operations. The design is event driven where the ISR pushes an event for incoming TWI requests. These end up in the implementation of the method on_request().
Code:
#include "Cosa/TWI.hh"
#include "Cosa/Watchdog.hh"
#include "Cosa/LCD/Driver/HD44780.hh"

HD44780::Port port;
HD44780 lcd(&port);

class LCDslave : public TWI::Slave {
private:
  static const uint8_t BUF_MAX = 64;
  uint8_t m_buf[BUF_MAX];

public:
  LCDslave() : TWI::Slave(0x5A)
  {
    set_write_buf(m_buf, sizeof(m_buf));
    set_read_buf(m_buf, sizeof(m_buf));
  }

  virtual void on_request(void* buf, size_t size);
};

void
LCDslave::on_request(void* buf, size_t size)
{
  char c = (char) m_buf[0];
  if (c != 0) {
    lcd.putchar(c);
    for (size_t i = 1; i < size; i++)
      lcd.putchar(m_buf[i]);
    return;
  }
  if (size == 2) {
    uint8_t cmd = m_buf[1];
    switch (cmd) {
    case 0: lcd.backlight_off(); return;
    case 1: lcd.backlight_on(); return;
    case 2: lcd.display_off(); return;
    case 3: lcd.display_on(); return;
    }
  }
  else if (size == 3) {
    uint8_t x = m_buf[1];
    uint8_t y = m_buf[2];
    lcd.set_cursor(x, y);
  }
}

LCDslave slave;

void setup()
{
  Watchdog::begin();
  lcd.begin();
  lcd.puts_P(PSTR("CosaLCDslave"));
  slave.begin();
}

void loop()
{
  Event event;
  Event::queue.await(&event);
  event.dispatch();
}
« Last Edit: July 12, 2013, 03:35:38 pm by kowalski » Logged

Dallas, TX USA
Offline Offline
Edison Member
*
Karma: 47
Posts: 2333
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

If you looking for a fast low pin count interface to an LCD (can't be lower than a single pin),
you might be interested in this recent activity:
https://bitbucket.org/fmalpartida/new-liquidcrystal/pull-request/1/adding-an-optimized-implementation-of/diff#comment-366944
Although the interface uses a single pin, it can transfer bytes in 92us for a frame rate close to 320 FPS,
which is about 3.6 times faster than the standard LiquidCrystal library using 6 pins!
This is a great example of how inefficient the Arduino core routines like digitalWrite() are.
It is about 6 times faster than the optimized i2c i/o expander interface.

While more components and a bit more complex than using something like a PCF8574 i/o expander chip,
the total component cost should be lower given
595s can be had for about (USD) 20cents  and transistors are about 2-3 cents
and caps and resistors are about 1 cent - all quantity 1 from places like tayda.

--- bill
Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 6
Posts: 372
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

If you looking for a fast low pin count interface to an LCD (can't be lower than a single pin),
you might be interested in this recent activity:
https://bitbucket.org/fmalpartida/new-liquidcrystal/pull-request/1/adding-an-optimized-implementation-of/diff#comment-366944
Although the interface uses a single pin, it can transfer bytes in 92us for a frame rate close to 320 FPS,
which is about 3.6 times faster than the standard LiquidCrystal library using 6 pins!
This is a great example of how inefficient the Arduino core routines like digitalWrite() are.
It is about 6 times faster than the optimized i2c i/o expander interface.
Hi Bill.

I have followed some of the development on the New LiquidCrystal library and the hardware support. Great job!! Very inspiring.

I thought of doing a version with 595 connected to SPI. Would require two more pins but at full speed the transfer rate could be 4 Mhz giving 4-5 us per byte. That is hard to beat that in cost/performance. Using an ATtiny at a dollar is more expensive but gives a lot of interesting options. An interesting challenge.

The poor performance of Arduino/Wiring and the lack of abstraction/structure was actually what got me started on what became the Cosa project. By chance I stumbled upon Arduino last year during the summer vacation. The work with Cosa started in late November.

Anyway, the latest LCD slave is more a test run of the TWI slave, LCD driver and event framework on an ATtiny84. I needed a test example and pushing I2C further seemed like fun. Also moving an interface between two micro-controllers is also an interesting challenge. I hope to add some tooling for this so that it becomes easier. Something in the line of IDL/Corba, etc.

Cheers!
« Last Edit: July 12, 2013, 03:28:24 pm by kowalski » Logged

Israel
Offline Offline
Sr. Member
****
Karma: 4
Posts: 277
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

@kowalski

I've actually did this with 2 595's to be able to utilize all 8 bits of the LCD.
Since the transfer rate is so high, a delay must be added to the code of about 30us, this resulted in an average write speed of about 38-40us per byte sent to the LCD, as I have had troubles with missed letters when I tried to go down to the lowest spec of 37us delay (in total).

Here's my schematic (please ignore the resistor net as it wasn't tested. R3 resistor was also unnecessary as my LCD already has a 100ohm built in resistor):
*Click to enlarge.


Running prints of 80 chars starting with a random number:



And you can find a simple/limited library attached:

* LiquidCrystal_SPI.zip (1.76 KB - downloaded 9 times.)
« Last Edit: July 11, 2013, 08:24:01 pm by TheCoolest » Logged


Sweden
Offline Offline
Sr. Member
****
Karma: 6
Posts: 372
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@TheCoolest

That was exactly what I was considering ;-) Great job!! See if I can get my hands on a few 595s and do a prototype board.

Added a picture of my setup with an Arduino Nano talking over TWI with an ATtiny84 running the LCD driver.

Cheers!
« Last Edit: July 12, 2013, 03:52:27 am by kowalski » Logged

Israel
Offline Offline
Sr. Member
****
Karma: 4
Posts: 277
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks. I got my 20x4 LCD thinking it supported SPI (as the ebay title said it did, and I still had no idea what's what)
And I found that the LiquidCrystal_I2C library I downloaded was awfully slow. About 1ms to send a complete byte or a command, and that is after I optimized it a little bit by removing the unnecessary delays and an extra expander write which wasn't needed.
Filling the screen with 80 chars takes about 78ms, that's insane. With the SPI method it takes just over 3ms for 80 chars, that's a huge improvement.
Frees up a ton of processor time for other important tasks smiley
I too want to build a small backpack for this LCD to go into the project I'm making right now. The only benefit to I2C I can think of right now is that it is probably less susceptible to interference and long wires than the SPI.
Logged


Dallas, TX USA
Offline Offline
Edison Member
*
Karma: 47
Posts: 2333
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The only benefit to I2C I can think of right now is that it is probably less susceptible to interference and long wires than the SPI.
I think the biggest benefit to I2C is if you need to interface to multiple devices since no additional pins are needed.
Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 6
Posts: 372
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The Cosa I2C slave LCD driver is now completed. The initial design has been refactored to a new Virtual LCD class (VLCD) which allows any Cosa LCD device driver to be connected (not just the HD44780 driver). The VLCD class contains two parts; 1) the client part acts as a LCD proxy, translating LCD API calls to I2C messages, 2) the server part acts as an adapter that decodes the I2C messages and calls the LCD implementation.



Below is the CosaLCDslave sketch. It uses the new Virtual LCD class and binding to the HD44780 driver with the 4-bit parallel port IO. This sketch is compiled for an ATtiny84 in the example above but may be compiled for any Cosa supported Arduino.

Code:
#include "Cosa/Watchdog.hh"
#include "Cosa/LCD/Driver/HD44780.hh"
#include "Cosa/VLCD.hh"

// Use a 4-bit parallel port for the HD44780 LCD (16X2 default)
HD44780::Port port;
HD44780 lcd(&port);

// And use the LCD for the implementation of the Virtual LCD slave
VLCD::Slave vlcd(&lcd);

void setup()
{
  Watchdog::begin();
  lcd.begin();
  vlcd.begin();
}

void loop()
{
  Event event;
  Event::queue.await(&event);
  event.dispatch();
}

The benchmark CosaLCDspeed.ino binds to the Virtual LCD and runs the measurements. It is the Arduino Nano in the picture above that runs this sketch.. See the code on github.

https://github.com/mikaelpatel/Cosa/blob/master/examples/LCD/CosaLCDspeed/CosaLCDspeed.ino
https://github.com/mikaelpatel/Cosa/blob/master/examples/TWI/CosaLCDslave/CosaLCDslave.ino

By implementing the IOStream::Device methods puts(), puts_P() and write() the performance can be boosted to 50-98% of the performance of the I2C IO expander at 400kHz. Below are some results from the benchmarking. The first table shows the performance (operations per second/frames per second), and compares the 4-bit and I2C IO expander implementations (at 100khz and 400 khz).



The above results are used as the baseline for the comparison with the second table below which is the ATtiny84 (internal clock 8Mhz) compiled version and the VLCD version. The comparison is between the 4-bit implementation and then the VLCD implementation (with optimizations).



VLCD  may be viewed as a "template" for how to construct I2C slave devices.
http://dl.dropboxusercontent.com/u/993383/Cosa/doc/html/d1/d1f/classVLCD.html

Cheers!
« Last Edit: July 13, 2013, 11:40:18 am by kowalski » Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 6
Posts: 372
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The next step is to implement a Cosa USI based TWI master for ATtiny and porting the LCD support. Below is the LCD benchmark running on a LCD with I2C IO expander and an ATtiny85 (internal clock 8 MHz, internal pull-up).

The picture shows 39 operations per second (32 characters plus 2 set cursor per op). The result for standard Arduino (Uno, Nano, etc) is 53 fps.



The latest I2C optimizations include packaging larger I2C block (32 IO expander commands for 8 characters) on puts() and write().

Cheers!
« Last Edit: July 16, 2013, 02:24:10 pm by kowalski » Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 6
Posts: 372
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Here are the numbers from the latest improvements of the Cosa LCD device driver. The table also contains the ratio compared to the New LiquidCrystal Library benchmark.
https://bitbucket.org/fmalpartida/new-liquidcrystal/wiki/Home#!performance-and-benchmakrs
Please note that the ATtiny84/85 benchmarking uses the internal 8Mhz clock.



The following I2C optimizations are included:

1. Packaging I2C IO expander updates to a single TWI message for putchar(). To send a byte (data or command) to the LCD four TWI messages (address and 1 byte data) was previously sent (LiquidCrystal_I2C). This is compressed to a single TWI message with address and the four bytes needed to send the byte (via the 4-bit parallel interface) to the LCD.

2. Packaging multiple encoded bytes into a single message for puts(). Applying the first optimization to a sequence of characters sent to the I2C IO expander. This allows (again) the TWI address to be removed. The default internal buffer size is 32 bytes. This gives 7 byte address reduction for an 8 byte string.

The second optimization shows up in the puts() to puts_P() ratio as program strings may contain control characters and are not compressed. Ratio 77/60 = 1.28X further improvement. This also shows up as an improvement when printing numbers (dec/bin in benchmark).

For ATmega with TWI hardware the processor will go into sleep mode during the wait for the completion of the I2C operation (write). A further optimization would be to allow the processor to continue and only sync when a new operation is issued. This would require some additional buffering. The Cosa TWI driver allows asynchronous calls but this feature is not yet used by the LCD driver. The current ATtiny USI based TWI is a bit-banging implementation with micro-second level delays. A redesign of the Cosa RTC (micro second level timer) for ATtiny is necessary to allow asynchronous TWI operation with ISR. This is due to a timer conflict.  

The last column in the table above contains the results when using an ATtiny84 as an I2C LCD adapter and reducing the I2C message communication even further. The improvement is then 2.3X.

Read more on the blog http://cosa-arduino.blogspot.se/2013/07/object-oriented-lcd-management.html

A port adapter for SR (74HC595) will be added to the Cosa LCD device driver library soon. Waiting for some more hardware to play with ;-)

Cheers!
« Last Edit: July 17, 2013, 09:46:42 am by kowalski » Logged

Dallas, TX USA
Offline Offline
Edison Member
*
Karma: 47
Posts: 2333
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Cool stuff.
I'm curious what core and i2c library you are using for the attiny.

--- bill
Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 6
Posts: 372
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Cool stuff.
I'm curious what core and i2c library you are using for the attiny.
@bperrybap

Thanks for your interest in this project.

I use the MIT ATtiny core by David Mellis. It is more or less only for the compiler settings, fuse bits, and build in the Arduino IDE. All code is Cosa. Non of the Arduino "core" code or libraries are used (except for main() and init() ;-). Same goes for "Mighty".

Cosa is an OO-framework. It supports the major Arduino ATmega/ATtiny within the framework itself with a Board abstraction. Cosa contains a newly written SPI and TWI class library. For ATtiny the implementation is USI based. It supports all SPI modes and both TWI master and slave devices. I find the standard Arduino/Wiring/dtools/AVR TWI a bit difficult to work with ;-) and too low level and slow. Cosa InputPin and OutputPin operations are between 3-5X faster than Arduino/Wiring. They are also object-oriented and symbolic which makes configuration and reuse much easier.

I post Cosa updates and improvements on http://forum.arduino.cc/index.php?topic=150299.0

Cheers!
« Last Edit: July 17, 2013, 04:55:30 pm by kowalski » Logged

Sweden
Offline Offline
Sr. Member
****
Karma: 6
Posts: 372
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Received a bunch of 74HC595's today (ebay: $2 for 10 pcs) so now I could add and benchmark a shift register based port version for the Cosa LCD support. It uses basically the same method as suggested by @Nadir and above by @TheCoolest. It uses three pins; data, clock and latch. And where the latch signal is also used for the LCD enable. Below is the 3-wire schematics from the codegoogle arduinoshiftreglcd project page http://code.google.com/p/arduinoshiftreglcd/.


http://forum.arduino.cc/index.php/topic,15364.msg112755.html#msg112755

The port is used a bit different for further optimization (later on ;-). Below are the updated LCD benchmark with the initial result for the SR3W support added.



The table values are operations per second. For the putchar, puts and puts_P this corresponds to frames per second on a 16X2 LCD with two set_cursor. The uint16_t dec benchmark is 4 digit decimal print plus set_cursor per operations second. And uint16_t bin benchmark is 14 digit binary number print (total 16 characters with 0x-prefix) plus set_cursor operations per second.

This SR3W implementation uses the Cosa OutputPin serialization function and is "high-level" (i.e. not PORT direct) as the 4-bit parallel version optimization. SPI could be used to boost performance further.
Code:
void
HD44780::SR3W::write4b(uint8_t data)
{
  m_port.data = data;
  m_sda.write(m_port.as_uint8, m_scl);
  m_en.toggle();
  m_en.toggle();
}

Using the different LCD port adapters is easy. The LCD driver is a single source for all versions. It is only the port adapter that needs implementing. This is one of the great OOP design pattern; delegation. Below is a snippet from the LCD benchmark.

Code:
// Select the LCD device for the benchmark
#include "Cosa/LCD/Driver/HD44780.hh"
// HD44780::Port port;
HD44780::SR3W port;
// HD44780::MJKDZ port;
// HD44780::DFRobot port;
HD44780 lcd(&port);

The HD44780 LCD device driver implements the abstract class LCD and can be replaced by any other Cosa LCD device driver implementations in the benchmark source code. Again by changing only a few lines. Below is yet another snippet:

Code:
// #include "Cosa/LCD/Driver/PCD8544.hh"
// PCD8544 lcd;
// #include "Cosa/LCD/Driver/ST7565.hh"
// ST7565 lcd;
// #include "Cosa/VLCD.hh"
// VLCD lcd;

These are all implementations of the LCD interface and are all benchmarked with the same code. Basically be commenting in/out the LCD to test. Below is a link to the benchmark sketch.

https://github.com/mikaelpatel/Cosa/blob/master/examples/LCD/CosaLCDspeed/CosaLCDspeed.ino

After benchmarking the different LCD port alternatives we can conclude that the Shift Register method has the best cost/performance and can match parallel access methods with a much lower pin count. It would be interesting to see this as part of future Arduino boards/shields.

Cheers!
« Last Edit: July 23, 2013, 10:37:49 am by kowalski » Logged

Pages: 1 [2] 3 4   Go Up
Jump to: