Enhanced LiquidCrystal

When there was a significant performance advantage for 8 bit in some situations it would have been sneaky. Increasingly the situation is that the 8 bit code:

  1. is rarely used
  2. adds length, wastes resources
  3. adds complexity to make reading the code harder.

Yes, I would agree, this version is probably about as fast as it's ever going to get, at least without ugly tricks like hard coded pins or assembly language.

You've already incorporated so much feedback from me and others, and I can understand wanting to do one last round of final testing and wrap everything up. But if I could talk you into reconsidering the userbusy API again, I just hope there's a simpler way.

Perhaps instead of userbusy, the same goals could be accomplished by making send() public? Then an example could be crafted which creates a subclass where send() overridden? That would probably be even faster than userbusy, since the 8 digitalWrite() and enable pulse could use digitalWriteFast, as well as reading the busy bit and switching pin modes.

Regarding not using 8 bit mode, perhaps it could be perceived as "sneaky", but if it's well documented that the 4 extra wires aren't actually used, that seems perfectly fair. I believe pretty much anybody would appreciate the code being much smaller and simpler. If they care about speed, I think they'll really appreciate all your hard work that's gone into benchmarking that proves using only 7 pins is actually fastest.

It's really great work you've done here. A lot of users are going to benefit once this gets merged into an official Arduino release!

Making send public rather than the user busy callback fcn is an interesting idea.

After all, every pin in the interface gets manipulated to test the busy flag. An example of how that could be overridden with digitalWriteFast and another example with port commands would make it perhaps more approachable than the callback fcn is.

I don't know if anyone else has actually used the user defined busy routine and its kind of ugly, so I haven't been motivated to fiddle with it.
User callback fcns are the basis for the arduino's approach to user supplied interrupt routines, but the way it works here may actually be harder to understand.

I have thought I was done several times already. I think introducing the little benchmark (which after all, is not all that important; all the interfaces go faster than the eye can follow) has kept this fun and interesting much longer than it otherwise might have been. The project has served to teach me quite a lot about C++ and the arduino that I wouldn't have run across otherwise and provides a practical focus when I read about the language that is much more useful than the umpteenth discussion of circles, squares, and triangles inheriting from shape.

I am starting through the test process with each of the different size LCDs and each of the interfaces; I do see one change in the way this version works on the 40x4: with the 40x4 you have always had to specify an argument for RW. If you want to ground the RW pin, rather than omitting the argument, you specify 255. In the past if you specified a userBusy argument and a (non-255) RW argument, it used userBusy.

With the version I'm testing now you will need to specify 255 for RW on the 40x4 (and 27x4) LCDs if you want it to use a userBusy routine. You will also have to digitalWrite LOW to the RW pin before you call lcd.begin(40,4).

We've changed the order of testing for the different options and in every other case, the new order is what makes sense. It is a little faster for the more common and easy to use option.

Other than the change described in my last post there are no new features in this version:

Speed testing
All of the interface modes go faster than the eye can follow. This version of the software is significantly slower than previous versions when using timed delays. I found an LCD (Axman) that needed longer delays and in the interests of making the code foolproof, I lengthened the delays to make that LCD work. However Paul Stoffregen has significantly speeded up the code when testing the busy flag and so those options run significantly faster than before. I compared the speeds of the different interfaces--writing 80 characters to the screen then 80 blanks and looping through that 20 times. The results on a Mega are:
Axman 4 data pins no RW 1349 milliseconds | nonAxman 1349
Axman 4 data pins + RW 565 milliseconds | nonAxman 468
Axman 8 data pins no RW 1314 milliseconds | nonAxman 1314
Axman 8 data pins + RW 520 milliseconds | nonAxman 500
Axman 4 pins + user busy 369 milliseconds | nonAxman 316

I also have a Teensy++2.0 board. One of the interesting things about that board is that the software that comes with it includes considerable optimization of digitalRead, digitalWrite etc. The board runs at 16 megaHz, just like the Mega, but speeding up those commands results in an impressive change in the benchmarks:
Axman 4 data pins no RW 1207 milliseconds | nonAxman 1207
Axman 4 data pins + RW 327 milliseconds | nonAxman 219
Axman 8 data pins no RW 1212 milliseconds | nonAxman 1212
Axman 8 data pins + RW 361 milliseconds | nonAxman 296
Axman 4 pins + user busy 241 milliseconds | nonAxman 189

This version is available at:
http://www.healthriskappraisal.org/LiquidCrystalFastWith8bit.zip

Interesting speeds.

I made an i2c backpack for my LCD* this weekend. After some tweaking, and boosting the TWI speed to 300kHz, I can do a single command in 384us.

That translates to 1229 ms for a (80 chars + 80 blanks) * 20 reps.

I could take the TWI speed higher, but then I would have to start adding delays, and separate commands.

Currently, I can send an LCD command with single TWI transmission. Just 3 bytes (excluding address).

  • Used a MCP23016 and 8-bit LCD interface. With no RW, although it is connected, not used.

Does the I2C controller check the busy flag or just use a timed delay or is all of the delay on the arduino side?

Does the I2C controller check the busy flag or just use a timed delay or is all of the delay on the arduino side?

Due to the slow speed of I2C, I dont even need a single delay/busy check except at initialization :slight_smile:

From my testing it is stable without using delay up to 300kHz i2c speed, but it works ok @ 100kHz (the default in Arduino) too. Time per command at that speed is just under 700us.

I am going to see if I can get higher rates (eg 1mHz) using delays and multiple commands to to if it is worth it. I dont think using the busy check will be worth it due to the overhead of i2c.

Edit: I'll also do a proper bench timing to make sure my calculations are in fact correct :slight_smile: I am bit surprised at the good speed I am getting.

That translates to 1229 ms for a (80 chars + 80 blanks) * 20 reps.

I'll also do a proper bench timing to make sure my calculations are in fact correct Smiley I am bit surprised at the good speed I am getting.

Good thing I did! Your benchmark code posted on the previous page runs in 611 ms! Half the expected time... Something is probably not right...

Gonna check the code, I am also getting some unexpected output. :frowning:

Edit: Found the bug! Was running out RAM on the 328p. Shortened strings, running again now.

Edit 2: Now getting 1229ms just as initially expected :slight_smile:

I looked at the MCP23016 datasheet, and it only does up to 400kHz. Not sure if it will be worth boosting it more.

Your benchmark code posted on the previous page

It does not seem to cause instability issues like the following does.

void loop()
{
  static int line = 0;
  static int col = 0;
  static char c = 'a';
  
  lcd.write(c);
  
  col++;
  
  if (col == 20)
  {
    col = 0;
    line++;
    lcd.setCursor(col, line);
    
    if (line == 4)
    {
      line = 0;
      
      c++;
    }
    
  }
}

Anything above a 300kHz TWI frequency and it crashes after a few minutes.

deleting code goes much faster than writing code.

It seems like I already have something working, about 700 bytes less flash memory usage. At least 5 bytes less RAM usage.

If I do sizeof(lcd) it gives me a number (30 for the 4 bit version I'm working on today) that includes the RAM used for instance variables. The compiler gives a size for a compiled sketch's total flash memory usage. Is there an easy way to see the amount of flash memory LiquidCrystal itself uses? I honestly didn't make note yesterday of sketch sizes for the various options, it seems excessive to reinstall yesterday's code just to measure those, but it would be interesting to see a percentage difference in static memory usage.

I found that I needed to increase the per character delay a little more to get the Axman LCD to work using timed delays. When the delay was 320 usec I was seeing maybe one error in 100 characters. quite posibly the vesion I posted on 5/30 needed a longer delay, too. I did the Axman LCD last that time and I was tired.

When I get a chance I think I will compare the length of this version of LiquidCrystal with Arduino-17/18. I suspect the length is pretty comparable now even with the bug fixes and additional modes and features.

I did change all of the items that were 'private' to 'protected'. I think that will allow what Paul was suggesting; writing a class that inherits from LiquidCrystal and then providing a new send() that has the pin numbers hard coded so that PORT type instructions or digitalWriteFast can be used to speed things up.

http://www.healthriskappraisal.org/LiquidCrystal4Bit.zip

I have a quick quesion... I did PM you but maybe you didn't get it...

With the 40x4 that has dual HD44780 chips, does that mean that you could have 16 custom characters, just 8 in the top half and 8 in the bottom half?

Mowcius

Very nice. The code's looking very good. I'll try to come up with a subclass example on send().

Here's a list of minor little things I noticed looking at the latest code. Nothing's critical, just little things.

_busyPin seems to always be _data_pins[3]. Eliminating it might reduce code size, and save an extra byte of per-instance RAM.

rwSave save in init() appears to be unused now.

Inside init(), should en2 be checked for 255 instead of 0 to see if it's unused?

Inside init2(), it would be advisable to call delayMicroseconds with only 15000 and do the loop 9 times instead of 3. Even though delayMicroseconds takes a 16 bit input, it doesn't actually work properly beyond 16383. In fact, limiting your call to 8191 us might be a good idea, in anticipation of a 32 MHz AVR (eg, the xmega chips).

Can _displayfunction become only a local variable inside begin2(), possibly saving code side and one more byte of per-instance allocated RAM?

Mowcius: the hardware would let you do that. My software, however, sends the user defined characters to both hd44780s. My philosophy has been to try to make the software for the 40x4 act like its one device. You'd have to change the API if you wanted to be able to tell the software which lines you were defining the characters for. I suppose you could write your own code outside LiquidCrystal (or subclass it!) to LOAD the user definitions and LiquidCrystal would never know the difference.

Paul: thanks for the tips I will look at those, probably in a few days.

I will spend a little time poking into the Axman timings. something seems fishy.

Mowcius: the hardware would let you do that. My software, however, sends the user defined characters to both hd44780s. My philosophy has been to try to make the software for the 40x4 act like its one device. You'd have to change the API if you wanted to be able to tell the software which lines you were defining the characters for. I suppose you could write your own code outside LiquidCrystal (or subclass it!) to LOAD the user definitions and LiquidCrystal would never know the difference.

I knew you would have the answer :stuck_out_tongue:
Right. I might look into it. I think it might be useful for a few projects I have in mind.

Thanks,

Mowcius

Inside init(), should en2 be checked for 255 instead of 0 to see if it's unused?

That one looks like an actual bug to me. It would only show up if someone passed pin 0 for en2, but a bug nonetheless.

Thanks for pointing this out!!

I updated the zip file with the changes Paul pointed out:
http://www.healthriskappraisal.org/LiquidCrystal4Bit.zip

I realized a day or two ago that the advice I gave Mowcius is needlessly complex. The 40x4 display can be thought of as two 40x2 displays each with its own hd44780 controller chip and with all the pins in common except the enable line.

Here is some completely untested code to illustrate the concept:

#include <LiquidCrystal.h>
LiquidCrystal lcdwhole(rs,rw,enable1,enable2,d0,d1,d2,d3);  //this refers to the whole 40 x4
void setup (void){
lcdwhole.begin(40,4); //initialize the whole LCD

LiquidCrystal lcdBottomHalf(rs,rw,enable2,d0,d1,d2,d3); //define a temporary LCD object that only has 2 lines
lcdBottomHalf.begin(40,2);  //initialize the 2 line object


//now load custom characters into the whole LCD:
      uint8_t bell[8] = {0x4,0xe,0xe,0xe,0x1f,0x0,0x4};
      uint8_t note[8] = {0x2,0x3,0x2,0xe,0x1e,0xc,0x0};
      uint8_t clock[8] = {0x0,0xe,0x15,0x17,0x11,0xe,0x0};
      uint8_t heart[8] = {0x0,0xa,0x1f,0x1f,0xe,0x4,0x0};
      uint8_t duck[8] = {0x0,0xc,0x1d,0xf,0xf,0x6,0x0};
      uint8_t check[8] = {0x0,0x1,0x3,0x16,0x1c,0x8,0x0};
      uint8_t cross[8] = {0x0,0x1b,0xe,0x4,0xe,0x1b,0x0};
      uint8_t retarrow[8] = { 0x1,0x1,0x5,0x9,0x1f,0x8,0x4};      
      lcdwhole.createChar(0, bell);
      lcdwhole.createChar(1, note);
      lcdwhole.createChar(2, clock);
      lcdwhole.createChar(3, heart);
      lcdwhole.createChar(4, duck);
      lcdwhole.createChar(5, check);
      lcdwhole.createChar(6, cross);
      lcdwhole.createChar(7, retarrow);

//now redefine custom char 0 for the lower 2 lines:
lcdBottomHalf.createChar(0,retarrow);

}  //end setup(); the temporary LCD object expires.

void loop (void){

      i = 0;
      lcdwhole.clear();
      while (i<4) {
            lcdwhole.setCursor(0,i);
            lcdwhole.print("user:");
            for (int j=0; j<7; j++) {
                  lcdwhole.print(j, BYTE);
            }
            
            i++;
}

I realized a day or two ago that the advice I gave Mowcius is needlessly complex. The 40x4 display can be thought of as two 40x2 displays each with its own hd44780 controller chip and with all the pins in common except the enable line.

Yeah. I had not had time to look into it in more detail so you saved me some time :slight_smile:

I will try that code later if my 40x4 wants to play ball.

Mowcius