The hd44780 library includes sketches that will measure the byte write time to the display for all the various interfaces supported.
Install the library using the library manager.
Then you can select the appropriate i/o class for whatever h/w you have.
The sketch you will want to run is the LCDiSpeed sketch - you will need to select the appropriate one depending on hardware (i/o class) you are testing.
The library also includes the ability to run timings for other libraries as well such as the LiquidCrystal library.
The LCDiSpeed sketch measures the time it takes to write a character to the display. It includes all the overhead of the library.
This is important since depending on how the library is written the time to write a character to the display can very substantially even on the same hardware.
The time reported is the time a sketch will see when writing characters to the display.
And in terms of using BUSY polling. Yes it is not worth it for most instructions.
I have spent MANY MANY hours (days/weeks) looking the hd44780 timing and various timings of hardware interfaces used to communicate with these types of displays using a logic analyzer on many different LCD libraries.
For clear and home instructions - it might be faster, depending on the library, but likely not.
For all others, it will slow things down quite a bit given the the way Arduino defined the API for digital i/o routines (digitaWrite(), digitalRead(), pinMode()). It is particularly true on the AVR platforms given a combination of the way AVR chip does its i/o and the sub optimal coding for the digital i/o routines provided by the Arduino.cc AVR core library.
For example, the hd44780 instruction to write a character to the screen takes no more than 37us to execute inside the chip.
Now lets look at the timing on a 16Mhz AVR. (UNO type board)
It can vary depending on version of the compiler but these numbers should be close.
Each digital i/o API call like digitalWrite(), pinMode,(), digitalRead() takes 4.5-6us
depending on which call.
So suppose you are wanting to read the BUSY status.
You have to flip all the data pins from output to input, change R/W to high, then strobe E appropriately, while you read the DB7 pin.
In 4 bit mode:
pinMode() 4 times, 1 for each data line to input mode
digitalWrite() - RS, RW high, E high, E low
digitalRead() - DB7
digitalWrite() E high, E low (for second/low nibble)
pinMode() 4 times, 1 for each data line to output mode
digitalWrite() R/W low
4 * 5 + 4 * 6 + 6 + 2 * 6 + 4 * 5 + 5= 87us
Even if you look only at the time to just to read the DB7 pin (not the needed clean up after reading the pin)
You are at 4 * 5 + 4 * 6 + 6 = 50us
It is already longer than the instruction time so you would never even see a BUSY status for anything but clear and home.
So you can see it doesn't make sense to read BUSY since it takes longer just to get to the point of being able to reading the DB7 pin than the LCD instruction takes to execute.
If in 8 bit mode, counter intuitively it takes even longer.
It changes to
pinMode() - 8 times, 1 for each data line input mode
digitalWrite() - RS, R/W, E high, E low
digitalRead() - DB7
pinMode() - 8 times, 1 for each data line output mode
digitalWrite() - R/W low
8 * 5 + 4 * 6 + 8 * 5 + 5 = 109us
Even in 8 bit mode, it still takes longer to read BUSY than the instruction time.
Can reading busy ever be faster, yes, but not when using Arduino and its APIs.
Yes you could write some faster code if the code was hard coded to certain ports/pins so that you could use the AVR specific bit set and bit clear instructions.
But that would no longer be using the Arduino API i/o functions and more importantly would no longer allow the user to configure the pins being used.
---- bill