Don't forget that the HD44780 is a character-based LCD driver, not a graphics (pixel-based) LCD driver. The ultimate speed of the display update also depecds on the optical response time of the LCD itself, which is quite slow in the case of the text-based displays I've seen.
Well, I assume the graph is made by using custom characters on a basic character LCD. I think I'll try out the new I2C LCD's and see if that can do what I am wanting.
You may not need those assembler nops. The arduino digitalWrite commands take longer without any additional delay then the Enable pulse width (450ns ) and Enable cycle time (1000us) specified in the HD44780 datasheet.
BTW, If you want the fastest performance then you should check the LCD busy flag to see when its ready instead of using delays before writing data.
it does not work without a line of code between the digitalWrite´s, looks like then there is no change at all on the line.
reading back from the LCD costs a Pin more, and also time, when running pure in assembler and having enough I/O left ok, but for the arduino i think that speed is ok...
it does not work without a line of code between the digitalWrite´s, looks like then there is no change at all on the line.
Are you sure? This is what a digitalWrite(HIGH) followed by a digitalWrite(LOW) looks like on my Arduino without any assembler nops. The pulse width is 4.5 microseconds (this is much longer than needed but as you say, is fast enough for most applications)
The shorter pulse that follows was created using direct port IO and this would need nops to increase the pulse width (from 126 nanoseconds to around 1 microsecond)
The horizontal scale used above is 1 microsecond per division
the display does not behave normal when having nothing between the HIGH and LOW write...
I remain surprised that a 65 nanosecond nop makes a difference to a function (digitalWrite) that takes over 2000 nanoseconds to execute, but if that what you observed then I wont argue.
thats a project for the winter, making a LCD4Bit with pure assembler for the highest speed
IMO, the Arduino LiquidCrystal library has many advantages over lcd4Bit, that would be a more fruitful base to work from if you want to tweak.