the display does not behave normal when having nothing between the HIGH and LOW write...
I remain surprised that a 65 nanosecond nop makes a difference to a function (digitalWrite) that takes over 2000 nanoseconds to execute, but if that what you observed then I wont argue.
thats a project for the winter, making a LCD4Bit with pure assembler for the highest speed
IMO, the Arduino LiquidCrystal library has many advantages over lcd4Bit, that would be a more fruitful base to work from if you want to tweak.
But if you are out for speed, Peter Fluery's LCD library is highly optimised and can be made to work well with Arduino with a little effort. http://homepage.hispeed.ch/peterfleury/avr-lcd44780.html