Enhanced LiquidCrystal


8 data pins + RW 432 milliseconds
8 data pins - RW 641 milliseconds
4 data pins + RW 719 milliseconds
4 data pins - RW 1038 milliseconds

I don't think there should be that big a difference between the 8-bit and the 4-bit speeds for the same RW configuration. I haven't checked your code but I hope you took into account the fact that there is no delay (or busy check) required between the sending of the high nibble and the sending of the low nibble.
