Using LCD with multiple shields...no pins left

each character will use around 75us of time for
a total of around 6ms of lost CPU time to fully refresh the entire display.

That figure is highly implementation dependent. With software spi (polling), I can send a byte in around 100us on a 1MIPS avr, or 6us on a 16MIPS avr. To refresh a 20x4 lcd, that means ~1ms (6us * 2 * 80, in 4bit mode) - much faster than the device can actually handle.

That time can be greatly reduced if you utilize interrupt-driven hardware spi: each byte transmission would be 20 ticks (1us or so. -> 200us per refresh).