bitbanging instead of hardware SPI?

I am trying to interface to a 2416 dot matrix display from This is really just a LED Matrix being driven by a Holtek HT1632

What I don't get is that all the commands, reads and writes do not follow a programmers model in the sense that they are not divided into groups of bytes. It looks like I need to bitbang instead which meas I can't use the built in SPI. I do not understand why they would have designed it like this???

the leds are all memory mapped and each location has 4 bits to represent the on/off state for 4 leds in a column.

To send a write command you need to send "011b" followed by a 7bit address and then the 4 bits of data.

To send commands you send 100b followed by a 9bit address (wtf eh? even though the least significant digit is always zero)

So this obviously does not divide evenly into bytes.

The application notes show to put CS low at the start of the first 3 bits.

How should I deal with this???

Can I use the hardware SPI or do I need to do this in software?

From what I've done so far and what can be found in the datasheet, hardware SPI will only transfer 1 byte at a time.

I've come to the same conclusion with the Vinculum based devices - to use SPI, it'll have to be bit-banged because they don't use byte-sized transfers.