Need 40 ns delay max with Arduino IDE

This is correct. There's nothing about these SSD130x devices that requires a strict 40ns timing in any way. They don't have any particular signal conditioning requirements that result in a need to jump through hoops. Just use a standard SPI interface and/or digitalWrite's to get the job done.

You don't have to. You can save yourself the trouble of figuring this out.

A more relevant challenge when using these devices is to figure out (or find/copy) the required initialization routine. The datasheet is kind of limited in this sense; yes, it does explain the entire command set, but it's not clear/specific on how to get from a power-up state to a functional state where you can write data to unit so that it appears on the display. The easiest way is to find an example and then copy that and modify it for your purposes. I've done this a couple of times for the SSD1306 using I2C (but it'll be the same principle for SPI or bit-banged parallel etc.)