ESP8266 alternate SPI library with ability to not wait and some optimizations

Had a friend that gave me a few ESP8266'S he acquired and had no use for. Finally got to see what they were etc and got them working.

Managed in a few hours to port them into my graphics library etc.

For most boards I have DMA access or native optimized hardware support (Due, Teensy, AVR) so found the ESP8266 SPI libray uses the 64 byte FIFO hardware pipeline.

I altered a copy of the code to include an optional "bool wait = true" parameter that tells the low level code if it should wait for the operation to finish before leaving or not.

Basically took all the waits out and put in an inline function and utilized the wait/nowait internally to gain a bit of processing which effectively gets rid of loop overhead.

A lot of graphics routines I have rely on a bit of extra "free" cpu cycles while SPI is transferring so it now allows this to happen.

Also optimized a few loops (ie. 1 if per loop instead of 2)

By default something like ... SPI.write(buffer*) is the same as the extra parameter on the end is optional and defaults to true.*
would use like....
```
*for (i = 0; i < len; i++)
{
    uint8_t val = fn_small_calc(buffer[i]); // something small... after 1st loop this will be executing while SPI still transferring!!!
    SPI.write(val, false); // don't wait ...   
}

SPI.waitBusy(); // return value will tell us whether it was still waiting or not

disableCS();*
```
Code changes attached... renamed classes as made a local copy so not lost when update ESP8266. These files were spi.cpp and spi.h.
A great use for this is when sending (in my case) a 4BPP screen to a 16BPP device. It lets me send the buffer via SPI at full speed and convert the 4BPP data into 16BPP data without overhead.
dma_esp8266.cpp (12.5 KB)
dma_esp8266.h (3.5 KB)

Only looked at the write functions.... but the transfer functions would probably benefit from not waiting as well.

In another library I have the wait ability on the read() and read16() as the actual result is put into a volatile variable that can be accessed.

...
SPI.read16(false); // don't wait
fn_do_stuff();
SPI.waitBusy();
myres = SPI.getRes16();

A lot of the functions are very small and could be inlined.