fast parallel writes...

Hi, getting a TFT that works soon (1st one DOA) so I can add TFT support to a graphics library I built from scratch a few months back. Yeah, I know, yet another graphics library.

At the moment the library is only for 1 bit graphics and works for any LCD or dot matrix. It's device independent as it just tells the rendering library where the buffer is and orientation. Works great for my Nokia5110 and for a 48*8 LED matrix. The bits are always left-right... top-bottom 1bpp and it's up to the device to sort out how it wants to write the bytes.

The TFT I have ordered has the 8 bit parallel interface but as I'm mainly working on the Due I needed fast code to write 8 bits to 8 pins similar to writing all 8 bits to a single Uno port. A lot of talk on forums basically said the Due was slower than the Uno etc because of the pin layout and the bit bashing needed to do it.

I have some code below which seems to work well. Others might find it useful as I've seen others requesting working code that does this. The code below opens up 2 pin areas where 8 or 9 bit writes can be done fast. Have tested with 8 led bar chart. I'm also hoping the compiler optimizes the code down into something very simple as it should be a constant address.

/*
A lot of TFT screens and other parallel devices like to set 8 bits in parallel.. on the Due and Mega this is a bit crappy because pins
are all over the place. Ideally you want 8 pins in a row on the same port where the bitmask goes from 1 ... to 128.

The Due is also 32 bit so a lot different to the 8 bit Arduinos

Pins 33-40 have bitmasks that can be mapped directly without too much mucking around, are continious on the board
also
Pins 51-44 are similar although 1 extra shift operation to compensating for starting at the 12th bit (note the different direction)

See http://arduino.cc/en/Hacking/PinMappingSAM3X for pin layout

+------+------+
|32    | 33(1)|
+------+------+
|34(2) | 35(3)|
+------+------+
|36(4) | 37(5)|
+------+------+
|38(6) | 39(7)|
+------+------+
|40(8) | 41*  |
+------+------+
|42    | 43   |
+------+------+

Digital pin 33 -> bit 1 = PC1
Digital pin 34 -> bit 2 = PC2
...
...
Digital pin 40 -> bit 8 = PC8


* Could be expanded for 9 bit easily as Digital pin 41 is PC9 which is on the same Port and is in a bit sequence
*/

static inline __attribute__((always_inline)) void pinFastWrite33To40_8(uint32_t val8) // fast write to pins 33-40 - pinmode must have previously been set
{
#define mask33_40 (0xFF  << 1)

 val8 <<= 1;

   g_APinDescription[33].pPort->PIO_SODR = val8; // set all the 1's
   g_APinDescription[33].pPort->PIO_CODR = ((~val8) & mask33_40) ; // unset  - set all the 0's
}

// 1st bit is PC12 so all bits need to be shifted 12 bits .. 0-255 -> 51(1) 50(2) 49(4) 48(8) 47(16) 46(32) 45(64) 44(128)

static inline __attribute__((always_inline)) void pinFastWrite51To44_8(uint32_t val8) 
{
  val8 <<= 12; // shfit 12 bits to compensate for this sequence starting at 12th bit

  #define mask51_44 (0xFF << 12)  

   g_APinDescription[51].pPort->PIO_SODR = val8; // set all the 1's
   g_APinDescription[51].pPort->PIO_CODR = ((~val8) & mask51_44) ; // unset  - set all the 0's
}

static inline __attribute__((always_inline)) void pinFastWrite33To41_9(uint32_t val9) // 9 bit version of above
{
#define mask33_41 0x1FF  

   g_APinDescription[33].pPort->PIO_SODR = val9; // set all the 1's
   g_APinDescription[33].pPort->PIO_CODR = ((~val9) & mask33_41) ; // unset  - set all the 0's
}


static inline __attribute__((always_inline)) void pinFastWrite8(uint32_t *pins, uint32_t val8) // works for any 8 pins - pinmode must have previously been set
{
  register uint32_t mask = 1;

  while (mask & 0xFF)
  {
    if (val8 & mask) 
    {
      g_APinDescription[*pins].pPort->PIO_SODR = g_APinDescription[*pins].ulPin; // set
    }
    else
    {
      g_APinDescription[*pins].pPort->PIO_CODR = g_APinDescription[*pins].ulPin; // unset
    }

    pins++;
    mask += mask;  // 1 -> 2 -> 4 etc
  }
}

I also have a working SD card library that works for the Due. It's built on sdfatlib but only has hardware SPI access as I gutted other bits out when making it Due compatible. It will also work for the Uno. I hope to get these libraries into a public area soon need to do further testing.