Faster SPI on the Zero?

AloyseTech:
Thanks for the code, I'll try that ASAP.

Have you tried to replace each "spiwrite(c)" by "SPI.transfer(c)" in SSD1351.cpp ? Since we use hardware SPI, the test on _sid is not necessary I guess and so we always use the SPI.transfer method anyway. It will maybe not speed up the bliting from SD but could help speed things up with standard graphical function.

Off topic : what tool do you use to convert a BMP or any image into raw 16bit file?

I have not modified the Adafruit library beyond removing that delayMicroseconds() line, because right now I'm just trying to speed up the SPI and SD card reading. I'm sure that library could be sped up a great deal, but I don't need that to be super fast at the moment and until I know I've got the SPI library running at full tilt, there's no point in optimizing elsewhere anyway.

The BMP converter tool I used is here:
http://elm-chan.org/fsw_e.html

After updating the other two transfer functions, used by the SD lib, to have the same loop optimization I got another slight speed increase for RAW image blitting to 7.75 FPS:

// This function transmits a buffer but discards the recieved data.
void SERCOM::transferDataSPI(void *buf, uint32_t count)
{
  uint8_t *buffer = reinterpret_cast<uint8_t *>(buf);

/*
  while(count-- > 0) {
	sercom->SPI.DATA.bit.DATA = *buffer; // Initiate byte transfer.
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     *buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; // Read received byte, then increment pointer into buffer.
  }
*/
  
  sercom->SPI.DATA.bit.DATA = *buffer++; // Initiate byte transfer.

  while(count-- > 1) {
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     //*buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; // Read received byte, then increment pointer into buffer.
     sercom->SPI.DATA.bit.DATA = *buffer++; // Initiate byte transfer.
  }

  while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
  //*buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; ; // Read received byte, then increment pointer into buffer.

}

// This function transmits bytes and returns the received data in a buffer.
void SERCOM::transferDataSPI(void *buf, uint32_t count, uint8_t transmit)
{
  uint8_t *buffer = reinterpret_cast<uint8_t *>(buf);
  
  sercom->SPI.DATA.bit.DATA = transmit; // Initiate byte transfer.

  while(count-- > 1) {
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     *buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; // Read received byte, then increment pointer into buffer.
     sercom->SPI.DATA.bit.DATA = transmit; // Initiate byte transfer.
  }

  while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
  *buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; ; // Read received byte, then increment pointer into buffer.

}

// This function transmits bytes but discards the recieved data.
void SERCOM::transferDataSPI(uint32_t count, uint8_t transmit)
{

  sercom->SPI.DATA.bit.DATA = transmit; // Initiate byte transfer.

  while(count-- > 1) {
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     sercom->SPI.DATA.bit.DATA = transmit; // Initiate byte transfer.
  }

  while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.

}

Again, the first function is not compatible with the SPI library, it does not return any data in the buffer you pass, it only sends data.

With the source code you uploaded on your website, I have 1.05 FPS for BMP and 5.06 FPS for raw BIN files.

Well, either you missed changing something, put some file in the wrong folder, or the SD card you're using is slow, perhaps? I gave you the most up to date version I had.

My uSD card is a SanDisk 1GB, there's nothing else marked on it and I guess it is in fact pretty old. Maybe this is the issue.

I've put sercoms files in Arduino15/.../core, spi in arduino15/.../libraries/SPI, and Sd in arduinoApp/.../libraries/SD/src

I've tried with both your version and mine of SSD1351 but there is no change.

How do you try the code without SD? You take the .bin file as a const int16 array right in the code ?

I modify the RAW function like so, so it just reads one block from the card:
(Note the two f.read(buffer, 512) lines.)

// This function blits a full screen, raw, 16 bit 565 RGB color image to the display from the SD card.
void rawFullSPI(char *filename) { 
  
  File f;
  uint8_t buffer[512]; // Buffer two full rows at a time - 512 bytes.  This is the same the size of an SD card block.
  uint8_t *b, *bmax; // Pointers into the buffer.
  
  // Specify size of region to be drawn.

    tft.writeCommand(SSD1351_CMD_SETCOLUMN);
    tft.writeData(0);
    tft.writeData(127);
    
    tft.writeCommand(SSD1351_CMD_SETROW);
    tft.writeData(0);
    tft.writeData(127);


  // Draw bitmap.
    
    tft.writeCommand(SSD1351_CMD_WRITERAM); // Tell display we're going to send it image data in a moment. (Not sure if necessary.) 
    digitalWrite(my_dc, HIGH); // Set DATA/COMMAND pin to DATA.    
    
    f = SD.open(filename); // Open file for reading.
    f.read(buffer, 512);

    for (byte row = 0; row < 128; row+=2) { // 2.79FPS without SPI_FULL_SPEED in SPI.begin, 3.75FPS with it.
      
      //f.read(buffer, 512); // Read the next two rows from the card into the image buffer. 
      // 2.79FPS when doing this read. 6.42 FPS when not doing this read.  (2.3x as fast)    
      // With new block transfer optimization, 7.15 FPS when doing this read, and 20.18 FPS when not doing this read.
      // The reason the screen goes white when doing this is because the buffer we're using to transmit is also the receive buffer, so it is overwritten on the first go round.
      
      /*
      b = buffer;
      bmax = b+512; // Calcuate when we should stop and read the next two rows.   
      
      digitalWrite(my_cs, LOW); // Tell display to pay attention to the incoming data.
             
      while (b < bmax) { // Write both rows to the display.
        SPI.transfer(*b); // Write low byte.
        b++;
      }
      */

      // Moving all the extra stuff here  outside the for loop and getting rid of SD reads gives 24.75 FPS, which is still slower than expected. 24.23 FPS with these in the loop.
      // Skipping the file opening and closing for three different images by using a loop inside this function does not improve performance much.  Still 24.71 FPS.
      // Unrolling the transfer loop didn't seem to improve things at all.  
         
      digitalWrite(my_cs, LOW); // Tell display to pay attention to the incoming data.

      //noInterrupts(); // 7.65 -> 7.70 FPS
      SPI.beginTransaction(SPISettings(12000000, MSBFIRST, SPI_MODE0)); 
      SPI.transfer(buffer, 512); 
      SPI.endTransaction();
      //interrupts();
                 
      digitalWrite(my_cs, HIGH); // Tell display we're done talking to it for now, so the next SD read doesn't corrupt the screen.
      
    }  
  
    f.close(); // Close the file.
        
}

With that change I get over 24fps, but of course the images don't display properly. They don't need to though, I'm just testing how fast I can output a full screen of data.

But even though 24fps seems fast, it's only half of what it should be capable of and I can't for the life of me figure out why. I've checked everything I can think of. I even went so far as to make sure the sercom lib was calculating the baud rate correctly. And there is no 2x multiplier bit for the SPI like there is on the AVR so that can't be set wrong either.

I think you uploaded the wrong SD files. I added static SPISettings settings(12000000, MSBFIRST, SPI_MODE0); at the beginning of the file and now I'm having 1.57fps and 8.96fps for BMP and RAW respectively, as well as 22.4 for raw without updating buffer (no f.read(buffer,512) in loop)

At the beginning of which file?

And when you say static SPISettings do you mean you added the static keyword in front? Why?

Ah, my bad, I think I did forget to include one file, because I forgot I'd modified it.

in SD.CPP:

boolean SDClass::begin(uint8_t csPin) {
  /*

    Performs the initialisation required by the sdfatlib library.

    Return true if initialization succeeds, false otherwise.

   */
  return card.init(SPI_FULL_SPEED, csPin) && // *** MODIFIED from SPI_HALF_SPEED ***
         volume.init(card) &&
         root.openRoot(volume);
}

Of course in my defense, having the SPI library default to half speed is really dumb. I mean, I know they did it because some people have long wires attached to their SD cards or crappy SD shields with resistors instead of a level shifter, but come on. There's no indication unless you look under the hood that the library is slowing you down, no way short of modifying the library itself to fix it. And worst of all every time you reinstall the IDE it's going to go back to being the way it was, and the bug search for the cause will begin anew because it's easy to forget you changed that. Like I just did.

At the beginning of sd2card.h I modified

static SPISettings settings;

to

static SPISettings settings(12000000, MSBFIRST, SPI_MODE0);

and I changed the default speed to SPI_FULL_SPEED like you. It's strange that our results are diferents...

AloyseTech:
At the beginning of sd2card.h I modified

static SPISettings settings;

to

static SPISettings settings(12000000, MSBFIRST, SPI_MODE0);

and I changed the default speed to SPI_FULL_SPEED like you. It's strange that our results are diferents...

There is no "static SPISettings settings;" in sd2card.h. Did you mean sd2card.cpp?

Here's what the first few lines of mine looks like:

#define USE_SPI_LIB
#include <Arduino.h>
#include "Sd2Card.h"
//------------------------------------------------------------------------------
#ifndef SOFTWARE_SPI
#ifdef USE_SPI_LIB
#include <SPI.h>
static SPISettings settings;
#endif

What version of the IDE are you running? A nightly build perhaps? I'm using the regular old 1.6.6 and I've downloaded the latest board files for the Zero.

The line you changed relates to this class in SPI.h:

class SPISettings {
  public:
  SPISettings(uint32_t clock, BitOrder bitOrder, uint8_t dataMode) {
    if (__builtin_constant_p(clock)) {
      init_AlwaysInline(clock, bitOrder, dataMode);
    } else {
      init_MightInline(clock, bitOrder, dataMode);
    }
  }

  // Default speed set to 4MHz, SPI mode set to MODE 0 and Bit order set to MSB first.
  SPISettings() { init_AlwaysInline(4000000, MSBFIRST, SPI_MODE0); }

  private:
  void init_MightInline(uint32_t clock, BitOrder bitOrder, uint8_t dataMode) {
    init_AlwaysInline(clock, bitOrder, dataMode);
  }

  void init_AlwaysInline(uint32_t clock, BitOrder bitOrder, uint8_t dataMode) __attribute__((__always_inline__)) {
    this->clockFreq = (clock >= (F_CPU / SPI_MIN_CLOCK_DIVIDER) ? F_CPU / SPI_MIN_CLOCK_DIVIDER : clock);

    this->bitOrder = (bitOrder == MSBFIRST ? MSB_FIRST : LSB_FIRST);

    switch (dataMode)
    {
      case SPI_MODE0:
        this->dataMode = SERCOM_SPI_MODE_0; break;
      case SPI_MODE1:
        this->dataMode = SERCOM_SPI_MODE_1; break;
      case SPI_MODE2:
        this->dataMode = SERCOM_SPI_MODE_2; break;
      case SPI_MODE3:
        this->dataMode = SERCOM_SPI_MODE_3; break;
      default:
        this->dataMode = SERCOM_SPI_MODE_0; break;
    }
  }

  uint32_t clockFreq;
  SercomSpiClockMode dataMode;
  SercomDataOrder bitOrder;

  friend class SPIClass;
};

Which relates to this in sd2card.cpp:

//------------------------------------------------------------------------------
/**
 * Initialize an SD flash memory card.
 *
 * \param[in] sckRateID SPI clock rate selector. See setSckRate().
 * \param[in] chipSelectPin SD chip select pin number.
 *
 * \return The value one, true, is returned for success and
 * the value zero, false, is returned for failure.  The reason for failure
 * can be determined by calling errorCode() and errorData().
 */
uint8_t Sd2Card::init(uint8_t sckRateID, uint8_t chipSelectPin) {
  errorCode_ = inBlock_ = partialBlockRead_ = type_ = 0;
  chipSelectPin_ = chipSelectPin;
  // 16-bit init start time allows over a minute
  uint16_t t0 = (uint16_t)millis();
  uint32_t arg;

  // set pin modes
  pinMode(chipSelectPin_, OUTPUT);
  digitalWrite(chipSelectPin_, HIGH);
#ifndef USE_SPI_LIB
  pinMode(SPI_MISO_PIN, INPUT);
  pinMode(SPI_MOSI_PIN, OUTPUT);
  pinMode(SPI_SCK_PIN, OUTPUT);
#endif

#ifndef SOFTWARE_SPI
#ifndef USE_SPI_LIB
  // SS must be in output mode even it is not chip select
  pinMode(SS_PIN, OUTPUT);
  digitalWrite(SS_PIN, HIGH); // disable any SPI device using hardware SS pin
  // Enable SPI, Master, clock rate f_osc/128
  SPCR = (1 << SPE) | (1 << MSTR) | (1 << SPR1) | (1 << SPR0);
  // clear double speed
  SPSR &= ~(1 << SPI2X);
#else // USE_SPI_LIB
  SPI.begin();
  settings = SPISettings(250000, MSBFIRST, SPI_MODE0);
#endif // USE_SPI_LIB
#endif // SOFTWARE_SPI

  // must supply min of 74 clock cycles with CS high.

...

#ifndef SOFTWARE_SPI
  return setSckRate(sckRateID);
#else  // SOFTWARE_SPI
  return true;
#endif  // SOFTWARE_SPI

 fail:
  chipSelectHigh();
  return false;
}

Which you can see calls setSckRate(sckRateID); at the end...

Which is this in the same file:

uint8_t Sd2Card::setSckRate(uint8_t sckRateID) {
  if (sckRateID > 6) {
    error(SD_CARD_ERROR_SCK_RATE);
    return false;
  }
#ifndef USE_SPI_LIB
  // see avr processor datasheet for SPI register bit definitions
  if ((sckRateID & 1) || sckRateID == 6) {
    SPSR &= ~(1 << SPI2X);
  } else {
    SPSR |= (1 << SPI2X);
  }
  SPCR &= ~((1 <<SPR1) | (1 << SPR0));
  SPCR |= (sckRateID & 4 ? (1 << SPR1) : 0)
    | (sckRateID & 2 ? (1 << SPR0) : 0);
#else // USE_SPI_LIB
  switch (sckRateID) {
    case 0:  settings = SPISettings(25000000, MSBFIRST, SPI_MODE0); break;
    case 1:  settings = SPISettings(4000000, MSBFIRST, SPI_MODE0); break;
    case 2:  settings = SPISettings(2000000, MSBFIRST, SPI_MODE0); break;
    case 3:  settings = SPISettings(1000000, MSBFIRST, SPI_MODE0); break;
    case 4:  settings = SPISettings(500000, MSBFIRST, SPI_MODE0); break;
    case 5:  settings = SPISettings(250000, MSBFIRST, SPI_MODE0); break;
    default: settings = SPISettings(125000, MSBFIRST, SPI_MODE0);
  }
#endif // USE_SPI_LIB
  return true;
}

But where does sckRateID come from?

That comes from SD.cpp:

boolean SDClass::begin(uint8_t csPin) {
  /*

    Performs the initialisation required by the sdfatlib library.

    Return true if initialization succeeds, false otherwise.

   */
  return card.init(SPI_FULL_SPEED, csPin) && // *** MODIFIED from SPI_HALF_SPEED ***
         volume.init(card) &&
         root.openRoot(volume);
}

Which you see needed to be modified to pass SPI_FULL_SPEED. Which is in s2card.h, and = 0.

So we can see that at the end of the SD card initialization, the speed is set to:

case 0:  settings = SPISettings(25000000, MSBFIRST, SPI_MODE0); break;

And before that when the card is initializing it's set to:

settings = SPISettings(250000, MSBFIRST, SPI_MODE0);

Which it needs to be for the card to work. So I'm not sure how your code could override either of those settings and if it can, then it could potentially break card reading on cards which have to have the slow initialization which is part of the SD card standard.

I did in fact modified the sd2card.cpp, not .h, sorry.

I will go back to the initial files and only modify the SD.begin function. I was not aware of a special slower initialisation of SD card. It works at full speed though.

I'm also working with IDE 1.6.6 and samd core 1.6.2.

Dirk67:
did you know this thoughts about SPI with DMA ?
SPI write with DMA - Arduino Zero - Arduino Forum

GitHub - manitou48/ZERO
ZERO/SPIdma.ino at master · manitou48/ZERO · GitHub

Dirk67, thanks to your code, I have been able to achieve 10.15FPS for Raw picture from SD card to Oled and 27.55FPS for Raw array in program memory to Oled. I used the spiwrite(...) function only, and there is probably the possibility to add the spiread(...) function in the SD library as well. I will investigate.

Dirk67, thanks to your code...

the code is from mantoui, to be correct ... :slight_smile:

Thanks to both of you then ! :slight_smile:

I've applied some more optimizations. It's not as fast as the DMA one, but it has more of a chance of making it into the official libraries.

http://rabidprototypes.com/wp-content/uploads/2015/11/pixel_speedtest.zip

The main thing I did, besides all the other optimizations we discussed earlier, is I added a write() method to the SPI class. Now instead of just having transfer() which writes and returns a byte you have an inline method which writes a byte and ignores the received data.

This greatly sped up the portion of my demo where I draw lots of colored rectangles, because in my fastrect() function I have to transfer two bytes for each pixel.

I could have, I suppose, stored the bytes in an integer and then called the transfer function I wrote that transfers a buffer and discards the returned data, but that would have resulted in a bunch of unnecessary overhead for passing the byte count, doing the loop, etc. Sometimes you just need to write a byte or two, and this is the fastest way of doing it.

Of course this may change the behavior of the write function in the sercom library in some way. But it seems to work just fine with the SD card reading and such.

void SERCOM::writeDataSPI(uint8_t data)
{

/*
	while( sercom->SPI.INTFLAG.bit.DRE == 0 )
  {
    // Waiting Data Registry Empty
  }

  sercom->SPI.DATA.bit.DATA = data; // Writing data into Data register

  while( sercom->SPI.INTFLAG.bit.TXC == 0 || sercom->SPI.INTFLAG.bit.DRE == 0 )
  {
    // Waiting Complete Transmission
  }
*/

	sercom->SPI.DATA.bit.DATA = data; // Initiate byte transfer.
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.

	// Is RXC cleared when writing data to the transfer buffer?

}

uint16_t SERCOM::readDataSPI()
{
  while( sercom->SPI.INTFLAG.bit.DRE == 0 || sercom->SPI.INTFLAG.bit.RXC == 0 )
  {
    // Waiting Complete Reception
  }

  return sercom->SPI.DATA.bit.DATA;  // Reading data
}

Looking at this again now, I see another optimization I might be able to do. I see no reason why, if I have arranged the write function to wait until the last bit of the returned byte shows up, that I cannot change the read function to simply return the contents of the DATA register, and skip those checks above.

With the optimized block transfer functions in place, this probably won't speed up the SD card reading noticeably, but there's no point in leaving unnecessary checks in there.

On the other hand, the read method may not even be being called any more, because the transfer method now calls an optimized transfer method in the sercom library. But maybe there's some hidden calls somewhere in the SD lib.

[edit]

Yeah, I just tried commenting out those checks and I don't see a change in speed, so I don't think the read method is being called any more.

@scswift: I tried to compile your demo (on the Pixel), but I am getting compiling errors - it seems it cannot find the new SERCOM functions. I added SERCOM into my libraries as I did with SPI and all the others in your zip file, but while the compiler recognizes the replacement for the SPI library, it obviously doesn't for the SERCOM library. What do I have to do in order to fix this?
Thanks a lot,
Willi
oe1wkl

A few related updates on this topic:

  1. Pull request #180 has been merged and included since the 1.6.9 SAMD core release. This improves the performance of the SPI.transfer(...) function

  2. A new public facing API has been added to the SD library to select the SPI frequency (see pull request #25): SD.begin(spiFrequency, csPin)

Both of these changes increase SD card read performance.

Do we have a schematic for this ?

Is it the same Wiring as SdCard to the UNO ?