Go Down

Topic: Faster SPI on the Zero? (Read 8431 times) previous topic - next topic

scswift

Hi Shawn,

Good work for these optimizations !

Also, even if I think that you found it out, there is a delayMicroseconds(1) in the original Adafruit library for the Oled display, that could be removed without causing any issue. You can then simply replace the "spitransfer" calls by "SPI.transfer".
Heh, the delayMicroseconds() was the first thing to go.  That also only would affect the BMP drawing code, if I were in fact using drawPixel as they did in the original example, which was also one of the first things I got rid of.

Here's the code I'm using for the raw image drawing:

Code: [Select]

// This function blits a full screen, raw, 16 bit 565 RGB color image to the display from the SD card.
void rawFullSPI(char *filename) {
 
  File f;
  uint8_t buffer[512]; // Buffer two full rows at a time - 512 bytes.  This is the same the size of an SD card block.
  uint8_t *b, *bmax; // Pointers into the buffer.
 
  // Specify size of region to be drawn.

    tft.writeCommand(SSD1351_CMD_SETCOLUMN);
    tft.writeData(0);
    tft.writeData(127);
   
    tft.writeCommand(SSD1351_CMD_SETROW);
    tft.writeData(0);
    tft.writeData(127);


  // Draw bitmap.
   
    tft.writeCommand(SSD1351_CMD_WRITERAM); // Tell display we're going to send it image data in a moment. (Not sure if necessary.)
    digitalWrite(my_dc, HIGH); // Set DATA/COMMAND pin to DATA.   
   
    f = SD.open(filename); // Open file for reading.
    //f.read(buffer, 512);
   
    for (byte row = 0; row < 128; row+=2) { // 2.79FPS without SPI_FULL_SPEED in SPI.begin, 3.75FPS with it.
     
      f.read(buffer, 512); // Read the next two rows from the card into the image buffer.
      // 2.79FPS when doing this read. 6.42 FPS when not doing this read.  (2.3x as fast)   
      // With new block transfer optimization, 7.15 FPS when doing this read, and 20.18 FPS when not doing this read.
      // The reason the screen goes white when doing this is because the buffer we're using to transmit is also the receive buffer, so it is overwritten on the first go round.
     
      /*
      b = buffer;
      bmax = b+512; // Calcuate when we should stop and read the next two rows.   
     
      digitalWrite(my_cs, LOW); // Tell display to pay attention to the incoming data.
             
      while (b < bmax) { // Write both rows to the display.
        SPI.transfer(*b); // Write low byte.
        b++;
      }
      */

      digitalWrite(my_cs, LOW); // Tell display to pay attention to the incoming data.

      SPI.beginTransaction(SPISettings(12000000, MSBFIRST, SPI_MODE0)); // Adding this boosts speed to over 9FPS when not reading from SD card.  So reading from SD card is 2x as slow as this?
      SPI.transfer(buffer, 512); 
      SPI.endTransaction();
                 
      digitalWrite(my_cs, HIGH); // Tell display we're done talking to it for now, so the next SD read doesn't corrupt the screen.
     
    } 
   
    f.close(); // Close the file.
       
}



Quote
I've been able to go from 16 FPS to 23 FPS while running my 3D Vector demo with those simple modification coupled with SPI.setClockDIvider(4).

Could you please provide you're source code so I can test it on my similar setup? :)
I'll zip everything up as it stands right now so you can try it out for yourself and post that shortly.

As for SPI.setClockDivider(4) I found that no longer works reliably, because for example, the SD library changes the settings with SPI.beginTransaction() and they aren't reverted after.  You're supposed to use beginTransaction() before you start sending data now so you can use multiple SPI devices with different data rates and such.

scswift

#16
Nov 09, 2015, 09:49 am Last Edit: Nov 09, 2015, 09:54 am by scswift
Here ya go, all the modified source files, plus the demo code, the bitmaps, and the raw images:
http://rabidprototypes.com/wp-content/uploads/2015/11/pixel_speedtest.zip

Note that I'm not really trying to optimize the BMP drawing code here, just the RAW images.  There's a ton of optimization that could be done to that BMP reader I'm sure, but what I'm concerned with is getting SPI block transfers and SD card reading as fast as possible.

scswift

I found more more small optimization, which increases the speed of RAW image reading to 7.36 FPS, but this optimization breaks the correct behavior of the SPI library, because instead of the buffer you passed to the transfer function having the received data in it, I simply discard the data:

Code: [Select]

// This function transmits a buffer but discards the recieved data.
void SERCOM::transferDataSPI(void *buf, uint32_t count)
{
  uint8_t *buffer = reinterpret_cast<uint8_t *>(buf);

  while(count-- > 0) {
//sercom->SPI.DATA.bit.DATA = *buffer; // Initiate byte transfer.
     sercom->SPI.DATA.bit.DATA = *buffer++; // Initiate byte transfer.
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     //*buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; // Read received byte, then increment pointer into buffer.
  }

}


Also I checked whether that & 0xFF is needed.  Seems it is not, but it also doesn't affect the speed if removed.

I think I'm gonna try to modify the loop so that the byte transfer is initiated just before the jump instead of jumping after reading the received byte.  The processor is probably fast enough that this optimization won't have much if any effect, but it helped a lot on the AVR.

AloyseTech

Thanks for the code, I'll try that ASAP.

Have you tried to replace each "spiwrite(c)" by "SPI.transfer(c)" in SSD1351.cpp ? Since we use hardware SPI, the test on _sid is not necessary I guess and so we always use the SPI.transfer method anyway. It will maybe not speed up the bliting from SD but could help speed things up with standard graphical function.

Off topic : what tool do you use to convert a BMP or any image into raw 16bit file?

scswift

Huh, so by rearranging the loop so the jump happens while the byte transfer is happening, I get that same speed boost to 7.33 FPS without discarding the received data:

Code: [Select]

void SERCOM::transferDataSPI(void *buf, uint32_t count)
{
  uint8_t *buffer = reinterpret_cast<uint8_t *>(buf);

/*
  while(count-- > 0) {
 sercom->SPI.DATA.bit.DATA = *buffer; // Initiate byte transfer.
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     *buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; // Read received byte, then increment pointer into buffer.
  }
*/
  
  sercom->SPI.DATA.bit.DATA = *buffer; // Initiate byte transfer.

  while(count-- > 1) {
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     *buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; // Read received byte, then increment pointer into buffer.
     sercom->SPI.DATA.bit.DATA = *buffer; // Initiate byte transfer.
  }

  while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
  *buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; ; // Read received byte, then increment pointer into buffer.

}



And if I discard the received bytes, which again, is not compatible with how the SPI.transfer() function is supposed to work I get 7.58 FPS:

Code: [Select]

// This function transmits a buffer but discards the recieved data.
void SERCOM::transferDataSPI(void *buf, uint32_t count)
{
  uint8_t *buffer = reinterpret_cast<uint8_t *>(buf);

/*
  while(count-- > 0) {
 sercom->SPI.DATA.bit.DATA = *buffer; // Initiate byte transfer.
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     *buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; // Read received byte, then increment pointer into buffer.
  }
*/
  
  sercom->SPI.DATA.bit.DATA = *buffer++; // Initiate byte transfer.

  while(count-- > 1) {
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     //*buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; // Read received byte, then increment pointer into buffer.
     sercom->SPI.DATA.bit.DATA = *buffer++; // Initiate byte transfer.
  }

  while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
  //*buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; ; // Read received byte, then increment pointer into buffer.

}



Of course, part of the reason I'm not seeing as big a speed increase as I expect is the SD card reading is slower.  When I remove that from the equation, I get 24.23 FPS, which is a full 4 FPS faster than the previous best.  But it's also still half as fast as what I should be getting if my calculations are correct, so I'm not sure what's going on there.

scswift

Thanks for the code, I'll try that ASAP.

Have you tried to replace each "spiwrite(c)" by "SPI.transfer(c)" in SSD1351.cpp ? Since we use hardware SPI, the test on _sid is not necessary I guess and so we always use the SPI.transfer method anyway. It will maybe not speed up the bliting from SD but could help speed things up with standard graphical function.

Off topic : what tool do you use to convert a BMP or any image into raw 16bit file?
I have not modified the Adafruit library beyond removing that delayMicroseconds() line, because right now I'm just trying to speed up the SPI and SD card reading.  I'm sure that library could be sped up a great deal, but I don't need that to be super fast at the moment and until I know I've got the SPI library running at full tilt, there's no point in optimizing elsewhere anyway.

The BMP converter tool I used is here:
http://elm-chan.org/fsw_e.html

scswift

After updating the other two transfer functions, used by the SD lib, to have the same loop optimization I got another slight speed increase for RAW image blitting to 7.75 FPS:

Code: [Select]

// This function transmits a buffer but discards the recieved data.
void SERCOM::transferDataSPI(void *buf, uint32_t count)
{
  uint8_t *buffer = reinterpret_cast<uint8_t *>(buf);

/*
  while(count-- > 0) {
sercom->SPI.DATA.bit.DATA = *buffer; // Initiate byte transfer.
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     *buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; // Read received byte, then increment pointer into buffer.
  }
*/
 
  sercom->SPI.DATA.bit.DATA = *buffer++; // Initiate byte transfer.

  while(count-- > 1) {
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     //*buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; // Read received byte, then increment pointer into buffer.
     sercom->SPI.DATA.bit.DATA = *buffer++; // Initiate byte transfer.
  }

  while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
  //*buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; ; // Read received byte, then increment pointer into buffer.

}

// This function transmits bytes and returns the received data in a buffer.
void SERCOM::transferDataSPI(void *buf, uint32_t count, uint8_t transmit)
{
  uint8_t *buffer = reinterpret_cast<uint8_t *>(buf);
 
  sercom->SPI.DATA.bit.DATA = transmit; // Initiate byte transfer.

  while(count-- > 1) {
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     *buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; // Read received byte, then increment pointer into buffer.
     sercom->SPI.DATA.bit.DATA = transmit; // Initiate byte transfer.
  }

  while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
  *buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; ; // Read received byte, then increment pointer into buffer.

}

// This function transmits bytes but discards the recieved data.
void SERCOM::transferDataSPI(uint32_t count, uint8_t transmit)
{

  sercom->SPI.DATA.bit.DATA = transmit; // Initiate byte transfer.

  while(count-- > 1) {
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     sercom->SPI.DATA.bit.DATA = transmit; // Initiate byte transfer.
  }

  while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.

}


Again, the first function is not compatible with the SPI library, it does not return any data in the buffer you pass, it only sends data.

AloyseTech

With the source code you uploaded on your website, I have 1.05 FPS for BMP and 5.06 FPS for raw BIN files.

scswift

Well, either you missed changing something, put some file in the wrong folder, or the SD card you're using is slow, perhaps?  I gave you the most up to date version I had.

AloyseTech

#24
Nov 09, 2015, 06:29 pm Last Edit: Nov 09, 2015, 06:31 pm by AloyseTech
My uSD card is a SanDisk 1GB, there's nothing else marked on it and I guess it is in fact pretty old. Maybe this is the issue.

I've put sercoms files in Arduino15/.../core, spi in arduino15/.../libraries/SPI, and Sd in arduinoApp/.../libraries/SD/src

I've tried with both your version and mine of SSD1351 but there is no change.

How do you try the code without SD? You take the .bin file as a const int16 array right in the code ?

scswift

#25
Nov 09, 2015, 07:06 pm Last Edit: Nov 09, 2015, 07:07 pm by scswift
I modify the RAW function like so, so it just reads one block from the card:
(Note the two f.read(buffer, 512) lines.)

Code: [Select]

// This function blits a full screen, raw, 16 bit 565 RGB color image to the display from the SD card.
void rawFullSPI(char *filename) {
 
  File f;
  uint8_t buffer[512]; // Buffer two full rows at a time - 512 bytes.  This is the same the size of an SD card block.
  uint8_t *b, *bmax; // Pointers into the buffer.
 
  // Specify size of region to be drawn.

    tft.writeCommand(SSD1351_CMD_SETCOLUMN);
    tft.writeData(0);
    tft.writeData(127);
   
    tft.writeCommand(SSD1351_CMD_SETROW);
    tft.writeData(0);
    tft.writeData(127);


  // Draw bitmap.
   
    tft.writeCommand(SSD1351_CMD_WRITERAM); // Tell display we're going to send it image data in a moment. (Not sure if necessary.)
    digitalWrite(my_dc, HIGH); // Set DATA/COMMAND pin to DATA.   
   
    f = SD.open(filename); // Open file for reading.
    f.read(buffer, 512);

    for (byte row = 0; row < 128; row+=2) { // 2.79FPS without SPI_FULL_SPEED in SPI.begin, 3.75FPS with it.
     
      //f.read(buffer, 512); // Read the next two rows from the card into the image buffer.
      // 2.79FPS when doing this read. 6.42 FPS when not doing this read.  (2.3x as fast)   
      // With new block transfer optimization, 7.15 FPS when doing this read, and 20.18 FPS when not doing this read.
      // The reason the screen goes white when doing this is because the buffer we're using to transmit is also the receive buffer, so it is overwritten on the first go round.
     
      /*
      b = buffer;
      bmax = b+512; // Calcuate when we should stop and read the next two rows.   
     
      digitalWrite(my_cs, LOW); // Tell display to pay attention to the incoming data.
             
      while (b < bmax) { // Write both rows to the display.
        SPI.transfer(*b); // Write low byte.
        b++;
      }
      */

      // Moving all the extra stuff here  outside the for loop and getting rid of SD reads gives 24.75 FPS, which is still slower than expected. 24.23 FPS with these in the loop.
      // Skipping the file opening and closing for three different images by using a loop inside this function does not improve performance much.  Still 24.71 FPS.
      // Unrolling the transfer loop didn't seem to improve things at all. 
         
      digitalWrite(my_cs, LOW); // Tell display to pay attention to the incoming data.

      //noInterrupts(); // 7.65 -> 7.70 FPS
      SPI.beginTransaction(SPISettings(12000000, MSBFIRST, SPI_MODE0));
      SPI.transfer(buffer, 512);
      SPI.endTransaction();
      //interrupts();
                 
      digitalWrite(my_cs, HIGH); // Tell display we're done talking to it for now, so the next SD read doesn't corrupt the screen.
     
    } 
 
    f.close(); // Close the file.
       
}


With that change I get over 24fps, but of course the images don't display properly.  They don't need to though, I'm just testing how fast I can output a full screen of data.

But even though 24fps seems fast, it's only half of what it should be capable of and I can't for the life of me figure out why.  I've checked everything I can think of.  I even went so far as to make sure the sercom lib was calculating the baud rate correctly.  And there is no 2x multiplier bit for the SPI like there is on the AVR so that can't be set wrong either. 

AloyseTech

#26
Nov 09, 2015, 07:46 pm Last Edit: Nov 09, 2015, 08:05 pm by AloyseTech
I think you uploaded the wrong SD files. I added static SPISettings settings(12000000, MSBFIRST, SPI_MODE0); at the beginning of the file and now I'm having 1.57fps and 8.96fps for BMP and RAW respectively, as well as 22.4 for raw without updating buffer (no f.read(buffer,512) in loop)

scswift

At the beginning of which file?

And when you say static SPISettings do you mean you added the static keyword in front?  Why?

scswift

#28
Nov 09, 2015, 11:58 pm Last Edit: Nov 09, 2015, 11:59 pm by scswift
Ah, my bad, I think I did forget to include one file, because I forgot I'd modified it. 

in SD.CPP:

Code: [Select]

boolean SDClass::begin(uint8_t csPin) {
  /*

    Performs the initialisation required by the sdfatlib library.

    Return true if initialization succeeds, false otherwise.

   */
  return card.init(SPI_FULL_SPEED, csPin) && // *** MODIFIED from SPI_HALF_SPEED ***
         volume.init(card) &&
         root.openRoot(volume);
}



Of course in my defense, having the SPI library default to half speed is really dumb.  I mean, I know they did it because some people have long wires attached to their SD cards or crappy SD shields with resistors instead of a level shifter, but come on.  There's no indication unless you look under the hood that the library is slowing you down, no way short of modifying the library itself to fix it.  And worst of all every time you reinstall the IDE it's going to go back to being the way it was, and the bug search for the cause will begin anew because it's easy to forget you changed that.  Like I just did.

AloyseTech

At the beginning of sd2card.h I modified

Code: [Select]
static SPISettings settings;
to
Code: [Select]
static SPISettings settings(12000000, MSBFIRST, SPI_MODE0);

and I changed the default speed to SPI_FULL_SPEED like you. It's strange that our results are diferents...

Go Up