Arduino Forum

Products => Arduino Due => Topic started by: weird_dave on Nov 23, 2016, 01:24 pm

Title: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Nov 23, 2016, 01:24 pm
I did a quick test to see how long it takes to send a UDP packet of 100 bytes. I've changed the SPI speed to 28MHz by editing w5500.cpp:
Code: [Select]
SPISettings wiznet_SPI_settings(28000000, MSBFIRST, SPI_MODE0);

I did a quick bit of code to send set a pin high before the transfer then low after and added a timer to repeat after 1ms

Code: [Select]

const int EthTxBuf_Size = 100;
byte EthTxBuf[EthTxBuf_Size];

void loop()
{
  const unsigned long looptime = 1000;
  current_micros = micros();
  digitalWrite(Testpin, HIGH);
  Udp.beginPacket(Remote_IP, Remote_Port);
  Udp.write(EthTxBuf, EthTxBuf_Size);
  Udp.endPacket();
  digitalWrite(Testpin, LOW);
  while ((micros()- current_micros)<looptime)
  {
   
  }
}


The Testpin is high for almost 400us. Looking at the SPI clock, I can see a fair amount of dead time between each transfer. I would expect some software overhead, but during the portion where is is just sending the 100 bytes to the W5500, I am seeing a byte sent in 285ns (which is right for 28MHz) and the next one 1464ns later (one every 1749ns), that means the routine sending the bytes is only doing so for 16% of the time, what's it doing with the other 123 clock cycles (84MHz)?
Is there a way to improve this situation?
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Nov 24, 2016, 06:40 pm
Are you using Ethernet2 from Adafruit, or from Arduino.org, or some other source?  They're not all the same code, and small details matter greatly for performance.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Nov 24, 2016, 07:21 pm
I used the menu within the IDE (1.6.13 .cc not .org) to select it from the library manager which then downloaded it.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Nov 26, 2016, 03:43 am


sketch_nov25a: In function 'void loop()':
sketch_nov25a:7: error: 'current_micros' was not declared in this scope
   current_micros = micros();
   ^
sketch_nov25a:8: error: 'Testpin' was not declared in this scope
   digitalWrite(Testpin, HIGH);
                ^
sketch_nov25a:9: error: 'Udp' was not declared in this scope
   Udp.beginPacket(Remote_IP, Remote_Port);
   ^
sketch_nov25a:9: error: 'Remote_IP' was not declared in this scope
   Udp.beginPacket(Remote_IP, Remote_Port);
                   ^
sketch_nov25a:9: error: 'Remote_Port' was not declared in this scope
   Udp.beginPacket(Remote_IP, Remote_Port);
                              ^
'current_micros' was not declared in this scope
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Nov 26, 2016, 03:45 am
If you had posted a complete program, I probably would have run it on a couple different boards (I have nearly all Arduino boards here).  But the reality is I'm really not feeling like filling in the missing parts, just to figure out what Due performs badly.

That may or may not have led to anything useful, but if you couldn't be bothered to give a complete program that lets me start looking without guesswork, why should I go to the trouble.  At least you can see I did copy your code into an Arduino window and click Verify, but that's all.

Maybe someone else will take a stronger interest to look into what's wrong?
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Nov 28, 2016, 09:27 am
I certainly can post a complete program, it's not that I couldn't be bothered I just didn't realise it was the norm to post the entire thing. How much code is too much?

Code: [Select]
#include <Ethernet2.h>

// Edit the SPI speed in C:\Users\[USERNAME]\Documents\Arduino\libraries\Ethernet2\src\utility\w5500.cpp
//  to 28000000, 28MHz, line 25: (copy/paste the following is easiest)
//  SPISettings wiznet_SPI_settings(28000000, MSBFIRST, SPI_MODE0);

byte mac[] = { 0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED };
const IPAddress MyIP( 192, 168, 0, 10 );
const IPAddress Remote_IP (192, 168, 0, 1);
const unsigned int Remote_Port = 12345;
const unsigned int Local_Port = 12345;
EthernetUDP Udp;

char packetBuffer[UDP_TX_PACKET_MAX_SIZE];

unsigned long current_micros;

const int Testpin = 52;
void setup()
{
  Ethernet.begin(mac,MyIP);
  Udp.begin(Local_Port);
  pinMode(Testpin, OUTPUT);
}

void loop()
{
  const unsigned long looptime = 1000;
  const int EthTxBuf_Size = 100;
  byte EthTxBuf[EthTxBuf_Size];
  digitalWrite(Testpin, HIGH);
  current_micros = micros();
  Udp.beginPacket(Remote_IP, Remote_Port);
  Udp.write(EthTxBuf, EthTxBuf_Size);
  Udp.endPacket();
  digitalWrite(Testpin, LOW);
  while ((micros()- current_micros)<looptime)
  {
   
  }
}
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: westfw on Nov 30, 2016, 11:55 am
Quote
what's it doing with the other 123 clock cycles (84MHz)?
The UDP write code isn't what It'd call "tight" (it calls SPI::transfer(<one byte>) N types instead of SPI::transfer(<N bytes>), for example (but I guess that's so it doesn't overwrite with "input")), but it doesn't look like it should be 123 clocks worth...


Code: [Select]
Udp.write(buf, size)
  Socket::bufferData(... buf, size)
    W5500Class::send_data_processing_offset(... buf, len)
       W5500Class::write(... buf, len)
         for (len)
           SPI.transfer(buf[i])
      byte SPIClass::transfer(byte _pin, uint8_t _data, SPITransferMode _mode) {
      uint32_t ch = BOARD_PIN_TO_SPI_CHANNEL(_pin);
819d0: 290a      cmp r1, #10
      // SPI_CSR_DLYBCT(1) keeps CS enabled for 32 MCLK after a completed
      // transfer. Some device needs that for working properly.
      SPI_ConfigureNPCS(spi, ch, mode[ch] | SPI_CSR_SCBR(divider[ch]) | SPI_CSR_DLYBCT(1));
      }

      byte SPIClass::transfer(byte _pin, uint8_t _data, SPITransferMode _mode) {
819d2: b430      push {r4, r5}
      uint32_t ch = BOARD_PIN_TO_SPI_CHANNEL(_pin);
819d4: d027      beq.n 81a26 <SPIClass::transfer(unsigned char, unsigned char, SPITransferMode)+0x56>
819d6: 2904      cmp r1, #4
819d8: d029      beq.n 81a2e <SPIClass::transfer(unsigned char, unsigned char, SPITransferMode)+0x5e>
819da: 2934      cmp r1, #52 ; 0x34
819dc: bf14      ite ne
819de: f44f 25e0 movne.w r5, #458752 ; 0x70000
819e2: f44f 2530 moveq.w r5, #720896 ; 0xb0000
819e6: bf14      ite ne
819e8: 2103      movne r1, #3
819ea: 2102      moveq r1, #2
      // Reverse bit order
      if (bitOrder[ch] == LSBFIRST)
819ec: 4401      add r1, r0
819ee: 7a0c      ldrb r4, [r1, #8]
819f0: b91c      cbnz r4, 819fa <SPIClass::transfer(unsigned char, unsigned char, SPITransferMode)+0x2a>
       */
      __attribute__( ( always_inline ) ) static __INLINE uint32_t __RBIT(uint32_t value)
      {
uint32_t result;

__ASM volatile ("rbit %0, %1" : "=r" (result) : "r" (value) );
819f2: fa92 f2a2 rbit r2, r2
       */
      __attribute__( ( always_inline ) ) static __INLINE uint32_t __REV(uint32_t value)
      {
uint32_t result;

__ASM volatile ("rev %0, %1" : "=r" (result) : "r" (value) );
819f6: ba12      rev r2, r2
      _data = __REV(__RBIT(_data));
819f8: b2d2      uxtb r2, r2
      uint32_t d = _data | SPI_PCS(ch);
      if (_mode == SPI_LAST)
819fa: 2b01      cmp r3, #1
      byte SPIClass::transfer(byte _pin, uint8_t _data, SPITransferMode _mode) {
      uint32_t ch = BOARD_PIN_TO_SPI_CHANNEL(_pin);
      // Reverse bit order
      if (bitOrder[ch] == LSBFIRST)
      _data = __REV(__RBIT(_data));
      uint32_t d = _data | SPI_PCS(ch);
819fc: ea42 0205 orr.w r2, r2, r5
81a00: 6803      ldr r3, [r0, #0]
      if (_mode == SPI_LAST)
      d |= SPI_TDR_LASTXFER;
81a02: bf08      it eq
81a04: f042 7280 orreq.w r2, r2, #16777216 ; 0x1000000

      // SPI_Write(spi, _channel, _data);
      while ((spi->SPI_SR & SPI_SR_TDRE) == 0)
81a08: 6919      ldr r1, [r3, #16]
81a0a: 0789      lsls r1, r1, #30
81a0c: d5fc      bpl.n 81a08 <SPIClass::transfer(unsigned char, unsigned char, SPITransferMode)+0x38>
      ;
      spi->SPI_TDR = d;
81a0e: 60da      str r2, [r3, #12]

      // return SPI_Read(spi);
      while ((spi->SPI_SR & SPI_SR_RDRF) == 0)
81a10: 691a      ldr r2, [r3, #16]
81a12: 07d2      lsls r2, r2, #31
81a14: d5fc      bpl.n 81a10 <SPIClass::transfer(unsigned char, unsigned char, SPITransferMode)+0x40>
      ;
      d = spi->SPI_RDR;
81a16: 6898      ldr r0, [r3, #8]
      // Reverse bit order
      if (bitOrder[ch] == LSBFIRST)
81a18: b914      cbnz r4, 81a20 <SPIClass::transfer(unsigned char, unsigned char, SPITransferMode)+0x50>
       */
      __attribute__( ( always_inline ) ) static __INLINE uint32_t __RBIT(uint32_t value)
      {
uint32_t result;

__ASM volatile ("rbit %0, %1" : "=r" (result) : "r" (value) );
81a1a: fa90 f0a0 rbit r0, r0
       */
      __attribute__( ( always_inline ) ) static __INLINE uint32_t __REV(uint32_t value)
      {
uint32_t result;

__ASM volatile ("rev %0, %1" : "=r" (result) : "r" (value) );
81a1e: ba00      rev r0, r0
      d = __REV(__RBIT(d));
      return d & 0xFF;
      }
81a20: b2c0      uxtb r0, r0
81a22: bc30      pop {r4, r5}
81a24: 4770      bx lr
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Nov 30, 2016, 12:55 pm
I tried the following snippet of code and it has the same deadtime:

Code: [Select]
SPI.beginTransaction(SPISettings(28000000, MSBFIRST, SPI_MODE0));
digitalWrite(FPGA_SPIpin, LOW);
for (int i = 0; i<EthTxBuf_Size; i++)
{
  EthRxBuf[i] = SPI.transfer (byte (EthTxBuf[i]));
}
digitalWrite(FPGA_SPIpin, HIGH);
SPI.endTransaction();


I've attached a scope plot showing the SPI clock (it doesn't resolve particularly well, but it is obvious), there's oodles of deadtime!
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: westfw on Nov 30, 2016, 02:21 pm
try editing SPI.cpp and changing the occurances of  SPI_CSR_DLYBCT(1) to  SPI_CSR_DLYBCT(0)

I believe the (1) will cause 32 clocks between transfers, and I believe that the comment about CS is wrong since CS is manipulated manually (SPI_CSR_CSAAT - although I'm not sure why that's missing from setClockDivider()

Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Nov 30, 2016, 03:20 pm
Doesn't seem to have made any difference, it should have knocked off 381ns (3/4 of a graticule division on the scope plot) but it hasn't.
The location of SPI.cpp I edited was:
C:\Users\[USERNAME]\AppData\Local\Arduino15\packages\arduino\hardware\sam\1.6.9\libraries\SPI\src\SPI.cpp

The scope plot is set to trigger on the falling edge of the /CS pin, there's about 1.2us before the first SPI transaction, I wonder if all (most) of the delay is before the transaction rather than after?

It looks like this may be a suspect:
Code: [Select]
uint32_t ch = BOARD_PIN_TO_SPI_CHANNEL(_pin);
I have been unable to locate this function to see what it is doing.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: westfw on Dec 01, 2016, 05:09 am
Quote
I have been unable to locate [BOARD_PIN_TO_SPI_CHANNEL] to see what it is doing
It's a macro defined in variant.h:
Code: [Select]
#define BOARD_PIN_TO_SPI_CHANNEL(x) \
    (x==BOARD_SPI_SS0 ? 0 : \
    (x==BOARD_SPI_SS1 ? 1 : \
    (x==BOARD_SPI_SS2 ? 2 : 3)))


I can get faster using the multi-byte SPI transfer, but even that leaves almost a byte-time worth of gap between bytes, which is really strange.   There's a byte of buffering (a full byte transmission time get get the next byte ready); it should be able to output back-to-back, I would think.

Current test program:
Code: [Select]
#include <SPI.h>

Spi *myspi;

void setup()
{
  pinMode(13, OUTPUT);
  SPI.begin();
  Serial.begin(115200);
  myspi = SPI.spi;
}

#define DATSIZE 128

byte data[DATSIZE];

void loop() {
  Serial.println("Begin SPI test");
  SPI.beginTransaction(SPISettings(28000000, MSBFIRST, SPI_MODE0));
  digitalWrite(13, LOW);
#if 1
  for (byte i = 0; i < DATSIZE; i++)
  {
    data[i] = i;
  }
  SPI.transfer(data, sizeof(data), SPI_CONTINUE);
#else
  for (byte i = 0; i < DATSIZE; i++)
  {
    SPI.transfer (i, SPI_CONTINUE);
  }
#endif
  digitalWrite(13, HIGH);
  delay(100);
  SPI.endTransaction();
  for (int j = 0; j < 4; j++) {
    Serial.println(myspi->SPI_CSR[j], HEX);
  }
  delay(5000);

}
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Dec 01, 2016, 10:34 am
Thanks for telling me where that macro was, I was struggling to find it!

I was considering using the buffer transfer but I couldn't figure out from the documentation if it's the pointer that should be passed such that it overwrites the buffer with incoming data, else how do you receive? (I have wider concerns than just the Ethernet SPI, I want to use it for transfer with another device too, full duplex style).
This thread:
http://forum.arduino.cc/index.php?topic=407288.0 (http://forum.arduino.cc/index.php?topic=407288.0)
Suggests the pointer is passed to the function.
Looking at line 218 of SPI.cpp
Code: [Select]
void SPIClass::transfer(byte _pin, void *_buf, size_t _count, SPITransferMode _mode) {
...skipping to line 222
uint8_t *buffer = (uint8_t *)_buf;
...skipping to line 253
// Save read byte
if (reverse)
r = __REV(__RBIT(r));
*buffer = r;
buffer++;


My Cpp isn't strong but it looks like this will do both sending and receiving. Bit of a shame it overwrites the transmit buffer to do it tho :(
There's quite a bunch of commands in there, which I guess is accounting for the gap between bytes. I'm seeing a gap about a bit longer than a byte transmission time (about 1.4 times).
I'm also seeing what I guess is an interrupt occurring in the middle of the transmission (I've attached a screengrab of it), it's not a huge problem in the grand scheme of things but is annoying :)
(ignore the ends of the waveform, I've got other code running in there too. The first 2 transfers are a transfer16, just for comparison)

If this works, that's an increase in speed of about 3x for a 100 byte SPI transfer, things are starting to look up :)
Now I just need the Ethernet2 library to make use of it :(
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: westfw on Dec 01, 2016, 11:56 am
Quote
it looks like this will do both sending and receiving. Bit of a shame it overwrites the transmit buffer to do it tho
Yes, it does, and it is :-(   It seems that it would be nice if the SPI library had more features.

I've been trying to get some "bare" SPI code to work, so I can figure out whether this is inherent slowness, or some hardware/configuration issue.   But I'm having troubles getting it to work :-(

Ahh, there is goes...  This code produces a nice tight loop, and I'd really expect it to keep that SPI clock going pretty much continuously.  It doesn't :-(  Therefore, something "interesting" is happening!  (I had to modify SPI.h to make SPI.spi a publicly accessible value, but otherwise it has no modifications.)

Code: [Select]
#include <SPI.h>

Spi *myspi;

void setup()
{
  SPI.begin();
  Serial.begin(115200);
  myspi = SPI.spi;
}

void loop() {
  Serial.println("Begin bare metal SPI test");

  SPI.beginTransaction(SPISettings(28000000, MSBFIRST, SPI_MODE0));

  for (byte i = 0; i < 100; i++) {
    while ((myspi->SPI_SR & SPI_SR_TDRE) == 0)
      ; // spin
    myspi->SPI_TDR = SPI_PCS(3) | i;
  }
  SPI.endTransaction();

  Serial.println("Bare metal finished");
  delay(1000);
}



Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Dec 01, 2016, 01:30 pm
Hmmm, My C might not be good enough.
This line
Code: [Select]
myspi->SPI_TDR = SPI_PCS(3) | i;
similar to:
Code: [Select]
spi->SPI_TDR = d | SPI_PCS(ch);
in SPI.cpp, how did you get 3? And why is SPI_PCS(3) bitwise Or'd with the data to be sent? I couldn't work out what SPI_PCS(3) means.

What does the creation of myspi give us? Is this the only way to gain access to SPI_SR etc..?

I just about managed to alter SPI.h to make it work.... (tho I had to run your demo, I couldn't get it to work within my code).

Not much speed difference than the library, it looks like it's the while loop waiting for the clear signal
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: westfw on Dec 02, 2016, 12:38 am
Quote
in SPI.cpp, how did you get 3? And why is SPI_PCS(3) bitwise Or'd with the data to be sent? I couldn't work out what SPI_PCS(3) means.
The SAM3 SPI controller can drive multiple chip-select (SS) pins as part of the SPI transactions, and the Due SPI library allows 4 separate devices with separate configuration, one one "spi port."  PCS is "peripheral Chip Select"
"3" is the default, backward-compatible with the AVR single-SS (I could have used BOARD_PIN_TO_SPI_CHANNEL(BOARD_SPI_DEFAULT_SS), from variant.h, which is what I did manually.)

The write to the SPI data register has data in the low bits and chipselect info in the high bits (apparently the upper bits get copied directly to pins, so they're active-low, which was what was causing my problems. the SPI_PCS macro fixes that.)

myspi is just a pointer to the register set having to do with that SPI port.  By copying it from the library object, I avoided having to figure out whether there was more than one, or which one was used, and I could be sure that I could mix the library code with the bare metal code...

Quote
Not much speed difference than the library
To be honest, I don't know if we'll get any actual speedup at the UDP level.  8 bits every 1750ns (from your first message) is almost 5Mbps, which isn't bad (although the WizNet site says it should do 15Mbps.)  At the moment this is a puzzle in SPI behavior (for me.)

Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: westfw on Dec 02, 2016, 02:06 am
Quote
changing the occurances of  SPI_CSR_DLYBCT(1) to  SPI_CSR_DLYBCT(0)
This does help significantly for the bare-metal SPI case...

With DLYBCT(1):
(http://forum.arduino.cc/index.php?action=dlattach;topic=437243.0;attach=189179)
With DLYBCT set to zero after starttransaction:
(http://forum.arduino.cc/index.php?action=dlattach;topic=437243.0;attach=189181)
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: westfw on Dec 02, 2016, 02:52 am
Ah hah!  I get it!  Since the SPI single-byte "transaction" routines are all "return the read value", they MUST wait for the complete transmission to happen, so you get no benefit from the potential overlap of the transmission with other code.  We might as well be bit-banging :-(
The counted version of write() does a little better, but it's still not taking advantage of both the shift-register AND the TX buffer register.

Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Dec 02, 2016, 09:31 am
This is a longshot, but you might try my optimized Ethernet library.

https://github.com/PaulStoffregen/Ethernet

Hopefully if you just put it into Documents/Arduino/libraries it'll override Arduino's version.  Pay attention to the messages about duplicate libraries and which one the Arduino IDE is really using.

This doesn't fix the slowness of Due's SPI library, but it does eliminate the redundant accesses to those Wiznet index registers.  It also uses transactions at the socket level, rather than needlessly starting and stopping the SPI transaction over and over again at the W5500 read/write level.

I have been mostly testing with TCP, and there are still many unsolved mysteries of slowness.  I'm waiting for delivery of a network tap this weekend before I continue work on this... so I can see what the Wiznet chip is really doing with the packets.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Dec 02, 2016, 10:12 am
It seems there is a SPI_CSR_DLYBCT(1) hiding in SPI.h, I had only changed the ones in SPI.cpp, so that explains it a bit more :)
Having made this change, buffer transfers have improved as well, 100 bytes is now about 48us. It's also knocked 50us off the UDP time (450 down to 400us)
I shall try the wiznet library (this was Ethernet2 from the library manager) and see if there's an improvment.
Currently, I can see it is doing byte by byte transfers for all of it, so there's massive gaps

I notice that the wiznet site says 15Mbps, but the datasheet says 80 for SPI and 15 for the Ethernet link, which makes sense given the wiznet library sets the SPI speed to 42MHz :)

Paul, you posted while I was still writing this post :)
I did try your library yesterday (I posted in this thread: https://forum.arduino.cc/index.php?topic=438559.0 (https://forum.arduino.cc/index.php?topic=438559.0))
I couldn't get the retry and timeout to work, I really need these as I need to 'give up early' and carry on, I'm trying to get my looptime down
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Dec 02, 2016, 12:05 pm
I've tried the Wiznet library (at 28MHz) and it does seem to be faster, the UDP packet is down to about 270us. Clearly the wiznet library isn't using the faster buffer transfer :(

For testing, I've got a 100 byte SPI buffer transfer (not to the W5500) followed by the UDP transfer, just so I can compare on the fly, here's the really odd thing, the wiznet library is causing a minor slowdown in the SPI time! A 100 byte transfer over SPI takes 48.6us with the Ethernet2 library in use and 52us when I use the Wiznet library. It's not a huge amount, but it is very noticeable when you are looking at 10us/div on the scope, it crosses the 5 div boundary! Looking at the waveform, it is a small increase in deadtime between the SPI bursts, very strange!
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Dec 02, 2016, 01:43 pm
I've tidied up my test code so it can be posted here.
It's currently setup to use Pauls library (and wiznet by folder renaming), there are commented out includes at the start for changing to Ethernet2, along with lines 42-45 for the timeout and retry configuration.

The wiznet library seems the fastest of the 3, tho there is the oddity of it increasing other SPI deadtime. None of the libraries seem to be using "transfer (buffer, size);", this would boost the speed enormously without resorting to bare-metal code.

Code: [Select]
#include <SPI.h>
//#include <Ethernet2.h>
#include "Ethernet.h"
#include "w5100.h"
//#include <utility/w5500.h>
// Edit the SPI speed in C:\Users\[USERNAME]\Documents\Arduino\libraries\Ethernet2\src\utility\w5500.cpp
//  to 28000000, 28MHz, line 25: (copy/paste the following is easiest)
//  SPISettings wiznet_SPI_settings(28000000, MSBFIRST, SPI_MODE0);
//
// Edit occurances of SPI_CSR_DLYBCT(1) to SPI_CSR_DLYBCT(0) in
//  C:\Users\[USERNAME]\AppData\Local\Arduino15\packages\arduino\hardware\sam\1.6.9\libraries\SPI\src\SPI.cpp
// and C:\Users\[USERNAME]\AppData\Local\Arduino15\packages\arduino\hardware\sam\1.6.9\libraries\SPI\src\SPI.h
//

byte mac[] = { 0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED };
const IPAddress MyIP( 192, 168, 0, 10 );
const IPAddress Cab1_IP (192, 168, 0, 2); //this address exists
const IPAddress Cab2_IP (192, 168, 0, 1); //this doesn't, for testing retry and timeout
unsigned int local_Port = 12345;
unsigned int Cab1_Port = 12345;
unsigned int Cab2_Port = 12345;
EthernetUDP Udp;

const int EthTxBuf_Size = 100;
byte EthTxBuf[EthTxBuf_Size];
byte EthRxBuf[EthTxBuf_Size];
unsigned long current_micros;
const unsigned long looptime = 2000;
const int Testpin = 52;
const int Errorpin = 22;
const int Framepin = 32;
const int FPGA_SPIpin = 4;

void setup()
{
  Ethernet.begin(mac,MyIP);
  Udp.begin(local_Port);
  pinMode(Testpin, OUTPUT);
  pinMode(Errorpin, OUTPUT);
  pinMode(Framepin, OUTPUT);
  pinMode(FPGA_SPIpin, OUTPUT);
  //w5500.setRetransmissionCount(1);
  //w5500.setRetransmissionTime(1);
  W5100.setRetransmissionCount(uint8_t(1));
  W5100.setRetransmissionTime(uint16_t(1));
}

void loop()
{
  current_micros = micros();
  for (byte j=0; j<EthTxBuf_Size; j++)
  {
    EthTxBuf[j] = j;
  }
  digitalWrite(Framepin, HIGH);
  digitalWrite(Framepin, LOW);
  digitalWrite(Testpin, HIGH);

  Udp.beginPacket(Cab1_IP, Cab1_Port);
  Udp.write(EthTxBuf, EthTxBuf_Size);
  if (Udp.endPacket() == 0)
  {
    digitalWrite(Errorpin, HIGH);
    digitalWrite(Errorpin, LOW);
  }
 
  digitalWrite(Testpin, LOW);

  for (byte j=0; j<EthTxBuf_Size; j++)
  {
    EthTxBuf[j] = j;
  }

  SPI.beginTransaction(SPISettings(28000000, MSBFIRST, SPI_MODE0));
  digitalWrite(FPGA_SPIpin, LOW);
  SPI.transfer (&EthTxBuf, EthTxBuf_Size);
  digitalWrite(FPGA_SPIpin, HIGH);
  SPI.endTransaction();
 
  digitalWrite(Testpin, HIGH);
  Udp.beginPacket(Cab2_IP, Cab2_Port);
  Udp.write(EthTxBuf, EthTxBuf_Size);
  if (Udp.endPacket() == 0)
  {
    digitalWrite(Errorpin, HIGH);
    digitalWrite(Errorpin, LOW);
  }
  digitalWrite(Testpin, LOW);
  while ((micros()- current_micros)<looptime)
  {
   
  }
}

Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Dec 05, 2016, 09:57 pm
None of the libraries seem to be using "transfer (buffer, size);", this would boost the speed enormously without resorting to bare-metal code.
I decided to try a quick sanity check for this theory.  I ran this code on an Arduino Due:

Code: [Select]

#include <SPI.h>

void setup() {
  SPI.begin();
  pinMode(10, OUTPUT);
}

void loop() {
  uint8_t data[5] = {0x55, 0x5A, 0x49, 0xAA, 0x96};
  digitalWrite(10, LOW);
  SPI.beginTransaction(SPISettings(25000000, MSBFIRST, SPI_MODE0));
  SPI.transfer(data, 5);
  SPI.endTransaction();
  digitalWrite(10, HIGH);
  delay(100);
}


Here is the rather disappointing result:

(http://forum.arduino.cc/index.php?action=dlattach;topic=437243.0;attach=189653)
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Dec 05, 2016, 10:02 pm
Then again, those 50% dead times between bytes are a LOT better than the overhead of calling SPI.transfer(byte) five times.

Here's how bad *that* is:

(http://forum.arduino.cc/index.php?action=dlattach;topic=437243.0;attach=189655)
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Dec 05, 2016, 10:12 pm
For comparison, here is Arduino Uno running the five SPI.transfer(byte) sketch:

(http://forum.arduino.cc/index.php?action=dlattach;topic=437243.0;attach=189657)

Even with only an 8 bit CPU running at one fifth the clock speed, Uno manages to transfer the 5 bytes at only 8 Mbit/sec in approximately the total time Due does at 21 Mbit/sec.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Dec 05, 2016, 10:15 pm
For completeness, here is how Uno performs with SPI.transfer(buf, 5):

(http://forum.arduino.cc/index.php?action=dlattach;topic=437243.0;attach=189659)
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Dec 05, 2016, 10:59 pm
Your results with the Due match what I'm seeing. The Uno results put the Due to shame really given the core speed. I don't own an Uno to play with unfortunately, so thanks for sharing that research.

Does your library buffer the transfers? My results suggest they don't, but I recall reading they did (or were supposed to). I suspect it's possible to get a 100 bytes transmission done well under 100us with a buffer transfer, that's the whole UDP SPI transfer, at 28MHz.
Also, could you confirm if the timeout and retry count work with your library? They didn't seem to work for me :(
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: dlloyd on Dec 06, 2016, 03:00 am
Some bare metal sprinkled in and no buffering ...

Code: [Select]
#include <SPI.h>
uint8_t data[5] = {0x55, 0x5A, 0x49, 0xAA, 0x96};
byte count;

void setup() {
  SPI.begin(10);
  SPI.setClockDivider(10, 5);  // 16.8MHz Clock
  REG_SPI0_CSR &= 0x00FFFFFF;  // DLYBCT = 0
}

void loop() {
  while (1) {
    if ((REG_SPI0_SR & 2) != 0) { // transmit when data register empty
      REG_SPI0_TDR = data[count];
      count++;
      if (count == 5) count = 0;
    }
  }
}

SPI clock at 16.8 MHz: Without while loop, 2.17µs delay between transfers

With while loop: no delay between transfers, 0.5µs/byte, 50µs/100bytes
(http://i.imgur.com/e0etHWR.png)

SPI clock at 21 MHz: 0.12µs delay, 0.5µs/byte including delay, 50µs/100bytes
(http://i.imgur.com/Q9c8kB9.png)

SPI clock at 28 MHz: 0.22µs delay, 0.5µs/byte including delay, 50µs/100bytes
(http://i.imgur.com/0cEQIqs.png)

Uno USART in MSPIM mode ...

Code: [Select]
uint8_t data[5] = {0x55, 0x5A, 0x49, 0xAA, 0x96};
byte count;

void setup() {
  UBRR0H = 0;
  UBRR0L = 0;
  DDRD |= _BV (4);                         // XCK as output enables master mode
  UCSR0C = (1 << UMSEL01) | (1 << UMSEL00) | (0 << UCPHA0) | (0 << UCPOL0); // Master SPI, mode 0
  UCSR0B = (1 << RXEN0) | (1 << TXEN0);    // Enable receiver and transmitter
  UBRR0L = 1;                              // 4MHz XCK on pin 4
  SPCR = (1 << SPE);                       // enable SPI
  TIMSK0 = 0;                              // disable timer0
}

void loop() {
  while (1) {
    if ((UCSR0A & 32) != 0) { // transmit when data register empty
      UDR0 = data[count];
      count++;
      if (count == 5) count = 0;
    }
  }
}

SPI clock at 4 MHz, no delay between bytes, 2µs/byte, 200µs/100bytes

(http://i.imgur.com/T8lNJae.png)
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: westfw on Dec 06, 2016, 07:45 am
https://community.atmel.com/forum/getting-back-back-spi-transfers-sam3x (https://community.atmel.com/forum/getting-back-back-spi-transfers-sam3x)
I have a variation of the bare metal code:

Code: [Select]
 for (byte i = 0; i < BFSIZE; i++) {
    while ((myspi->SPI_SR & SPI_SR_TDRE) == 0)
      ; // spin
    myspi->SPI_TDR = SPI_PCS(3) | i;
    if (myspi->SPI_SR & SPI_SR_RDRF) {
      *inptr++ = (byte) myspi->SPI_RDR;
    }
  }

And it has some mysterious aspects.   Most mysterious: timing doesn't seem to change between using SPI_SR_TDRE and SPI_SR_TXEMPTY, even though the former SHOULD have a full byte-time worth of leeway...

Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: dlloyd on Dec 06, 2016, 08:52 am
Using DMA is supposed to optimize SPI transfers (haven't tried it), but here's an example on GitHub (https://github.com/manitou48/DUEZoo/blob/master/dmaspi.ino).
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Dec 06, 2016, 01:57 pm
Seems the SPI library on AVR has received careful optimization work, but the SPI library on Due... not so much.  :(
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Dec 07, 2016, 12:17 pm
I decided to write a separate piece of sample code to play around with SPI, without any of the the Ethernet stuff. Problem is, it doesn't work!
It gets to the SPI transfer part then falls over, I see "Transferring" on the serial monitor and never the "Done" (lines 26 and 28, wrapped around the SPI.transfer on line 27).
The Ethernet2 shield is attached to the SPI but nothing else (well, scope probes are). The transfers don't happen, the SPI clock doesn't change state.
If anyone can spot the idiot mistake I've made that would be great!
The Ethernet/SPI test code I posted previously still works, so it's not a short or other daft hardware issue.

Code: [Select]
#include <SPI.h>

const int SPIbuf_Size = 100;
byte SPIbuf [SPIbuf_Size];
const int FPGA_SPI_CSpin = 4;
unsigned long current_micros;
const unsigned long looptime = 2000000;
const int Serial_Baud = 115200;

void setup() {
  pinMode(FPGA_SPI_CSpin, OUTPUT);
  Serial.begin(Serial_Baud);
  Serial.println(F("Setup Complete"));
}

void loop() {
  current_micros = micros();
  for (int j = 0; j<256; j++)
  {
    Serial.print(F("Doing : "));
    Serial.println(j);
    SPIbuf[0] = j;
    SPIbuf[SPIbuf_Size-1] = j;
    SPI.beginTransaction(SPISettings(1000000, MSBFIRST, SPI_MODE0));
    digitalWrite(FPGA_SPI_CSpin, LOW);
    Serial.println(F("Transferring"));
    SPI.transfer (&SPIbuf, SPIbuf_Size);
    Serial.println(F("Done"));
    digitalWrite(FPGA_SPI_CSpin, HIGH);
    SPI.endTransaction();
    Serial.print(j);
    Serial.print(F(" : "));
    Serial.print(SPIbuf[0]);
    Serial.print(F(" : "));
    Serial.println(SPIbuf[SPIbuf_Size-1]);
    while ((micros()- current_micros)<looptime)
    {
     
    }
  }
}
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Dec 07, 2016, 12:29 pm
Seems:
SPI.begin();
is required, shoved it in setup and it now works. I guess the Ethernet libraries are doing this and not
SPI.end();
when they are finished, making my other code work.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Dec 08, 2016, 01:02 pm
I just did a comparison between 28MHz and 42MHz, although the clocking was indeed faster, the deadtime increases to keep the 100 byte transfer at about 53us in both cases. I couldn't get 84MHz to work, it seemed to still clock at 42 :(
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Dec 08, 2016, 02:41 pm
Are you communicating with other hosts on your local ethernet?  Or will you be talking to hosts on the internet, accessed through routers and high-latency links?
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: dlloyd on Dec 08, 2016, 03:24 pm
Quote
I just did a comparison between 28MHz and 42MHz, although the clocking was indeed faster, the deadtime increases to keep the 100 byte transfer at about 53us in both cases. I couldn't get 84MHz to work, it seemed to still clock at 42 :(
Yes, that matches my findings in reply 26. The SAM3X SPI hardware works without deadtime at up to 16.8MHz clock. Beyond this rate, the deadtime increases to cancel out the byte transfer time improvement because a wall has been hit.

According to the datasheet, DMA will optimize transfer rate ... I guess the DMA improvement would be noticed only if the SPI clock is set higher than 16.8MHz.

This looks interesting...

32.7.3.9 Peripheral Deselection with DMAC
When the Direct Memory Access Controller is used, the chip select line will remain low during the whole transfer since the TDRE flag is managed by the DMAC itself. The reloading of the SPI_TDR by the DMAC is done as soon as TDRE flag is set to one.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: ard_newbie on Dec 08, 2016, 04:21 pm


Did you try turbospi.h library for Sam3x  ( https://github.com/anydream/TurboSPI ) ?

Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: dlloyd on Dec 08, 2016, 07:24 pm
Yeah, now that's more like it!

SPI SCK @ 42MHz, no deadtime, 100 bytes transferred in 19.04µs!

5.25 MBps (42Mbps)
(http://i.imgur.com/hhdoKap.png)

Code: [Select]
#include <TurboSPI.h>

TurboSPI    g_SPI;
DigitalPin  g_PinCS, g_PinRS;
uint8_t     g_Buffer[100];  // some data buffer to transfer
uint8_t     g_Divisor = 2;  // transfer speed set to MCU's clock divide by 2

void setup()
{
  // setup pins
  g_PinCS.Begin(45);
  g_PinRS.Begin(47);

  g_PinCS.PinMode(OUTPUT);
  g_PinRS.PinMode(OUTPUT);

  // setup SPI
  g_SPI.Begin();

  // fill the buffer with data
  for (uint8_t i = 0; i < sizeof(g_Buffer); i++) {
    g_Buffer[i] = i + 1;
  }
}

void loop()
{
  // setup speed and select slave
  g_SPI.Init(g_Divisor);
  g_PinCS.Low();

  // set some pins
  g_PinRS.High();

  // transfer data to slave
  g_SPI.Send(g_Buffer, sizeof(g_Buffer));

  // unselect slave
  g_PinCS.High();
}
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Dec 09, 2016, 09:38 am
Are you communicating with other hosts on your local ethernet?  Or will you be talking to hosts on the internet, accessed through routers and high-latency links?
Neither that time, this was SPI only test of 100 bytes.

Yes, that matches my findings in reply 26. The SAM3X SPI hardware works without deadtime at up to 16.8MHz clock. Beyond this rate, the deadtime increases to cancel out the byte transfer time improvement because a wall has been hit.
Oops, I forgot to include my 21MHz result, it was slower than 28MHz, about 65us if memory serves...
Have you done this?
Code: [Select]
// Edit occurances of SPI_CSR_DLYBCT(1) to SPI_CSR_DLYBCT(0) in
//  C:\Users\[USERNAME]\AppData\Local\Arduino15\packages\arduino\hardware\sam\1.6.9\libraries\SPI\src\SPI.cpp
// and C:\Users\[USERNAME]\AppData\Local\Arduino15\packages\arduino\hardware\sam\1.6.9\libraries\SPI\src\SPI.h


Yeah, now that's more like it!

SPI SCK @ 42MHz, no deadtime, 100 bytes transferred in 19.04µs!
That's rather good. It isn't obvious to me if it receives at the same time as transmitting, if it does then this improves the SPI part of my project (I'm effectively building an Ethernet to FPGA bridge, I don't want to control the wiznet directly hence the Due. I have the FPGA and Due talking via SPI full duplex).
The real task now is to try and modify the Ethernet library to use TurboSPI, not a task I'm looking forward to, there goes the weekend :D
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: dlloyd on Dec 09, 2016, 04:27 pm
Yes, it should receive at the same time. The MISO line is on pin 47.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: Hoek on Apr 24, 2017, 07:58 pm
Been doing a LOT of work and experimentation with the Due SPI/DMA and finally decided to implement all SPI transfers via DMA.

The throughput is... great!

I have an OLED1351 @ 128*128 rgb565 and can fill it at > 20 fps off SD card.

The SD card has a DIV=4 and OLED DIV=5 and they both play nicely.

I found in the low level routines that called write8() it made a large difference as to whether the function was inline or not.

I got probably a 20-30% speedup by forcing inline.


I also broke the write8() functions down into 2 to cut down on overhead of setting the same registers multiple times.

Code: [Select]


 __INLINE__ uint8_t cDMA_spi_send_do_wait_buffer()
{
 while (!due_dma_dmac_channel_transfer_done(DUE_DMA_SPI_TX_CH)) {}

 while ((SPI0->SPI_SR & SPI_SR_TXEMPTY) == 0) {}

 // leave RDR empty
 return  SPI0->SPI_RDR;
}

// new routines 8 bit send -- DMA --

__INLINE__ void cDMA_spi_send_again(uint8_t b, bool wait)
{
 __src8 = b;

 DMAC->DMAC_CH_NUM[DUE_DMA_SPI_TX_CH].DMAC_SADDR = (uint32_t)&__src8;
 DMAC->DMAC_CH_NUM[DUE_DMA_SPI_TX_CH].DMAC_CTRLA = DMAC_CTRLA_BTSIZE(1) | DMAC_CTRLA_SRC_WIDTH_BYTE | DMAC_CTRLA_DST_WIDTH_BYTE;
 due_dma_dmac_channel_enable(DUE_DMA_SPI_TX_CH);
 
 if (wait)
 {
 cDMA_spi_send_do_wait_buffer();
 }
}

__INLINE__ void cDMA_spi_send(uint8_t b, bool wait)
{
// due_dma_dmac_channel_disable(DUE_DMA_SPI_TX_CH);
 DMAC->DMAC_CH_NUM[DUE_DMA_SPI_TX_CH].DMAC_DSCR = 0;
 DMAC->DMAC_CH_NUM[DUE_DMA_SPI_TX_CH].DMAC_DADDR = (uint32_t)&SPI0->SPI_TDR;
 DMAC->DMAC_CH_NUM[DUE_DMA_SPI_TX_CH].DMAC_CTRLB = DMAC_CTRLB_SRC_INCR_INCREMENTING | DMAC_CTRLB_SRC_DSCR | DMAC_CTRLB_DST_DSCR | DMAC_CTRLB_FC_MEM2PER_DMA_FC | DMAC_CTRLB_DST_INCR_FIXED;
 DMAC->DMAC_CH_NUM[DUE_DMA_SPI_TX_CH].DMAC_CFG = DMAC_CFG_DST_PER(DUE_DMA_SPI_TX_IDX) | DMAC_CFG_DST_H2SEL | DMAC_CFG_FIFOCFG_ALAP_CFG | DMAC_CFG_SOD;

 cDMA_spi_send_again(b, wait);
}

[code/]



Code: [Select]


cDMA_spi_send(b1, false);

// do some stuff here as have some cycles b4 the request ends


cDMA_spi_send_do_wait_buffer(); // now wait

cDMA_spi_send_again(b2, true);
cDMA_spi_send_again(b3, true);


With the ability to control whether to wait or not it allows some work to be done "for free". For the video streamer it means the pixel processing and looping is basically done for free as it's done while I would usually be waiting for the DMA request to end.

Also have the 16 bit send functions that don't require changing modes etc which also made a huge difference.

Just waiting on an AD5330 DAC so I can test the sound output with video. ATM the video is running 2x - 4x the normal speed so confident I should be able to support video with sound.

Made a SPIDevice class I use for all my projects. It will do things like check the DIV every time the chip is selected. However, it's important to only reset the DIV if necessary as it's a costly operation.

Code: [Select]


bool cDMA_spi_check_div(uintX_t sckDivisor, bool dma)  // check .. really need to do before each send to make sure each device is at correct speed etc
{
// may be SPI lib or DMA

if (dma && last_div_dma != sckDivisor)
{
last_div_dma = sckDivisor;

SPI0->SPI_CR = SPI_CR_SPIDIS;   //  disable SPI
SPI0->SPI_CR = SPI_CR_SWRST; // reset SPI
SPI0->SPI_MR = SPI_PCS(DUE_DMA_SPI_CHIP_SEL) | SPI_MR_MODFDIS | SPI_MR_MSTR; // no mode fault detection, set master mode
SPI0->SPI_CSR[DUE_DMA_SPI_CHIP_SEL] = SPI_CSR_SCBR((uint8_t)sckDivisor) | SPI_CSR_NCPHA; // mode 0, 8-bit,

SPI0->SPI_CR |= SPI_CR_SPIEN; // enable SPI

return true;
}

return false;
}


I suppose the other thing worth mentioning is I totally gutted sdfat to include an external library for all SPI and made it fat32 only. It's about as lean and mean as I can get.


Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: ard_newbie on Apr 24, 2017, 08:50 pm

AFAIK inlining prevents the compiler to add a prologue (push { registers}) and prologue (pop {registers} ) to a function call, so logically it should be faster at the price of a larger code size.

PDC DMA and AHB DMA are surely the best options on a DUE to speed up every time they are available.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Jul 30, 2018, 09:18 pm
Just a quick followup to this very old thread...

I recently released Ethernet library version 2.0.0, which brings my many optimizations originally written for Teensy to all Arduino boards, including Arduino Due.

W5200 & W5500 now utilize SPI.transfer(buffer, size).  I made many other optimizations, including native register I/O to avoid the slow digitalWrite on Due, and important higher level optimizations.  Details and benchmarks here:

https://www.pjrc.com/arduino-ethernet-library-2-0-0/ (https://www.pjrc.com/arduino-ethernet-library-2-0-0/)

To get version 2.0.0, just use the library manager to update your Ethernet lib.


Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Jul 31, 2018, 05:09 pm
Do you have any like for like timing comparisons with the official wiznet library?
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Aug 01, 2018, 07:59 am
This library, right?

https://github.com/Wiznet/WIZ_Ethernet_Library

I installed the "Arduino IDE 1.5.x" version just now.  It doesn't work with my Seeed W5500 shield (with w5100.h edited to select W5500 - this lib doesn't auto-detect which chip you have).

It does work with my Arduino Ethernet R3 shield.  The speed measures 9.83 kbytes/sec.  Ethernet 2.0.0 gets 109.73 kbytes/sec.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Aug 01, 2018, 08:09 am
I switched to Arduino Uno.  Wiznet's library *does* work with the Seeed W5500 shield when using Uno.  The speed is 139.99 kbytes/sec.  For comparison, Ethernet 2.0.0 gets 329.00 kbytes/sec on the same test with Uno, and 689.69 kbytes/sec with Due.

I also retested W5100 (Arduino Ethernet R3).  Wiznet's library get 10.17 kbytes/sec (yes, slightly faster than 9.83 kbytes/sec it gets with Arduino Due).  Ethernet 2.0.0 gets 82.66 kbytes/sec when using W5100 on Uno, and 109.73 kbytes/sec with Due.

Without a doubt, Ethernet 2.0.0 is much faster than Wiznet's library.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Aug 01, 2018, 08:21 am
For one final test, I put the Arduino.org Ethernet2 shield on Arduino Due.  Wiznet's library does work with this shield.  I don't know why it fails on the Seeed W5500 shield.  Both work on Due with Ethernet 2.0.0.

Arduino Due with the W5500-based Arduino.org Ethernet2 speed is 394.80 kbytes/sec.  Ethernet 2.0.0 gets 695.35 kbytes/sec with that shield on Due.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Aug 01, 2018, 10:36 am
Thanks for sharing those results. Is there a result for the wiznet library on the 5500/Due?
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Aug 01, 2018, 12:37 pm
Thanks for sharing those results. Is there a result for the wiznet library on the 5500/Due?
Yes.

"Arduino Due with the W5500-based Arduino.org Ethernet2 speed is 394.80 kbytes/sec.  Ethernet 2.0.0 gets 695.35 kbytes/sec with that shield on Due."

Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Aug 01, 2018, 01:12 pm
With all these optimizations in Ethernet 2.0.0 (removing the *many* prior bottlenecks in the Ethernet library), I believe these benchmarks would at least doubled on Arduino Due if someone were to optimize the SPI.transfer(buffer, length) well.  Much of the hard work for those SPI optimizations has been done in the messages on this thread.  But hardly anyone will ever benefit until someone goes to the trouble of actually updating Due's SPI library.
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: weird_dave on Aug 02, 2018, 10:24 am
OK, I took the Arduino.org Ethernet2 library not to mean the Wiznet library since they are different (or were last time I checked).
When I get some spare time, I'll give it a go, thanks for the effort :)
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: westfw on Aug 02, 2018, 10:51 am
Quote
I recently released Ethernet library version 2.0.0
Wait - you got all that improvement WITHOUT implementing write-only SPI functions?
Wow.
(Hmm.   Not that I'm sure that a write-only SPI would be much faster.  Mostly just ... easier?  No more overwriting your output buffer (?))
Title: Re: Ethernet2 (UDP) SPI transfers have a lot of dead time
Post by: pjrc on Aug 02, 2018, 12:03 pm
Yup, the old Ethernet lib was horribly inefficient on every level.

Due's SPI library is still very inefficient, which holds back Due's performance to ~700 kbytes/sec.  If someone were to improve the SPI lib, I believe Due could probably even outperform Teensy (where the SPI lib is highly optimized) on these tests, because Due is the only board that can actually produce a 14 MHz SPI clock.  Pretty much all the others use 8 or 12 MHz when SPISettings asks for 14 MHz max.

The SPI lib on Due isn't my project.  My dev cycles are funded by Teensy sales.  All this optimization work came from Teensy's fork of Ethernet.  Occasionally I try to contribute Teensy's improvements back to the rest of the Arduino community, so everyone can benefit.  Hope everyone gets some good use from it.  :)