Go Down

Topic: Ethernet2 (UDP) SPI transfers have a lot of dead time (Read 9227 times) previous topic - next topic

weird_dave

I did a quick test to see how long it takes to send a UDP packet of 100 bytes. I've changed the SPI speed to 28MHz by editing w5500.cpp:
Code: [Select]
SPISettings wiznet_SPI_settings(28000000, MSBFIRST, SPI_MODE0);

I did a quick bit of code to send set a pin high before the transfer then low after and added a timer to repeat after 1ms

Code: [Select]

const int EthTxBuf_Size = 100;
byte EthTxBuf[EthTxBuf_Size];

void loop()
{
  const unsigned long looptime = 1000;
  current_micros = micros();
  digitalWrite(Testpin, HIGH);
  Udp.beginPacket(Remote_IP, Remote_Port);
  Udp.write(EthTxBuf, EthTxBuf_Size);
  Udp.endPacket();
  digitalWrite(Testpin, LOW);
  while ((micros()- current_micros)<looptime)
  {
   
  }
}


The Testpin is high for almost 400us. Looking at the SPI clock, I can see a fair amount of dead time between each transfer. I would expect some software overhead, but during the portion where is is just sending the 100 bytes to the W5500, I am seeing a byte sent in 285ns (which is right for 28MHz) and the next one 1464ns later (one every 1749ns), that means the routine sending the bytes is only doing so for 16% of the time, what's it doing with the other 123 clock cycles (84MHz)?
Is there a way to improve this situation?

pjrc

Are you using Ethernet2 from Adafruit, or from Arduino.org, or some other source?  They're not all the same code, and small details matter greatly for performance.

weird_dave

I used the menu within the IDE (1.6.13 .cc not .org) to select it from the library manager which then downloaded it.

pjrc



sketch_nov25a: In function 'void loop()':
sketch_nov25a:7: error: 'current_micros' was not declared in this scope
   current_micros = micros();
   ^
sketch_nov25a:8: error: 'Testpin' was not declared in this scope
   digitalWrite(Testpin, HIGH);
                ^
sketch_nov25a:9: error: 'Udp' was not declared in this scope
   Udp.beginPacket(Remote_IP, Remote_Port);
   ^
sketch_nov25a:9: error: 'Remote_IP' was not declared in this scope
   Udp.beginPacket(Remote_IP, Remote_Port);
                   ^
sketch_nov25a:9: error: 'Remote_Port' was not declared in this scope
   Udp.beginPacket(Remote_IP, Remote_Port);
                              ^
'current_micros' was not declared in this scope

pjrc

#4
Nov 26, 2016, 03:45 am Last Edit: Nov 26, 2016, 03:51 am by Paul Stoffregen
If you had posted a complete program, I probably would have run it on a couple different boards (I have nearly all Arduino boards here).  But the reality is I'm really not feeling like filling in the missing parts, just to figure out what Due performs badly.

That may or may not have led to anything useful, but if you couldn't be bothered to give a complete program that lets me start looking without guesswork, why should I go to the trouble.  At least you can see I did copy your code into an Arduino window and click Verify, but that's all.

Maybe someone else will take a stronger interest to look into what's wrong?

weird_dave

I certainly can post a complete program, it's not that I couldn't be bothered I just didn't realise it was the norm to post the entire thing. How much code is too much?

Code: [Select]
#include <Ethernet2.h>

// Edit the SPI speed in C:\Users\[USERNAME]\Documents\Arduino\libraries\Ethernet2\src\utility\w5500.cpp
//  to 28000000, 28MHz, line 25: (copy/paste the following is easiest)
//  SPISettings wiznet_SPI_settings(28000000, MSBFIRST, SPI_MODE0);

byte mac[] = { 0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED };
const IPAddress MyIP( 192, 168, 0, 10 );
const IPAddress Remote_IP (192, 168, 0, 1);
const unsigned int Remote_Port = 12345;
const unsigned int Local_Port = 12345;
EthernetUDP Udp;

char packetBuffer[UDP_TX_PACKET_MAX_SIZE];

unsigned long current_micros;

const int Testpin = 52;
void setup()
{
  Ethernet.begin(mac,MyIP);
  Udp.begin(Local_Port);
  pinMode(Testpin, OUTPUT);
}

void loop()
{
  const unsigned long looptime = 1000;
  const int EthTxBuf_Size = 100;
  byte EthTxBuf[EthTxBuf_Size];
  digitalWrite(Testpin, HIGH);
  current_micros = micros();
  Udp.beginPacket(Remote_IP, Remote_Port);
  Udp.write(EthTxBuf, EthTxBuf_Size);
  Udp.endPacket();
  digitalWrite(Testpin, LOW);
  while ((micros()- current_micros)<looptime)
  {
   
  }
}

westfw

Quote
what's it doing with the other 123 clock cycles (84MHz)?
The UDP write code isn't what It'd call "tight" (it calls SPI::transfer(<one byte>) N types instead of SPI::transfer(<N bytes>), for example (but I guess that's so it doesn't overwrite with "input")), but it doesn't look like it should be 123 clocks worth...


Code: [Select]
Udp.write(buf, size)
  Socket::bufferData(... buf, size)
    W5500Class::send_data_processing_offset(... buf, len)
       W5500Class::write(... buf, len)
         for (len)
           SPI.transfer(buf[i])
      byte SPIClass::transfer(byte _pin, uint8_t _data, SPITransferMode _mode) {
      uint32_t ch = BOARD_PIN_TO_SPI_CHANNEL(_pin);
819d0: 290a      cmp r1, #10
      // SPI_CSR_DLYBCT(1) keeps CS enabled for 32 MCLK after a completed
      // transfer. Some device needs that for working properly.
      SPI_ConfigureNPCS(spi, ch, mode[ch] | SPI_CSR_SCBR(divider[ch]) | SPI_CSR_DLYBCT(1));
      }

      byte SPIClass::transfer(byte _pin, uint8_t _data, SPITransferMode _mode) {
819d2: b430      push {r4, r5}
      uint32_t ch = BOARD_PIN_TO_SPI_CHANNEL(_pin);
819d4: d027      beq.n 81a26 <SPIClass::transfer(unsigned char, unsigned char, SPITransferMode)+0x56>
819d6: 2904      cmp r1, #4
819d8: d029      beq.n 81a2e <SPIClass::transfer(unsigned char, unsigned char, SPITransferMode)+0x5e>
819da: 2934      cmp r1, #52 ; 0x34
819dc: bf14      ite ne
819de: f44f 25e0 movne.w r5, #458752 ; 0x70000
819e2: f44f 2530 moveq.w r5, #720896 ; 0xb0000
819e6: bf14      ite ne
819e8: 2103      movne r1, #3
819ea: 2102      moveq r1, #2
      // Reverse bit order
      if (bitOrder[ch] == LSBFIRST)
819ec: 4401      add r1, r0
819ee: 7a0c      ldrb r4, [r1, #8]
819f0: b91c      cbnz r4, 819fa <SPIClass::transfer(unsigned char, unsigned char, SPITransferMode)+0x2a>
       */
      __attribute__( ( always_inline ) ) static __INLINE uint32_t __RBIT(uint32_t value)
      {
uint32_t result;

__ASM volatile ("rbit %0, %1" : "=r" (result) : "r" (value) );
819f2: fa92 f2a2 rbit r2, r2
       */
      __attribute__( ( always_inline ) ) static __INLINE uint32_t __REV(uint32_t value)
      {
uint32_t result;

__ASM volatile ("rev %0, %1" : "=r" (result) : "r" (value) );
819f6: ba12      rev r2, r2
      _data = __REV(__RBIT(_data));
819f8: b2d2      uxtb r2, r2
      uint32_t d = _data | SPI_PCS(ch);
      if (_mode == SPI_LAST)
819fa: 2b01      cmp r3, #1
      byte SPIClass::transfer(byte _pin, uint8_t _data, SPITransferMode _mode) {
      uint32_t ch = BOARD_PIN_TO_SPI_CHANNEL(_pin);
      // Reverse bit order
      if (bitOrder[ch] == LSBFIRST)
      _data = __REV(__RBIT(_data));
      uint32_t d = _data | SPI_PCS(ch);
819fc: ea42 0205 orr.w r2, r2, r5
81a00: 6803      ldr r3, [r0, #0]
      if (_mode == SPI_LAST)
      d |= SPI_TDR_LASTXFER;
81a02: bf08      it eq
81a04: f042 7280 orreq.w r2, r2, #16777216 ; 0x1000000

      // SPI_Write(spi, _channel, _data);
      while ((spi->SPI_SR & SPI_SR_TDRE) == 0)
81a08: 6919      ldr r1, [r3, #16]
81a0a: 0789      lsls r1, r1, #30
81a0c: d5fc      bpl.n 81a08 <SPIClass::transfer(unsigned char, unsigned char, SPITransferMode)+0x38>
      ;
      spi->SPI_TDR = d;
81a0e: 60da      str r2, [r3, #12]

      // return SPI_Read(spi);
      while ((spi->SPI_SR & SPI_SR_RDRF) == 0)
81a10: 691a      ldr r2, [r3, #16]
81a12: 07d2      lsls r2, r2, #31
81a14: d5fc      bpl.n 81a10 <SPIClass::transfer(unsigned char, unsigned char, SPITransferMode)+0x40>
      ;
      d = spi->SPI_RDR;
81a16: 6898      ldr r0, [r3, #8]
      // Reverse bit order
      if (bitOrder[ch] == LSBFIRST)
81a18: b914      cbnz r4, 81a20 <SPIClass::transfer(unsigned char, unsigned char, SPITransferMode)+0x50>
       */
      __attribute__( ( always_inline ) ) static __INLINE uint32_t __RBIT(uint32_t value)
      {
uint32_t result;

__ASM volatile ("rbit %0, %1" : "=r" (result) : "r" (value) );
81a1a: fa90 f0a0 rbit r0, r0
       */
      __attribute__( ( always_inline ) ) static __INLINE uint32_t __REV(uint32_t value)
      {
uint32_t result;

__ASM volatile ("rev %0, %1" : "=r" (result) : "r" (value) );
81a1e: ba00      rev r0, r0
      d = __REV(__RBIT(d));
      return d & 0xFF;
      }
81a20: b2c0      uxtb r0, r0
81a22: bc30      pop {r4, r5}
81a24: 4770      bx lr

weird_dave

I tried the following snippet of code and it has the same deadtime:

Code: [Select]
SPI.beginTransaction(SPISettings(28000000, MSBFIRST, SPI_MODE0));
digitalWrite(FPGA_SPIpin, LOW);
for (int i = 0; i<EthTxBuf_Size; i++)
{
  EthRxBuf[i] = SPI.transfer (byte (EthTxBuf[i]));
}
digitalWrite(FPGA_SPIpin, HIGH);
SPI.endTransaction();


I've attached a scope plot showing the SPI clock (it doesn't resolve particularly well, but it is obvious), there's oodles of deadtime!

westfw

try editing SPI.cpp and changing the occurances of  SPI_CSR_DLYBCT(1) to  SPI_CSR_DLYBCT(0)

I believe the (1) will cause 32 clocks between transfers, and I believe that the comment about CS is wrong since CS is manipulated manually (SPI_CSR_CSAAT - although I'm not sure why that's missing from setClockDivider()


weird_dave

Doesn't seem to have made any difference, it should have knocked off 381ns (3/4 of a graticule division on the scope plot) but it hasn't.
The location of SPI.cpp I edited was:
C:\Users\[USERNAME]\AppData\Local\Arduino15\packages\arduino\hardware\sam\1.6.9\libraries\SPI\src\SPI.cpp

The scope plot is set to trigger on the falling edge of the /CS pin, there's about 1.2us before the first SPI transaction, I wonder if all (most) of the delay is before the transaction rather than after?

It looks like this may be a suspect:
Code: [Select]
uint32_t ch = BOARD_PIN_TO_SPI_CHANNEL(_pin);
I have been unable to locate this function to see what it is doing.

westfw

Quote
I have been unable to locate [BOARD_PIN_TO_SPI_CHANNEL] to see what it is doing
It's a macro defined in variant.h:
Code: [Select]
#define BOARD_PIN_TO_SPI_CHANNEL(x) \
    (x==BOARD_SPI_SS0 ? 0 : \
    (x==BOARD_SPI_SS1 ? 1 : \
    (x==BOARD_SPI_SS2 ? 2 : 3)))


I can get faster using the multi-byte SPI transfer, but even that leaves almost a byte-time worth of gap between bytes, which is really strange.   There's a byte of buffering (a full byte transmission time get get the next byte ready); it should be able to output back-to-back, I would think.

Current test program:
Code: [Select]
#include <SPI.h>

Spi *myspi;

void setup()
{
  pinMode(13, OUTPUT);
  SPI.begin();
  Serial.begin(115200);
  myspi = SPI.spi;
}

#define DATSIZE 128

byte data[DATSIZE];

void loop() {
  Serial.println("Begin SPI test");
  SPI.beginTransaction(SPISettings(28000000, MSBFIRST, SPI_MODE0));
  digitalWrite(13, LOW);
#if 1
  for (byte i = 0; i < DATSIZE; i++)
  {
    data[i] = i;
  }
  SPI.transfer(data, sizeof(data), SPI_CONTINUE);
#else
  for (byte i = 0; i < DATSIZE; i++)
  {
    SPI.transfer (i, SPI_CONTINUE);
  }
#endif
  digitalWrite(13, HIGH);
  delay(100);
  SPI.endTransaction();
  for (int j = 0; j < 4; j++) {
    Serial.println(myspi->SPI_CSR[j], HEX);
  }
  delay(5000);

}

weird_dave

Thanks for telling me where that macro was, I was struggling to find it!

I was considering using the buffer transfer but I couldn't figure out from the documentation if it's the pointer that should be passed such that it overwrites the buffer with incoming data, else how do you receive? (I have wider concerns than just the Ethernet SPI, I want to use it for transfer with another device too, full duplex style).
This thread:
http://forum.arduino.cc/index.php?topic=407288.0
Suggests the pointer is passed to the function.
Looking at line 218 of SPI.cpp
Code: [Select]
void SPIClass::transfer(byte _pin, void *_buf, size_t _count, SPITransferMode _mode) {
...skipping to line 222
uint8_t *buffer = (uint8_t *)_buf;
...skipping to line 253
// Save read byte
if (reverse)
r = __REV(__RBIT(r));
*buffer = r;
buffer++;


My Cpp isn't strong but it looks like this will do both sending and receiving. Bit of a shame it overwrites the transmit buffer to do it tho :(
There's quite a bunch of commands in there, which I guess is accounting for the gap between bytes. I'm seeing a gap about a bit longer than a byte transmission time (about 1.4 times).
I'm also seeing what I guess is an interrupt occurring in the middle of the transmission (I've attached a screengrab of it), it's not a huge problem in the grand scheme of things but is annoying :)
(ignore the ends of the waveform, I've got other code running in there too. The first 2 transfers are a transfer16, just for comparison)

If this works, that's an increase in speed of about 3x for a 100 byte SPI transfer, things are starting to look up :)
Now I just need the Ethernet2 library to make use of it :(

westfw

Quote
it looks like this will do both sending and receiving. Bit of a shame it overwrites the transmit buffer to do it tho
Yes, it does, and it is :-(   It seems that it would be nice if the SPI library had more features.

I've been trying to get some "bare" SPI code to work, so I can figure out whether this is inherent slowness, or some hardware/configuration issue.   But I'm having troubles getting it to work :-(

Ahh, there is goes...  This code produces a nice tight loop, and I'd really expect it to keep that SPI clock going pretty much continuously.  It doesn't :-(  Therefore, something "interesting" is happening!  (I had to modify SPI.h to make SPI.spi a publicly accessible value, but otherwise it has no modifications.)

Code: [Select]
#include <SPI.h>

Spi *myspi;

void setup()
{
  SPI.begin();
  Serial.begin(115200);
  myspi = SPI.spi;
}

void loop() {
  Serial.println("Begin bare metal SPI test");

  SPI.beginTransaction(SPISettings(28000000, MSBFIRST, SPI_MODE0));

  for (byte i = 0; i < 100; i++) {
    while ((myspi->SPI_SR & SPI_SR_TDRE) == 0)
      ; // spin
    myspi->SPI_TDR = SPI_PCS(3) | i;
  }
  SPI.endTransaction();

  Serial.println("Bare metal finished");
  delay(1000);
}




weird_dave

Hmmm, My C might not be good enough.
This line
Code: [Select]
myspi->SPI_TDR = SPI_PCS(3) | i;
similar to:
Code: [Select]
spi->SPI_TDR = d | SPI_PCS(ch);
in SPI.cpp, how did you get 3? And why is SPI_PCS(3) bitwise Or'd with the data to be sent? I couldn't work out what SPI_PCS(3) means.

What does the creation of myspi give us? Is this the only way to gain access to SPI_SR etc..?

I just about managed to alter SPI.h to make it work.... (tho I had to run your demo, I couldn't get it to work within my code).

Not much speed difference than the library, it looks like it's the while loop waiting for the clear signal

westfw

Quote
in SPI.cpp, how did you get 3? And why is SPI_PCS(3) bitwise Or'd with the data to be sent? I couldn't work out what SPI_PCS(3) means.
The SAM3 SPI controller can drive multiple chip-select (SS) pins as part of the SPI transactions, and the Due SPI library allows 4 separate devices with separate configuration, one one "spi port."  PCS is "peripheral Chip Select"
"3" is the default, backward-compatible with the AVR single-SS (I could have used BOARD_PIN_TO_SPI_CHANNEL(BOARD_SPI_DEFAULT_SS), from variant.h, which is what I did manually.)

The write to the SPI data register has data in the low bits and chipselect info in the high bits (apparently the upper bits get copied directly to pins, so they're active-low, which was what was causing my problems. the SPI_PCS macro fixes that.)

myspi is just a pointer to the register set having to do with that SPI port.  By copying it from the library object, I avoided having to figure out whether there was more than one, or which one was used, and I could be sure that I could mix the library code with the bare metal code...

Quote
Not much speed difference than the library
To be honest, I don't know if we'll get any actual speedup at the UDP level.  8 bits every 1750ns (from your first message) is almost 5Mbps, which isn't bad (although the WizNet site says it should do 15Mbps.)  At the moment this is a puzzle in SPI behavior (for me.)


Go Up