Bug in Wire timeout loop cause poor performance

I posted an issue here for a bug in wire read/write that causes the time to read seven RTC date/time registers to be increased from 259 μs to 2,146 μs.

The example I posted is for a PCF8523 RTC at 400 kHz.

The bug causes delay(1) to be called for almost every call to endTransmission or requestFrom.

1 Like

Did you mean, "requires"?

Did you mean, "requires"?

The bug doesn't require delay(1) to be called but delay(1) is almost always called.

Here is one of the places where the bug occurs:

   if(err == FSP_SUCCESS) {
      if(m_read != nullptr) {
        err = m_read(&m_i2c_ctrl,data,length,!sendStop);  // <<<<< call starts a read
      }
    }
    bus_status = WIRE_STATUS_UNSET;
    while(timeout_ms > 0 && bus_status == WIRE_STATUS_UNSET) {  // <<<<< this loop waits 
      timeout_ms--;
      delay(1);
    }
  }

The read is done by an interrupt routine that does a callback when finished.

The callback sets the volatile variable bus_status.

Almost always the wait loop executes once calling delay(1).

So the bug does not require delay(1) to be called but I have not caught a case where delay is not called with my logic analyzer.

As a test I commented out delay(1) and set timeout_ms to 10,000. This is not a fix, just a hack to verify the problem can be fixed.

Time to read seven registers from an RTC chip at 400kHz went from 2,146 μs to 259 μs.

Edit: There were two transactions, one to set the read address and one to read the seven bytes. This caused two calls to delay.

Did a quick test, since I have been playing with I2C sensors with the wifi board on Wire, Wire1 and Wire2 (pins 0,1 can be used if defined) with commenting out the delay(1). Here are the results just using wire scanner.

With delay(1):

Scanning(Wire)...
Device found at address 0x3C  (SSD1306,DigisparkOLED)
Device found at address 0x62  (LidarLite,MCP4725,AtlasEzoORP)
done

Scanning(Wire1)...
Device found at address 0x50  (EEPROM,FRAM)
Device found at address 0x68  (DS1307,DS3231,MPU6050,MPU9050,MPU9250,ITG3200,ITG3701,LSM9DS0,L3G4200D)
Device found at address 0x7C  (FRAM_ID)
done

Scanning(Wire2)...
Device found at address 0x29  (TSL2561,VL6180,TSL2561,TSL2591,BNO055,CAP1188)
Device found at address 0x4B  (ADS1115,TMP102,BNO080)
done

all devices found with the correct addresses.

without the delay(1) wire2 devices return but wrong addresses:

Scanning(Wire)...
Device found at address 0x3C  (SSD1306,DigisparkOLED)
Device found at address 0x62  (LidarLite,MCP4725,AtlasEzoORP)
done

Scanning(Wire1)...
Device found at address 0x50  (EEPROM,FRAM)
Device found at address 0x68  (DS1307,DS3231,MPU6050,MPU9050,MPU9250,ITG3200,ITG3701,LSM9DS0,L3G4200D)
Device found at address 0x7C  (FRAM_ID)
done

Scanning(Wire2)...
Device found at address 0x2A  (SGTL5000,CAP1188)
Device found at address 0x4E  (unknown chip)
done

with that said if I change the delay(1) to delayMicroseconds(50) it works. Be difficult to change all the timouts in all libraries.

Sorry to jump in here.

But if someone is going to change this code, I might humbly suggest, that maybe part of it should be reworked.

For example, suppose err != FSP_SUCCESS or m_read is NULLPTR, as such m_read is never called, so does anything else clear the bus_status? Or will it always have to wait until the timeout value.

Also suppose m_read is called, and right during that ending of that call, one or more interrupt handlers are called, which gives the wire object enough time to complete and change the bus_status. Then your next statement wipes out the fact that wire call completed. So again, I am guessing your call will wait for the entire timeout_ms.

Again, sorry for jumping in.

As long as the timeout is given in ms I see no[1] better solution in the timeout loop.

BUT I could imagine that a much shorter delayMicros() before that loop could prevent entering the loop at all. Perhaps depending on the I2C speed...

[1] Splitting up the delay into multiple delayMicros() in the loop may also help.

How about:

    uint32_t ms_start = millis();
    while(bus_status == WIRE_STATUS_UNSET) {  // <<<<< this loop waits 
      uint32_t delta_time = millis() - ms_start;
      if (delta_time > timeout_ms ) break;  
    }

Can probably don't need the intermediate variable delta_time, and do it in the next line, but was trying to handle the case where millis() rolls over.

Still depends on full ms :frowning:

@KurtE has it right. A lot of work needs to be done to remove the delay call and make it reliable.

That's why I said:

This is not a fix, just a hack to verify the problem can be fixed.

I looked at FSP and the rest of the code and decided not to work on a pull request because the fix won't be a simple change to the timeout loop.

Uno R3 uses interrupts for Wire and can run my test in 307 μs. So I believe Wire can be fixed for R4 and be a bit faster than R3, close to 259 μs.

Yes, to timeout...

But it will return ASAP when the data is there. It won't be stuck waiting full milliseconds.

Now if you can change the API, such that you have a delay in Microsconds, change the calls from millis() to micros.

I have not looked at WIre much, but would not surprise me.

Good luck

Now if you can change the API, such that you have a delay in Microsconds, change the calls from millis() to micros.

Make sure you use a version of R4 that has the micros() fix.

Also I posted two other issues with Wire on github. One is setClock(freq) only works correctly if you use 100kHz, 400kHz, or 1Mhz. It has a case statement for an enum and will do nothing for other values.

Uno R3 accepts any frequency, but with no range check so only works if the result fits in the 8-bit baud register.

Note that in the UNO/Mega AVR core Wire library timeouts are in microseconds.
There is even an API call to set the timeout.

It seems silly not to align this core functionality and its API with the UNO/Mega AVR Wire library.

Also note that on the write side in the wire.cpp code for Uno R4 in the function write_to() the polling delay loop will not check or wait for status to change if the timeout is zero.
In the Uno/Mega AVR core Wire library, a timeout of zero is allowed and means poll forever with no timeout.

--- bill

Does this work with the WiFi version? I have the problem with the Minima that Wire.setClock(400000); doesn't seem to work...

  Wire.begin();                   // Setup I2C 
  Wire.setClock(400000);          // Doesn't work... just get default 100kHz

I have to stamp on the registers directly to get the right speed

#define IICBASE 0x40050000 
#define IIC1_ICMR1 ((volatile unsigned char  *)(IICBASE + 0x3102))
#define ICMR1_CKS_2_0  4  // Internal Reference Clock Select bit-0 position
#define IIC1_ICBRL ((volatile unsigned char  *)(IICBASE + 0x3110))
#define IIC1_ICBRH ((volatile unsigned char  *)(IICBASE + 0x3111))

void set_serial_iic1_400(void)
  {
  *IIC1_ICMR1 &= ~(0b010 << ICMR1_CKS_2_0);   // Clear bit to clear CKS[2:0] to 0b000, and change BRL & BRH for 400kHz 
  *IIC1_ICBRL = ~0x06; // = 0xF9;             // Originally 0xFB - change BRL
  *IIC1_ICBRH = ~0x09; // = 0xF6;             // Originally 0xFA - change BRH
  }

You are correct Wire.setClock(400000); does not work. There is a PR that is working on correcting this - Wire: fix setClock #46

Started out as an issue. But I am getting different results for clock with the proposed settings.

2 Likes

A post was split to a new topic: My two hour delay code only delays seven minutes