tracking down slow or inefficient code

Hi all,

This is a two-part question that I have not found an answer to thus far by searching here and elsewhere - likely because I'm not asking it correctly :slight_smile:

I have a project which is suffering from a slowdown or bottleneck in the code.

The hardware is a Mega running a couple of I2C devices including a 64x48 OLED screen, a sensor on non-hardware-based SPI, a few buttons for input and four pins for output.

The code, mostly borrowed from various sources, performs simple control calculations and runs a menu system and state machines for the UI and for the control of the hardware. The control problem has low precision requirements: a once per second update of reads, outputs and state machines is adequate and is implemented using a timer library.

I figured out after a bunch of trial and error that the slow-down (visible in scrolling text on the display and, under certain conditions, in the feedback from the UI LEDs) is caused by the function that reads the sensor over (bit-banged) SPI.

So the first part of the question is a general one: are there any methods, other than trial and error, of finding slow or inefficient code?

The second part is more specific: how would I best go about working around this problem (code is below)? I am guessing that moving the SPI communication to the dedicated hardware pins would help, but may not completely solve the problem. Since I don't particularly care about the how quickly, or exactly when the information from the SPI arrives, could I prioritize the I2C communication? There is a bunch of floating point calculation happening too - I am sure this isn't helping...

Many thanks!

void readThermo() {
  // DPRINT("[readThermo - currentBoilerTemp: "); DPRINTLN(currentBoilerTemp);
  total -= priorReadings[readIndex];
  priorReadings[readIndex] = max.temperature(100, RREF);
  total += priorReadings[readIndex];
  readIndex++;
  if (readIndex >= THERMO_SAMPLES) {
    readIndex = 0;
  }
  currentBoilerTemp = ((((total / THERMO_SAMPLES) - RAW_LOW)* REFERENCE_RANGE) / RAW_RANGE) + REFERENCE_LOW;
}

My guess (just off the cuff, not much code to look at) is your SPI is 99% of your issue. Displays are SLOW! All the stuff done in the processor, no matter how "slow", is blindingly fast. Except delay(), that's continental drift slow.

-jim lee

brian kernighan who coined the term Unix, wrote The C Programming Language, The Unix Programming Environment, ... suggests making measurements and not guessing.

you can use micros() to capture timestamps at various points during execution to find where the most time is spent and then narrow it down.

the function that reads the sensor over (bit-banged) SPI.

Use the hardware SPI. It is easily 100x faster and possibly more, depending on the sensor specs.

What is "max.temperature()" doing ?

The floating point library for the AVR library is fast. It is extremely optimized.
Perhaps there is something else that makes it slow. Can you show all of your code ?

What gcjr mentioned is like this:

unsigned long t1 = micros();
void readThermo();
unsigned long t2 = micros();
Serial.print( "readThermo took ");
Serial.print( t2 - t1)
Serial.println( " microseconds");

Even if your sketch has no delay(), when sending too much data to the serial port will also slow down the sketch. When the TX buffer in the Serial library is full, the function Serial.println() waits until there is a free spot to put a byte in the buffer.

bidoowee:
.. the slow-down ... is caused by the function that reads the sensor over (bit-banged) SPI.

Unless you are producing a product in large volumes where every penny counts and you have enough time to optimize your software, hardware peripherals are usually the better choice. Buy a bigger controller if you need multiple peripherals. One hour of your time saved buys you a nice fast microcontroller.

bidoowee:
.. are there any methods, other than trial and error, of finding slow or inefficient code?

With professional development tools and bigger devices, you can use instruction trace to see exactly which instruction, function or library uses the most amount of time. Otherwise you can use some timers to measure time. Some processors have cycle counters and other features that are usually used by debug tools.

bidoowee:
There is a bunch of floating point calculation happening too - I am sure this isn't helping...

Floating point in hardware is at least ten times faster than software floating point. This depends on the architecture, compiler and libraries used. Floating-point is often not necessary on small microcontrollers and can be replaced by fixed point code. For cases with only a few FP instructions you can save the memory needed for the library and with instructions executed ten thousand times as second or more you can save a lot of processing time. But as said before, if you do not have a high-volume product you can simply buy some better hardware. This can even be cheaper.

asking your board to do a little math wont slow you down. Any chip will do math unfathomably fast. even little Arduino chips will do millions of instructions a second. Writing to serial can slow things down a bit. Writing to a display can slow you down a lot. Make sure you are not updating your display any more than you have to! if you cant figure out where your problem is just comment out lines till you find the problem.

jremington:
Use the hardware SPI. It is easily 100x faster and possibly more, depending on the sensor specs.

Ah - I had read that it was at more like 3x faster. This is a significant difference and would definitely solve the problem. I guess I will have to modify the hardware and re-test.

Klaus_K:
Unless you are producing a product in large volumes where every penny counts and you have enough time to optimize your software, hardware peripherals are usually the better choice. Buy a bigger controller if you need multiple peripherals. One hour of your time saved buys you a nice fast microcontroller.
With professional development tools and bigger devices, you can use instruction trace to see exactly which instruction, function or library uses the most amount of time. Otherwise you can use some timers to measure time. Some processors have cycle counters and other features that are usually used by debug tools.

Just buy a bigger motor :slight_smile: Yes, this is true, but switching controllers at this point would require significantly more than an hour.

I am curious to know if by 'professional development tools and bigger devices' you are excluding an Arduino with a hardware debugger and something like Visual Studio?

Sorry - this post was supposed to have preceded the last one - not sure how that happened.

In any case - many thanks for the helpful replies and suggestions.

jimLee:
My guess (just off the cuff, not much code to look at) is your SPI is 99% of your issue. Displays are SLOW!

-jim lee

The SPI part is indeed the problem. I added some measurements to the code as I should have done in the first place to quantify the issue. Note however, that the screen is I2C and the refresh time had already been eliminated a as cause for the slow-down.

As many suggested, the floating point part is of microseconds magnitude and not an issue.

The max.temperature() function (code below) from the Adafruit_MAX31865 library takes 75-79 milliseconds - which is readily perceivable.

float  Adafruit_MAX31865::temperature(float RTDnominal, float refResistor) {
  // http://www.analog.com/media/en/technical-documentation/application-notes/AN709_0.pdf

  float Z1, Z2, Z3, Z4, Rt, temp;

  Rt = readRTD();
  Rt /= 32768;
  Rt *= refResistor;
  
  //Serial.print("Resistance: "); Serial.println(Rt, 8);

  Z1 = -RTD_A;
  Z2 = RTD_A * RTD_A - (4 * RTD_B);
  Z3 = (4 * RTD_B) / RTDnominal;
  Z4 = 2 * RTD_B;

  temp = Z2 + (Z3 * Rt);
  temp = (sqrt(temp) + Z1) / Z4;
  
  if (temp >= 0) return temp;

  // ugh.
  float rpoly = Rt;

  temp = -242.02;
  temp += 2.2228 * rpoly;
  rpoly *= Rt;  // square
  temp += 2.5859e-3 * rpoly;
  rpoly *= Rt;  // ^3
  temp -= 4.8260e-6 * rpoly;
  rpoly *= Rt;  // ^4
  temp -= 2.8183e-8 * rpoly;
  rpoly *= Rt;  // ^5
  temp += 1.5243e-10 * rpoly;

  return temp;
}

The actual read function:

uint16_t Adafruit_MAX31865::readRTD (void) {
  clearFault();
  enableBias(true);
  delay(10);
  uint8_t t = readRegister8(MAX31856_CONFIG_REG);
  t |= MAX31856_CONFIG_1SHOT;      
  writeRegister8(MAX31856_CONFIG_REG, t);
  delay(65);

  uint16_t rtd = readRegister16(MAX31856_RTDMSB_REG);

  // remove fault
  rtd >>= 1;

  return rtd;
}

bidoowee:
I have a project which is suffering from a slowdown or bottleneck in the code.

You need to post the complete program.

However, having said that, if it is a long program I am unlikely to study it - I'm lazy. Problems should be sorted out before a program gets big.

...R

That is why I didn't post the whole project - there is 38k of code that no one has the time to read. But the specific problem has been definitively identified: commenting out the call to the SPI sensor reading makes the problem go away. Further testing using millis() shows that the two library functions I posted a few minutes ago are taking nearly 80 ms to execute.

That'll be the 75 ms of delays.

Rewrite as a simple state machine.

OMG - It didn't even occur to me that there would be actual delay() calls in the library.

However, I'm not sure exactly how to go about remedying the situation with a state machine. Could you elaborate a little?

Well, assuming (I haven't seen the device datasheet) that a measurement takes a certain amount of time in the hardware, you could split that call into clearFault / enableBias, then return, having started a software timer. (10ms)
Subsequent calls would simply return if the timer hadn't elapsed (with a status to indicate that, of course).
Then the read-modify-write of the register and another software timer for 65ms.
When that has lapsed, read the result and return.
I'd probably make the function return a bool, and pass the result back by reference.
A couple of static variables (state and timer) plus a switch/case driven by the state variable.

That's a quick and dirty way.

Rough untested semi-pseudo code example....

int state = 0;
unsigned long delayTime = 1000;
unsigned long lastStateChangeTime;

void DoSPIWorkInBackground()
{
  unsigned long now = millis();
  if (now - lastStateChangeTime >= delayTime)
  {
    switch (state)
    {
      case 0:
        //Waiting for more work
        break;
      case 1:
        clearFault();
        enableBias(true);
        state = 2;
        delayTime = 10;
        break;
      case 2:
        uint8_t t = readRegister8(MAX31856_CONFIG_REG);
        t |= MAX31856_CONFIG_1SHOT;      
        writeRegister8(MAX31856_CONFIG_REG, t);
        state = 3;
        delayTime = 65;
        break;
      case 3:
        uint16_t rtd = readRegister16(MAX31856_RTDMSB_REG);
        // remove fault
        rtd >>= 1;
        // do something with new reading
        // back to waiting state...
        state = 0;
        delayTime = 1000;
        break;      
    }
    lastStateChangeTime = now;
  }
}

You might want to revisit state 0.

TheMemberFormerlyKnownAsAWOL:
Well, assuming (I haven't seen the device datasheet) that a measurement takes a certain amount of time in the hardware, you could split that call into clearFault / enableBias, then return, having started a software timer. (10ms)
Subsequent calls would simply return if the timer hadn't elapsed (with a status to indicate that, of course).
Then the read-modify-write of the register and another software timer for 65ms.
When that has lapsed, read the result and return.
I'd probably make the function return a bool, and pass the result back by reference.
A couple of static variables (state and timer) plus a switch/case driven by the state variable.

That's a quick and dirty way.

pcbbc:
Rough untested semi-pseudo code example....

Thank you both for your help. I will have a bash at finishing and testing the pseudo code. I agree that state 0 needs some work :wink:

bidoowee:
I am curious to know if by 'professional development tools and bigger devices' you are excluding an Arduino with a hardware debugger and something like Visual Studio?

With bigger device I mean devices which have hardware instruction trace e.g. ETM for ARM Cortex-M processors. Because the pins for trace are not available during development, devices with small pin count usually do not have trace. During development engineers can use a device from the same family with more pins.

As for the tools there are many vendors like ARM/Keil, IAR, Lauterbach, Segger and others that provide trace for embedded processors but they are usually too expensive for hobby use.
Visual Studio is certainly a professional development tool (it costs money and you can buy support for it). I have seen engineers using it for embedded development but they used a separate debug environment like the ones mentioned above.

Many thanks for the replies - also to Klaus_K for responding again to the initial general question.

I've spent a couple of hours trying to complete the code posted by pcbbc and have reached an impasse and have some I'm-a-bit-out-of-my-depth/am-I-going-in-the-right-direction questions:

1 - the readRegister and writeRegister functions are not public in the library. I am assuming that moving them to public in the header file is ok? Implicit in this assumption is a second one i.e. that the intention was for the new read function to reside in the project code and not the library.

2 - my inelegant solution for exiting the wait state 0 was to pass a variable to use as the trigger for a state change. I am sure that this is not how it is supposed to be done. I am also not really clear about how the function is intended to be used as the delayTime is set to 1000 in state 3 which suggests to me that the intention is for the function to be called continuously in a loop. Which leads me to wonder why there is a wait state at all...

3 - finally, i ran into a bug which has me stumped. As written, the code passes correctly from state 0 through 2 but hangs before it reaches state 3. After exiting state 2 the lastStateChangeTime is always the same as now

Code and modified library header below.

As always, all comments and aid are massively appreciated!

#include <SPI.h>
#include "Adafruit_MAX31865_custom.h"


int state = 0;
unsigned long delayTime = 1000;
unsigned long lastStateChangeTime;

// existing hardware needs this pin
#define SOLENOID_PIN				      12

// Use software SPI: CS, DI, DO, CLK
Adafruit_MAX31865 max = Adafruit_MAX31865(14, 15, 16, 17);


void setup() {

  Serial.begin(115200);

  pinMode(SOLENOID_PIN, OUTPUT);
  digitalWrite(SOLENOID_PIN, HIGH);

  max.begin(MAX31865_3WIRE);

}


unsigned long lastTime;

void loop() {
 unsigned long timeNow = millis();
 if (timeNow - lastTime >= 100) {
   DoSPIWorkInBackground(true);
   lastTime = timeNow;
 }
 // DoSPIWorkInBackground(true);
}



void DoSPIWorkInBackground(boolean readNow)
{
  unsigned long now = millis();
  if (now - lastStateChangeTime >= delayTime)
  {
    switch (state)
    {
    case 0:
      Serial.print("state: ");
      Serial.println(state);
      if (readNow) state = 1;
      //Waiting for more work
      break;
    case 1:
      Serial.print("state: ");
      Serial.println(state);
      max.clearFault();
      max.enableBias(true);
      state = 2;
      delayTime = 10;

      break;
    case 2:
      Serial.print("state: ");
      Serial.println(state);
      uint8_t t = max.readRegister8(MAX31856_CONFIG_REG);
      t |= MAX31856_CONFIG_1SHOT;
      max.writeRegister8(MAX31856_CONFIG_REG, t);
      state = 3;
      Serial.print("state set to 3 in state 2: ");
      Serial.println(state);
      delayTime = 65;

      break;
    case 3:
      Serial.print("state: ");
      Serial.println(state);
      uint16_t rtd = max.readRegister16(MAX31856_RTDMSB_REG);
      // remove fault
      rtd >>= 1;
      // do something with new reading
      // back to waiting state...
      Serial.print("reading: ");
      Serial.println(rtd);
      state = 0;
      delayTime = 1000;

      break;
    }
    lastStateChangeTime = now;
  }
  
  Serial.print("now: ");
  Serial.print(now); Serial.print("  |  ");
  Serial.print("lastStateChangeTime: ");
  Serial.print(lastStateChangeTime); Serial.print("  |  ");
  Serial.print("now - lastStateChangeTime: ");
  Serial.println(now - lastStateChangeTime);

  Serial.print("delayTime: ");
  Serial.print(delayTime); Serial.print("  |  ");
  Serial.println();

}

custom library header

/*************************************************** 
  This is a library for the Adafruit PT100/P1000 RTD Sensor w/MAX31865

  Designed specifically to work with the Adafruit RTD Sensor
  ----> https://www.adafruit.com/products/3328

  This sensor uses SPI to communicate, 4 pins are required to  
  interface
  Adafruit invests time and resources providing this open source code, 
  please support Adafruit and open-source hardware by purchasing 
  products from Adafruit!

  Written by Limor Fried/Ladyada for Adafruit Industries.  
  BSD license, all text above must be included in any redistribution
 ****************************************************/

#ifndef ADAFRUIT_MAX31865_CUSTOM_H
#define ADAFRUIT_MAX31865_CUSTOM_H

#define MAX31856_CONFIG_REG            0x00
#define MAX31856_CONFIG_BIAS           0x80
#define MAX31856_CONFIG_MODEAUTO       0x40
#define MAX31856_CONFIG_MODEOFF        0x00
#define MAX31856_CONFIG_1SHOT          0x20
#define MAX31856_CONFIG_3WIRE          0x10
#define MAX31856_CONFIG_24WIRE         0x00
#define MAX31856_CONFIG_FAULTSTAT      0x02
#define MAX31856_CONFIG_FILT50HZ       0x01
#define MAX31856_CONFIG_FILT60HZ       0x00

#define MAX31856_RTDMSB_REG           0x01
#define MAX31856_RTDLSB_REG           0x02
#define MAX31856_HFAULTMSB_REG        0x03
#define MAX31856_HFAULTLSB_REG        0x04
#define MAX31856_LFAULTMSB_REG        0x05
#define MAX31856_LFAULTLSB_REG        0x06
#define MAX31856_FAULTSTAT_REG        0x07


#define MAX31865_FAULT_HIGHTHRESH     0x80
#define MAX31865_FAULT_LOWTHRESH      0x40
#define MAX31865_FAULT_REFINLOW       0x20
#define MAX31865_FAULT_REFINHIGH      0x10
#define MAX31865_FAULT_RTDINLOW       0x08
#define MAX31865_FAULT_OVUV           0x04


#define RTD_A 3.9083e-3
#define RTD_B -5.775e-7

#if (ARDUINO >= 100)
 #include "Arduino.h"
#else
 #include "WProgram.h"
#endif

typedef enum max31865_numwires { 
  MAX31865_2WIRE = 0,
  MAX31865_3WIRE = 1,
  MAX31865_4WIRE = 0
} max31865_numwires_t;


class Adafruit_MAX31865 {
 public:
  Adafruit_MAX31865(int8_t spi_cs, int8_t spi_mosi, int8_t spi_miso, int8_t spi_clk);
  Adafruit_MAX31865(int8_t spi_cs);

  boolean begin(max31865_numwires_t x = MAX31865_2WIRE);

  uint8_t readFault(void);
  void clearFault(void);
  uint16_t readRTD();


  void setWires(max31865_numwires_t wires);
  void autoConvert(boolean b);
  void enableBias(boolean b);

  float temperature(float RTDnominal, float refResistor);

  uint8_t  readRegister8(uint8_t addr);
  uint16_t readRegister16(uint8_t addr);

  void     writeRegister8(uint8_t addr, uint8_t reg);

 private:
  int8_t _sclk, _miso, _mosi, _cs;

  void readRegisterN(uint8_t addr, uint8_t buffer[], uint8_t n);

  //uint8_t  readRegister8(uint8_t addr);
  //uint16_t readRegister16(uint8_t addr);

  // void     writeRegister8(uint8_t addr, uint8_t reg);
  uint8_t spixfer(uint8_t addr);
};


#endif

Let me start with a few simple things I would change.

Make all global variables that are only used inside one function static and declare them inside the function. This makes the code easier to read, you can reuse the names and you can copy functions from one project to another without looking for all those global variables.

void DoSPIWorkInBackground(boolean readNow)
{
static int state = 0; // only set to 0 once
...
}

Use enum for the states to make the code easier to read and modify. Declare enum inside the function.

void DoSPIWorkInBackground(boolean readNow)
{
enum SPI_STATE_TYPE { SPI_STATE_WAIT,
SPI_STATE_SEND,
...
SPI_STATE_RESTART = 255
};
...
}

For simple state forward use state++; this will allow you to insert states.
For backward steps use the new state enum name e.g. state = SPI_STATE_RESTART;

Move the state print before the switch statement. Reduces code and makes it easier to read.

Remove the time check from outside the state machine. States that are simple should just return. You can use time checks inside the states. In your case just create separate states that wait. Makes it easy to read. You can have multiple of these in your state machine.

Add a default to the switch statement. Just in case you mess things up. Especially during development. Just reset your variables in this case and set the state to the first state.

bidoowee:
1 - the readRegister and writeRegister functions are not public in the library. I am assuming that moving them to public in the header file is ok? Implicit in this assumption is a second one i.e. that the intention was for the new read function to reside in the project code and not the library.

Sounds reasonable. You could also add your own public functions that just call the private functions inside. This would keep the old API the way it is. Most libraries try to hide these low layer functions and there could be a reason for it.

bidoowee:
2 - my inelegant solution for exiting the wait state 0 was to pass a variable to use as the trigger for a state change. I am sure that this is not how it is supposed to be done. I am also not really clear about how the function is intended to be used as the delayTime is set to 1000 in state 3 which suggests to me that the intention is for the function to be called continuously in a loop. Which leads me to wonder why there is a wait state at all...

Do not worry about how it is supposed to be done too much. If it is easy to read and runs with good performance go for it. When you find another way later you can always change it. :slight_smile:
Regarding the time as said before I would only use it for specific cases. In your case this depends on the sensor. You need to have a look into the datasheet.

bidoowee:
3 - finally, i ran into a bug which has me stumped. As written, the code passes correctly from state 0 through 2 but hangs before it reaches state 3. After exiting state 2 the lastStateChangeTime is always the same as now

Can you show what your Serial Monitor is printing? The code looks OK. Do you have an oscilloscope/logic analyzer to confirm what is happening on the signals?

Additionally, reading values from external interfaces takes time. You might want to look at the library and change the code in a way that you do not wait for values to be returned. You can start the read and return later to get the value. But this is a question of how slow your interface is compared to your processor.