Precision timing issues

Hi Everyone,

So I am making this post mostly to talk myself (and others) through some common problems with precision timing. I do have an unsolved problem that I could use some help with, but to start off with I will discuss how to get extremely accurate timing from your Arduino. In theory.

I am working on a precision stepper motor library, using a STEP/DIR based motor driver. Microstepping at high speeds requires that STEP pin be toggled fairly rapidly and fairly accurately if you want to do things like precisely move objects with sub-mm accuracy and with accurate speeds.

For those who just want to learn from this post, when I say "motor", just read that as "something that I need to turn on and off with high accuracy".

There are two main problems to overcome here. The biggest and baddest is timing drift. Here's some code that causes it and also happens to be an amazingly common technique. Note, that lastStepTime and stepInterval (member variables of a class) are also unsigned longs (uint32_t).

// this causes timing error!
uint32_t curTime = micros()
if ((curTime - lastStepTime) >= stepInterval)
{
	// do stuff to step the motor
	
	lastStepTime = curTime; // this is the line of code that causes the problem
}

The micros() function is accurate, but it's not the problem. We're all doing other stuff in our sketches, so it's not like we have bare-bones loop() routines that can just check micros() every single microsecond to determine if it's time to do something. Instead, micros() is more likely to be sampled much less frequently, maybe once every 10 microseconds. So you might toggle a pin a few microseconds late, who cares? Well, if you look at the code above, you're not just going to toggle the pin late that one time. You're baking the error into lastStepTime and next time around it will be even worse. It will continue to get worse with every iteration. This is cumulative error, and in the real world it can get bad. It's not just something for nitpickers to worry about! Here's what it looks like when microstepping (1/8 steps) a stepper motor between 50 and 400 steps per second:

Okay, so the motor is running more slowly that desired. And it's bad - up to 10 steps/second off at higher speeds. You would not want to make a 3d printer with this type of problem.

The fix is, thankfully, amazingly easy.

// this is the correct way!
uint32_t curTime = micros();
if ((curTime - lastStepTime) >= stepInterval)
{
	// do stuff to step the motor
	
	lastStepTime += stepInterval; // all better now!
}

The caveat is that you need to be careful that you set lastStepTime = micros() at the start of a sequence of steps. If you let it lag behind, you're going to get very unwelcome high-speed pulses. For a stepper motor, it will simply stall and your position index will become completely off. For other devices, worse might happen. So take care, but it's an easy thing to do.

Here's what it looks like when we correct for the timing drift (and where a few folks might notice my pesky problem):

It's much better but now! Our worst-case error went down from being 10 steps/second off to being 1.7 steps/second off.

But we still have error and its appearance tells us something about why. We have the motor tending to always be faster than expected. The reason for that lies in integer math and the rounding error that results. When you calculate your timing interval, whatever digits might be after the decimal point just get thrown away. If stepInterval should be 400.9, you're going to get 400 from integer math. This rounding error will always err on the side of making things just a tad too fast and it will be more obvious at higher speeds. But the solution is NOT to use floats. They are slow.

Here's how my calculation of stepInterval looks:

uint32_t numer = 1000000UL;
uint32_t denom = (uint32_t)stepsPerSec << driveMode;
uint32_t stepInterval = numer / denom; // here comes the rounding error!

Absolutely slaying the rounding error is fairly easy. When you calculate stepInterval, determine the remainder (using modulus) of the division operation. And use that remainder to determine what proportion of steps should actually be delayed by 1 us. For example, if stepInterval should be 400.9 in a perfect world, then in the integer world we delay our steps by 1 us 90% of the time. On average, then, we get 400.9 even though we're using integer math.

Here's the code for how to do that when calculating stepInterval:

uint32_t numer = 1000000UL;
uint32_t denom = (uint32_t)stepsPerSec << driveMode;
uint32_t stepInterval = numer / denom;
uint8_t speedAdjustProbability = ((numer % denom) * 255) / denom;

And the code for putting that to use:

curTime = micros();
if ((curTime - lastStepTime) >= stepInterval)
{
	// do stuff to step the motor
	lastStepTime += stepInterval;
	
	// increment lastStepTime if needed to fix rounding error
	// Note: speedAdjustCounter is a byte (uint8_t)
	if (speedAdjustCounter++ < speedAdjustProbability) lastStepTime++;
}

Nice and simple. Here is that final result of all these corrections:

And now our worst case error is 0.5 steps/second. Perhaps that is "problem solved" in many cases, but not in my case.

There is still cumulative error! And there shouldn't be. I cannot figure out why the error is rising with speed. I have tried everything I can think of to get rid of it. Is it caused by micros() itself? When running, micros() disables interrupts very briefly... and my code is efficient enough that this happens very frequently. Could calling micros() every 9 us or so actually cause timing problems by frequently disabling interrupts (including timer overflow interrupts)?

It's not the crystal, because I'm using the Arduino to diagnose itself. Even if the crystal were way off, it wouldn't notice.

Here's where it gets stranger still. I've timed the frequency of step pulses with my oscilloscope (Siglent SDS 1052DL) and it is reading accurate step timing, assuming I'm using it correctly. But I doubt myself just enough... could be code I'm using to measure my timing be at fault? Here it is:

I'm aware it probably doesn't make perfect sense without the rest of the sketch it is part of. If needed I can post the whole thing... but as an attachment no doubt.

void measureSpeed(uint32_t overDist)
{
	int32_t startPos = mot.getPos();
	int32_t targetPos = mot.getTarget();
	int32_t testEndPos = (targetPos > startPos) ? (startPos + overDist) : (startPos - overDist);
	uint32_t startTime, endTime;
	
	uint16_t startSpeed = mot.getCurSpeed();
	startTime = micros();
	while (mot.work())
	{
		if ((targetPos > startPos) && (mot.getPos() >= testEndPos)) break;
		else if ((targetPos < startPos) && (mot.getPos() <= testEndPos)) break;
	}
	endTime = micros();
	uint16_t endSpeed = mot.getCurSpeed();
	int32_t endPos = mot.getPos();
	
	// calculate speed
	int32_t dist = endPos - startPos; // microsteps
	uint32_t time = endTime - startTime; // us
	float measuredSpeed = (1000000.0 * dist) / (time * mot.fullStepVal);
	
	Serial.print(String(measuredSpeed));
	Serial.print(F(" st/s avg ("));
	if (startSpeed == endSpeed)
		Serial.print(String(startSpeed));
	else
	{
		Serial.print(String(startSpeed));
		Serial.print(F(" - "));
		Serial.print(String(endSpeed));
	}
	Serial.println(F(" st/s)"));

}

It's not pretty but it's a testing sketch, I haven't quite taken the time to make it as elegant as it could be.

Anything stand out there as a problem? This is driving me nuts. There absolutely should be a bit of error, but it should oscillate around 0. Not have a tendency to rise.

The business of using

lastStepTime += stepInterval; // all better now!

was extensively discussed in Several Things at a Time

If you want very accurate timing with short intervals why not use one of the hardware Timers ?

...R

All three timers seem to be used by Arduino core, so using any of them inevitably interferes with something. Timer 1 seems the safest but seemed to be used by some parts of the PWM library (I went snooping around the core code and saw it was being used in there).

I actually tried using Timer1. It absolutely made things better (and more efficient!) by counting clock cycles. But only marginally so. You really can go quite a long ways with micros(), with the proper corrections. When it came down to it, I compared the code I wrote for counting clock cycles with what micros() was doing... barring some #defines to account for different hardware, it was basically the same deal. Setting the timer to increment every tick, and dealing with the overflow ISR. In fact you (by which I mean whoever maintains the Arduino core) could probably just add a ticks() function rather easily to put all of these woes to rest once and for all. That way, portability is more likely because all the Arduino compatible folks out there will want to make sure that the public API still works on their hardware. But some random guy's library? Na, they won't care if that breaks.

And I got to thinking. Doesn't using Timer 1 myself reduce portability? If I code for the Timer1 on the ATMega328, will my library work with other hardware? What about "Arduino compatible" stuff like the Teensy? That may not be a favorite topic here, but the Arduino ecosystem has really taken off and it's nice to build libraries that are as portable as possible. All of those "Arduino Compatible" folks are just emulating your public API, so those really are the safe options.

I feel I'm already tempting fate enough by using some of the core's secret features (digitalPinToBitMask(), portOutputRegister(), digitalPinToPort()) to bit bang the STEP pin directly but still allow the library to be used by the end user with standard Arduino pin numbers. Seriously, all of these hidden features, constants like F_CPU, etc.. should be better documented! Some of them are amazingly useful.

Anyway, any idea as to why I'm seeing cumulative error?

Rylee_Isitt:
And I got to thinking. Doesn't using Timer 1 myself reduce portability?

Probably. But I had the impression you have a very specialised application where that would be of little consequence.

...R

This is for a general purpose stepper motor library.

Although such things may be used purely for simple robotics, home automation, etc... there are times when accuracy will be important, and I didn't want to preclude those uses.

However, looking at the code in wiring.c, I see that there are a few ifdef/elif/else preprocessor instructions that account for different hardware. If I do the same with Timer1 I believe that would make the code fairly portable. Specifically, the big gotcha is that the timer 1 overflow ISR has a different name for the ATTiny series, and some of the registers also seem to have different names for different chips.

I half expect my problem to mysteriously vanish tonight after I do a small refactor. I may never figure out the cause.

first micros() has a resolution of 4 uSec

Are you familiar with the Bresenham line algorithm for lines with a slope?
In a similar way you can sum the error made in lastStepTime by adding the error per step.
If it comes above the threshold_error lastStepTime is increased by 1.

working: due to the integer division in stepInterval = time_in_usec / steps; stepInterval is a bit too short (in time). The error per step equals ((float) time_in_usec - stepInterval * steps) / steps; . We can sum the error per step and if that sum >= 1 (threshold) we must increase stepInterval by 1. Then of course we must decrease the sum of the errors by 1 (threshold).
To remove the division by steps and the floating point operations we multiply the error logic by steps resulting in integer math only:
error_per_step = time_in_usec - stepInterval * steps;
threshold = steps;

This leads to the following code:

//
//    FILE: BresenhamTiming.ino
//  AUTHOR: Rob Tillaart
// VERSION: 0.1.00
// PURPOSE: demonstration
//    DATE: 2015-10-07
//     URL: http://forum.arduino.cc/index.php?topic=351925.0
//
// Released to the public domain
//

uint32_t time_in_usec = 0;
uint32_t steps = 0;
uint32_t stepInterval = 0;
uint32_t step_error = 0;
uint32_t threshold_error = 0;
uint32_t total_error = 0;

uint32_t lastStepTime = 0;

void setup()
{
  Serial.begin(115200);
  Serial.print("Start ");
  Serial.println(__FILE__);

  time_in_usec = 1000000UL;
  steps = 154;                                    // change this to the nr of steps you like.
  stepInterval = time_in_usec / steps;
  step_error = time_in_usec - stepInterval * steps;
  threshold_error = steps;
  total_error = 0;
}

void loop()
{
  uint32_t curTime = micros();
  if ((curTime - lastStepTime) >= stepInterval)
  {
    // do stuff to step the motor
    Serial.println(lastStepTime);

    lastStepTime += stepInterval;
    total_error += step_error ;
    if (total_error >= threshold_error)
    {
      lastStepTime++;
      total_error -= threshold_error;
    }
  }
}

please give it a try, I am curious what error you find for the different values.

a variation of the code checks if the timing matches 1000000 exactly for all values between 1 and 5000

//
//    FILE: BresenhamTiming.ino
//  AUTHOR: Rob Tillaart
// VERSION: 0.1.01
// PURPOSE: demonstration
//    DATE: 2015-10-07
//     URL: http://forum.arduino.cc/index.php?topic=351925.0
//
// Released to the public domain
//

uint32_t time_in_usec = 0;
uint32_t steps = 0;
uint32_t stepInterval = 0;
uint32_t step_error = 0;
uint32_t threshold_error = 0;
uint32_t total_error = 0;

uint32_t lastStepTime = 0;

void setup()
{
  Serial.begin(115200);
  Serial.print("Start ");
  Serial.println(__FILE__);

  for (int i = 1; i < 5001; i++)
  {
    time_in_usec = 1000000UL;
    steps = i;
    stepInterval = time_in_usec / steps;
    step_error = time_in_usec - stepInterval * steps;
    threshold_error = steps;
    total_error = 0;
    lastStepTime = 0;

    test(i);
  }
}

void loop()
{
}

void test(int n)
{
  Serial.print(n);
  Serial.print('\t');
  while (lastStepTime <= 1000000)
  {
    uint32_t curTime = micros();
    if ((curTime - lastStepTime) >= stepInterval)
    {
      // do stuff to step the motor
      if (lastStepTime == 1000000)  // check if it reaches 1000000 exactly
      {
        Serial.print(lastStepTime);
      }

      lastStepTime += stepInterval;
      total_error += step_error ;
      if (total_error >= threshold_error )
      {
        lastStepTime++;
        total_error -= threshold_error;
      }
    }
  }
  Serial.println();
}

give it a try

Hi Rob,

That seems to be a similar idea to what I'm doing already (see my original post, starting off where I write "But we still have error and its appearance tells us something about why").

Essentially I am using an extra byte to hold the decimals of the division operation, multiplied by 255. Then I increment a counter every time the timing condition holds true. If the counter is less than the remainder, the next scheduled step is offset by 1 microsecond.

So, if your decimals are 0.75, that becomes a byte value of 191, and the next step will be delayed by 1 microsecond between a counter value of 0 and 190. From 191 to 255, the next step will not be delayed. The next increment of the counter causes is to overflow back to 0 and we start adding the 1 us delays back in again. This seems to work extremely well, and you can see that most of the timing error is gone. Leaving a smooth non-linear curve.

I believe I should expect no more than 8 us of total error, regardless of speed. That is, the start time for a sequence of steps may be up to 4 us off, and the end time may also be up to 4 us off. If that were the case I would expect a fixed amount of error regardless of speed - my last graph would have no significant slope.

The fact that it does have a slope... and a non-linear slope, with apparently discrete levels of error (as you can see more prominently in the beginning), suggests something funny is going on. Maybe I've got a > when I should have a >=, or maybe the other way around. But I've gone and fiddled with most of that logic, and by and large changing those operators hardly makes a difference.

At higher speeds, it is necessary for the timing to be checked more frequently. At 400 steps/sec, which when 1/8 stepping requires a 3200 Hz square wave, I should be checking the timing at least every 312 microseconds and probably much more frequently. Turns out I'm checking it every 9 microseconds, so I don't think that's the problem either.

I might try your variation of the solution tonight and see how it goes. However I am more likely to ditch micros() and use a hardware timer. Funny because the code I'd write for a hardware timer is almost exactly the same as what appears in the micros() function in wiring.c... so close that it feels like re-inventing the wheel.

I'd set it to increment every clock cycle and just use F_CPU in my calculation of how many clock cycles to wait between steps. Doing that I could probably ditch the rounding error correction entirely.

Well I figured it out, where that remaining error is coming from. I didn't do anything wrong, and micros() is working as it should. Here's the explanation...

  1. The absolute amount of timing error (how accurate your start and end points are) are largely determined by how often you check timing. The resolution of micros() is a fixed quantity, but how often you check timing is variable.

  2. The clock cycles / number of instructions that run when you check timing is greater when it's time to step. That's because the bit of code that checks to see if it's time to step contains a block of instructions that only run when the timing check evaluates to TRUE.

  3. At higher speeds, you step more frequently. So, the speed of your timing code decreases, on average, at higher speeds.

  4. And, because the speed of your timing code decreases, it takes more time to execute, and is executed less frequently.

  5. So, no matter what you do, you will get more error at higher speeds so long as you use this micros() based method. That said, 400.5 steps/second instead of 400 steps/second is not much error. But maybe to some folks, it is enough to cause problems. You can avoid most (all?) of this error by using a hardware timer to trigger your stepping as an ISR (interrupt service routine - there are plenty of good how-tos online for setting those up). At that point your limits are mostly just the speed of your MCU.

  6. Another "solution" is to refactor and tweak and optimize until you get a level of precision that is acceptable to you. That just comes down to doing as much math & calculations outside of your timing check as possible, so that it only has to do the most trivial of tasks.

I intend to do some refactoring that I've been meaning to do for many days now that will shave off about 30% of the clock cycles needed by my timing routine. That includes using a hardware timer (Timer 1). Doing that, I will still see an upward error curve but I expect it to be reduced.

I'll post again when I have succeeded :slight_smile:

Rylee_Isitt:
5) And, because the speed of your timing code decreases, it takes more time to execute, and is executed less frequently.

I'm not quite clear about this. Is it the measuring that is causing the problem - in other words would the code work properly if you were not checking whether the code worked properly?

You can avoid most (all?) of this error by using a hardware timer to trigger your stepping as an ISR (interrupt service routine -

Reply #1 ?

Plus it must free up a lot of MCU cycles for other stuff.

...R

Essentially I am using an extra byte to hold the decimals of the division operation, multiplied by 255.

Note that (integer) division is a new cause for errors.

Hi Robin,

The timing error is real, but it is misleading to express it as speed error as I have done. It is heavily influenced by how you measure it. The error is really just jitter of where exactly the pulses lie, about +/- 9 us in my case since that's how long it takes my code to run through a single iteration before it can check timing again.

On average, though, the frequency of the pulses is reading on my scope as being right on the money. If you look at the trace, each pulse jitters around a little bit, but the average is correct. That jitter is expected when not using a compare match timer interrupt (CTC mode). You can reduce the jitter, but as long as you aren't using timer interrupts to trigger steps, it's always going to be there.

Measuring things by getting the time at the start, sending the motor over a known distance, getting the time at the end, and then back-calculating the step frequency assuming perfectly even steps can give you a poor understanding of what's going on. My frustration was caused by misinterpreting the results of my software-based measurements, and then questioning my oscilloscope as a result. "My code says there's error, my scope does not. What's going on? Maybe my scope isn't precise enough to see the error...". Well, it is, it just reports it in a different, less misleading way.

If you find yourself trying to decide if your relatively expensive oscilloscope is lying to you versus some code you whipped up in 15 minutes, trust the scope first :slight_smile:

I attempted using a hardware timer. It hasn't quite worked out yet. The problem is really mine, and has to do with my specific library's goals. I am doing two things - stepping at a variable speed, and accelerating/decelerating at a variable speed. Because I did not want to occupy two hardware timers using compare match interrupts, I thought I'd simply create a clock cycle counter and use a 16 bit overflow counter - similar to what micros() does but with 64 times the precision.

Unfortunately I have not got this working yet. When you have a timer incrementing literally every clock cycle, you can't actually read its value and the value of a volatile overflow counter simultaneously without first stopping the timer. Once you stop the timer, all is good in terms of getting the values of those two variables. Except that stopping the timer causes clock cycles to go unnoticed, which actually throws off timing quite badly. Every single instruction you use to try to compensate for that is even more clock cycles missed.

In order to get it working properly, I'd need to determine the exact number of clock cycles occupied by every instruction that runs between TCCR1B = 0; and TCCR1B = 1; and then increment TCNT1 to account for those skipped clock cycles before turning the timer back on with TCCR1B = 1; Of course something like TCNT1 += 4 also requires clock cycles to run, so that needs to be factored in there as well. Probably easy enough through trial and error, just increase the amount being added to TCNT1 until the timing is good. I just haven't gotten around to that yet.

When I had the hardware timer running, performance was about 10% better, which is hardly worth celebrating over. I've achieved more impressive performance improvements by finding ways to compare variables to 0, instead of comparing them to other variables with greater than/less than operators. If statements can be oddly intensive operations at times...

Rob,

I always had that big division operation in there, so it's not a "new" source of error :slight_smile: My library allows for a specification of literally any motor speed (within the confines of a 16 bit integer value) so I am not using pre-calculated intervals. Each time a speed is specified, that division operation determines the interval to use, and the following correction factor captures 8 bits of "decimal places". That is more than enough to make the integer rounding error essentially vanish to the extent that I am able to measure with my equipment. The error I am left with is jitter caused by the 4 microsecond granularity of micros() combined with my code, which only checks the timing about every 9 microseconds.

Or maybe you mean the division operation that appears here:

uint8_t speedAdjustProbability = ((numer % denom) * 255) / denom;

In which case, yes, there is some rounding error creeping in there. But I know it's not a hugely noticeable effect because when I have speeds that result in perfectly even multiples (so the division operation has no remainder and zero rounding error), I still see the expected amount of timing error. Even if I succeed in implementing the more precise hardware timer, the timing jitter will remain because I am still limited in how often to check the timing.

The only way I can see to make this ultra-precise is to use a compare match timer interrupt, which, given my library's goals, is not really a good option. Coding for portability and trying not to break Arduino core functions like PWM, tone(), delay(), etc has its drawbacks.

If you are interested, the actual library is available at github.

I'm fairly happy with it, but I continue to experiment with ways to increase its performance and accuracy while keeping its compiled size and use of dynamic memory to a minimum.

Rob, I still intend to try out what you're suggesting. Maybe it will be better than what I'm doing. We'll see :slight_smile:

I have finally implemented and tried out Rob's suggestion to compare with my own method for correcting the rounding error when determining stepInterval.

His method works better (less error overall) but has a larger compiled size and requires more dynamic memory. To some extent that can likely be optimized further. Here are the details. Note, I've changed how I make my graphs, they now show the error as a percentage deviation of the stepping speed: [100%*(measured speed / set speed)] - 100%

My method:
Compiled size of testing sketch: 17110 bytes
Global variables use 390 bytes of dynamic memory

timing loop benchmark without acceleration
Testing 50 st/s: 8.96 us avg over 446413 calls
Testing 100 st/s: 9.00 us avg over 222350 calls
Testing 200 st/s: 9.08 us avg over 110202 calls

timing loop benchmark with 200 st/s^2 acceleration
Testing 50 st/s: 13.41 us avg over 315806 calls
Testing 100 st/s: 13.85 us avg over 177941 calls
Testing 200 st/s: 14.85 us avg over 130716 calls

Rob's method:
Compiled size of testing sketch: 17246 bytes (+136)
Global variables use 400 bytes of dynamic memory (+10)

timing loop benchmark without acceleration
Testing 50 st/s: 8.98 us avg over 445510 calls (+0.02)
Testing 100 st/s: 9.03 us avg over 221459 calls (+0.03)
Testing 200 st/s: 9.16 us avg over 109155 calls (+0.08)

timing loop benchmark with 200 st/s^2 acceleration
Testing 50 st/s: 13.43 us avg over 315418 calls (+0.02)
Testing 100 st/s: 13.86 us avg over 177738 calls (+0.01)
Testing 200 st/s: 14.82 us avg over 130981 calls (-0.03)

You can see in both cases, the timing error is constant with speed (roughly). That's what I expected all along. I just was calculating the timing error incorrectly.

Rob's solution corresponds to the motor finishing its movement about 560 us earlier than expected regardless of speed. Since that is still much greater than the granularity of micros() or the frequency at which I check the value of micros(), I wonder if there is still a source of unaccounted for error. 0.06% of timing error over 1 second of stepping is not very much but I will continue to look into this.

Interesting that the timing error is one of 8 discrete values...

Rob, if I commit your solution to the library, would you like credit as a contributor? If so, let me know the details.

Victory!

There were some timing problems with my measurement code that were interfering with its accuracy.

After correcting those problems, the remaining timing error was so small that I had to increase the number of decimal places displayed by Serial.print from 2 to 4 in order to even see that there was still error. Now that's a nice problem to have.

The accuracy is very good, no doubt suitable for even highly demanding applications.

Both methods of rounding error correction are suitable. Rob's is easier to understand and document and better at correcting the rounding error, whereas mine has a lower resource usage. So which to use depends on your thoughts on those tradeoffs.

Rob, if I commit your solution to the library, would you like credit as a contributor? If so, let me know the details.
Yes please, credits would be nice. Maybe also a link to this thread?

After correcting those problems, the remaining timing error was so small that I had to increase the number of decimal places displayed by Serial.print from 2 to 4 in order to even see that there was still error. Now that's a nice problem to have.

Then a new source of errors might come in, the accuracy of the float on an Arduino is only ~7 digits :slight_smile:

About the size

Global variables use 400 bytes of dynamic memory (+10)
In the demo sketch above I declared every variable as 32 bit, you can make some of them int16_t so they use less RAM

//
//    FILE: BresenhamTiming.ino
//  AUTHOR: Rob Tillaart
// VERSION: 0.1.02
// PURPOSE: demonstration
//    DATE: 2015-10-07
//     URL: http://forum.arduino.cc/index.php?topic=351925.0
//
// Released to the public domain
//

uint16_t threshold_error = 0;
uint16_t stepInterval = 0;
uint16_t step_error = 0;
uint16_t total_error = 0;

uint32_t lastStepTime = 0;

void setup()
{
  Serial.begin(115200);
  Serial.print("Start ");
  Serial.println(__FILE__);

  for (int steps = 1; steps < 5001; steps++)
  {
    stepInterval = 1000000UL / steps;
    step_error = 1000000UL - stepInterval * steps;
    threshold_error = steps;
    total_error = 0;
    lastStepTime = 0;

    test(steps);
  }
}

void loop()
{
}

void test(int n)
{
  Serial.print(n);
  Serial.print('\t');
  while (lastStepTime <= 1000000)
  {
    uint32_t curTime = micros();
    if ((curTime - lastStepTime) >= stepInterval)
    {
      // do stuff to step the motor
      if (lastStepTime == 1000000)  // check if it reaches 1000000 exactly
      {
        Serial.print(lastStepTime);
      }

      lastStepTime += stepInterval;
      total_error += step_error ;
      if (total_error >= threshold_error )
      {
        lastStepTime++;
        total_error -= threshold_error;
      }
    }
  }
  Serial.println();
}

Compared to the 1.01 version this uses 16 bytes less RAM and 204 bytes less FLASH,
16 bit values might also be slightly faster.

However it fails for the values for steps 1..16 as 1000000/steps does not fit in an uint16.

Give it a try,

fixed first 16 by making stepInterval 32 bit

//
//    FILE: BresenhamTiming.ino
//  AUTHOR: Rob Tillaart
// VERSION: 0.1.03
// PURPOSE: demonstration
//    DATE: 2015-10-07
//     URL: http://forum.arduino.cc/index.php?topic=351925.0
//
// Released to the public domain
//

uint16_t threshold_error = 0;
uint32_t stepInterval = 0;
uint16_t step_error = 0;
uint16_t total_error = 0;

uint32_t lastStepTime = 0;

void setup()
{
  Serial.begin(115200);
  Serial.print("Start ");
  Serial.println(__FILE__);

  for (int steps = 1; steps < 5001; steps++)
  {
    stepInterval = 1000000UL / steps;
    step_error = 1000000UL - stepInterval * steps;
    threshold_error = steps;
    total_error = 0;
    lastStepTime = 0;

    test(steps);
  }
}

void loop()
{
}

void test(int n)
{
  Serial.print(n);
  Serial.print('\t');
  while (lastStepTime <= 1000000)
  {
    uint32_t curTime = micros();
    if ((curTime - lastStepTime) >= stepInterval)
    {
      // do stuff to step the motor
      if (lastStepTime == 1000000)  // check if it reaches 1000000 exactly
      {
        Serial.print(lastStepTime);
      }

      lastStepTime += stepInterval;
      total_error += step_error ;
      if (total_error >= threshold_error )
      {
        lastStepTime++;
        total_error -= threshold_error;
      }
    }
  }
  Serial.println();
}

compared to 1.01 version
14 bytes less RAM
184 bytes less FLASH

so still interesting.

Hi Rob,

My plan is to whip up an excel sheet to see what the valid ranges are for those variables and set the types accordingly.

However I'm still internally debating about which error correction solution to use. Mine is not quite as good as currently implemented, but only requires two bytes of dynamic memory and can be improved simply by moving up to two 16 bit variables.

I haven't actually tested what happens with my solution when I use larger variables to store more "decimal places". Something I will have to do today.

Edit:
I tested 16 bit variables for my method, as intended. It actually makes things quite a bit worse according to the measurement code. I say "according to" because the measurement code is calculating the "actual speed" over a 1 second period, when 1/8th stepping at various speeds. At the highest speed, 400 steps/sec, that is 3200 microsteps that the motor advances through before a "actual speed" is calculated. With 16 bit variables to track the advancement through my correction factor, 65535 microsteps are required to advance through the entire counter variable and apply the proper number of increments to the stepInterval. With 8 bit variables, only 255 microsteps are required.

So there is a tradeoff, using my method, between how many "decimal places" you capture, and how long it takes for them to be fully applied against stepInterval.

This suggests that the main difference between our approaches is that yours, Rob, applies corrections in a more "as needed" way. Perhaps it is possible for me to pull a few tricks here.

I had thought of capturing the last 8 bits of the curTime (which is the stored value of micros()) and using that as my "counter". In effect, it will act like a pseudo-random number and would make the application of my correction factor more "fuzzy logic", plus will save me 8 bits of dynamic memory. I'm going to try it.

So after a lot of trial and error, it became clear that an important factor in determining how well the rounding error correction works is how evenly distributed the corrections are. In my original method, the corrections were applied in clusters, which made short duration movements less accurate than longer duration movements.

That is why Rob's technique was initially doing noticeably better: it distributes the corrections extremely evenly (probably as good as it can get), so that even over a short duration, the right amount of correction is applied.

However, there is a way to achieve this with my technique, by iterating through the counter in a way that causes its value to jump around like a random number would, yet still hit every possible value exactly once per cycle, with a perfectly uniform distribution.

correctionCounter += 17;

works far better than

correctionCounter ++;

The result is to rapidly flip between applying corrections and not applying corrections, and in the correct proportion, so that even short duration movements remain accurate.

Here is some example code. My real code differs in several ways because it's part of a larger library with other stuff going on. This example code is simplified and for illustrative purposes only.

// in a real world situation, the code below
// would be probably be split between multiple methods
// of a library/class, and set member variables instead of local variables

// the speed you want to step at
unsigned int stepsPerSec = 140;

// a tracking variable for storing the time at which a step was last made
// make sure to set it equal to micros() very soon before starting a step sequence
unsigned long lastStepTime = micros();

// this is half a second because we're making a square wave on the STEP pin with equal time at high and low
const unsigned long halfPeriod = 500000UL;

// calculate the stepInterval
// the division operation introduces rounding error
unsigned long stepInterval = halfPeriod / stepsPerSec;

// this captures 8 bits of "decimal places"
// that would otherwise be lost in the above division operation
// another way of thinking about this variable is that it contains
// the number of times to apply a timing correction for every 256 steps
// if the value is 128, then a correction will be applied for 128 steps out of every 256
// which would correspond to .5  in terms of lost decimal places
byte errorCorrection = ((halfPeriod % stepsPerSec) << 8) / stepsPerSec;

// this is a counter for determining (in combination with errorCorrection)
// whether to apply a timing correction
byte correctionCounter = 0;

// make sure this is an odd number that, when repeatedly added to correctionCounter,
// and taking into account rollover, will cause fairly large jumps that still
// hit every single value between 0 and 255 (inclusive) with a perfectly uniform distribution
// Potentially useful formula: (2^x)+1
// So usable values are 3, 5, 9, 17, etc
// Some trial and error will tell you which you prefer.
const byte counterIncrement = 17;

// in a real situation, such as a library, the code below would likely be part of a
// class method (I call mine work()) that needs to be called repeatedly and often
while (stepping)
{
	// Step, if it's time...
	// Adding stepInterval to lastStepTime produces more accurate timing than setting lastStepTime = curTime
	unsigned long curTime = micros();
	if ((curTime - lastStepTime) >= stepInterval)
	{

		// this part should also contain code that actually makes the motor move

		// increment lastStepTime
		lastStepTime += stepInterval;

		// this adds timing corrections
		if (correctionCounter < errorCorrection) lastStepTime++;
		
		// increment the error correction counter
		correctionCounter += counterIncrement;

	}
}

Here is a real world test with my library, using this technique.

And Rob's method, unchanged from before (but with an expanded vertical axis to match the graph above)

Rob's method has a lower range of error but about the same absolute amount of error as mine (0.0016% at most, which is tiny). Because of the lower range of error, Rob's method may have a slight advantage in some circumstances. This is definitely splitting hairs. My method is less resource intensive, using only two bytes in dynamic memory for the correction variables. It is what I will be using for the library.

You'll notice that there are somewhat discrete (though not perfectly so) amounts of error. This is due to the 4 microsecond resolution of micros() (it always returns a multiple of 4). In agreement with that, the difference between two adjacent quantities of error is 0.0004% (0.000004 as a proportion), which multiplied by 1000000 (number of microseconds in one second - the duration of the test) is 4 microseconds.

So you can easily re-interpret the vertical axes of the graphs above. Each horizontal line of points outwards from 0 represents 4 additional microseconds of deviation from the expected 1000000 microseconds.

In other words, much of the error you're seeing here is due to the resolution of micros(), and we are getting close to the limits of what can be done with micros(). The error is really very small and amounts to double digit microseconds of error over one entire second of stepping - and it will remain a fixed quantity, so 10 seconds of stepping will still have only double digit microseconds of error.

Rob, although I have not used your suggested code in the end, I do want to thank you for giving me a reason to investigate why mine was not working as well - and finding a way to fix it :slight_smile: