Go Down

Topic: pulseIn Bug (Read 5467 times) previous topic - next topic

iwoloschin

I thought that might have something to do with it too, but I don't know why that would matter.  My timeout (usLong) is set to 100 mS, or 10,000 uS.  According to the reference page*, it needs to be between 10 uS and 3 minutes (1.8 x 108 uS), and it also says the following:

Quote
Reads a pulse (either HIGH or LOW) on a pin. For example, if value is HIGH, pulseIn() waits for the pin to go HIGH, starts timing, then waits for the pin to go LOW and stops timing. Returns the length of the pulse in microseconds. Gives up and returns 0 if no pulse starts within a specified time out.


I don't see why that change should be affecting this, as the pulses I'm measuring are 2 orders of magnitude below my timeout period.

*http://arduino.cc/en/Reference/pulseIn

robtillaart


My bugreport - http://code.google.com/p/arduino/issues/detail?id=675 - included math problems due to macro expansion with timeout too ... But the effect was that "long" timeouts went off too early due to overflow, not that it gave too large values....
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

iwoloschin



My bugreport - http://code.google.com/p/arduino/issues/detail?id=675 - included math problems due to macro expansion with timeout too ... But the effect was that "long" timeouts went off too early due to overflow, not that it gave too large values....



It looks like pulseIn is reporting the correct pulse length, just not returning right away which causes it to miss the next (closely, but not too closely) spaced pulse.

That could be due to unoptimized math happening at the end, I haven't really looked into that though.

bperrybap

When I look at your scope captures, I do not see a bug in pulsein() at all.
It sure looks to me like pulsein() is working exactly as it is supposed to
in both the "working" picture and the "non working" pictures.
(the problem appears to be elsewhere)

From what I can tell it looks like the Red wave is being set high just before pulsein() is called
and lowered just after pulsein() returns and the blue one is the signal being timed with pulsein().

Note: Given that there a different number of red waves in the two pictures,
and the timing in between red pulses (calls to pulsein() ), is substantially different,
it leads me to question whether the two pictures were actually taken
with identical sketches.
Ignoring that for a moment.......


You can see that in the "non working" picture, the red wave rises later - much later
than in the "working" picture.
So that in the "non working" case, the 2nd pulsein() appears to start when the wave to be timed
is low and since it has been told to time a LOW pulse, it must now wait for it to go high and then low again.
And if you look at when the RED wave drops, I don't not see a very long amount of time
after the waves goes high after the timing of the low pulse ends.
It looks to be about the overhead of digitalWrite().
And the amount of time seems to be consistent with the first pulsein().

BTW. For any serious type of timing you don't want to use digitalWrite() as it is WAY too slow.
Use direct port i/o. The overhead will reduce from close to 6us to 125ns to strobe a pulse.

But overall, it looks to me like pulsein() is working just fine.

The question is what is the code doing after the first pulsein() returned which must happen
in order to set the red wave low and the setting of the red wave high which is done
just before the second call to pulsein()?
This is code that is outside pulsein().

From what I see in the pictures, it appears that the code used to create the waves
in the two pictures is not the same, either that or something kicks in after the first pulse and is
hammering the processor with interrupts and slowing it down.

--- bill

Graynomad

I think you are correct bperrybap, it appears that the digitalWite() is taking too long and the next pulseIn() occurs in a LOW and therefore has to wait. 

Has digitalWrite() changed between versions?

Using direct port writes should prove that one way or another.

_____
Rob
Rob Gray aka the GRAYnomad www.robgray.com

bperrybap

#20
Jan 14, 2012, 08:58 am Last Edit: Jan 14, 2012, 09:01 am by bperrybap Reason: 1

I think you are correct bperrybap, it appears that the digitalWite() is taking too long and the next pulseIn() occurs in a LOW and therefore has to wait.  

Has digitalWrite() changed between versions?

Using direct port writes should prove that one way or another.

_____
Rob


Yes digitalWrite() has changed.
In fact I pushed the Arduino team to change it because as it was it would corrupt data bits in a port
when pins were shared between an ISR and normal code on the same port.
It took about a year to convince them that they needed to make the change.
http://code.google.com/p/arduino/issues/detail?id=146
In 0019 digitalWrite() and pinMode() were changed to save the SR, mask interrupts and then restore the SR
to set/clear bits.


But I don't believe that is the issue here.
If you look at the graphs, they do not seem to indicate this.
In both the working and non working times, the first pulsein() works just fine.
It is on the "non working" graph that you see a very long pause between the dropping of Y/RED
and the rising of Y/RED. It looks like several hundred microseconds.
That is way longer than the few hundred nanoseconds that was added to mask and restore interrupts.

--- bill

iwoloschin

Again, for those of you who doubt me, I ask you to try playing with this yourself.  As I've said, I'm pretty sure the voltage doesn't matter, a HIGH is anything over 3V.

The sketch was used for both images, some stuff may have changed elsewhere, but not this part.  Additionally, 6uS for digitalWrite is still fast enough that it pulseIn should be triggering properly, as there's at least 40 uS between pulses.

The only thing that changed, that I can find a difference for, is going from Arduino 1.0 back to Arduino 0021.  Again, I urge someone with more experience with the Arduino environment to regression test this.  If you're unwilling, well, that's kind of poor form.  If you're unable, at least say so.  If you're convinced I'm doing something wrong, I ask that you "prove it" to me by actually running a test.  I've done my testing, on several different Arduinos and found a workable, though unexplained solution.  The engineering side of me would love to know why there's a difference like this, but on the other hand, if the Arduino community doesn't seem to want to test this...I'm not going to waste any more of my time on it.

robtillaart

At the moment I am unable (resources) to test. To confirm the bug we need two sketches - one generating a pulsetrain - the other measuring it - to confirm the bug. Something like this:

simplest pulse generating sketch
Code: [Select]

#define PULSEPIN 5

unsigned long timeHigh = 100;  // to be tweaked to take digitalWrite() into account
unsigned long timeLow = 100;

void setup()
{
  Serial.begin(115200);
  Serial.println("Arduino Pulse Generator V0.1");
  Serial.print("micros  LOW: ");
  Serial.println(timeLow);
  Serial.print("micros HIGH: ");
  Serial.println(timeHigh);

  pinMode(PULSEPIN, OUTPUT);
  while(1)
  {
    digitalWrite(PUSLEPIN, HIGH);
    delayMicroseconds(timeHigh);
    digitalWrite(PUSLEPIN, LOW);
    delayMicroseconds(timeLow); 
  }
}
void loop(){}


simplest pulse measuring sketch
Code: [Select]

#define PULSEPIN 5

unsigned long duration;
unsigned long timeLow;

void setup()
{
  Serial.begin(115200);
  Serial.println("Arduino Pulse Measurement V0.1");

  pinMode(PULSEPIN, OUTPUT);
  while(1)
  {
    duration = pulseIn(PUSLEPIN, HIGH);
    Serial.print("Duration: ");
    Serial.println(duration);
  }
}
void loop(){}



Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

johnwasser


My timeout (usLong) is set to 100 mS, or 10,000 uS.


Which is it?  100 mS (100,000 uS) or 10,000 uS (10 mS).  Perhaps you should show the code.
Send Bitcoin tips to: 1L3CTDoTgrXNA5WyF77uWqt4gUdye9mezN
Send Litecoin tips to : LVtpaq6JgJAZwvnVq3ftVeHafWkcpmuR1e

bperrybap

Ok, so I'll admit that I initially misunderstood the transitions of your Y/RED wave based on the sample
code you showed us.
And that the transitions are on each call to pulsein() vs measuring the time around each pulsein(),
my bad on that, however, I don't think it changes the interpretation of what is happening.

I guess where I'm at is that the graphs in the pictures you've shown so far do not seem indicate a bug with pulsein().
Those graphs appear to show pulsein() working just as it should - but.....
do seem to indicate that there are certain uses of pulsein() that worked in the past that may have
have issues with the new pulsein() now that the timeout mechanism was added to the pulse measurement in 0022.

My suspicion at this point it that, that are are 2 things that are contributing to your measurement
code not working as anticipated in the newer releases, and while it may make
your previously working implementation fail to work, unfortunately I wouldn't necessarily call it a bug in pulsein().
It is more of case where behavior has changed.

Take a look at the pictures you've provided.
While its a bit hard to see given the scale,
would you agree that the "non working" pulesin() example appears to show that the 2nd pulsein()
(as indicated but the first falling edge of Y) was just a bit later than
than the 2nd pulsein() in the "working" example based?

The width of the first red pulse is  just a bit wider in the "non working" case.

In the "working" picture you can  see that X is high is when Y/RED drops and that it is near
the front part of the high part of the X pulse.
In the "non working" picture, it appears that X is already low or just about to be low
when Y/RED drops.

If you don't agree, I'd like to understand the different interpretation.

Since there is some additional overhead between Y/RED dropping and pulsein() being called much
less being able to time the desired pulse, it appears in the "non working" example that the call to pulsein()
is not soon enough to to catch the first low pulse so it must wait for the 2nd low pulse.

So while the "non working" example isn't measuring the pulse you want, the picture
seems to show pulsein() measuring the first pulse it could given when pulsein() is being called.

So the real question becomes, why is the second call to pulsein() delayed in the non working case?

I can think of 2 off hand:

- using digitalWrite()
- pulsein()  changed between 0021 and 0022 (comment shows granularity of width measurement doubled)

Lets look at each one:


digitalWrite()
--------------------
Like I said before using digitalWrite() is not a good thing to use for any type of critical timing measurement
as it inserts overhead. LOTS of overhead vs doing direct port i/o.
Just because the digitalWrite() overhead is much shorter than your 40us period between pulses means nothing.
It could be that there is already 35 us of overhead before the pulsein() call is actually to the point in its internal code
of being able to start watching the signal line. In that case if digitalWrite() adds more than 5us of additional
overhead, your measuring code "breaks".
Keep in mind that when you see the output signal change, that there is still additional timing
overhead before the digitalWrite() returns.
digitalWrite() changed in 0019 to prevent port bit corruption when using digitalWrite() in ISRs
(I verfied that it was in 0019), That change makes digitalWrite() a little bit slower and also adds additional
overhead between the signal changing and the actual return of digitalWrite().
So after 0018 not only did digitalWrite() get a bit slower, but the latency between when a signal changes
and when the digitalWrite() returns has increased slightly.

My advice: NEVER, NEVER, NEVER use it for things like this.

pulsein() code changing
------------------------------------
This is more than likely the biggest contributor to the failure.
The measurement code in pulsein() was changed in between 0021 and 0022
From this:
Code: [Select]
// wait for the pulse to stop
while ((*portInputRegister(port) & bit) == stateMask)
width++;

// convert the reading to microseconds. The loop has been determined
// to be 10 clock cycles long and have about 16 clocks between the edge
// and the start of the loop. There will be some error introduced by
// the interrupt handlers.
return clockCyclesToMicroseconds(width * 10 + 16);

to this:
Code: [Select]
// wait for the pulse to stop
while ((*portInputRegister(port) & bit) == stateMask) {
if (numloops++ == maxloops)
return 0;
width++;
}

// convert the reading to microseconds. The loop has been determined
// to be 20 clock cycles long and have about 16 clocks between the edge
// and the start of the loop. There will be some error introduced by
// the interrupt handlers.
return clockCyclesToMicroseconds(width * 21 + 16);


It was to add the timeout to the pulse measurement itself, to prevent the code
from hanging during pulse measurement.

While that seems innocent enough, it does affect how quickly pulsein() "notices" the pulse changing
which affects the granularity of the measurement and the latency before the pulsein() call returns.
One thing that concerns me is that the comment does not match the code.
comment says 20 and code uses 21 this extra +1 is not consistent with the previous loop
count of 10.
Regardless, the impact of the additional code is that it takes longer for the code to recognize a
transition. If it really is an additional 11 cycles over the previous code,
that is 11*62.5ns (using a 16mz clock) or 687.5ns is added to the granularity.
So while the old loop had a granularity of 625ns, the new loop has a granularity of 1312ns.
This means that you could potentially get stuck for up to an additional ~ 1.3us longer than
the real pulse length.



My guess as to what is happening here is that the new pulsein() granularity due to the timeout
capability combined with the dititalWrite() overhead, is creating just enough latency for the the 2nd pulsein()
call to just barely miss the start of desired pulse.

While technically, it isn't a "bug" it sure can cause previously working code to fail.

The documentation for pulseIn() really doesn't say how it handles the timeout of the
pulse itself. In fact the documentation on pulsein() currently clearly states that the timeout
is for waiting for the pulse to start and not for the pulse itself.

Having a granularity of 21 cycles seems like a lot and seems like it has the potential to break things,
like it seems to be doing in this case.
At a minimum, it creates additional error in the measurements.

Somebody thought that having a timeout for the actual pulse measurement was a good
thing and it got added in. However, it came with the cost of reducing the accuracy
of the measurements and everyone is now forced to live with this even if they
intentionally don't specify a timeout.

I guess the pulsein() code could be smarter when not using a timeout
so that it branched around to a different set of code that didn't do the timeout checks
vs just creating a really long timeout.
This would give better granularity and lower latencies with added risk of getting stuck
waiting for the signal to change.

pulseIn() could also be extended to have two timeouts,
one for the amount of time waiting for the pulse and one for the pulse itself to allow
the user to have better control over the timeouts.
i.e.
Code: [Select]

pulseIn(pin, value);
pulseIn(pin, value, waitforpulsetimeout);
pulseIn(pin, value, waitforpulsetimeout, pulselengthtimeout);


and then be smart enough to use code that does not do the checks when timeouts are not specified
so that better granularity and accuracy could be achieved.

I think going with double timeouts would offer the best flexibility and preserve the old
(pre 0022) behavior while adding the option for a timeout on the actual pulse measurement
all at the option of the user.
But the code has to be re-written to be smarter or potentially simply have 3 different routines
so that when timeouts are not specified, the pin monitoring loops are tighter.

--- bill

iwoloschin

100 mS, I must've missed a 0 when typing in the uSeconds.

I'm not sure I can actually post the code as it's not mine.

Jantje

bill
thanks for the extensive explanation. I had been reading this thread and simply didn't get what was going on. Your explanation is the first one which I understand  :D
Even though it doesn't have to be the real explanation of the pulse in problem at hand it is a real good candidate and it provides a very good example of why timing is so critical. It is also a very good example of "Why better is not always better".
Thanks again
Best regards
Jantje
Do not PM me a question unless you are prepared to pay for consultancy.
Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -

westfw

So is this "resolved" in the sense that the changed behavior is now understood?  If it's understood, is it believed to be a good change that happens to break this person's apps. or a bad changed that has caused a real bug?

johnwasser


So is this "resolved" in the sense that the changed behavior is now understood?  If it's understood, is it believed to be a good change that happens to break this person's apps. or a bad changed that has caused a real bug?


The change to pulseIn() is good because it protects against a possible hang condition:  a timeout is specified and a pulse starts before the timeout but never ends.  The change increased the granularity of the pulse measurement from 0.625 microseconds to 1.3 microseconds, which is not ideal but bearable.

If you need a finer grain measurement you can re-implement pulseIn() easily enough.  Remove the timeout check to get back to 0.625 microsecond granularity.  It might even be possible to speed the code up more if you don't have to support both HIGH and LOW pulses with the same loop.  For short pulses,  using an int counter rather than long  (or even byte for very short pulses) will save time incrementing the count.


Might be able to reduce the granularity even more by simplifying the loop code a bit:
Send Bitcoin tips to: 1L3CTDoTgrXNA5WyF77uWqt4gUdye9mezN
Send Litecoin tips to : LVtpaq6JgJAZwvnVq3ftVeHafWkcpmuR1e

bperrybap

I would say it is technically not a bug, but it does potentially break pre-existing code.
Whether that is a good thing or not depends on your point of view and whether or not
it breaks your code or makes you feel that your code is now more robust.

My personal opinion is that the current implementation is a bad thing.
While it does make code more robust by protecting from a potential hang condition, it does it in a way
that forces a lower granularity on everyone including those that may need the higher
level of granularity and are willing to give of that level of additional level of robustness
in order to get it.

An example of this is that pulsein() is used on things like ultrasonic distance sensors. Now all of a sudden
your distance measurements are not as accurate and there is nothing you can do about it in your application.

So my overall view is that it isn't the functionality of additional robustness , which I think is a
good thing, that I am opposed to, it is the cost associated with the current implementation
that I am opposed to.

There are other ways to handle this added functionality than the current implementation
that are still compatible with the current API and yet provide additional robustness.

I think the ideal solution would be to extend the API  to support 2 optional timeouts as I noted before.
That way, those that want/need the extra level of robustness can ask for
timeouts and get it,  and those that want/need the extra level of granularity can have
that as well.

It also allows all pre-existing code to continue to function with the behavior and
granularity that they had prior to this change.

The current implementation (and previous version as well)
uses the same code but then insert a super long
timeout value when timeouts are not needed/wanted.
Those applications that are not asking for a timeout are still paying the price to support a timeout.

From a granularity perspective, a better way to do it is to have
3 separate functions rather than have a common
function and make the timeouts "long" when they are not needed.
With separate functions, the code can be squeezed down to offer better granularity
when timeouts are not requested.


--- bill


Go Up