I have a string that's been read in from a digital sensor, it contains two sensor values. Occasionally it ends up being corrupted due to noise, and I'd like to be able to detect this so I can re-read it.
Here's a typical string:
-23.4 12.2
The first number is always negative, and always has one decimal place, and can range from single single digits before the decimal, e.g. -9.3, up to six digits, e.g. -100000.0. There is always a space between the two values, and then the second value is always positive, with one decimal place. It may have one or two digits before the decimal (0.0 to 99.9).
So, the full ranges of the values are:
-100000.0 to -1.0
0.0 to 99.9
jmusther:
I have a string that's been read in from a digital sensor, it contains two sensor values
That sounds as if you have no control over the format of the data.
And you have not said where the corruption occurs - is it within the sensors, or is it something that happens between the sensors and the Arduino.
In any case, if you want to turn the values into numbers so that you can check if they are in range the parse example in Serial Input Basics should point you in the right direction.
By the way, I am assuming that the incorrect data is still a valid number and not something like 8G.5 when it should be 87.5 (for example).
I have no control over how the sensor is sending the data.
I'm ultimately converting them into numeric values so I can work with them, plot them etc.
The corruption appears to be coming from the sensor line, I've scoped them and occasionally ambient interference makes them noisy, so the serial read gives dodgy data (or sometimes truncated).
So a corrupted string might look like:
-1g&#s#LK #(**
So what I'm really looking to do is check that:
There is a valid number with up to 6 digits before the decimal point, and one after, preceded by a '-'.
A space
A second number with up to two digits before the decimal point and one after.
jmusther:
Thanks for the input, I'll clarify a few things:
I have no control over how the sensor is sending the data.
I'm ultimately converting them into numeric values so I can work with them, plot them etc.
The corruption appears to be coming from the sensor line, I've scoped them and occasionally ambient interference makes them noisy, so the serial read gives dodgy data (or sometimes truncated).
So a corrupted string might look like:
-1g&#s#LK #(**
So what I'm really looking to do is check that:
There is a valid number with up to 6 digits before the decimal point, and one after, preceded by a '-'.
A space
A second number with up to two digits before the decimal point and one after.
maybe tokenize the string using strtok() and/or check for phrases with strstr()
You need a Finite-State Machine. I use one in my library, NeoGPS, to parse NMEA GPS sentences. It's much more complicated than what you need, but you can see the main part here. Like many other character-processing state machines, there is an initial test of the current state variable, and then the current character is handled in a section of code for that state (more detailed description as a PS)
In your case, you could have something like this:
state RX_WAITING: Accept only '\n'
state RX_START: Accept only '-'
state RX_FIRST_WHOLE: Accept up to 6 digits, then a period
state RX_FIRST_FRAC: Accept 1 digit
state RX_SPACE: Accept 1 space
state RX_SECOND_WHOLE: Accept up to 2 digits, then a period.
state RX_SECOND_FRAC: Accept 1 digit
At each step, a rejection sets the variable state to RX_WAITING. In code:
enum state_t
{ RX_WAITING, RX_START, RX_FIRST_WHOLE, RX_FIRST_FRAC, RX_SPACE, RX_SECOND_WHOLE, RX_SECOND_FRAC }; // the possible states
static state_t state; // a state variable of that enumerated type (basically an integer)
// Here are some pieces of what we're parsing from the input stream:
static int32_t first_whole;
static int8_t first_frac;
static uint8_t second_whole;
static uint8_t second_frac;
static uint8_t digitCount;
// The magic function that handles one character at a time,
// and returns true when the above variables are valid.
bool getReading( char c )
{
bool gotIt = false; // Assume we're not done.
switch (state) {
case RX_WAITING: //--------------------------------
if (c == '\n') {
state = RX_START;
}
break;
case RX_START: //--------------------------------
if (c == '-') {
state = RX_FIRST_WHOLE;
// Initialize the two numbers we will be receiving
first_whole = 0;
first_frac = 0;
second_whole = 0;
second_frac = 0;
digitCount = 0; // To make sure we don't get too many digits.
} else {
// Didn't get that negative sign we were expecting.
state = RX_WAITING;
}
break;
case RX_FIRST_WHOLE: //--------------------------------
if (isdigit( c )) {
if (digitCount++ < 6) {
first_whole = first_whole*10 + (c - '0');
} else {
// Too many digits!
state = RX_WAITING;
}
} else if ((digitCount > 0) && (c == '.')) {
state = RX_FIRST_FRAC;
} else {
state = RX_WAITING;
}
break;
case RX_FIRST_FRAC: //--------------------------------
if (isdigit( c )) {
first_frac = c - '0';
state = RX_SPACE;
digitCount = 0; // count the second number's digits, too.
} else {
state = RX_WAITING;
}
break;
case RX_SPACE: //--------------------------------
if (c == ' ') {
state = RX_SECOND_WHOLE;
} else {
state = RX_WAITING;
}
break;
case RX_SECOND_WHOLE: //--------------------------------
if (isdigit( c )) {
if (digitCount++ < 2) {
second_whole = second_whole*10 + (c - '0');
} else {
// Too many digits!
state = RX_WAITING;
}
} else if ((digitCount > 0) && (c == '.')) {
state = RX_SECOND_FRAC;
} else {
state = RX_WAITING;
}
break;
case RX_SECOND_FRAC: //--------------------------------
if (isdigit( c )) {
second_frac = c - '0';
gotIt = true;
}
state = RX_WAITING;
break;
}
return gotIt;
}
//----------------------------------------
void setup()
{
Serial.begin( 9600 );
Serial.println( F("FSM test started.") );
}
void loop()
{
while (Serial.available()) {
bool gotSome = getReading( Serial.read() );
// Serial.print( (uint8_t) state ); // display the state number as each char is handled
if (gotSome) {
Serial.print( F("Got a valid reading: -") );
Serial.print( first_whole );
Serial.print( '.' );
Serial.print( first_frac );
Serial.print( ' ' );
Serial.print( second_whole );
Serial.print( '.' );
Serial.print( second_frac );
Serial.println();
}
}
}
Just type test strings into the Serial Monitor window to see if this does what you want. Uncomment that one debug print if you like.
Note that this saves a lot of program space by not using float. It saves RAM because it doesn't have to store the entire line in a buffer, and it uses the F() macro for string literals.
Also note that the notorious String class is not used. -_-
Cheers,
/dev
P.S. NeoGPS state machine notes:
RxState IDLE (line 165): Wait for a '$'. All sentences begin with this character, so the state isn't actually tested. Any time the character is received, it starts watching for the header (next state).
RxState RCV_HEADER (line 203): Wait for a valid sentence type to be fully received. parseCommand returns COMPLETED when "GPGGA" (or other valid type) and a terminating comma is received (line 215).
RxState RCV_DATA (line 168): Wait for all field data to be received. The fields are comma-separated, and an asterisk indicates no more fields (line 171).
RxState RCV_CRC (line 220): Wait for the two-character checksum to be received. If it matches, decode returns COMPLETED (line 230). Otherwise, it rejects it and increments an error count (line 237).
jmusther:
So what I'm really looking to do is check that:
There is a valid number with up to 6 digits before the decimal point, and one after, preceded by a '-'.
A space
A second number with up to two digits before the decimal point and one after.
I certainly did not get that impression from your Original Post.
Then I would iterate over the data checking that it only has the allowable characters. If there is any wrong character i would reject the whole thing.
After that there is still a doubt about whether you have two properly formed numbers separate by a space. The parse example in Serial Inp.... shows how you can split the data. If it does not split into the correct number of parts I would, again, reject the data.
Finally you can use atof() to convert the data into float variables.
Hi Robin2,
Thanks for your suggestion, and I'm sorry that my original statement of my intention wasn't clear enough. In this case however, I think /dev's state machine provides a cleaner solution. I'm able to test each char as I read it in, and if at any point it fails, I can break, clear the serial buffer, and try again.
/dev, thanks for the diagram. Ultimately I've modified to to look for the '-' to signal start, as there isn't a \n in the stream. But thanks for the introduction to this technique, it's certainly going to come in handy again.
jmusther:
I'm able to test each char as I read it in, and if at any point it fails, I can break, clear the serial buffer, and try again.
This is probably a matter of personal taste, but I feel I have more control over the process if I read in all the data first.
For example if you decide to discard the data because the 5th character is wrong and you then dump all remaining data in the input buffer you may not actually have dumped all the data from that transmission because the Arduino is very much faster than the speed at which serial data arrives.
Actually, that's three bit errors. "-23.5" into "-33.5" is a single bit error. But this is still a great question. I've been wondering if this would come up.
The real problem is that in the "simple" mode, this sensor doesn't provide error detection. This sensor has a complicated mode that implements the SDI-12 protocol, which does offer a CRC-16. Unfortunately, it uses an unusual 7-bit, even parity mode, in half-duplex. There is a library to do this, so it may be possible to switch without too much trouble. Sorry, @jmusther!
Robin2:
I feel I have more control over the process if I read in all the data first.
But what is "all the data"? If the '\n' is bad, how long do you read? A character-driven FSM has control at, well, the indiviual character level. You can't get more control than that.
For example if you decide to discard the data because the 5th character is wrong and you then dump all remaining data in the input buffer you may not actually have dumped all the data from that transmission
That's not how the state machine works. If the 5th character is wrong, the FSM starts over with the 6th character. That may be where good data is really available, or it may be the rest of the bad message. The FSM will correctly accept a good message, and correctly reject the rest of a bad message. That's why I posted the State-Transition Diagram. Put your finger on the initial state, then try different kinds of garbled input to see what happens.
Actually, this is just an extension of your two-state "Receive with End Marker" example. Instead of having to save a whole line (which can be three garbled lines), getting 0.0 from atof, searching for a space, period, and/or invalid characters within the line array, the FSM looks through a very narrow window: 1 character wide. There is no buffering at all.
It's a good technique to know: it's used by compiler writers, too. You should google yacc, lex and BNF. This is part of any Computer Science curriculum. It does seem kind of upside-down at first, but this is a good example.
In my case, I've found that the data from this particular sensor, when corrupted, is completely corrupted. But yes, this method doesn't allow me to spot corruption leading to this sort of change (number to number).
I am going to look into getting the SDI-12 mode working with error checking. With summer marching on (in New Zealand), and the sensor successfully working in the ground, this may have to wait for a few months.