I recently spent a ton of time debugging my code to find a problem that was ultimately due to an incorrect implementation of the checksum calculation used for NMEA GPS messages. I'm hoping that by sharing what I learned through the process that someone else might avoid similar issues.
NMEA Checksum Calculation
In my opinion, the checksum method used by the NMEA protocol for GNSS message transmission is contrived, confusing, and poorly explained in other sources I've seen. Since I had such a hard time figuring it out, I thought I would share what I learned in case others have the same difficulties in the future.
NMEA Checksums Generally
Like many communication protocols, NMEA includes a feature that allows the message receiver to verify if the message has arrived intact. This makes the system more robust in noisy EM environments or when interfacing with poorly designed/implemented devices. Pretty standard stuff. Unlike many other protocols, NMEA messages do not include a "self-length" (number of bytes in the packet) field which makes ingesting these messages a little more annoying. The self-length is not required to work with a checksum, but it makes it easier to figure out where the checksum data sits in the packet. For NMEA, the checksum data comes immediately after a 0x2A byte which is an ASCII '*'. The checksum is followed by 0x0D 0x0A which is ASCII for CARRIAGE RETURN
and LINE FEED
. This does help for verifying message integrity but I think is mainly included as an actual command for data display systems.
Source: u-Blox 6 Receiver Description Manual
Checksum Calculation
The actual NMEA checksum calculation is fairly straightforward. Take all the packet bytes between the first byte (always an ASCII '$') and the '* ' character (I think the '*' should never show up elsewhere in the packet?), align their least significant bits, and apply the XOR logical operator to each "column" of bits, one row at a time. As a reminder, XOR is a logic function like AND and OR, but only returns TRUE
if exactly one of the inputs is true.
When a "real" XOR function has more than two inputs it still works the same way, only returning TRUE
when one of n
inputs is true. For calculating NMEA checksums, you don't take the XOR of multiple inputs but rather apply a two input XOR multiple times. Below is an example of how this works, the arrows show which "bits" are used as the inputs to the XOR function. The XOR in row 1 uses the "bits" in rows 0 and 1. The XOR in row 2 uses the bit on row 2, and the result of the previous XOR on row 1. Adding more bits to the sequence repeats the process.
When you extend this to multiple digit binary numbers, align the numbers by their least significant bit and apply the sequential XOR along each "column". The result of this process is a new binary number.
The "calculated" NMEA checksum is this new binary number. In this case, 0b10101001 or 0xA9. Take note of the fact that the first four bits, 0b1010, are equal to the first digit in the hex form, 0xA. The same is true for the second four bits, 0b1001, and the second hex digit, 0x9. These four bit segments are often called "nybbles" or "nibbles" (a nibble being smaller than a bite).
Checksum Translation
The result of the checksum "calculation" is a single 8 bit number (1 byte). For some reason unknown to me, the NMEA group decided that this single byte checksum really would look better if it were two bytes. Why? I don't know. Logically you may be thinking "Hey, lets just use those two nibbles from earlier". Not a bad idea! But no. But also kind of yes? NMEA decided that the two checksum bytes should be equal to the ASCII representation of the value of each of the calculated nibbles from before. For example:
- Go through the sequential XOR checksum calculation that gives an 8 bit binary number: 0b10101001
- Split the 8 bit byte into two four bit nibbles: 0b1010 and 0b1001
- Convert the two nibbles from binary to hexadecimal: 0xA and 0x9
- "Transform" the nibbles from numbers into characters: 'A' and '9'
- Convert the new characters back into hexadecimal using the ASCII standard: 0x41 and 0x39
These two new bytes are the data that is actually included at the end of an NMEA message. The result of this particular checksum method is that when printing the NMEA messages as text, the calculated checksum value (0xA9) appears in the string of text as the characters "A9". I'm sure this was a feature to someone, but to me this is an annoyingly weird process. Below is a few more examples of the process in case its helpful. CS1 and CS2 are the two checksum bytes that go in the NMEA message.
Implementation
Included below is my implementation of this algorithm in C++ (for use in Arduino code). This is by no means the only way to do it, but it works for me.
Inputs
-
uint8_t packet[]
: This an array of bytes that contains a full NMEA packet/message, including the '$' start character, the transmitted checksum bytes, and the end characters,CR
andLF
. -
uint8_t packetEndIndex
: This is the "length" of the packet expressed as the 0-starting index of the last byte in the packet. So if a full packet was 25 bytes long,packetEndIndex
would be 24. I used the index instead of the actual size because I already had the variable from a previous function, but you could use actual size instead. -
unit8_t receivedCSbyte1
: This is the first (of two) checksum bytes that was actually received as part of the NMEA packet. Since this function also has the entire packet as an input you could alternatively extract the checksum bytes from there instead of passing them as inputs. I already had the checksum bytes stored in their own variables so I left them as separate function inputs. - `uint8_t receivedCSbyte2': Same as above.
Calculation
The process outlined here is the same, and in the same order, as the process described above in the Checksum Translation section.
-
As was discussed earlier, the first step is to perform consecutive XOR operations on the NMEA packet. This is achieved with the
for()
loop. The loop begins at index 1 to skip the '$' start character, and ends at index 4 less than thepacketEndIndex
(5 from the end) to exclude the ending byte sequence ('* ', CS1, CS2,CR
,LF
). The NMEA checksum XOR calculation is done only over the bytes between '$' and '*'. Refer to the "NMEA Protocol Frame" graphic above. -
Next, the calculated byte (8 bits) is split into two nibbles (4 bits each). The first nibble is the first four bits, and the second nibble the second four bits. The nibbles are "extracted" from the byte by performing a bitwise AND operation. Bitwise AND is binary multiplication of bits in the same position. In this case, the calculated byte is multiplied by 0xF0 or 0b11110000. If the byte was 0b10101001, multiplying by 0b11110000 would result in 0b10100000. Both nibbles need to "look" like 4 bit numbers so nibble 1 needs to be shifted to the right by 4 bits using the right shift operator.
0b10100000 >> 4 = 0b00001010
-
Within the memory of a computer, the hexadecimal and binary representations of a number are literally identical. Writing 0xA is literally the same thing as writing 0b1010. This means that step 3 of the process above doesn't really "do" anything in code, its only to help the programmer understand what's going on. The nibbles from step 2 exist as both binary and hexadecimal at the same time. So in a sense, step 3 is completed at the same time as step 2.
-
Remember that we're going to "transform" the value of the nibbles into the ASCII character that would display the nibble's value as text. (0xA→'A', 0x9→'9') Also remember that ASCII is just a standard that relates a symbol (mostly letters and numbers) to a number that can be represented in computer memory. So an ASCII 'A' character is saved in memory as 0x41, and a '9' character is saved as 0x39. In the ASCII standard, the characters '0' through '9' are represented by 0x30 through 0x39, and "A" through "F" are 0x41 through 0x46. The selection of these hex values is intentional, notice the second digit of the hex number is the same as the character it represents. (For A-F, add one) In code, we can take advantage of this relationship. If a nibble is between 0x0 and 0x9, we can simply add it to the ASCII "0" to get the ASCII code for that nibble. For example, if the nibble is 0x4 then
0x4+'0' = 0x4+0x30 = 0x34 = '4'
. Looking at the ASCII code standard will help this make more sense. If a nibble is between 0xA and 0xF, subtract 10 (to "reset" back to zero) and add 'A'. If the nibble is 0xC then0xC-10+'A' = 0xC-0xA+0x41 = 0x43 = 'C'
In the example code below, the "ternary" operator?
is used to check if the nibble is greater less than or equal to 0x9. -
Like Step 3, when you're dealing in computer memory, binary and hex numbers mean the same thing so once you have the output from step 4, that the last thing the code needs to "do" in the checksum calculation.
Verification
At the end of this process, you have "transformed" a single calculated checksum byte into two new bytes which happen to "look" like the nibbles of the checksum byte. Remember that the purpose of all of this is to see if any data was lost or corrupted in transmission. When the GPS receiver sent the packet of data, it performed exactly the same checksum calculation and included its resulting two bytes at the end of the packet. As long as the data has arrived intact, when the calculation is performed again the two sets of bytes should be the same. The final piece of this puzzle is to confirm if this is the case. In my example code, the function returns TRUE
if the byte sets match and FALSE
if they don't. The function output can then be used as a signal to other parts of your code, indicating if that GPS packet is good to process or needs to be ignored.
Conclusion
I'm hoping that this explanation helps some people understand this process a little bit better. I realize its rather lengthy but I thought that when writing this kind of reference more detail is better. If you get through all of this and are looking for some more practical information and examples, try looking for the datasheet on the GPS receiver you plan on using (there may be more than one document). In it you should find details about how that specific device handles NMEA messages and there may even be some suggestions from the manufacturer on what your code should do. There are also many GPS code libraries for Arduino available to look at. While it is by no means the simplest, SparkFun's library for uBlox brand GPS receivers was a good reference for my project.
Example Code
bool nmeaChecksumCompare(const uint8_t packet[], const uint8_t packetEndIndex, const uint8_t receivedCSbyte1, const uint8_t receivedCSbyte2)
{
uint8_t calcChecksum = 0;
for(uint8_t i=1; i<packetEndIndex-4; ++i) // packetEndIndex is the "size" of the packet minus 1. Loop from 1 to packetEndIndex-4 because the checksum is calculated between $ and *
{
calcChecksum = calcChecksum^packet[i];
}
uint8_t nibble1 = (calcChecksum&0xF0) >> 4; //"Extracts" the first four bits and shifts them 4 bits to the right. Bitwise AND followed by a bitshift
uint8_t nibble2 = calcChecksum&0x0F;
uint8_t translatedByte1 = (nibble1<=0x9) ? (nibble1+'0') : (nibble1-10+'A'); //Converting the number "nibble1" into the ASCII representation of that number
uint8_t translatedByte2 = (nibble2<=0x9) ? (nibble2+'0') : (nibble2-10+'A'); //Converting the number "nibble2" into the ASCII representation of that number
if(translatedByte1==receivedCSbyte1 && translatedByte2==receivedCSbyte2) //Check if the checksum calculated from the packet payload matches the checksum in the packet
{
return true;
}
else
{
return false;
}
}