Data compression in my code

I need to store a long list of GPS coordinates. I use progmem to have them in the flash. Problem is, my program takes about 16K of the memory which leaves me with only 16K to store my coordinates. Each coord is basically two floats, so I need 8 bytes for each. This lets me store only ~2000 coords. Is there a way to compress my list?
I don't know if it's relevant, but if I take my coords list and save it in notepad, and then zip the file, I get a file half the size...

szangvil:
Each coord is basically two floats, so I need 8 bytes for each

Look for a more space-efficient way to represent the values. What resolution and range of values do you actually need to support?

If you were to try and compress your data you would need code to both compress and decompress it and your program would run slower.
Why not store the data on an SD card?

szangvil:
I need to store a long list of GPS coordinates. ... but if I take my coords list and save it in notepad, and then zip the file, I get a file half the size...

This implies you want to store 4000 coordinates. Can you define "long list" better? 3000? 4000? 10000? And to what resolution (ie. how many decimal places)?

How about storing only the difference between two readings ?
That would make up a starting point, and vector like data (which might be smaller in size)for next readings.

Or only storing the bytes that have changed ?
So the right hand side bytes that have changed, where you keep track of the number of bytes you need and compare this to what you get.
The left hand side bytes would then need to be copied from the previous value.

So:
8 byte value.
1 byte value (makes for 7 bytes + 1 byte)
3 byte value (5 byte + 3 byte)

If for some miraculous reason only byte number 3 (from the right) would have changed, you'd need to update bytes 2 and 1 as well.

As you are using variable lengths, you need to use a separator character to indicate the end of your data package (comma is often used for that, but you can go for some other character if you already use a comma for other purposes).

A Trie might help.

I wouldn't use floats though, but the ASCII numbers. It would probably be effective because a lot of coordinates would start with the same number (or numbers).

Agreed with most comments above, but would like to add another:
16k worth of code seems quite a bit, too. Maybe there's room to optimize?
If you use a lot of "Libraries", maybe you can trim/reduce them a bit?

The reason compression works is that you have some inside information about the data you're trying to compress. At this point all we know is that there are GPS coordinates and that they are stored as two floats. There are a couple of things we could do with just that information, but if you want a lot of compression it would help to know a few more details about the data.

  • How much accuracy do you need?
  • Are the coordinates in a certain part of the world? What are the lower and upper bounds of the X and Y coordinates?
  • Is the data sorted? If so, in what way? Would it be a problem if we re-sorted the data in order to compress it better?

The reason compression works is that you have some inside information about the data you're trying to compress.

Yes. If, for example, you have lots of coordinates at the same Latitude then you only need to store the Latitude once then only the Longitude of all the coordinates. Also of course the less accuracy you require the more Latitudes that will be "the same". As TanHadron was pointing out if you are only interested in a small part of the world you can further reduce what is stored.

Sounds like a fun challenge.

I like MAS3's idea. If your coordinates are all close together or consist of several tightly bunched clusters, you could store one or more "master" locations and store the rest of the locations as offsets from a master. Use only two or four bytes for an offset stored in "integer fixed-point" format. If an overflow of your integer would occur for a particular offset, create a new master point and store the offset from the new master.

Also. It looks like 1 degree of latitude at the equator is 111,319 meters but only about 7 meters near the poles. So for coordinates with higher longitudes, you can use less latitude resolution and keep the same amount of relative accuracy.

If you really want to get fancy, use packed bits as described near the end of this thread: forum.arduino.cc/index.php?topic=182430

radman:
Yes. If, for example, you have lots of coordinates at the same Latitude then you only need to store the Latitude once then only the Longitude of all the coordinates.

Effectively that is what a Trie does because it stores common prefixes. However I'm not so sure about a 2-dimensional array (lat and long).

tylernt:
It looks like 1 degree of latitude at the equator is 111,319 meters but only about 7 meters near the poles.
... 1 degree of equatorial latitude (111 km) is already a lot smaller than 1 degree of longitude (6,356 km) ...

A degree of latitude is everywhere the same: about 60 nautical miles, about 111 kilometers.
A degree of longitude is the same size = ~ 111 kilometers - at the equator, shrinking to zero at the poles. 6,356 kilometers looks more like a radian of longitude at the equator.

tmd3:
6,356 kilometers looks more like a radian of longitude at the equator.

Oops you're right, that's what happens when I Google something and copy/paste without engaging the brain. :blush:

Storing the GPS coordinates as floats is wasting space because you don't need the exponent information, since your coordinates are all within a fixed range and have to be stored (I presume) with the same absolute precision. First you need to decide what resolution you need. Suppose it is 0.1 arc minute. Then 90 degrees is (90 * 60/0.1) = 54000 units of 0.1 arc minute. So the range of latitudes you need to store is from +54000 to -54000, which you can store as a 17-bit signed integer. Similarly, for longitude you need to store from +108000 to -108000 units of 0.1 arc minute, which you can do in 18 bits. So you can store a complete coordinate in 35 bits (i.e. 8 coordinates in 35 bytes) - which is a lot less than the 64 bits per coordinate (8 coordinates in 64 bytes) you are using at present. And that's without applying compression to the table as a whole.