Compression Libraries

Does anyone know of a solid general-purpose compression library for the Arduino?

What kind of data do you expect to compress? Audio? Text? Byte streams? Images? Executable code?

Almost all general-purpose compression algorithms require a LOT of memory. They must analyze the uncompressed data, form statistical models of phrases or fragments of the data, and then access this statistical data to output the compressed form of the input.

Probably the simplest general-purpose compression algorithm would be a Huffman coding. The "statistical model" is a bit-tree which might fit in the ATmega168 for ASCII strings. Google it.

I'm familiar with Huffman coding. What I'm looking for is a pre-written library small enough to fit on the arduino. If I was going to just write my own Huffman coding scheme, I wouldn't be asking. :slight_smile:

Technically, our signal behaves the most like audio, but doesn't need quite as high a sampling rate. We're looking at around 5kHz sampling.

Same question as "datalurkur"

Compression schemes tend to be RAM-intensive and difficult to do on a microcontroller like the arduino, although there are audio-like compression schemes that use hardware assist for telco-like applications (voice-like A-D converts that feed you an 8kbit/s bitstream for "almost phone quality" audio.)
You might see if you can dig up the old CPM utility that came before lzhuf compression and zip; IIRC it used a huffman-like algorithm that created a document-specific decode table stored with the compressed output, and it wasn't TOO big.

If your sample data normally looks like a square wave, you can use Run-Length-Encoding, compressing the data without a lot of memory overhead whatsoever.

If you send sine-wave-like data, you can compress the data into deltas using delta encoding(Delta encoding - Wikipedia), sending two or three samples per byte. Wikipedia says it isn't useful by itself, but if your data is normally +-3, you can pack 3 samples per byte. If it is normally +-7, you can pack 2 samples per byte. You just need to send out flag bytes in your data stream to indicate which packing method you are using for the following bytes, none, 2bit, 4bit, etc.

If it looks more like an audio waveform with many frequencies and quick transistions from high to low, you may still be able to benefit from delta compression, but the compression won't occur as often.

Both are trivial to implement with an Arduino's space and speed limitations.

I am collecting weather station data. It samples once a min. Looks something like this, once it makes it into the MySQL DB.

mysql> select * from weather_stations_data limit 5;
+--------------------------+-------+-------+-------+-------+---------------+----------------+----------------+------------+------------------+---------------------+
| weather_stations_data_id | temp1 | temp2 | temp3 | temp4 | wireless_link | wireless_level | wireless_noise | link_to_ap | link_to_internet | date_code           |
+--------------------------+-------+-------+-------+-------+---------------+----------------+----------------+------------+------------------+---------------------+
|                        1 |  7914 |  7092 |  3729 |  5551 |            48 |            -72 |            -89 |          1 |                1 | 2008-01-01 00:00:00 | 
|                        1 |  7925 |  7092 |  3673 |  5540 |            48 |            -72 |            -89 |          1 |                1 | 2008-01-01 00:04:00 | 
|                        1 |  7925 |  7092 |  3706 |  5551 |            48 |            -72 |            -89 |          1 |                1 | 2008-01-01 00:01:00 | 
|                        1 |  7925 |  7092 |  3706 |  5551 |            48 |            -72 |            -89 |          1 |                1 | 2008-01-01 00:02:00 | 
|                        1 |  7914 |  7092 |  3706 |  5540 |            48 |            -72 |            -89 |          1 |                1 | 2008-01-01 00:09:00 | 
+--------------------------+-------+-------+-------+-------+---------------+----------------+----------------+------------+------------------+---------------------+
5 rows in set (0.00 sec)

How about the Mega? http://arduino.cc/en/Main/ArduinoBoardMega

Is it possible to add more RAM and run from that memory space?

Speed does not matter, I collect data once a min, and would like to transmit once an hour. The connection is done with a GSM (Cell) modem, so I pay by the byte, and it adds up FAST!

You can send just the changes (deltas) in that case:

Excluding Date (you can apply the date at the receiver's end when storing to the DB, for example) you could have 6 bytes the second minute, then 4 bytes, etc. You will have to mark which data is changed, then send the actual change, like:
[1][11][3][56][4][-9]

The first transmission should be the actual data, everything after that can be only the changes. In the example above, [1],[3], and [4] are the marker bytes indicating the values that changed, and the values follow.

If the changes are larger than 127, send a different mark, like
[11][01][56]

[11] marks a double-byte change in temp1 in my sample that increases 156 (two bytes reassembled) in the above line.

I think the overhead of alternate types of compression certainly won't show massive improvement on something simple like this...

Thanks Spinlock. I will give that a shot.

Thus far, I have had no luck writing a compression algorithm. I found one based on LZ that seems to fit both the memory and storage requirements. I just cant seem to get it to run. Found it from the open Solaris kernel source tree. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/compress.c

I have done all my testing in CodeBlocks, and will move the running code to my arduino. If I can get it running. Any help would be great!

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>

#define MATCH_BITS      6
#define MATCH_MIN       3
#define MATCH_MAX       ((1 << MATCH_BITS) + (MATCH_MIN - 1))
#define OFFSET_MASK     ((1 << (16 - MATCH_BITS)) - 1)
#define LEMPEL_SIZE     256
#define      NBBY                8
typedef      unsigned char      uchar_t;
typedef unsigned short uint16_t;
typedef      int                  intptr_t;
typedef      unsigned int            uintptr_t;

size_t
compress(char *s_start, char *d_start, size_t s_len)
{
        uchar_t *src = s_start;
        uchar_t *dst = d_start;
        uchar_t *cpy, *copymap;
        int copymask = 1 << (NBBY - 1);
        int mlen, offset;
        uint16_t *hp;
        uint16_t lempel[LEMPEL_SIZE];   /* uninitialized; see above */

        while (src < (uchar_t *)s_start + s_len) {
                if ((copymask <<= 1) == (1 << NBBY)) {
                        if (dst >= (uchar_t *)d_start + s_len - 1 - 2 * NBBY) {
                                mlen = s_len;
                                for (src = s_start, dst = d_start; mlen; mlen--)
                                        *dst++ = *src++;
                                return (s_len);
                        }
                        copymask = 1;
                        copymap = dst;
                        *dst++ = 0;
                }
                if (src > (uchar_t *)s_start + s_len - MATCH_MAX) {
                        *dst++ = *src++;
                        continue;
                }
                hp = &lempel[((src[0] + 13) ^ (src[1] - 13) ^ src[2]) &
                    (LEMPEL_SIZE - 1)];
                offset = (intptr_t)(src - *hp) & OFFSET_MASK;
                *hp = (uint16_t)(uintptr_t)src;
                cpy = src - offset;
                if (cpy >= (uchar_t *)s_start && cpy != src &&
                    src[0] == cpy[0] && src[1] == cpy[1] && src[2] == cpy[2]) {
                        *copymap |= copymask;
                        for (mlen = MATCH_MIN; mlen < MATCH_MAX; mlen++)
                                if (src[mlen] != cpy[mlen])
                                        break;
                        *dst++ = ((mlen - MATCH_MIN) << (NBBY - MATCH_BITS)) |
                            (offset >> NBBY);
                        *dst++ = (uchar_t)offset;
                        src += mlen;
                } else {
                        *dst++ = *src++;
                }
        }
        return (dst - (uchar_t *)d_start);
}

size_t
decompress(void *s_start, void *d_start, size_t s_len, size_t d_len)
{
        uchar_t *src = s_start;
        uchar_t *dst = d_start;
        uchar_t *s_end = (uchar_t *)s_start + s_len;
        uchar_t *d_end = (uchar_t *)d_start + d_len;
        uchar_t *cpy, copymap;
        int copymask = 1 << (NBBY - 1);

        if (s_len >= d_len) {
                size_t d_rem = d_len;
                while (d_rem-- != 0)
                        *dst++ = *src++;
                return (d_len);
        }

        while (src < s_end && dst < d_end) {
                if ((copymask <<= 1) == (1 << NBBY)) {
                        copymask = 1;
                        copymap = *src++;
                }
                if (copymap & copymask) {
                        int mlen = (src[0] >> (NBBY - MATCH_BITS)) + MATCH_MIN;
                        int offset = ((src[0] << NBBY) | src[1]) & OFFSET_MASK;
                        src += 2;
                        if ((cpy = dst - offset) >= (uchar_t *)d_start)
                                while (--mlen >= 0 && dst < d_end)
                                        *dst++ = *cpy++;
                        else
                                /*
                                 * offset before start of destination buffer
                                 * indicates corrupt source data
                                 */
                                return (dst - (uchar_t *)d_start);
                } else {
                        *dst++ = *src++;
                }
        }
        return (dst - (uchar_t *)d_start);
}


int main()
{
  char foo[80] = { 0 };
  char target[80] = { 0 };

  printf("String to compress: ");
  gets(foo);

  compress(foo, target, 80);

  //newline string, WTF, newline, from foo
  printf("\n%s,was inputed.\n", foo);
  printf("\n%s,was compressed.\n", target);
  return 0;
}

Pushed :wink:

http://code.google.com/p/fastlz/

I am not able to check this... but possibly a hint...
..have good luck... and please tell us, if you got it work...

Greetings
ChrisS

I would solve this as follows:

  1. The data is arranged such that all values of the same colum will be in successive order.
  2. The first value is the measure value
  3. Every following value is a delta

This will result in sequences with long runs of small values. Especially if the weather conditions do not change to fast.

  1. For the delta values analyze how many bits they actually use up (you may need to renormalize to make negative numbers like -1 not use all bits)

Protocoll will then be:

Start value, number of bits required per delta, delta bits

If this is still not sufficient then I would push this through huffman or something similar.

You may also want to consider

http://en.wikipedia.org/wiki/Burrows-Wheeler_transform

for preprocessing.

Udo

1 Like