Compression Libraries

system · February 25, 2009, 5:24pm

Does anyone know of a solid general-purpose compression library for the Arduino?

system · February 25, 2009, 5:35pm

What kind of data do you expect to compress? Audio? Text? Byte streams? Images? Executable code?

Almost all general-purpose compression algorithms require a LOT of memory. They must analyze the uncompressed data, form statistical models of phrases or fragments of the data, and then access this statistical data to output the compressed form of the input.

Probably the simplest general-purpose compression algorithm would be a Huffman coding. The "statistical model" is a bit-tree which might fit in the ATmega168 for ASCII strings. Google it.

system · February 25, 2009, 5:37pm

I'm familiar with Huffman coding. What I'm looking for is a pre-written library small enough to fit on the arduino. If I was going to just write my own Huffman coding scheme, I wouldn't be asking.

Technically, our signal behaves the most like audio, but doesn't need quite as high a sampling rate. We're looking at around 5kHz sampling.

system · October 9, 2009, 8:43pm

Same question as "datalurkur"

westfw · October 10, 2009, 2:08am

Compression schemes tend to be RAM-intensive and difficult to do on a microcontroller like the arduino, although there are audio-like compression schemes that use hardware assist for telco-like applications (voice-like A-D converts that feed you an 8kbit/s bitstream for "almost phone quality" audio.)
You might see if you can dig up the old CPM utility that came before lzhuf compression and zip; IIRC it used a huffman-like algorithm that created a document-specific decode table stored with the compressed output, and it wasn't TOO big.

system · October 10, 2009, 3:37pm

If your sample data normally looks like a square wave, you can use Run-Length-Encoding, compressing the data without a lot of memory overhead whatsoever.

If you send sine-wave-like data, you can compress the data into deltas using delta encoding(Delta encoding - Wikipedia), sending two or three samples per byte. Wikipedia says it isn't useful by itself, but if your data is normally +-3, you can pack 3 samples per byte. If it is normally +-7, you can pack 2 samples per byte. You just need to send out flag bytes in your data stream to indicate which packing method you are using for the following bytes, none, 2bit, 4bit, etc.

If it looks more like an audio waveform with many frequencies and quick transistions from high to low, you may still be able to benefit from delta compression, but the compression won't occur as often.

Both are trivial to implement with an Arduino's space and speed limitations.

system · October 12, 2009, 3:53pm

I am collecting weather station data. It samples once a min. Looks something like this, once it makes it into the MySQL DB.

mysql> select * from weather_stations_data limit 5;
+--------------------------+-------+-------+-------+-------+---------------+----------------+----------------+------------+------------------+---------------------+
| weather_stations_data_id | temp1 | temp2 | temp3 | temp4 | wireless_link | wireless_level | wireless_noise | link_to_ap | link_to_internet | date_code           |
+--------------------------+-------+-------+-------+-------+---------------+----------------+----------------+------------+------------------+---------------------+
|                        1 |  7914 |  7092 |  3729 |  5551 |            48 |            -72 |            -89 |          1 |                1 | 2008-01-01 00:00:00 | 
|                        1 |  7925 |  7092 |  3673 |  5540 |            48 |            -72 |            -89 |          1 |                1 | 2008-01-01 00:04:00 | 
|                        1 |  7925 |  7092 |  3706 |  5551 |            48 |            -72 |            -89 |          1 |                1 | 2008-01-01 00:01:00 | 
|                        1 |  7925 |  7092 |  3706 |  5551 |            48 |            -72 |            -89 |          1 |                1 | 2008-01-01 00:02:00 | 
|                        1 |  7914 |  7092 |  3706 |  5540 |            48 |            -72 |            -89 |          1 |                1 | 2008-01-01 00:09:00 | 
+--------------------------+-------+-------+-------+-------+---------------+----------------+----------------+------------+------------------+---------------------+
5 rows in set (0.00 sec)

How about the Mega? http://arduino.cc/en/Main/ArduinoBoardMega

Is it possible to add more RAM and run from that memory space?

Speed does not matter, I collect data once a min, and would like to transmit once an hour. The connection is done with a GSM (Cell) modem, so I pay by the byte, and it adds up FAST!

system · October 12, 2009, 4:14pm

You can send just the changes (deltas) in that case:

Excluding Date (you can apply the date at the receiver's end when storing to the DB, for example) you could have 6 bytes the second minute, then 4 bytes, etc. You will have to mark which data is changed, then send the actual change, like:
[1][11][3][56][4][-9]

The first transmission should be the actual data, everything after that can be only the changes. In the example above, [1],[3], and [4] are the marker bytes indicating the values that changed, and the values follow.

If the changes are larger than 127, send a different mark, like
[11][01][56]

[11] marks a double-byte change in temp1 in my sample that increases 156 (two bytes reassembled) in the above line.

I think the overhead of alternate types of compression certainly won't show massive improvement on something simple like this...

system · October 12, 2009, 4:30pm

Thanks Spinlock. I will give that a shot.

system · November 18, 2009, 5:33am

Thus far, I have had no luck writing a compression algorithm. I found one based on LZ that seems to fit both the memory and storage requirements. I just cant seem to get it to run. Found it from the open Solaris kernel source tree. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/compress.c

I have done all my testing in CodeBlocks, and will move the running code to my arduino. If I can get it running. Any help would be great!

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>

#define MATCH_BITS      6
#define MATCH_MIN       3
#define MATCH_MAX       ((1 << MATCH_BITS) + (MATCH_MIN - 1))
#define OFFSET_MASK     ((1 << (16 - MATCH_BITS)) - 1)
#define LEMPEL_SIZE     256
#define      NBBY                8
typedef      unsigned char      uchar_t;
typedef unsigned short uint16_t;
typedef      int                  intptr_t;
typedef      unsigned int            uintptr_t;

size_t
compress(char *s_start, char *d_start, size_t s_len)
{
        uchar_t *src = s_start;
        uchar_t *dst = d_start;
        uchar_t *cpy, *copymap;
        int copymask = 1 << (NBBY - 1);
        int mlen, offset;
        uint16_t *hp;
        uint16_t lempel[LEMPEL_SIZE];   /* uninitialized; see above */

        while (src < (uchar_t *)s_start + s_len) {
                if ((copymask <<= 1) == (1 << NBBY)) {
                        if (dst >= (uchar_t *)d_start + s_len - 1 - 2 * NBBY) {
                                mlen = s_len;
                                for (src = s_start, dst = d_start; mlen; mlen--)
                                        *dst++ = *src++;
                                return (s_len);
                        }
                        copymask = 1;
                        copymap = dst;
                        *dst++ = 0;
                }
                if (src > (uchar_t *)s_start + s_len - MATCH_MAX) {
                        *dst++ = *src++;
                        continue;
                }
                hp = &lempel[((src[0] + 13) ^ (src[1] - 13) ^ src[2]) &
                    (LEMPEL_SIZE - 1)];
                offset = (intptr_t)(src - *hp) & OFFSET_MASK;
                *hp = (uint16_t)(uintptr_t)src;
                cpy = src - offset;
                if (cpy >= (uchar_t *)s_start && cpy != src &&
                    src[0] == cpy[0] && src[1] == cpy[1] && src[2] == cpy[2]) {
                        *copymap |= copymask;
                        for (mlen = MATCH_MIN; mlen < MATCH_MAX; mlen++)
                                if (src[mlen] != cpy[mlen])
                                        break;
                        *dst++ = ((mlen - MATCH_MIN) << (NBBY - MATCH_BITS)) |
                            (offset >> NBBY);
                        *dst++ = (uchar_t)offset;
                        src += mlen;
                } else {
                        *dst++ = *src++;
                }
        }
        return (dst - (uchar_t *)d_start);
}

size_t
decompress(void *s_start, void *d_start, size_t s_len, size_t d_len)
{
        uchar_t *src = s_start;
        uchar_t *dst = d_start;
        uchar_t *s_end = (uchar_t *)s_start + s_len;
        uchar_t *d_end = (uchar_t *)d_start + d_len;
        uchar_t *cpy, copymap;
        int copymask = 1 << (NBBY - 1);

        if (s_len >= d_len) {
                size_t d_rem = d_len;
                while (d_rem-- != 0)
                        *dst++ = *src++;
                return (d_len);
        }

        while (src < s_end && dst < d_end) {
                if ((copymask <<= 1) == (1 << NBBY)) {
                        copymask = 1;
                        copymap = *src++;
                }
                if (copymap & copymask) {
                        int mlen = (src[0] >> (NBBY - MATCH_BITS)) + MATCH_MIN;
                        int offset = ((src[0] << NBBY) | src[1]) & OFFSET_MASK;
                        src += 2;
                        if ((cpy = dst - offset) >= (uchar_t *)d_start)
                                while (--mlen >= 0 && dst < d_end)
                                        *dst++ = *cpy++;
                        else
                                /*
                                 * offset before start of destination buffer
                                 * indicates corrupt source data
                                 */
                                return (dst - (uchar_t *)d_start);
                } else {
                        *dst++ = *src++;
                }
        }
        return (dst - (uchar_t *)d_start);
}


int main()
{
  char foo[80] = { 0 };
  char target[80] = { 0 };

  printf("String to compress: ");
  gets(foo);

  compress(foo, target, 80);

  //newline string, WTF, newline, from foo
  printf("\n%s,was inputed.\n", foo);
  printf("\n%s,was compressed.\n", target);
  return 0;
}

ChrisS · March 12, 2010, 9:12am

Pushed

http://code.google.com/p/fastlz/

I am not able to check this... but possibly a hint...
..have good luck... and please tell us, if you got it work...

Greetings
ChrisS

udoklein · March 14, 2010, 10:39am

I would solve this as follows:

The data is arranged such that all values of the same colum will be in successive order.
The first value is the measure value
Every following value is a delta

This will result in sequences with long runs of small values. Especially if the weather conditions do not change to fast.

For the delta values analyze how many bits they actually use up (you may need to renormalize to make negative numbers like -1 not use all bits)

Protocoll will then be:

Start value, number of bits required per delta, delta bits

If this is still not sufficient then I would push this through huffman or something similar.

You may also want to consider

http://en.wikipedia.org/wiki/Burrows-Wheeler_transform

for preprocessing.

Udo

Topic		Replies	Views
LZ type compressor/decompressor article Libraries	6	9748	May 6, 2021
static html compression Networking, Protocols, and Devices	3	1609	May 6, 2021
Compress text file in SD card Storage	2	3374	May 6, 2021
How to use LZ77 compression algorithm in Arduino Programming Questions	2	792	July 19, 2022
Compressing a short text message. Project Guidance	16	1872	May 5, 2021

Compression Libraries

Related Topics