I’m glad my project sparked some interest. Since I started that thread I have implemented several compression schemes:
Compressed 888 + Alpha, 777 + Alpha, 666 + Alpha, 555
Palletized & compressed 256, 128, 64, 32, 16, 8, 4 and 2 colors
Also there is uncompressed 565, which is basically the “raw” 565 data. For example, for compressed 555 I use the additional bit to mark a following number of repetitions, a kind of RLE algorithm.
I compared the result with some other RLE implementation in the Arduino world, and it seems I do a bit better. But my process is not fully automatic, because depending on some images, sometime it is better to pick a particular compression scheme. So I wrote a program to pick the best result for a given usage. Here’s for example the Arduino Community logo, which has a transparent background and a size of 500x251, encoded as a PNG in 51491bytes.
https://support.arduino.cc/hc/article_attachments/12416033021852
You can see that this particular image compress well as a 8 color palletized image, in 6759 bytes, preserving the transparency.
Now a small monochrome image on a white background like that 150x150 JPEG image of 3362 bytes
https://support.arduino.cc/hc/article_attachments/12415993577116
Again, the 8 colors work best for this image, even removing the quite heavy JPEG compression artefacts at the same time, thanks to the palletization.
I compared also to an existing Arduino library, https://github.com/MHotchin/RLEBitmap
Picking the 2 examples given with a resulting size, the “chanceflurries” icon compress to 755 bytes according to the author. I do 512 bytes, a bit better…
Also, the moon example, 128x128 pixels in 16 colors compress to 8400 bytes, and again I do a bit better with 7731 bytes
I can even do a compression in 8 colors, with a result of 4939 bytes removing a little of the pixels information, but not much.
The current compression program that is quite a heavy Windows executable and is intended to be used manually picking the best algorithm. But the byte array result is read by a very simple algorithm, very light and that only need a implementation as a callback function, receiving a coordinate, a line length, a color and a direction (horizontal or vertical). So this would be straightforward to implement with any graphic library, or even doing it manually, for example lighting up an array of leds with a simple loop. So far, I implemented the decoding part in C#, but in a way that is extremely close to C++, so it will be no problem to translate it in the Arduino language later. And I’m pretty sure it will be very lightweight. I also have no dependencies at all for the decoding algorithm.
As my implementation can do sprites with some transparency, I thought it would be nice to consider the sprites as letters, thus having a way to write with any kind of font that Windows can display. I used as an encoding scheme CP437 (extended ascii) so many Western European languages can be written + a few additional characters, unlike some other implementations that limit to non-accentued characters. See https://www.lookuptables.com/text/extended-ascii-table
Here’s the result using a fancy font, only 16 pixels high and of course using antialiasing to have the letters looking nice, using only 4 colors (3 + transparency)

You can use even fancier fonts with complex ligatures, like this one

Enlarge for details:
And showing the various letters:
As you can see, at the end it is just a juxtaposition of sprites, taking into account the various parameters of a font (leading, kerning, baseline…). Each letter is a highly compressed buffer of bytes. in 2, 4 or 8 colors gradient
As a bonus, my decoding algorithm can read directly from an external media, so reading directly an image from a microSD card is possible, without having to instanciate more than a single line of pixels at a time.
Hope you enjoyed the ride 