Converting/Compressing numbers to bytes

Hi, I have some numbers and I want to compress them as much as possible to send them via internet.
I searched a lot but there was no library for that.

What I have in mind:
A byte is from 0-255. I want to separate digits of max 255 and convert each of them to bytes.
Consider this number:

1214730647285

It should be separated like this:

121, 47, 30, 64, 72, 85

Starting from first, 1214... is greater than 255 so we take its first 121. and so on.
Hex representation is:

79, 2F, 1E, 40, 48, 55

So that big number (1214730647285) with 13 digits or 13 string-bytes was converted to 6 bytes only.

Is there any library to do this or a similar technique to this to compress numbers?
Thanks

That number should fit into 64 bits, which means that it can be represented by a maximum of 8 bytes. Use the sdoull standard lib function to convert it. Note that this won't work on an Uno since it doesn't have the resources to include the std namespace. An ESP32, RPi Pico should be fine though

    std::string str = "1214730647285";
    uint64_t num = std::stoull(str);
1 Like

12147 translated to 121, 47.

But to what does 121047 translate to?

2 Likes

Someone showed you a "number trick?" Now try 256(10). "256 is greater than 255 so we take the 25, then the 6 and get two bytes..." I prefer "old math" 100(16) over your 19 06(goofy)

Arduino Uno can use type unsigned long long variables.

Perhaps the OP would be better served by BCD where each byte represents 2 digits with no ambiguity and it can support very large values. That's how 4-bit calculators worked, with exponents as well as mantissas.

"1214730647285";

01 21 47 30 64 72 85

Note that low to high order parsing is necessary.

1 Like

Yeah that example was just to say what's in my mind about numeric string compression.

Using this algorithm if I have a number like "132.04" I can separate it into
"132" : 01 32
"04": 04
How can I know 04 is .4 or .04? Is there any standard library/function for it that handles these cases?

This is an alternative library to zlib for embedded systems.

It is not for Arduino, but there is an article running it on ESP32.
(This article is in Japanese, so use Google Translate.)

Original: 組み込みファームウェアで ZIP ファイルを扱う 〜 miniz の紹介 #C - Qiita

Google translate: 組み込みファームウェアで ZIP ファイルを扱う 〜 miniz の紹介 #C - Qiita

At the end of the article, you will find a Github link to the code the author tested.

if you want to send values with decimal digits as integers, instead of floats, you can scale them up by a fixed amount: 10, 100, 1000 when transmitting and scale them down after receiving.

132.04 becomes 13204 which is 0x33, 0x94. but seems you still need to either specifiy the # of bytes being transmitted or transmit a fixed # of bytes for each value (e.g. 0x00, 0x00, 0x33, 0x94)

of course you can always send it as ASCI string "132.04" of varying lengths (e.g. ".1")

1 Like

"How can I know 04 is .4 or .04?"

Because 04 in binary is 0000 0100
the first nyble is literally 0, the second literally 4

I dunno about a standard library but this goes back to the 60's if not before. It is how 70's hand calculators worked.

If you want a system with software and complete explanation/tutorial:
Nick Gammon on Big Numbers
Nick was a very active member here when I arrived in 2011, he had many 1000's of Karma points then for good reason.
One level up on that link gets the whole set of whatever is there, it is a complete explanation with code, schematics and illustrations beyond what forum posts will getcha.

Your example deals with decimals in which case could be fixed length or be either fixed point where the last digit is always tenths or have a fixed length number of places like
0x01 0x32 0x04 0x02 with the last byte as 2 BCD places after the decimal point.

How I deal with decimals in binary is to choose my working units carefully, a version of fixed-place. If I want to deal with meters to 3 places I work in micrometers (long or unsigned long) and then when divisions occur I have 3 places precision to lose and still get correct results to the mm. The only time a decimal place turns up is in the displayed result that has to be text anyway and I code for that, have since the early 80's.

1 Like

how would you generate these bcd hex values from the floating-pt #?

why not scale up the value by what ever factor: 10, 4, 7 (yea!?) and using the resulting binary (not bcd) bytes representing the integer value?

No, it was converted to 17 bytes if you're talking about ASCII (counting the commas and not counting a null terminator).

It seems that you (like countless other newbies) are wrapped around the axle when it comes to the concepts of binary numbers verses their representation in human-readable form.

The most efficient way to transmit numbers is in their raw binary form. Then, n bits of data occupy n bits on the "wire". This however presents problems with framing and ASCII transport mechanisms that don't work well with the non-printable control characters that binary data inevitably contains.

The standard method for transporting binary data (images, etc) over the internet with http and similar protocols is to use Base-64 encoding. With this, every 6 bits of data occupy 8 bits on the wire. So, it takes 4 ASCII bytes to transmit 3 bytes of binary data. This is a 4/3 X expansion, but it has the advantage that the data stream is all printable ASCII (though not human-readable). It's also more efficient than the 2 X expansion you'd get from using ASCII Hex. And, it's vastly more efficient than your proposed scheme.

So if you had the number 3678546345 you'd break it up into 36, 78, 54, 63, and 45. So you're sending 5 bytes.

But thtat number is already represented in your Arduino as 4 bytes in binary. Wouldn't it be more compact to just send the 4 binary bytes and reassemble on the other side?

That is stupid simple to do and doesn't involve any ambiguity like your method does.

I would be building them up from text for one, likely from buffered serial input.
What is binary is already compressed.

Possibly to send serial bytes, possibly to use arbitrary length values. Have to know what for before judging on stupid.

tranlating to BCD requires first tranlating the float to an ASCII string and then each char to BCD and combining 2 BCD digits into a single byte.

seems like a lot of work compared to just sending the binary bytes the integer is represented in

Agreed, as long as the transport mechanism can deal with raw binary data and framing so the received bytes can be reassembled properly. Otherwise, if transport in printable ASCII is required, Base-64 is probably as efficient as you can get (Post #14).

what transmission mechanism cannot support binary? doesn't ASCII need to be transmitted as binary?

Obviously, various internet mechanisms. Otherwise, Base-64 wouldn't be required for transport of things like JPG images or security certificates:

const char * const rootCa =           R"EOF(
-----BEGIN CERTIFICATE-----
MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
-----END CERTIFICATE-----
)EOF";