BitArray, Class to compactly store ints of (almost) any size,

Some time ago I was doing some experiments and wanted to store many small values in an Arduino UNO.
Lets for simplicity sake state it were 1000 analogRead()'s Declaring an array of 1000 int’s would blow the memory and I didn’t want to use a MEGA.

As an analogRead = 0…1023, it uses only 10 bits of every int. So If I could compact the values then I would only need 10/16 * 1000 = 1250 bytes to store the 1000 reads. A simple prototype worked quite well however with a fixed int ar[1250] I lost quite some valuable RAM.

So I decided it should use dynamic memory, so I could temporary allocate it and free at will. It should also work for any element size, 2,3,5, 8, 13, 21 bits whatever. Sounded not too difficult so I went for malloc() and free() but found out that on an UNO you can only allocate about 250-255 bytes at one time. That meant non continuous memory and a mapping function.

Long story short, it resulted in a class I named BitArray which allows compact int arrays of any size.
The library is still in beta, but stable enough to share. There are currently 2 demo’s

demo1: 4000 dice rolls stored in an UNO

demo2: 1000 analogReads stored in an UNO with different bit depths (10, 6, 4) after another.

the demo’s show diagnostic info like

Start sketch_nov22e.ino
LIB VERSION:	0.1.03
CAPACITY:	4000
  MEMORY:	1500
    BITS:	3
SEGMENTS:	8

CAPACITY is the number of elements,
BITS is the size of one element in bits,
MEMORY is memory allocated,
SEGMENTS is the number of allocated memory chunks.

As always comments and remarks are welcome.


Last version - Arduino/libraries/BitArray at master · RobTillaart/Arduino · GitHub

BitArray.zip (3.89 KB)

Test sketch to measure performance reading 1000 bits.

//
//    FILE: bitArrayDemo2.ino
//  AUTHOR: Rob Tillaart
// VERSION: 0.1.00
// PURPOSE: demo performance reading boolean array
//    DATE: 2015-12-06
//     URL: https://forum.arduino.cc/index.php?topic=361167.0
//
// Released to the public domain
//

#include "BitArray.h"

BitArray b;

uint32_t start;
uint32_t stop;
volatile long x = 0;

void setup()
{
  Serial.begin(115200);
  Serial.print("Start ");
  Serial.println(__FILE__);
  Serial.print("LIB VERSION:\t");
  Serial.println(BITARRAY_LIB_VERSION);

  b.begin(1, 1000);
  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    x += b.get(i);
  }
  stop = micros();

  Serial.print("CAPACITY:\t");
  Serial.println(b.capacity());
  Serial.print("  MEMORY:\t");
  Serial.println(b.memory());
  Serial.print("    BITS:\t");
  Serial.println(b.bits());
  Serial.print("SEGMENTS:\t");
  Serial.println(b.segments());

  Serial.print("DURATION:\t");
  Serial.println(stop - start);

  Serial.println("Done...");
}

void loop()
{
}

output:

Start bitArrayDemo2.ino
LIB VERSION:	0.1.03
CAPACITY:	1000
  MEMORY:	125
    BITS:	1
SEGMENTS:	1
DURATION:	27880
Done...

28 uSec to read a bit (includes loop overhead ~1uSec)

(I’ll add this demo in the next version)

Updated version 0.1.04 on Github

  • improved performance (almost factor 2)
  • added perfomance demo
  • some refactoring

As always remarks and comments are welcome

As remarks are welcome ;)

I would suggest the use of arduino predefined constants to afford a greater value to BA_MAX_SEGMENTS on arduino mega

and an other point in _bitset() and _bitget(), but i'm not quite sure of it :

    uint8_t by = re >> 3; //instead of by = re / 8;

This may improve performance slightly, assuming that shift a byte is faster than divide it, and also that gcc is not clever enough to optimize it on its own.

Other suggest : refactoring _bitset() and _bitget(), introducing _bitpos()

void BitArray::_bitpos(uint16_t pos, uint8_t *se, uint8_t *by, uint8_t *bi)
{
    uint16_t re = pos;
    *se = 0;
    while (re >= (BA_SEGMENT_SIZE * 8))  // 8 == #bits in byte 
    { 
      (*se)++; 
      re -= (BA_SEGMENT_SIZE * 8); 
    }
    uint8_t re8 = re; //in order to avoid 16 bits calculations
    *by = re8 >> 3; 
    *bi = re8 & 7;
}

byte BitArray::_bitget(uint16_t pos) 
{ 
    uint8_t se, by, bi;
    _bitpos(pos, &se, &by, &bi);
    byte * p = _ar[se]; 
    return (p[by] >> bi) & 0x01; // bitRead(p[by], bi); 
} 

void BitArray::_bitset(uint16_t pos, byte value) 
{ 
    uint8_t se, by, bi;
    _bitpos(pos, &se, &by, &bi);
    byte * p = _ar[se]; 
    if (value == 0) p[by] &= ~(1 << bi); // bitClear(p[by], bi); 
    else p[by] |= (1 << bi);             // bitSet(p[by], bi); 
}

And then, optimize _bitpos() for high speed sequential access

void BitArray::_bitpos(uint16_t pos, uint8_t *se, uint8_t *by, uint8_t *bi)
{
    static uint16_t lastpos = 0;
    static uint8_t  lastse = 0;
    static uint8_t  lastby = 0;
    static uint8_t  lastbi = 0; //rem : lastse and lastbi could be merged in only one static byte

    if (pos != lastpos)
    {
      lastpos++;
      if (pos == lastpos)
      {
        lastbi++;
        if (lastbi >= 8)
        {
          lastbi = 0;
	  lastby++;
	  if (lastby >= (BA_SEGMENT_SIZE * 8))
	  {
	    lastby = 0;
	    lastse++;
	  }
        }
      }
      else
      {
        lastpos = pos;
        uint16_t re = pos; 
        lastse = 0;
        while (re >= (BA_SEGMENT_SIZE * 8))  // 8 == #bits in byte 
        { 
          lastse++; 
          re -= (BA_SEGMENT_SIZE * 8); 
        }
        uint8_t re8 = re;	  
        lastby = re8 >> 3; 
        lastbi = re8 & 7;
      }		
    }
    *se = lastse;
    *by = lastby;
    *bi = lastbi;
}

Note : this optimized _bitpos() should be fine with set() but not with get() because of i--

Modify _bitpos() to add sequential access down, or modify get() with a i++

This kind of get() might be fine

uint32_t BitArray::get(const uint16_t idx) 
{
    if (_bits == 1) return _bitget(idx); //slight optim
	  
    uint32_t v = 0;
    uint32_t mask = 1;
    uint16_t pos = idx * _bits; 
    for (uint8_t i = 0; i < _bits; i++) 
    { 
        if (_bitget(pos++)) v |= mask;
        mask <<= 1;
    } 
    return v; 
}

None of this has been tested, not even compiled.
Hope it will be usefull

For the fun : using 3 bits to store a throw of six sided dice is improvable.

You could save 11% ram (8 bits instead of 9) by using 1 byte for 3 dice (value 0 => 6*6*6-1=215)

:D

First thanks for your comments, appreciated!

I’ll take some time to go through them. As you implemented them do you have performance numbers to share?

bricoleau:
For the fun : using 3 bits to store a throw of six sided dice is improvable.

You could save 11% ram (8 bits instead of 9) by using 1 byte for 3 dice (value 0 => 666-1=215)

:smiley:

Good thinking, however it looks like an optimized version for storing dices (valuable in its own way).
Converting from base 10 to base 6 and back would be needed (mod & mul) is 100% doable.

The venom is as usual in the tail. At the end of the array you need a single or double digit instead of a triplet. That means you must save room for a special “endtriplets”. You need 6 + 36 = 42 of them.
215 + 42 => 257 just doesn’t fit in a byte any more.

Posted a BoolArray Class in - http://forum.arduino.cc/index.php?topic=364131.msg2511620#msg2511620

optimized for single bits (max 2000 booleans for now 0.1.01 version).

reaction on #3 - good idea!

Updated version 0.1.05 on Github - https://github.com/RobTillaart/Arduino/tree/master/libraries/BitArray

  • board dependant max # segments (thanks bricoleau)

Note: Numbers are educated guesses for the maximum number of segments. People can always adjust these numbers if they want, after all it is open source :) The code might need some fixes when the total amount of bits > 65535 some 16 bits vars might need to be 32 bits.

@bricoleau - do the numbers make sense to you?

As always remarks and comments are welcome

reaction on post #4

uint8_t re8 = re; //in order to avoid 16 bits calculations

This will probably not work as the re(mainder) can maximal be BA_SEGMENT_SIZE * 8 -1; which does not always fit in an uint8_t.

Still the refactoring of the common parts from get() and set() is a good point.
(was already on the list, thanks anyway)


reaction post #5 / #6 / #7

I was thinking of another way to optimize bit_pos(). Sort of binary search. This will always result in 3 compares for 8 segments instead of 1…7 loops.

void BitArray::_bitpos(uint16_t pos, uint8_t *se, uint8_t *by, uint8_t *bi)
{
    uint16_t re = pos;
    *se = 0;
    if (re >= (BA_SEGMENT_SIZE * 8) * 4 ) 
    { 
      (*se) += 4; 
      re -= (BA_SEGMENT_SIZE * 8) * 4; 
    }
    if (re >= (BA_SEGMENT_SIZE * 8) * 2)  // 8 == #bits in byte 
    { 
      (*se)+= 2; 
      re -= (BA_SEGMENT_SIZE * 8) * 2; 
    }
    if (re >= (BA_SEGMENT_SIZE * 8))  // 8 == #bits in byte 
    { 
      (*se)++; 
      re -= (BA_SEGMENT_SIZE * 8); 
    }
    *by = re >> 3; 
    *bi = re & 7;
}

In practice this might be decreasing performance as it is only optimal if all 8 segments are used.


What I really would like is to fetch all the bits that are in one byte in one go. So assume we have a 10 bit analogRead in the bitArray then all values would be divided over 2 bytes (in theory 3 is possible).

The lowerlevel bit_get() would be split in 2 functions:

  • bit_get_left(pos, n);
  • bit_get_right(pos, n);
    // n = 1…8

then you would get something like

value = bit_get_left(pos, a) << b | bit_get_right(pos+a, b);

however for larger than 16 bit elements (e.g. 32 bit) there would be up to 5 bit_get_left/right calls. Keeping it generic comes with a price.

In short, enough to investigate and analyse

I’m very sorry

I made a couple of tests, and a very few i told is good idea

Basicly it looks like functions are too simple to be greatly optimized.
Add a function call (like _bitpos) and it costs ticks
Even my “wonderfull” sequential read uses more ticks because of use of static vars.

The only thing i found with interesting result is to add one line in both get() and set().

uint32_t BitArray::get(const uint16_t idx)
{
    // if (_error != BA_OK) return BA_ERR;
    // if (idx >= _size) return BA_IDX_RANGE;
    if (_bits == 1) return _bitget(idx); //bricoleau
    uint32_t v = 0;
    uint16_t pos = idx * _bits;
    for (byte i = _bits; i-- > 0;)
    {
        v <<= 1;
        v += _bitget(pos + i);
    }
    return v;
}

uint32_t BitArray::set(const uint16_t idx, uint32_t value)
{
    // if (_error != BA_OK) return BA_ERR;
    // if (idx >= _size) return BA_IDX_RANGE;
    if (_bits == 1) {_bitset(idx, value); return value;} //bricoleau
    uint16_t pos = idx * _bits;
    uint32_t mask = 1UL;
    for (byte i = 0; i < _bits; i++)
    {
        byte v = (value & mask) > 0 ? 1 : 0;
        _bitset(pos + i, v);
        mask <<= 1;
    }
    return value;
}

maybe you could clean up your data class : _size seems useless and i guess _bits could be uint8_t

_bits is the element size and yes 64bit is the max I foresee so making it an uint8_t makes sense
_size can be removed indeed. (need to double check but still)

don’t know if I can refactor those tonight as I am about halfway my analysis how to optimize get/set. Your ideas although they didn’t work were provocative enough to explore some new ways.

Thanks,
Rob

Updated version 0.1.06 on Github - Arduino/libraries/BitArray at master · RobTillaart/Arduino · GitHub

  • refactoring
  • size checking

remarks and comments are still welcome

Updated version 0.1.07 on Github - https://github.com/RobTillaart/Arduino/tree/master/libraries/BitArray

  • made private functions inline to improve performance and decrease footprint.
  • refactored clear() to be faster.
  • updated examples.

remarks and comments are still welcome

Updated version 0.1.08 on Github - Arduino/libraries/BitArray at master · RobTillaart/Arduino · GitHub

  • added toggle()

remarks and comments are welcome