Pages: [1]   Go Down
Author Topic: A sane way to store timestamp & reduntant values in datalogging?  (Read 1124 times)
0 Members and 1 Guest are viewing this topic.
Offline Offline
Newbie
*
Karma: 0
Posts: 40
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I'm doing the same kind of temp&humidity datalogger project that 7 billion other people in Arduino community are also making. I got multiple RF nodes transmitting the data about once a minute to one receiver unit, which processes them and sends to internet through ENC28j60.

I've been figuring out the best way to store timestamp with each value. This value will be stored to EEPROM, because I'm seriously not able to get the SD cards to work in conjunction with the Ethernet module. I already killed 2x 2Gb cards and I'm pretty much done with that. I don't use combined shields so I need to work with combining modules and code examples.

This is what I came up with:
1. First 4 bytes, store the absolute UNIX time value when the logging was launched for the first time

2. Store temp value(-128;128)

3. Store the number of times it's being repeated

4. Repeat 2-3, and get time for each value by adding the number of repeats together.

24 x 6 = 24°c for 0-6 minutes
25 x 4 = 25°c for 6-10 minutes
27 x 3 = 27°c for 10-13 minutes
29 x 1 = 29°c for 13-14 minutes
etc...

Is this in any way a sane method to store timestamps for values, of which I will most likely have lots of repeating? I need to keep track of each repetion value to know the time for each temperature value. I need to read alot from the EEPROM, if I want to know when the 6597th temperature value was taking place.
Another way would be to store 4-byte time value after each temperature, but that would take a lot of memory.

When the memory is filled, everything will be reset.
« Last Edit: December 18, 2013, 02:47:48 pm by Tuppe » Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 217
Posts: 13717
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

you only need to add the index to your records, that index is the cumulative measurement:
memory could look like :

{unixtime} // starttime,
{ index , temp, freq } { index , temp, freq } { index , temp, freq } { index , temp, freq } .... { index , temp, freq }  (10x)
{ index , temp, freq } { index , temp, freq } { index , temp, freq } { index , temp, freq } .... { index , temp, freq }  (10x)
{ index , temp, freq } { index , temp, freq } { index , temp, freq } { index , temp, freq } .... { index , temp, freq }  (10x)
{ index , temp, freq } { index , temp, freq } { index , temp, freq } { index , temp, freq } .... { index , temp, freq }  (10x)

index - 2bytes ,
temp - 1 byte
freq - 1 bytes
-----------------
total - 4 bytes / record

your sample looks like
0, 14, 6
6, 25, 4
10, 27, 3
13, 29, 1
14, 28, 5
19, 27, 8
...
if I look for sample 15, I can do a (binary) search on the indexes which are all on addresses that are easy to calculate.

Code: (linear search)
byte find(searchIndex)
{
  for (int i=0; i< 1000; i++)
  {
    index = readEEPROM(4 + i*4) + readEEPROM(4 + i*4 + 1) ; // index has 2 bytes
    if (index == searchIndex) return readEEPROM(4 + i*4 + 2); // location of temperature
  }
}

If you want you can to add humidity, but compression will be a bit less
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Offline Offline
Newbie
*
Karma: 0
Posts: 40
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

But the index could be easily calculated using the frequency value. If I use EEPROMs, the memory space is way more crucial than the search time.
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 217
Posts: 13717
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

what you can do to minimize space is "glitch reduction". - outliers with freq 1  can be 'smoothed in".
and merge of run-lengths that pop up after  "glitch reduction"

24 x 6  
25 x 1   <<< absorb the single 25 by the two 24's   (this gives an error of -1)
24 x 2  
29 x 1  <<< this can become either 24 or 25  (setting it to 25 gives the least error  -4)
25 x 4
27 x 3
26 x 1  <<<< next candidate an outlier to below. (becomes 27   error +1)
27 x 4
26 x 1  <<<<< (as the cumulative error is -4, we better smooth it to 27 so the cumulative error becomes -3
25 x 3
(20 bytes)
....
==>

24 x 9  
25 x 5
27 x 9
25 x 3
(8 bytes)
...
The algorithm is not very difficult, you can have several strategies,
you need to hold max 3 records in RAM.
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Offline Offline
Newbie
*
Karma: 0
Posts: 40
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Cool idea, but this would render rise and declining sections(when value continuously changes, thus every minute different reading) invisible.
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 217
Posts: 13717
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

yes, it is the difference between lossless compression and lossy compression.

There can other ways to improve run-length compression, without data loss , but they involve knowing a bit about the data.
 

Method 1:
Assumption: The temperature is always between 10 and 40. That can be encoded in 5 bits 0..31 this leaves 3 bits for runlength .
The runlength 0..7 can represent 1..8 (as 0 is not needed).

[010 01100] = 3x (12+10) = 3x22C

this way you can encode temperature in one byte. However if the runlength is larger than 8 you have a problem.
But that can be solved in the following way.
We offer the runlength 8 = [111 xxxxx] and give it the meaning, He I am a runlength that is not 8 but the next byte is the real runlength (original model).

example :
[010 01100] [010 01101] [111 01100] [0001 0000] [010 01101]
   3 x 22      2 x 23      !!  22    (16 x)       3 x 23


Get the idea?


Method 2:
If the range of temperatures is larger than 32 C you can do delta encoding.
Do not store the temperature, but the difference between the current and the previous value. With 4 bits you can encode jumps from -7 to +8  (OK zero would not be needed) This will be acceptable in most applications. (otherwise take 5 bits for the delta, like above)
This leaves 4 bits for the run length 1..14
- 0, has the special meaning, that the next byte is an absolute temperature
- 15, has the special meaning that the next byte is the actual run length

[ 0000 5 ] [32]  abs temp = 32 (5 x) - the place where normally is  the temp now a run length is coded
[ 2 1 ]  2x 33
[ 4 2 ]  4x 35 
[16 -1 ] [120] NEXT BYTE = RL. ==> 120 x 34   
[3 -1 ] 3x 33
[ 0 4 ] [24] new abs temp = 24 (4x)

The compression and decompression will become a bit more complex but very doable.
By placing an absolute marker every 20 readings you can even recover most of the data in a corrupted file,
as these abs temperatures give a new reference point.
if only one in 20 records need a second byte this compresses the RLE  by another ~45%. (at the expense of bigger code!)

Your turn to implement the above ideas smiley-wink
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Offline Offline
Newbie
*
Karma: 0
Posts: 40
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I like the delta encoding idea, but the both methods could cause a serious problem if the system gets unsynced or misintreprets some byte. I reckon that in practise such asymmetric system would be very hard to debug. I also would like to have the system flexible, so I could use the same code for other use in wide range of different coditins.

By the actual run length, did you mean that I'd store the actual accumulated value in specific intervals? E.g. check memory adress 100,200,300... for run length values, until I find the last one.
This could be used to improve the search times, if that becomes a problem when accumulating run length through 250 000 values.
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 217
Posts: 13717
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I like the delta encoding idea, but the both methods could cause a serious problem if the system gets unsynced or misintreprets some byte. I reckon that in practise such asymmetric system would be very hard to debug. I also would like to have the system flexible, so I could use the same code for other use in wide range of different coditins.
That is the price of compression (and solved by posting some reference every 10,20,60 measurements)

Quote
By the actual run length, did you mean that I'd store the actual accumulated value in specific intervals? E.g. check memory adress 100,200,300... for run length values, until I find the last one.
This could be used to improve the search times, if that becomes a problem when accumulating run length through 250 000 values.
think you are inspired enough to start some prototyping smiley-wink
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Pages: [1]   Go Up
Jump to: