LowLatencyLogger binToCsv conversion using Python

Hi!
I have been using the excellent LowLatencyLogger from Sdfat lib to log some sensor data. So far it has worked great but converting the BIN file to CSV in the Arduino takes a long time (about two hours for the +3million samples I gather each time, using the binToCsv() function)
So I am trying to come up with a Python script that does the same inside the computer. This would be nice because it could allow me to integrate said script into a larger one that stores the CSV into a timeseries database(InfluxDB). However I have not had much luck decoding the file and maybe you guys can point me in the right direction here:

I am using Python 3.4.0 on a mac running osx 10.11.6. Python newbie here. So far I am able to open the file and print the first line(no use in printing the whole file because its huge) just to see how it looks like but I am not able to figure out the encoding.

with open("filename", "rb") as binary_file:
	byte = binary_file.readline()
	print(byte)

prints:

b'\x07\x00\x00\x00\xc3\x84SU\x00\x1a\x00\x00\x00\x03\x00\x00\x00\xc3\xa1\x07\x00\x00\x11\x00\x00\x00"\x00\x00\x000\x00\x00\x00\xc3\xa6\xc3\xbe\xc3\xbf\xc3\xbfH\xc3\xb7\xc3\xbf\xc3\xbf\xe2\x82\xac=\x00\x00\xc2\x99\x13\x00\x00\xc3\xa7\xc3\xb2\xc3\xbf\xc3\xbf=\x14\x00\x00h\xc3\xbe\xc3\xbf\xc3\xbf{\xc3\xbf\xc3\xbf\xc3\xbf#\x00\x00\x00\xc3\xb8\xc3\xbf\xc3\xbf\xc3\xbf\xc3\x94zU\x00\x1a\x00\x00\x00\x03\x00\x00\x00\xc3\xa1\x07\x00\x00\x11\x00\x00\x00"\x00\x00\x000\x00\x00\x00\xc3\x85\xc3\xbe\xc3\xbf\xc3\xbf\x1f\xc3\xb7\xc3\xbf\xc3\xbf\xc2\x82=\x00\x00\xc2\x99\x13\x00\x00\xc3\xa7\xc3\xb2\xc3\xbf\xc3\xbf=\x14\x00\x00J\xc3\xbe\xc3\xbf\xc3\xbf\xc2\x8d\xc3\xbf\xc3\xbf\xc3\xbf#\x00\x00\x00\xc3\xb8\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xa4\xc2\xa1U\x00\x1a\x00\x00\x00\x03\x00\x00\x00\xc3\xa1\x07\x00\x00\x11\x00\x00\x00"\x00\x00\x000\x00\x00\x00\xc3\x80\xc3\xbe\xc3\xbf\xc3\xbf:\xc3\xb7\xc3\xbf\xc3\xbf==\x00\x00\xc2\x9a\x13\x00\x00\xc3\xab\xc3\xb2\xc3\xbf\xc3\xbfE\x14\x00\x00K\xc3\xbe\xc3\xbf\xc3\xbf\xc2\x96\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xb9\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xb8\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xb6\xc3\x88U\x00\x1a\x00\x00\x00\x03\x00\x00\x00\xc3\xa1\x07\x00\x00\x11\x00\x00\x00"\x00\x00\x000\x00\x00\x00\xc3\xb0\xc3\xbe\xc3\xbf\xc3\xbf\x1d\xc3\xb7\xc3\xbf\xc3\xbf\xc2\x81=\x00\x00\xc2\x9a\x13\x00\x00\xc3\xab\xc3\xb2\xc3\xbf\xc3\xbfE\x14\x00\x00p\xc3\xbe\xc3\xbf\xc3\xbf}\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xa7\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xb8\xc3\xbf\xc3\xbf\xc3\xbf\x06\xc3\xb0U\x00\x1a\x00\x00\x00\x03\x00\x00\x00\xc3\xa1\x07\x00\x00\x11\x00\x00\x00"\x00\x00\x000\x00\x00\x00\xc3\xa4\xc3\xbe\xc3\xbf\xc3\xbf\x13\xc3\xb7\xc3\xbf\xc3\xbf\xc2\x99=\x00\x00\xc2\x9f\x13\x00\x00\xc3\x9c\xc3\xb2\xc3\xbf\xc3\xbfQ\x14\x00\x00\\\xc3\xbe\xc3\xbf\xc3\xbfi\xc3\xbf\xc3\xbf\xc3\xbf\xc3\x87\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xb9\xc3\xbf\xc3\xbf\xc3\xbf\x16\x17V\x00\x1a\x00\x00\x00\x03\x00\x00\x00\xc3\xa1\x07\x00\x00\x11\x00\x00\x00"\x00\x00\x000\x00\x00\x00\xc3\x9d\xc3\xbe\xc3\xbf\xc3\xbf5\xc3\xb7\xc3\xbf\xc3\xbf\xc2\x94=\x00\x00\xc2\x9f\x13\x00\x00\xc3\x9c\xc3\xb2\xc3\xbf\xc3\xbfQ\x14\x00\x00s\xc3\xbe\xc3\xbf\xc3\xbf{\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xa5\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xb9\xc3\xbf\xc3\xbf\xc3\xbf&>V\x00\x1a\x00\x00\x00\x03\x00\x00\x00\xc3\xa1\x07\x00\x00\x11\x00\x00\x00"\x00\x00\x000\x00\x00\x00\xc3\x80\xc3\xbe\xc3\xbf\xc3\xbf\x19\xc3\xb7\xc3\xbf\xc3\xbfa=\x00\x00\xc2\x97\x13\x00\x00\xc3\x9e\xc3\xb2\xc3\xbf\xc3\xbf^\x14\x00\x00f\xc3\xbe\xc3\xbf\xc3\xbf\xc2\x91\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xb2\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xba\xc3\xbf\xc3\xbf\xc3\xbfDATA03  BIN\x00\x18\x00\x00\x08!(!(\x00\x00\x00\x08!(\xc2\xbb\x03\x00\n'

Where the first line in the csv file should be:

Sampling interval: 10000 microseconds

Using codecs.decode only return garbled text using the ASCII option.

I am attaching sample files here in .bin and .csv format(converted inside the microcontroller)
Any clues?

Thanks!
(and thanks fat16lib for the incredibly useful library!)

sample_data.zip (9.8 KB)

	byte = binary_file.readline()
	print(byte)

I am certainly not a python expert, but that appears to be rubbish code, to me.

I can't imagine that the function to read one byte from a file would be called readline(). I'd expect a function called readline() to read a complete line. Of course, the definition of "a complete line" is rather nebulous when it comes to reading data from a binary file.

I can not tell what type byte is, but it certainly does not LOOK like something that can hold more than one byte.

There is NOTHING that that code does that converts the byte(s) to (a) character(s).

I am certainly not a python expert, but that appears to be rubbish code, to me.

probably! :slight_smile:

There is NOTHING that that code does that converts the byte(s) to (a) character(s).

You are right. The code above only tries to read the first line of the file(until the first '\n') and prints what it can read. That variable shouldn't be called byte, my bad. I just tried using the chardet module to try and identify the encoding with the following code:

with open("filename", "rb") as binary_file:
 line = binary_file.readline()
 chardet.detect(line)

Which returned:

{'encoding': 'utf-8', 'confidence': 0.99}

Not quite sure where to go from here. Will try using codecs to see if I can convert each line.

Are you using Python to open a file on an SD Card that has been written by an Arduino?

If so, you must know the precise format in which the data was written to the file - please tell us.

Python should be able to do what you want easily - but it is essential to have the data spec.

...R

Are you using Python to open a file on an SD Card that has been written by an Arduino?

The file has already been stored on the computer. But yes, it was written by Arduino + Sdfat lib(LowLatencyLogger.ino)

If so, you must know the precise format in which the data was written to the file - please tell us.

Once converted to CSV each data point should look like this:

1550004,26,3,2017,17,34,50,-332,-2248,15765,5014,-3359,5201,-395,-146,12,-7

17 columns in total. As for the encoding I'm not entirely sure yet. Unfortunately the code that converts the binary file to CSV in LowLatencyLogger is way above my coding level to understand.

I know fat16lib provides a bintocsv.exe file but it only seems to work with examples generated by AnalogBinLogger.ino

thank you all!

Sorry. I'm lazy. It would take me a long time to figure out how that logger program is saving data. It looks like it saves the value of micros() followed by 4 ADC readings. But your example above has far more than 4 readings, and it has negative values which won't be produced by the ADC.

If it helps any, the value of micros() will take up 4 bytes and the value of each ADC reading seems to be stored as a 2-byte value.

I can't see any evidence that the logger uses a Linefeed character to mark the end of anything. My guess is that you need to know how many data entries there are and just count from the start of the file. Suppose there is a 4-byte entry for micros() followed by 4 2-byte data values then the next entry for micros() would be in byte number 12 (counting from 0).

...R

I have made some progress. Thanks Robin for the pointers.

Taking this sample row of data:(I modified it a little from the previous example)
(Internally, the arduino software stores each of those variables as longs)

30,3,2017,14,0,1,-1332,-379,15886,6908,-1023,4724,-345,-73,-101,45

So far the only method that has given me any results is to use struct.unpack

import struct
with open("filename", "rb") as binary_file:
    binary_file.seek(0)
    bytes = binary_file.read(116)
    h = struct.unpack('icicicicicicicicicicicicicici', bytes)
    print(h)

Where 'icicicicicicicicicicicicicici' is the supposed format of the data
Which yields:

(7, b'\x00', 30, b'\x03', 2017, b'\x0e', 0, b'\x01', -1332, b'\x85', 15886, b'\xfc', -1023, b't', -345, b'\xb7', -101, b'-', 536871701, b'\x1e', 3, b'\xe1', 14, b'\x00', 1, b'\x7f', -338, b'/', 6908)

Which is a start, most(not all) of the numbers that I need are there but in a different order.
I tried using "l"(for long) instead of "i" (for integer) as format string for struct.unpack but it does not return any usable numbers.

will continue my search for binary truth..
please forgive my n00bness

I had forgotten about unpack. It makes life very easy. But why are you using 'c' when you say all the data is in longs. Just try 'i' or 'l'

I doubt very much that the logger puts commas (or anything else) between the values.

...R

Yes, unpack is very friendly. I thought the commas where added as chars in the binary file, turns out that was not the case, as you pointed out. Anyway, I have a working version of the code. Took me a while to figure out.
The script below will get the file size and number of 512 bytes blocks to process, then will iterate accordingly over each block, data row and individual values. I have tested it with some larger files(150mb) and is definitely faster than doing the conversion on the Arduino. Hope it is useful for someone. Could definitely be improved…

#Modify variables to suit your particular CSV structure

import struct

in_file = "example00.bin"
out_file = "OUT00.csv"

block_len = 512     #Block length
lines_per_block = 7
line_len = 68       #Length of each data row
h = 8               #Length of line header 
col_per_line = 16   #No. of columns 
fmt = 'i'           #Datatype
data_size = struct.calcsize(fmt)

with open(in_file, "rb") as binary_file:
    f = open(out_file, 'a')
    binary_file.seek(0,2)
    fsize = binary_file.tell()
    blocks = fsize/block_len
    for x in range(int(blocks)):    
        for y in range(lines_per_block):
            binary_file.seek((y*line_len)+(x*block_len)+h)
            for z in range(col_per_line):
                data_tup = struct.unpack(fmt, binary_file.read(data_size))
                data = str(data_tup[0])
                f.write(data)
                if z < col_per_line - 1:
                    f.write(",")
            f.write("\n")
    f.close()

cheers

Thanks for the feedback. Good to hear it is working.

...R

I also needed something like this, and I'm sure I've seen some Python code on GitHub somewhere to do what you were after. When looking, I found this instead:

https://github.com/greiman/SdFat/tree/master/AnalogBinLoggerExtras/bintocsv

If you're on Windows, you can just download the bintocsv.exe and run it from a command prompt to convert .bin to .csv quickly. I know you've already solved the problem, but hopefully this will be useful to others.

Thanks MarkHanlon. I didnt try that .exe because(I'm not on windows, and) I read that file only works for converting bin files made by the AnalogBinLogger.ino.
cheers