This is a horribly broad question, but I'm stuck.
I'm successfully logging a ton (10K+ samples per session)
of temperature data, and it's hard to manage.
Manual observation of the numbers seems to indicate that the data's
all valid, but grinding through it byte- by- byte isn't time effective at all!
For background- http://forum.arduino.cc//index.php?topic=89926.0
I'm trying to find a PC- based program that will let me window,
average, smooth, build comparison channels, etc for 'large' volumes
of data. Excel can do it- but it really chokes on bigger chunks, and it
can't do any useful condensation and hiding of 'uninteresting' data.
Statistical analysis is what I'm really interested in- as in, 'What's the correlation
between tire temperature and cornering loads?' for example.
Examples of operations would be correlating temperature data to
lateral load data. Smoothing load data, Windowing out (or highlighting)
out- of- range events and data. Hiding long, 'boring' straights...
I'm not picky- I'm looking for direction as much as specific answers.
I'm a novice coder, so doing this myself would be... painful. Existing
commercial products all seem to be tied to hardware- and hardware that
doesn't do what I need it to do (thus, the Arduino logger)
You know what the odd thing is? I've used computer for, like, 100 years now (well maybe not quite), but if anything the software, like spreadsheets, are getting worse, not better.
I loaded a few thousand lines of temperature data into a spreadsheet program and it just froze for a minute or so. Things should work better than this in 2013. You could do better than that in 1980.
All I can suggest is learn Lua. With that you can process text files very quickly. You should be able to handle your data, average it, total it, whatever, if half a minute.
A spreadsheet would be the easiest way, but all the ones I've seen scale badly and choke at 65k lines. I don't understand why a system with tens of gigabytes of memory and multiple cores running at gigahertz would have any problem dealing with a couple of meg of data, but it seems to me that my computer struggles just putting menus and windows on the screen let alone doing anything useful. Could all these impressive hardware specs be one big scam perpetuated by the manufacturers?
My solution has been to dump the data into a database and run a query to select the result set of interest based on whatever complex criteria you end up with, and then summarize that (in the database query) to return a result set small enough for the spreadsheet to cope with. In other words, the spreadsheet becomes just a tool to put a few thousand dots on a chart and the analysis/mining is all done in your head and implemented as SQL queries.
There must be a better way to do it, I'm sure, but there never seems to be time to go look for it.
Matlab is absolutely terrific for analyzing large data sets and the authors have built in just about every analysis function that you can imagine. I've used it for years and find both the help function and the Matlab user forums to be extremely helpful. It is expensive, though.
The free alternative, Scilab, seems perfectly functional but there are minor differences in notation, so routines written for Matlab generally need some editing before they will run.
The spreadsheet route couldn't do more than one lap's worth of data in a timely fashion,
and one of the major things I was hoping to find was lap- to- lap variances in the same
parts of the track. And trying to paste large quantities of functions? yech...
Thanks for all the suggestions- I'll go digging now.
I'd come across Matlab, but not the free version,
and that Python book looks interesting, too.
Lua? Off I go!
I haven't worked with it much but it does perform most of the basic operations of Matlab in the same way. It does take a bit to get used to everything being treated as a matrix. The basic data operations (statistical analysis, transforms, plotting, etc. of extremely large data sets) are incredibly simple to perform.