I'm using a Yun as a simple controller to run low voltage AC lighting on a schedule. It also monitors the current consumption of the lights to verify proper control, and determine whether any bulbs are burned out. The sketch is a dumb I/O controller that receives on/off commands from a Python script, and sends analog readings to the script. All of the intelligence and control is in the Python code. There is also a website to control and monitor the system using the Bottle framework.
The Python code manages the timing and the on/off schedule, and converts the analog readings to Amps and condenses them into average values over a 5 second period. This is written to a SQLlite database, which is growing rapidly due to the high data volume.
I want a way to be able to see that data at high resolution while I'm working on the system and tracking down wiring faults or burned out lights. But I also want to look at the data over longer periods to be able to determine average bulb life and overall system reliability. For the short term analysis, I need the data at high resolution, but for the longer term views the data could be averaged over longer intervals. (When looking at a year's worth of data, having resolution down to 5 seconds is serious overkill!) I needed to have a way of getting high resolution for the short term, yet still cover a long period of time without storing excessive data.
As it turns out, there is a perfect solution: RRDtool. The "RRD" stands for Round Robin Database, which uses a scheme where data is logically organized as fixed size circular buffers: as new data comes in, the oldest data is overwritten. What makes it interesting is that you can have several different archives of the same data, all using different resolutions. So, by storing a single data value every five seconds, RRDtool automatically manages several circular buffers of data, for example:
- Every 5 seconds for a day
- Every minute for a week
- Every hour for a year
- Every day for 20 years
Of course, the resolution and time spans are completely configurable, those are just the values I'm using. It's still probably a lot more data than I really need, but it's far better than storing every sample forever.
I had used RRD in the past, in the context of collectd. That's a nice system, and it lets you collect and merge data from a lot of systems across a network, but it seemed to be overkill for this case. I wanted to figure out how to write to RRD directly without going through collectd. Documentation is out there, but it took me a while to figure out a streamlined method, so I figured I would share the results of my research here.
To get started, install the required software into OpenWRT:
opkg update
opkg install rrdtool pyrrd
This takes a while, as it installs lots of dependencies. It ends up installing:
- rrdtool
- librrd
- libart
- libfreetype
- libbz2
- libpng
- pyrrd
The next step is to create the database. In this case, I have three data sources: an "On" flag (0=off or 1=on) to say whether the lights are on, the current consumption in Amps, and a "Status" flag (0=OK, 1=Warning, 2=Failure) to indicate whether the current consumption is less than expected. For each value, I want to track the minimum, maximum, and average values over the various time periods.
I won't go into the details of the commands to create the database, they are documented at http://www.rrdtool.org. I found the RRDtool Wizard helpful in working out the code to create the table, which I put in a shell script file:
#!/bin/ash
# Create an RRD database
# 3 data sources:
# On: GAUGE, min=0, max=1
# Current: GAUGE, min=0, max=unknown
# Status: GAUGE, min=0, max=2
# 12 total RRAs:
# 4 RRA periods:
# Every 5 seconds for a day = 17280 points, taken every sample
# Every minute for a week = 10080 points, taken every 12 samples
# Every hour for a year = 8760 points, taken every 720 samples
# Every day for 2 decades = 7300 points, taken every 17280 samples
# 3 RRA CFs:
# MIN
# MAX
# AVERAGE
rrdtool create /mnt/sda1/lightingController/data/statistics.rrd \
--step '5' \
'DS:On:GAUGE:15:0:1' \
'DS:Current:GAUGE:15:0:U' \
'DS:Status:GAUGE:15:0:2' \
'RRA:AVERAGE:0.5:1:17280' \
'RRA:MIN:0.5:1:17280' \
'RRA:MAX:0.5:1:17280' \
'RRA:AVERAGE:0.5:12:10080' \
'RRA:MIN:0.5:12:10080' \
'RRA:MAX:0.5:12:10080' \
'RRA:AVERAGE:0.5:720:8760' \
'RRA:MIN:0.5:720:8760' \
'RRA:MAX:0.5:720:8760' \
'RRA:AVERAGE:0.5:17280:7300' \
'RRA:MIN:0.5:17280:7300' \
'RRA:MAX:0.5:17280:7300'
With the database created, it's time to start adding data. As mentioned in the beginning, I already had a Python script that was getting data from the sketch, and averaging the data into five second chunks. All that's needed at this point is to feed the data to RRD instead of SQLlite.
I'm using PyRRD to neatly wrap up the interface to the RRD file. Most of the examples I found are self-contained in that they create the database, stuff some data into it, and then plot it. That's nice to have it all in one place, but I don't want to re-create the database every time the script starts - I want it to keep appending data to the existing database. I already have the database created in by the above RRDtool commands, I didn't bother to re-create that same code using the PyRRD methods. So I needed to figure out how to simply append data to an existing database. As it turns out, it's quite simple, but you have to read the source code to find the answers. "Use the Source, Luke!"
The first step is to simply open the existing database:
from pyrrd.rrd import RRD
rrd = RRD('/mnt/sda1/lightingController/data/statistics.rrd')
The PyRRD objects are smart enough to read the structure from the database file and figure out everything it needs to know without having to go through all of the work of mirroring the structure in the Python code. Very nice. With the database open, all that's needed is to write data to the database. A series of values can be appended, and then committed to database at once, and that's handy if you're reading in a batch of samples. But in this case, with the data being generated live, I just wanted to append one sample at a time and commit the data at each sample. I created a simple function to do so:
# Write a sample to the RRD database
#
# Parameters:
# rrd - The RRD object to receive the data, assumed to already be open
# on - Boolean flag indicating whether the light is on
# current - Float value representing the current consumption, in Amps
# status - Flag to indicate light status, 0=OK, 1=Warning, 2=Failure
def updateRRD(rrd, on, current, status):
# The bufferValue() parameter order is the same as in the RRD file definition
rrd.bufferValue(time.time(), on, current, status)
rrd.update()
Putting it all together, the general form of the Python script is:
#!/usr/bin/python
import time
from pyrrd.rrd import RRD
def updateRRD(rrd, on, current, status):
rrd.bufferValue(time.time(), on, current, status)
rrd.update()
# Open the existing RRD file
rrd = RRD('/mnt/sda1/lightingController/data/statistics.rrd')
# Repeat forever
while (True):
# Collect data for a period of five seconds, and update the variables
# Just using dummy assignments here, since the actual code is
# project specific and out of scope of this post
on = 1
current = 2.34
status = 0
# Update the RRD file
updateRRD(rrd, on, current, status)
This is now getting the data into my RRD file. A very simple way to verify data is going in is to use the lastupdate command from RRDtool, which prints out the last update time (seconds since Unix epoch of 1-Jan-1970) plus the last stored values:
rrdtool lastupdate /mnt/sda1/lightingController/data/statistics.rrd
The next step is to let this run for a while and collect some data, and then experiment with ways to plot the data by showing the graphs on the website managed by the Bottle framework application. This is still a work in progress, more to come...