Database advice?

I have a Yun based project that has been working well for me, but I want to take it to the next step.

The current version uses a barcode scanner, and a network port connection to a PC running a C# app. When a code is scanned, the Yun sends that code to the PC, and the C# app looks up the data from a simple database, displays it on the PC's screen, and does some simple updates to the database. The Yun is basically acting as a semi-intelligent networked barcode scanner.

In the new iteration, I'm adding an LCD screen to the Yun, and I would like to move the database onto the Linux side of the Yun. I will still use the C# application to generate the data for the database, then will download the data to the Yun, and then the Yun will continue to operate on a stand-alone basis: when a barcode is scanned, a record is retrieved from the database and displayed on the LCD, and some simple updates are made to the database.

I've got a good grasp on the hardware, and have no issues with the general Yun programming, displays, scanner, etc. But I've not done anything database related on the Yun, and my C# database is limited to using the C# data-aware controls and simple database/datatable objects that process the data in memory and store the data in an XML file between runs.

I guess I need some sort of database manager on the Linux side of the Yun, and I will most likely access it from Python. I envision updates being triggered by the sketch when a barcode is scanned: it will call a Process object with the barcode data as a parameter. The Process will run a Python script to search the database, fetch, test, and update the data, and then return display data back to the sketch which will display it on the LCD. I envision all of the heavy lifting to be done in Python.

I think the database requirements are rather minor:

  • Less than 500 records
  • About a dozen fields per record
  • Updates will involve setting time stamps in a couple fields
  • The most complicated process will probably be getting the sum of a numeric field across all records that do not have a timestamp in a certain field
  • Most access to be from local Python, but will also need to be accessible from the C# app over the network (limited to uploading/downloading the entire database.)

So, the questions I throw out to the group:

  • What would be an appropriate database/framework for such an application?
  • What is a good source of beginner's guide to that database?
  • What is the easiest way to transfer mass data to/from the database and C# application?

I am an experienced professional embedded programmer and software developer, but I have little to no database experience (especially on limited embedded platforms like the Yun.) I'm sure I can figure this out, but I need a little guidance on the best direction to head. There are so many choices out there, I'm overwhelmed on what to use, and I want to do this right the first time: and not try out half a dozen technologies until I find the one that works best.

Upon further investigation, it looks like SQLite and the Python interface should work nicely for me. If anyone knows of a better solution, please speak up.

There seems to be reasonable introductory material at the SQLite and Python sqlite3 sites. If you know of a great SQLite/Python tutorial, please give a link. Getting the local Yun side of things to work shouldn't be too difficult.

I'm envisioning a small set of "operations" each if which are implemented in their own Python script. Most will take one parameter (a scanned barcode value) but some will take none. They will perform the simple query/update operation, and output a character stream to display on the LCD to show the result. They will be called by the sketch using a Process object with the name of the script (the operation) and parameter (if needed.) The sketch will then read the output of the Process object and send that output to the LCD.

Does that sound like a reasonable architecture? Got a better suggestion?

For the occasional access from the C# application, most of it will be the exact same set of queries/updates. So I figure it can be done with some sort of remote call from the app to the Yun that ends up calling the same Python scripts. Perhaps some sort of REST API? Got a suggested architecture?

That leaves just the total database update, involving downloading a complete new database from the C# app to the Yun. It will be trivial for the C# app to write the required data in CSV format. Do I then run sqlite3 on the PC to create the DB and import the CSV, then move that file to the Yun? Or send the CSV file to the Yun, and run sqlite3 there to import the data? Or something else? Suggestions, anyone?

Or do I look into the system.data.sqlite.org ADO inteface and have the C# app query the data directly? Anybody use this? (Remotely calling the same Python scripts sounds easier and safer - and no need to write the same query twice...)

I agree with Python and SQlite. The PeeWee ORM makes life very easy.

...R

Robin2:
I agree with Python and SQlite. The PeeWee ORM makes life very easy.

...R

Im some hours late. Was gonna suggest the same, Peewee is a very good option for using sqlite in python.

Thanks, I had not seen PeeWee before. It looks interesting, and I can see where it could be powerful with object oriented data and complex relations. But I think my database will be very simple: a single table, with a simple unique integer key, and very few fields. I won't need any relationships to other tables or records, and the updates I do will be very simple (add a couple timestamps to track when certain updates were made.)

I wonder if PeeWee is necessary? For such a simple application, it seems to be adding an extra layer of complexity? What am I missing?

I don't know why people always need to throw SQL at such a simple/small database requirement. I guess "if the only tool you know is a hammer, everything looks like a nail"...

I avoid programming in Python myself, but have a look at http://liw.fi/larch/ for what looks like a simple Btree database in Python.

Ralf

ShapeShifter:
I wonder if PeeWee is necessary? For such a simple application, it seems to be adding an extra layer of complexity? What am I missing?

I find it tedious to create SQL queries by hand. I like the fact that Peewee does that stuff for me.
(And I am well aware that this is diametrically opposite to my frequently stated aversion to the use of libraries :slight_smile:

I think my interest in Peewee and Bottle stems from the fact that I first encountered web programming via Ruby on Rails.

As PCWorxLA has said, it may not be necessary even to use SQLite - but, again, I find that it makes life easier for lazy me.

...R

PCWorxLA:
I don't know why people always need to throw SQL at such a simple/small database requirement. I guess "if the only tool you know is a hammer, everything looks like a nail"...

I'm open to suggestions, which is the reason for the thread. Your quote really doesn't apply here, since I have virtually no practical experience with SQL. I also have very little Python experience, but it seems like a popular thing to learn (90% of what I do is C, with a smattering of C++ and C# thrown in (so Arduino cones naturally,which is good for a pastime activity). While used a lot in the past, I haven't done any Ada, Jovial, FORTRAN, APL, Pascal, and various assemblers for years now. Be done a smattering of HTML and
PHP, but no real web page programming.) For data storage, I usually write my own "file" managers that access serial EEPROM chips directly: having an SD card with a real file system on it, and even standard fopen()/fread()/fwrite() calls is a supreme luxury that I've only recently been able to include in my embedded systems.

I'm not opposed to doing things in a minimalist way, or doing things the hard but efficient way. I always want to learn and be exposed to new things, but in this case, I'm doing it for fun, and I don't want to make a career out of it, I'd rather get it working and spend more time making money. To that end, one of the statements on the larch home page concerns me: "Documentation is sparse. Docstrings and reading the code are your best hope." Taking a very quick look at the code, it appears to be something that you have to sit down and analyze, rather than lightly read (comments and explanation appear to be as sparse as the documentation.) I've had to resort to "Use the source, Luke" more times than I can remember, and while it does usually provide the definitive answer to a question, I find it tedious if it's the primary documentation source (poor pun intended...)

SQLite may very well be overkill for this application, but there appears to be a lot of documentation and examples out there. That's worth more to me than elegance or efficiency.

One thing I'm not finding quickly in the larch pages is how it handles concurrent access: I will have up to three independent processes accessing/updating the data simultaneously. While the update rate is low, and the odds of simultaneous read/write access is small, it is not zero,so it will probably happen eventually. Does larch provide any protection for this? Or is it something I will have to work out on my own?

Yes, I know I'm adding more requirements and restrictions than I originally stated. I'm in the initial brainstorming phase of what will be a major update to my system. I haven't hashed out all of the details yet, and I'm still working on the system requirements.

Robin2:
As PCWorxLA has said, it may not be necessary even to use SQLite - but, again, I find that it makes life easier for lazy me.

I admit it, I'm lazy, and I like easy. That's what is drawing me toward SQL: the basic select/update syntax is clear and simple, and that's all I need in this case. I can see the benefit of peewee in more complex situations where joins and sophisticated updates are needed. But in this case, I think my queries are going to be on the order of "select * where ID is 123" and "update Time to now where ID is 123" (yes, I know that's not real syntax, consider it pseudoSQL.)

So I guess my overriding concern is simplicity, documentation, and examples rather than elegance and efficiency. Most of this is new to me (OpenWRT, Python, SQL, etc.) so I'm thinking that the fewer new layers to add the more documentation/examples that are available, the better.

While it seems like I'm shooting down anything that isn't my pre-conceived selections, that isn't the case, I truly appreciate all comments and suggestions. But while I'm looking at this as a learning experience, I want to make it easy and spend a minimum amount of time learning. I'm open to the simplest solutions that involve minimum time spent researching. I'm already spending too much time on this when I should be working and making money (I bill by the hour, and every hour I spend on this is an hour where I can't bill y time to a customer...)

I think with anything new there will be time "wasted" on the learning curve. The most frustrating thing is the time taken to figure out what you should learn.

My view (for what it's worth) is that learning SQLite will give you a significant resource for the future. Unlike some of the Arduino libraries it does work and is well documented.

And, because SQlite is not restricted to Python, it can act as a very convenient interface between different programming languages, and between different programs written in different languages.

I also suggest you spend 30 to 60 minutes looking at Peewee. I think it would be a quicker way into SQLite with Python.

I also use a GUI program called Sqliteman which I find very useful for checking the content or structure of a database without writing code.

...R

Yes, there is always a learning curve. Sometimes, I take on a project just to learn a new technique, and the leaning curve IS the focus of the project. This is not one of those times, so I don't want to have to "learn" more than necessary on this one. :wink:

Thanks Robin, your thoughts/suggestions are very close to mine, particularly the applicability of SQLite for other languages/projects and as a valuable tool to know for future projects.

I will take a look a Peewee, and give it a read.

ShapeShifter:
I'm open to suggestions, which is the reason for the thread. Your quote really doesn't apply here, since I have virtually no practical experience with SQL. I also have very little Python experience, but it seems like a popular thing to learn (90% of what I do is C, with a smattering of C++ and C# thrown in (sonArduino cones naturally,which is good for a pastime activity).

Well, what your choice of languages is is kind of irrelevant, and the basic issue of my reply still stands, throwing SQL at an issue that you describe as so simple is overkill IMHO, regardless of the language used.
I personally loath Python, for various reasons, so I did not know of a better recommendation (in order to avoid using SQL) than the one I posted.

Ralf

PCWorxLA:
I personally loath Python, for various reasons

I don't have a lot of exposure to it so far, and from the little I've done it doesn't seem like it will ever become my go-to language. But, it is widely available, and widely used, and I can see some benefit in having at least a passing familiarity with it, so it's worth at least a little learning effort. (Yes, I know, I did just say that the point of this project wasn't about the learning...)

I will keep a note about larch, and look into it when I have a bit more "learning curve" time to apply to a project. Thank you for the tip. (This forum needs a thumbs-up smiley...)

PCWorxLA:
I personally loath Python,

Just to put things in context, what is your preferred language?

ShapeShifter:
But, it is widely available, and widely used,

I prefer Ruby but I abandoned it in favour of Python precisely for these reasons.

...R

Robin2:
Just to put things in context, what is your preferred language?

Preferred? Pascal.
And using ANSI C if (Free)Pascal is not available on a given platform...

I prefer Ruby but I abandoned it in favour of Python precisely for these reasons.

Can't stand Ruby either, another one of those "look I am a cool guy" kind of programming languages, that are more self-serving than providing any advantage over anything already existing.

Python is a bit better, but then a programming language in which whitespace/indentation is significant is IMHO the mark in terms of making it easy to use.
That it is in it's basic implementation an interpreter, but what really irks me is that it the dynamic typing (something that von Rossum supposedly wants to change as a feature in 3.0). Which also leads me to the problem that Python 3.x is not backwards compatible, having you rely on libraries/frameworks being available for both general versions, which can be a major PITA if you are working on more than one platform...

Ralf

ShapeShifter:
I have a Yun based project that has been working well for me, but I want to take it to the next step.

The current version uses a barcode scanner, and a network port connection to a PC running a C# app. When a code is scanned, the Yun sends that code to the PC, and the C# app looks up the data from a simple database, displays it on the PC's screen, and does some simple updates to the database. The Yun is basically acting as a semi-intelligent networked barcode scanner.

::::SNIP::::

I think the database requirements are rather minor:

  • Less than 500 records
  • About a dozen fields per record
  • Updates will involve setting time stamps in a couple fields
  • The most complicated process will probably be getting the sum of a numeric field across all records that do not have a timestamp in a certain field
  • Most access to be from local Python, but will also need to be accessible from the C# app over the network (limited to uploading/downloading the entire database.)

So, the questions I throw out to the group:

  • What would be an appropriate database/framework for such an application?
  • What is a good source of beginner's guide to that database?
  • What is the easiest way to transfer mass data to/from the database and C# application?

::::SNIP::::

@ShapeShifter,
I was going to say SQLite, because it is my favorite, but I have to agree with PCWorxLA.
One true factore, not listed, was what is the future direction of this. If the extensions are minor, then I would go with a simpler solution.

I would go with SleepyCat DB. It is a fairly independent group that is now owned by Oracle. In years past, the group supported the Berkely DB, which is a key/value pair system. Over the last 15 years, they have turned an Open Source project into a very reliable system. The software is fairly solid. And as you can see, the driver layer is well supported with C, C++, C#, Python, Rudy, TCL, etc.

Berkeley DB has as lots of tutorials, and if you look, it is widely used to administer Unix and Linux box.

As far as transferring data, CSV is the way to go. Most Databases have a import and export tools.

I imagine you might have questions. Feel Free.
Jesse

jessemonroy650:
One true factore, not listed, was what is the future direction of this. If the extensions are minor, then I would go with a simpler solution.

Who knows? This started out several years ago as a simple C# Windows program to grab registration data from a website as XML data and format/print custom ID badges for an annual event. It has since grown into a generic report formatter and data manager. Last year I added the barcode reader interface to allow faster/easier on-site management, tracking what "premiums" they ordered and whether they have been picked up. This year I'm trying to separate that last function from the C# app so I don't have to have my computer dedicated full time to handling this phase. In subsequent years, who knows what direction it will take?

Thanks for the tip on SleepyCat, I will look into it.

As far as transferring data, CSV is the way to go. Most Databases have a import and export tools

I did a little playing with that last night. Robin has mentioned Bottle a few times n other threads, so I took a look and I have to say my initial impression is positive. (Flask also looked interesting, but Bottle won out due to ease of installation.) In short order I had it serving up a simple form that allowed me to select and upload a CSV file, then accept the POST to save the file and import it into an SQLite database. So I can manually send data to the Yun that way, or I can have the C# make the POST web request directly.

While the explicit goal of the project is to get this done and not be a purely research/learning task, that is a part of it: I will no doubt be doing more projects like this in the future, so it is not a bad idea to learn a technology that has some room for growth, as long as it doesn't overwhelm the Yun. So far, Bottle/SQLite seems to have what I need, and has extra capacity to handle more in the future. But I'm still researching and open to suggestions.

I'm trying to settle on a general architecture. At this point, I'm thinking of putting each major operation in a Python file. That file includes a function that performs the actual operation, and then a block of code at the end that is run if the file is called from the command processor: this code fetches arguments from the command line, and passes them to the function. At the same time, the Bottle app is fielding web requests, and if it gets one for the operation it imports the function from that module file and calls it. That would allow four ways to call the operation's core function:

  • Locally from the SSH command line for testing
  • Locally from a Process object in the sketch when a barcode is scanned
  • Remotely through a simple web interface
  • Remotely as a web request from the main C# app
    Anybody have thoughts on this? It basically gives you central place for the business logic, plus a couple of thin wrappers to call it from different interfaces.

ShapeShifter:
::::SNIP::::

I'm trying to settle on a general architecture. At this point, I'm thinking of putting each major operation in a Python file. That file includes a function that performs the actual operation, and then a block of code at the end that is run if the file is called from the command processor: this code fetches arguments from the command line, and passes them to the function. At the same time, the Bottle app is fielding web requests, and if it gets one for the operation it imports the function from that module file and calls it. That would allow four ways to call the operation's core function:

  • Locally from the SSH command line for testing
  • Locally from a Process object in the sketch when a barcode is scanned
  • Remotely through a simple web interface
  • Remotely as a web request from the main C# app

Anybody have thoughts on this? It basically gives you central place for the business logic, plus a couple of thin wrappers to call it from different interfaces.

Hmm.. After your response I would go with SQLite. It is better at business process that Sleepy Cat. SC is better at embedded system and extension, but if you might extend this like it appears, then SQLite will give you more mileage.

SQLite has one distinct advantage over most system. If you use THE most generic, read/write/purge functions, then you can extend it well. Too often I have seen system designers depend on things like SQL JOIN and be trapped into a path with few choices.

I've done more than one SQL system, and I learned fairly quickly that not knowing the generic SQL command set would cost me, especially if I used a proprietary (or version specific) command.

SQLite passes the ACID (Atomicity, consistency, isolation, durability) test because it use Berkely DB underneath. (A fact not often repeated.)

SQLite is a subset of SQL-92. This means you can find a good printed reference for a few bucks. Be careful, there are quite a few crappy books. Stay clear of the one with the rabbit on the cover.

Here are some good choice you might find at the thrift store:

SQL

SQL

SQLITE

This is the better book. It is similar to what is online, except it is in print.

Best of Luck
Jesse

Thanks for the list of books.

You have the right idea - while I didn't state it explicitly at first (my thoughts on the task are still fluid) and this isn't strictly a learning task, learning is a part of it. While this particular task may be able to be accomplished with a smaller set of tools, the task could grow in the future, and future tasks are certainly going to be more complex - it seems that each new project idea is always a bit more ambitious than the last. So if a new technology or technique needs to be learned, might as well choose one that will have some room to grow and handle future projects. Of course, there's no need to take it to extremes...

Speaking of learning, I may put this project on hold temporarily. My biggest customer has mentioned that I would be even more marketable to him if I knew VHDL. That was always an interest of mine, and to that end I bought an FPGA evaluation/development board a while ago, but never got around to opening it. I think it's time to open up that box and start playing with it - if I can get comfortable with that technology, it could mean more income potential, and that's never a bad thing!

ShapeShifter:
Thanks for the list of books.

::::SNIP::::

Speaking of learning, I may put this project on hold temporarily. My biggest customer has mentioned that I would be even more marketable to him if I knew VHDL. That was always an interest of mine, and to that end I bought an FPGA evaluation/development board a while ago, but never got around to opening it. I think it's time to open up that box and start playing with it - if I can get comfortable with that technology, it could mean more income potential, and that's never a bad thing!

Yep, I agree. It's more money. About 1995 I was contracted to Sony, they were still trying to get there MPEG chip out the door. I was contracted write the drivers, but the project was stalled. I found out a bit later that the senior engineer on the project was secretly talking to an FPGS maker. They hired him about 2-3 months after I started. He got in at the ground level of a startup and made a ton of money.

Meanwhile, FPGAs became hot, then died about 5-6 years ago, when more things went overseas. Hardware is cool again in Silicon Valley. I'm hoping to return for the robot revolution. Which at this point is put off another 3-8 years.

FWIW, we are in an implosion for startups and tech. A tweet this morning confirmed the bubble and stupidity. See attached image. FYI, Quirky file bankruptcy. See second image.

Jesse

unicorn,pegasus,BS.png

Just a quick update: I have the project mostly working, but of course I'm sure I will still make little tweaks with it. Most of the work is done on the Linux side using Python, and I have three major modules:

  • Database module: a wrapper around a sqlite database that implements my business logic. It defines a class that does all of the actual operations on the database, and has methods that do all of the various high level operations including importing/exporting data, and performing queries and updates. All of the database details and SQL are in this module, the rest of the system doesn't really have to know how the work is being accomplished.
  • Local Interface module: this is the process that is called from the sketch when barcodes are scanned. This module takes the scanned code, figures out what to do with it, calls the Database module to actually do the work, then takes the results from the database module, formats the LCD screen output, and sends it back to the sketch for display.
  • Remote Interface module: this is a Bottle framework application to allow remote control of the system. It implements a set of web pages to perform certain operations such as configuring the system, viewing data, and importing/exporting data. It also implements a RESTful API to allow remote access from the C# program, letting it get at certain data and remotely perform some of the same actions that are done by a local scan operation.

This is all tied together by a relatively simple sketch running on the Yun's '32U4 processor that takes data from the scanner, passes it to the Local Interface module, receives data from that module, and displays it. It doesn't perform any of the business logic, it's pretty much just a smart I/O controller. It does kick off the system by using Process objects to start the bottle application, and to make calls to the Local Interface module.

A few details are different than I had originally planned. First, by using the Bottle framework to create a web application, a lot of the things I was going to do with the C# app can now be done with simple web pages, greatly simplifying the C# app's duties.

Another significant change is my plan to use a Process object to call the Local Interface module, passing the scanned code as a parameter: that worked, but was too slow, taking a couple seconds to spawn the process and get the data back. Instead, I start the Process in setup(), and keep the process open. The Python module loops forever, and uses sys.raw_input() to read from stdin. When a code is scanned, the sketch writes to the Process object, the Python script picks it up, processes it, and uses sys.stdout.write() to send data back, which the sketch reads from the Process object. I could've used simple Python print statements, but sys.stdout.write() has the advantage of not inserting various spaces and newlines in the output. By keeping the Process open all of the time, the response time is shortened down to about 400 milliseconds - quite acceptable, as it takes about that long to move your eyes from the code being scanned to the display.

At this point, Robin is probably wondering why I didn't just bypass the Bridge and use the serial port directly? Well, a few reasons: The first is that it was simple enough to do almost the same thing using the Process object -- I think it's even simpler than going into Linux to disable the login on the tty port, using PySerial to open up the serial port, and configuring Linux to automatically start the required scripts. But the bigger reason is that I actually have two scanners and two displays attached to the Yun: if I bypassed the Bridge, I would have to write some steering logic on both sides of the serial port to figure out which scanner and which display is associated with the data. But by using two Process objects, I can have one for each data stream, and not only they are the data streams automatically kept separate, but each process has its own set of local data to keep state information between scans - no extra work is needed to be able to implement a second scanner/display, just instantiate a second Process object. A final reason is that leaving the Bridge in place allowed me to add the feature described in the next paragraph:

The last change is a new requirement to be able to send some asynchronous events from the Bottle web application to the sketch. I could've come up with some way to send that information to the process running the Local Interface module, and have that send it to the sketch through the Process object, but I took the easy way out: I just implemented a very simple REST API in the sketch, similar to the basic Bridge Library example. When certain buttons are pressed in the web application, it causes a call directly to the sketch. This seemed like the easiest way to go without inventing a new communications method. (With all of the business logic in Linux, and the sketch being basically a dumb I/O processor, there was plenty of memory left to implement the rudimentary API.)

PCWorxLA:
I personally loath Python, for various reasons.

This was my first significant exposure to Python, and while the list/dictionary/string processing abilities made some facets of it simpler (or at least needed less statements) I don't find it very good as a formal development environment. The flexibility in data typing has some appeal at times, but in general I found that it allowed problems that were not detected until run-time, and sometimes not until much later when a strange data situation arose. Many of the issues I tracked down would've been caught at compile time with a more formal and strongly typed language. I can see why developing strong unit testing scripts is such a big thing with the more serious Python developers, as some bugs aren't shown until you've called your function dozens or hundreds of times with every conceivable combination of data. Yes, you still need to do unit testing with a strongly typed language, but many of the bugs (like assuming a value is an integer when it turns out to be a string) would've been caught at compile time.

I'm glad I took some time to do some real work with Python: Python will likely stay in my box of tricks for some future tasks, but it is unlikely to take over as my go-to language for a serious development effort.

Anyway, if you're still reading this, thanks to all for your advice and guidance. The project is a success. I offer this recap of the project in case there is anyone out there looking for some architecture ideas for building a somewhat larger project than the typical small sketch.