Rapid MySQL Updates

For the project I'm working on right now, I need to update my MySQL database with between 0 and 504 pairs of datapoints as quickly as possible.

I wanted to share the methodology I'm using both hopefully to help other people if they have similar needs and to see if anyone sees opportunities for improvement.

My normal method for sending information to MySQL is to call a Process each time that I need to add a certain line. This is very easy to implement, but the downside is that if you have to repeat the operation many times, the overhead time of establishing the MySQL connection quickly stacks up.

For the particular query that I'm running, establishing the connection and inserting 1 row takes about 0.7 seconds. If I have to perform the maximum update of 504 rows (which happens), sequentially executing the Process would mean almost 6 minutes of wait time.

This is how I ended up holding open a single MySQL connection and feeding data into it:

  • Initiate a Process which starts a php script
  • Run the php script asynchronously
  • Immediately after starting, the php script opens a database connection and then waits for data to be fed into STDIN
  • The arduino script now feeds each pair of data into STDIN
  • When the last datapoints have been sent, the arduino sends two blank rows to STDIN
  • Once the php script reaches the two blank rows, it closes the connection and sends a completion character back to the arduino
  • The arduino waits on the completion character, otherwise the Process could be ended before php had made it all the way through the data

The result is a significantly faster query speed for an unknown number of queries.
1 query still takes 655ms
10 queries take 1,076ms
504 queries take 24,000ms (instead of 330,120ms)

If my possible number of queries had been smaller or more predictable, I could have say loaded 10 queries at once into my Process and then executed that as often as needed, but I went with this method because I would run out of memory trying to pass all 504 queries at once and there isn't a reason to use all that execution time when I might only be sending 1 data pair that cycle.

<?php
$runid = $argv[1];  //runid
$temp = $argv[2];  //temperature
$displayonly = $argv[3]; //displayonly

$DBServer = '*******'; 
$DBUser   = '*********';
$DBPass   = '***********';
$DBName   = '**********';   
$conn = new mysqli($DBServer, $DBUser, $DBPass, $DBName);
// check connection
if ($conn->connect_error) {  trigger_error('Database connection failed: '  . $conn->connect_error, E_USER_ERROR);}

//sends an "R" after the connection is ready
printf ("%s \n", "R");

$thermostat = "999";

while ( $thermostat != '' ) {

	//retrieves data send from the arduino to STDIN
	$thermostat = trim(fgets(STDIN)); // reads one line from STDIN
	$cont = trim(fgets(STDIN)); // reads one line from STDIN
	$tray = (FLOOR($thermostat/85) + 1); //calculates which tray the thermostat belongs to
	$traythermostat = $thermostat - (($tray - 1)*84);

	if ( $thermostat != '' ) {	
		//uploads the data to MySQL
		$sql="INSERT INTO ovenshifter_data (runid, thermostat, continuity, temperature, displayonly, tray, traythermostat) VALUES ($runid, $thermostat, $cont, $temp, $displayonly, $tray, $traythermostat)";
		if($conn->multi_query($sql) === false) {trigger_error('Wrong SQL: ' . $sql . ' Error: ' . $conn->error, E_USER_ERROR);} 
	}
}
printf ("%s \n", "F");  //sends a "F" after the upload is finished
?>
//this function opens a php script and sends data to a remote MySQL server
void LinuxDataTrayInternal() {
  String result;
  if (traycontk > 0) { //only proceeds if a data pair is present
    unsigned long speedtime = millis();  //timer to track processing speed

    Process TrayWrite; //this sends information to the db
    TrayWrite.begin("php-cli"); //calls php
    TrayWrite.addParameter("/mnt/sda1/db_traywrite.php");  //adds the script path
    TrayWrite.addParameter(runid); //insert the runid, which is the unix timestamp when the run began
    TrayWrite.addParameter(temperature); //insert the temperature that the thermostats tripped at
    TrayWrite.addParameter(String(displayonly)); //insert whether the readings should be used only for display purposes or used as data
    TrayWrite.runAsynchronously(); //runs the process asynchronously, since data will be fed into it as it runs

    while (TrayWrite.available() == 0) {}  //waits until the database connection has been established

    while (TrayWrite.available() > 0) { //clears out the completion characters
      char c = TrayWrite.read();
      if (c != '\n')
        result += c;
    }
    result = "";

    for (int kt = 0; kt < (traycontk); kt++) {  //the data is in pairs in the array (thermostat and continuity) so it increments by 2 each time
      String tempcont = traycontc[kt];
      int temptherm = int(traycont[kt]);
      TrayWrite.println(String(temptherm)); //insert the thermostat #
      TrayWrite.println(tempcont); //insert the continuity reading
    }

    TrayWrite.println();  //prints an empty thermostat to indicate input is over
    TrayWrite.println();  //prints an empty continuity to indicate input is over

    while (TrayWrite.available() == 0) {}  //waits until the script finishes (it will send an "F" back

    while (TrayWrite.available() > 0) { //clears out the completion characters
      char c = TrayWrite.read();
      if (c != '\n')
        result += c;
    }

    Console.println(result);
    result = "";

    //after all of the data pairs have been sent
    TrayWrite.close(); //closes the php script

    Console.println(String(String(traycontk) + "," + String(millis() - speedtime)));
  }
}

I do something very similar with Python scripts that write to SQLite. The big difference in my case is that I keep the script on the Linux side running all of the time. Then, there is no need to start the script to write a new batch of data: just write to the Process object any time.

Your idea of writing an "F" at the end is interesting, but you aren't actually looking for it. You're just looking for no more characters to be available. You're banking on there not being a gap in the output data log enough to fall out of your while loop. Also, in reality, it's probably good that you're not looking for the "F \n" terminator because your sketch would hang if the PHP script failed and terminated early before it sent the terminator string.

A more robust way of dealing with it might be to just let your PHP process terminate when its done. Then, use the Process.running() function to check the status. In that way, you definitively know that the script has terminated, for whatever reason - completion or failure.

   while (TrayWrite.running())
   {
      if (TrayWrite.Available())
         console.print((char)TrayWrite.read());
   }
   TrayWrite.close();

This also eliminates the need to collect the characters in a result String object, which is not a very efficient way of managing character data. It's easy to code concatenation with a String object, but it results in poor memory management. You can make a similar optimization to the first loop where you're just throwing away any additional data: why concatenate the data into the result string, if the very first thing you do after the loop is clear the string?

ShapeShifter,

Thanks for the feedback!

Do you start your Python script from the Sketch or do you start it automatically when Linux boots up?
I assumed that my SQL session would expire if I left it open without activity for some period of time. Do you have any such trouble?

Letting the PHP script end itself and continuing on with the sketch based on Process.running() is definitely a good idea. I'll implement that tomorrow; I have had issues with the script hanging due to wireless connectivity issues and that should keep me running.

Interestingly, falling out of the while loop isn't a problem at least in the default php implementation on the yun. The php stream functions can be set to Blocking or Non-Blocking. trim(fgets(STDIN)) is blocking and doesn't proceed until a line is present to retrieve. I didn't know this originally, so I had previously had two-way confirmation. After finding out that it is blocking through experimentation, I eliminated the confirmation from sketch --> php.

I'm sorry, but I forgot to mention what a nice writeup your first post was, thank you for this thread.

I start the Python script from the sketch using a Process object. If it was auto-started by Linux, it would make communications much more difficult - the Bridge library would have to be bypassed, and the script would have to access the TTY port directly. It would also allow only one process to communicate without having to write some kind of communications manager.

By using the Process object, the script just needs to read/write STDIN/STDOUT, and it automatically gets connected to the I/O from the Process object in the sketch. There can also be multiple Process objects, each one running an independent Linux process, each with their own independent communications. I have one project where I run three processes. I don't use most of the features of the Bridge library (there are better and more efficient ways to do most of it) but I use the Process class a lot. I think it's the most useful feature in the library.

Because I keep the processes running all of the time, I use the Process.running() function a little differently. At the bottom of the loop() function, I check Process.running(), and if it returns false, I start the process again. This way, if the process should unexpectedly quit, it gets restarted automatically.

I use SQLite, not MySQL, so our situations may be different. And even though the process is running the whole time, I don't always have a connection to the database. When the process gets a new data record from the sketch, it connects to the database and appends the record as a complete transaction. My data rate is low enough that I don't need to keep a single connection open all of the time. Also, SQLite has low overhead so there isn't as much of a time penalty making the DB connection. But with a higher data rate, and MySQL, your situation could be different.

ShapeShifter,

Thanks!

I'd never thought of declaring the Process globally and then running it elsewhere. By declaring it in the functions, I'd always been limited by destructor's imperative to clean up.

Running the php script continuously saves a significant amount time when I only send in a few data pairs at a time.

Before: 1 query took 655ms
Now: 1 query only takes 264ms

It also means that my Sketch doesn't actually have to wait for the global Asynchronous TrayWrite process to finish. Once all of the data-points have been loaded into STDIN followed by two blanks, I can move on in the sketch without worrying about the script ending. New data can be appended to STDIN before PHP even finishes with the last set if needed, since they are delineated by the two blank spaces. The only real limit is my Sketch execution speed.

I think I'm going to try to add some php to safeguard against the MySQL connection timing out (it sounds like the default time-out might be 8 hours), try to speed up my Sketch function, and then I'll post the revised Sketch and PHP.

That is a good speed improvement by getting rid of all the startup overhead!

Yes, I while I didn't specifically say an global Process instance, of course it would need to be global for that scheme to work. From your numbers, it seems like the slowest part of the 1 query execution time was actually starting the PHP process. That's consistent with my observations, and a good incentive to keeping the script running constantly, even though it's a little more work. Being able to "fire and forget" the data and not wait for completion is also a nice side benefit.

I look forward to seeing the updated code.

Before when I though I was restricted to running the Process in a function, I was accumulating all of my data during Loop into a large array, which I then iterated through in my function. Because now I can send data out at any point in my Sketch, the function is gone and data is uploaded live during the Loop, which saves both complexity and a lot of memory (no array).

Process TrayWrite;
^declared globally

In my Loop:

if (TrayWrite.running() == 0) LinuxDataStart(); //starts the php processing script if it is not already running
.....
TrayWrite.println(String("R" + runid + "Z")); //passes the runid
TrayWrite.println(String("T" + temperature + "Z")); //passes the temperature
TrayWrite.println(String("D" + String(displayonly) + "Z")); //passes the displayonly flag
.....
TrayWrite.println(String("S" + thermostat + "Z")); //insert the thermostat #
TrayWrite.println(String("C" + String(newbit) + "Z")); //insert the continuity reading
.....
TrayWrite.println("KKZ") //keepalive string

During the loop I send values out to TrayWrite. Each value is prefixed by an unique identifying character and suffixed by the character "Z".

The function to start the process

void LinuxDataStart() {
  Console.println(9);
  TrayWrite.begin("php-cli"); //calls php
    TrayWrite.addParameter("/mnt/sda1/db_traywrite.php");  //adds the script path
  TrayWrite.runAsynchronously(); //runs the process asynchronously, since data will be fed into it as it runs
}

My php isn't terribly refined, but it seems to be working alright. I made it non-blocking now to prevent any hang-ups.

It basically:
-Connects
-Pulls a string from STDIN
-Checks if the string is complete (has Z in it...I'm only sending #'s so I'm not worried about false positives)
-If the string is complete, assign it to the correct variable based on its prefix
-When the last variable is received, initial the upload to MySQL

<?php
$DBServer = '***********'; 
$DBUser   = '***********';
$DBPass   = '***********';
$DBName   = '***********';   

$runid;
$temperature;
$displayonly;
$thermostat;
$continuity;
$complete = 0;
$datadone = 0;
$varlen;
$getvar = "";
$gotvar = "";
$idcheck;
$contimer = time();
$timer = time();
$connected = 0;
$t2 = time();

//set up connection
$conn = new mysqli($DBServer, $DBUser, $DBPass, $DBName);
// check connection
if ($conn->connect_error) {  trigger_error('Database connection failed: '  . $conn->connect_error, E_USER_ERROR);}

//sends an "R" after the connection is ready
printf ("%s \n", "R");

//set stdin blocking to off
stream_set_blocking(STDIN,FALSE);

//loops perpetually unless no keepalive is recieved for 60 seconds
while (time() - $timer < 60) {
	
	//check connection
	if (time() - $contimer > 5) { //keeps script from pinging the server more often than every 5 seconds
		if ($conn->ping()) { //pings the connection TRUE is connected, FALSE is not connected
			$connected = 1;
			//printf("%s\n","still connected");
		}
		else {  //if the connection ping failed
			$connected = 0;
			printf( "%s\n","not connected");
			printf ("Error: %s\n", $mysqli->error);
			if ($conn->connect_error) {  trigger_error('Database connection failed: '  . $conn->connect_error, E_USER_ERROR);}		
		}
		$contimer = time();
	}
	
	
	if ($connected == 1) {  //if the connection is available
	
		$rest = trim(fgets(STDIN));  //trim any control characters or spaces out
		$getvar = $getvar . $rest;  //takes the value from STDIN and adds it onto getvar
				
		//look for the Z end flag	 
		if (strstr($getvar,"Z") !== FALSE) {  //if the control character for newline is found
			echo $getvar;
			$varcomplete = 1;  //flag that a string is complete
			$gotvar = $getvar; //move the string to gotvar
			$gotvar = trim($gotvar);  //trim the control characters off of gotvar
			$varlen = strlen($gotvar);  //store the length of gotvar in a variable
			$getvar = "";  //reset the value of getvar to be empty
		}
		else $varcomplete = 0;   //if the control character for newline is not found

		 
		 
		if ($varcomplete == 1) {  //if the complete string flag is on
			$idcheck = substr($gotvar,0,1);  //pull the first character off the string into a new variable
			$gotvar = substr($gotvar,1,$varlen-1); //set gotvar equal to the string minus that first character
			$gotvar = substr($gotvar,0,$varlen-2); //set gotvar equal to the string minus that last character
			
			
			//checks what the first character is and stores gotvar into the appropriate variable
			if 		($idcheck == 'R') $runid = $gotvar;
			else if ($idcheck == 'T') $temperature = $gotvar;
			else if ($idcheck == 'D') $displayonly = $gotvar;
			else if ($idcheck == 'S') $thermostat = $gotvar;
			else if ($idcheck == 'C') {
				$continuity = $gotvar;
				$tray = (FLOOR($thermostat/85) + 1); //calculates which tray the thermostat belongs to
				$traythermostat = $thermostat - (($tray - 1)*84);
				$sql="INSERT INTO ovenshifter_data (runid, thermostat, continuity, temperature, displayonly, tray, traythermostat) VALUES ($runid, $thermostat, $continuity, $temperature, $displayonly, $tray, $traythermostat)";
				if($conn->multi_query($sql) === false) {trigger_error('Wrong SQL: ' . $sql . ' Error: ' . $conn->error, E_USER_ERROR);}
			}  //the continuity is the last item to be received before an upload, so receiving it triggers the upload
			else if ($idcheck == 'K') $timer = time();
			
			
			$varcomplete = 0;  //changes the complete string flag back to 0
		}
	
	}  //end of connection available if
} //end of while loop

Including a timeout in the script was very important because the php process wasn't closing when the arduino side reset (due to things like new sketch uploads, where Linux doesn't reboot). The stackup of php processes was eating all of the available resources on the linux side.

ShapeShifter, when you are running more than 1 process in the background, how do you keep one from receiving the data from the other?
At least in my implementation, regardless of which Process I println to, it all goes to the same STDIN buffer and one php script would starting parsing lines mean for the other script. I had thought each Process would have a separate STDIN, but what I observed didn't support that. I suppose I could write lines to a particular file for each process...

Your code looks much cleaner simpler now!

DarkSabre:
Including a timeout in the script was very important because the php process wasn't closing when the arduino side reset (due to things like new sketch uploads, where Linux doesn't reboot). The stackup of php processes was eating all of the available resources on the linux side.

That's a good idea, I like that. I don't recall dead processes sticking around when the sketch resets, but I will keep an eye out for it.

ShapeShifter, when you are running more than 1 process in the background, how do you keep one from receiving the data from the other?

Each Process object communicates only with the process it created. Anything I print() or println() to a Process object goes only to the STDIN of the associated Linux process, and anything that the process prints to STDOUT is available to be read only by the associated Process object in the sketch.

At least in my implementation, regardless of which Process I println to, it all goes to the same STDIN buffer and one php script would starting parsing lines mean for the other script. I had thought each Process would have a separate STDIN, but what I observed didn't support that. I suppose I could write lines to a particular file for each process...

That's very curious. With my Python scripts, it works exactly as you would've thought it should work. I wonder if it's something about PHP that's causing it?

I have one system that runs three independent Process objects. One of them runs a Python script that is a Bottle web application, and the other two run two independent instances of the same Python script. There are no problems keeping the two instances distinct and maintaining their own I/O streams, data, and internal state, even though they are instances of the same script. Those two scripts (and the Bottle application script) all import another Python script module that provides the common SQLlite database access for the system. It works just as you would expect it to work.

That's how I thought that it should work...

What happened was values from my TrayWrite Process were being inserted into the database table used by my TempTrack Process. That table isn't used at all by TrayWrite. My assumption is that TempTrack was somehow pulling them from my STDIN for TrayWrite.

So that remains a mystery, but it wasn't a critical function, so I stopped running TempTrack perpetually.

An issue I encountered is that, after a period of several hours, my Sketch would stop executing when it encountered any line that called for it to perform TrayWrite.println();.

After adding some logging code to the php script, I found that the php script would eventually hang when it reached my "echo" statement. After this happened, the Sketch would run for a while longer (but queries no longer uploaded to the database). As far as the Sketch was concerned, the Process was still Running. After a while running in that state, the Sketch would then hang when it tried to println to the Process.

I think some memory limit or buffer must have been exceeded, but removing the echo statement seems to have fixed things, so I've not tried to track it down further than that.

After removing the echo statement, the script has been running a test loop continuously overnight and has uploaded 130,000 queries to the MySQL database without any issue.

<?php
$DBServer = '***********'; 
$DBUser   = '***********';
$DBPass   = '***********';
$DBName   = '***********';   

$runid;
$temperature;
$displayonly;
$thermostat;
$continuity;
$complete = 0;
$datadone = 0;
$varlen;
$getvar = "";
$gotvar = "";
$idcheck;
$contimer = time();
$timer = time();
$connected = 0;
$t2 = time();

//set up connection
$conn = new mysqli($DBServer, $DBUser, $DBPass, $DBName);
// check connection
if ($conn->connect_error) {  trigger_error('Database connection failed: '  . $conn->connect_error, E_USER_ERROR);}

//sends an "R" after the connection is ready
printf ("%s \n", "R");

//set stdin blocking to off
stream_set_blocking(STDIN,FALSE);

//loops perpetually unless no keepalive is recieved for 60 seconds
while (time() - $timer < 60) {
	
	//check connection
	if (time() - $contimer > 5) { //keeps script from pinging the server more often than every 5 seconds
		if ($conn->ping()) { //pings the connection TRUE is connected, FALSE is not connected
			$connected = 1;
			//printf("%s\n","still connected");
		}
		else {  //if the connection ping failed
			$connected = 0;
			printf( "%s\n","not connected");
			printf ("Error: %s\n", $mysqli->error);
			if ($conn->connect_error) {  trigger_error('Database connection failed: '  . $conn->connect_error, E_USER_ERROR);}		
		}
		$contimer = time();
	}
	
	
	if ($connected == 1) {  //if the connection is available
	
		$rest = trim(fgets(STDIN));  //trim any control characters or spaces out
		$getvar = $getvar . $rest;  //takes the value from STDIN and adds it onto getvar
				
		//look for the Z end flag	 
		if (strstr($getvar,"Z") !== FALSE) {  //if the control character for newline is found
			$varcomplete = 1;  //flag that a string is complete
			$gotvar = $getvar; //move the string to gotvar
			$gotvar = trim($gotvar);  //trim the control characters off of gotvar
			$varlen = strlen($gotvar);  //store the length of gotvar in a variable
			$getvar = "";  //reset the value of getvar to be empty
		}
		else $varcomplete = 0;   //if the control character for newline is not found

		 
		 
		if ($varcomplete == 1) {  //if the complete string flag is on
			$idcheck = substr($gotvar,0,1);  //pull the first character off the string into a new variable
			$gotvar = substr($gotvar,1,$varlen-1); //set gotvar equal to the string minus that first character
			$gotvar = substr($gotvar,0,$varlen-2); //set gotvar equal to the string minus that last character
			
			
			//checks what the first character is and stores gotvar into the appropriate variable
			if 		($idcheck == 'R') $runid = $gotvar;
			else if ($idcheck == 'T') $temperature = $gotvar;
			else if ($idcheck == 'D') $displayonly = $gotvar;
			else if ($idcheck == 'S') $thermostat = $gotvar;
			else if ($idcheck == 'C') {
				$continuity = $gotvar;
				$tray = (FLOOR($thermostat/85) + 1); //calculates which tray the thermostat belongs to
				$traythermostat = $thermostat - (($tray - 1)*84);
				$sql="INSERT INTO ovenshifter_data (runid, thermostat, continuity, temperature, displayonly, tray, traythermostat) VALUES ($runid, $thermostat, $continuity, $temperature, $displayonly, $tray, $traythermostat)";
				if($conn->multi_query($sql) === false) {trigger_error('Wrong SQL: ' . $sql . ' Error: ' . $conn->error, E_USER_ERROR);}
			}  //the continuity is the last item to be received before an upload, so receiving it triggers the upload
			else if ($idcheck == 'K') $timer = time();
			
			
			$varcomplete = 0;  //changes the complete string flag back to 0
		}
	
	}  //end of connection available if
} //end of while loop

FYI, to track down the points where the php code was stopping, I made a log file like this:

$log = time().":Connection Failed".PHP_EOL; //generates a string with the time, my message, and end of line characters
error_log($log,3,"/mnt/sda1/phplog.txt"); //sends the string to a file