Transfer data from Linux to the microprocessor

Hello,

How can I share data between the two processors of the Yun?

More specifically, the linux part runs a Python script which returns an array of integers.
How can I retrieve and then use this array in my Arduino Sketch?

Thanks.

There are many ways to do it, some of the more efficient and practical methods are:

Process: Use a Process object (example) This will actually launch your Python script, so you don't need another method to run the script. The Process object can either run the script synchronously such that the Arduino sketch waits until the Python process finishes, or it can run it asynchronously and do other things while the Python process runs. Either way, anything that the Python script prints out can be read using Process.available() and Process.read() in exactly the same manner as would be used with a serial port. Anything sent with Process.write() can be read by the Python script as standard input. The Process class derives from the Stream class, so any of those reading/writing functions can also be used. Running the process synchronously makes things a little simpler if the Python script runs, generates one set of output, and then exits. Running it asynchronously works well if the Python script runs in an infinite loop, sending periodic sets of data (this is the way I normally do it.)

Bridge: Use the Bridge class's data store. The Python script (which would have to be started separately) uses Bridge.put() to write string values to the data store, and the sketch uses Bridge.get() to read them. An advantage here is that you can also set or read these values using a web browser (useful for debugging.) A disadvantage is that the sketch does not know when the Python side has written a value, it must keep reading the value and looking for changes. This means that if the Python side writes the same value twice, the sketch will not see it. This is not a problem if the sketch just needs to know the last written value (for example, the value is a current set point.) But if the value is some sort of command, and it's necessary for the sketch to see every write, even if it is a duplicate, then either some handshaking method needs to be worked out (perhaps use Bridge.put() to clear the value after reading) or a different communications method should be used.

Mailbox: Use the Mailbox class. The Python script (which would have to be started separately) uses BridgeClient.mailbox() to send strings to the sketch, while the sketch uses Mailbox.readMessage() to read the message. This has the advantage of automatically queuing up duplicate messages, although I think there is more communications overhead than the other methods. Like the Bridge.get() techniques, you can also use a web browser to send messages to (but not see) the sketch for testing purposes (example.)

Roll your own: Bypass the bridge altogether and use direct communications between Serial1 on the sketch side, and /dev/ttyATH0 on the Linux side. There are a bunch of threads on how to do that, a forum search should turn up some examples. This can have the least amount of overhead, but involves a fair amount of work and Linux configuration changes, and also eliminates the possibility of using any of the other Bridge library capabilities.

Personally, I use the Process() class 98% of the time. The sketch starts the process asynchronously in setup(), and keeps checking for data from it in loop(). The Python script loops forever and periodically sends data by either print() or sys.stdout.write(). In either case, it's a good idea to run the Python script in unbuffered mode: either launch it using [color=blue]Python -u scriptname.py[/color], or add the shebang [color=blue]#!/usr/bin/Python -u[/color] to the beginning of the script if running it like an executable file (just using the script name as the command without explicitly calling Python.) If you don't explicitly enable unbuffered mode (using the -u when starting Python) then the output will be buffered and the sketch may not see any output from the script until the script exits, or a buffer fills up.

Thank you for your detailed answer. I am still struggling with this.

My Python script, once executed, returns (print()) an array of integers like this

[10, 20, 30, 40]

and then it stops.

Using Process as you suggested, it prints chars that seem random, not related to the expected output.

Do you think that Process is the best approach for my case?

mridolfi:
Do you think that Process is the best approach for my case?

It should work well for your needs. Please post your code, as there is apparently something not right. What's wrong? Without the code it would just be random guessing...

I just ran a very simple test.

Sketch:

#include <Bridge.h>
#include <Process.h>

void setup()
{
   Bridge.begin();
   Serial.begin(19200);
}

void loop()
{
   Process proc;

   Serial.print("Running process...");
   proc.begin("/mnt/sda1/processTest.py");
   proc.run();    // Run process synchronously, waits until process completes.
   Serial.println("done!");

   // Echo all process output to the serial port
   while (proc.available())
      Serial.print((char)proc.read());

   // Print out a couple blank lines to make the output visually distinct
   Serial.println();
   Serial.println();

   // Give a delay so the output doesn't scroll by so quickly
   delay(1000);
}

Python code:

#!/usr/bin/python -u

array = [10, 20, 30, 40]

print array

Output:

Running process...done!
[10, 20, 30, 40]


Running process...done!
[10, 20, 30, 40]


Running process...done!
[10, 20, 30, 40]


Running process...done!
[10, 20, 30, 40]


Running process...done!
[10, 20, 30, 40]


Running process...done!
[10, 20, 30, 40]


Running process...done!
[10, 20, 30, 40]

Not having seen your code, I'm going to go out on a limb and guess that the problem is the print statement that echos the data to the serial port (I assume you're doing something like that since it says you get random numbers.)

You will note that after I read a character from the process, I cast it to a char before I print it to the serial port. This is necessary because while read() returns one character at a time, it returns that character as an int. If you just call Serial.print(proc.read()) it will print that character as an integer, and display the ASCII encoding of that character.

For example, if I take the cast to char out of the echo loop so that it looks like this:

   while (proc.available())
      Serial.print(proc.read());

then the output becomes this:

Running process...done!
9149484432504844325148443252489310

Running process...done!
9149484432504844325148443252489310

Running process...done!
9149484432504844325148443252489310

Running process...done!
9149484432504844325148443252489310

Where the ASCII encoding of '[' is 91, '1' is 49, '0' is 48, ',' is 39, ' ' is 32, and so on...

Thank you very much!

I tried your code and it was not working properly, I got this output:

Running process...done!

Running process...done!

without the array!
Then I tried and changed

proc.begin("/tcpclient.py");
proc.begin()

with proc.runShellCommand("python /tcpclient.py");

and it's working properly.
But I don't know why it was not using your version.

mridolfi:
But I don't know why it was not using your version.

Do you have this exact line as the first line of your Python script?

#!/usr/bin/python -u

It's called a shebang and it's very important. When you try to run the script by just typing the filename (what my version of the code is doing) the system reads that first line of the file. If it starts with "#!" then it takes the rest of the line as the program it must run in order to run the sketch. So assuming the script name is "/tpclient.py" as in your example, it reads the first line, finds the shebang, and turns it into the command "/usr/bin/python -u /tpclient.py" and runs that. This starts Python, and passes it the name of the script to run.

Your code is doing that explicitly - running python and passing it the name of the script.

Not having a shebang means that it can only be run by explicitly running Python and passing it the script name. Having the shebang allows running it manually like that, but also allows running it by simply typing the name of the script. In addition, using the shebang allows adding arguments such as the "-u" which turns on unbuffered mode - not a problem in this simple example, since the buffer is automatically flushed when the process finishes, but can be an issue for a process which continues to run and only occasionally sends output.

On the other hand, explicitly running Python and passing the script name as you are doing is just a bit more efficient, as the system doesn't have to read that first line of the file to figure out how to launch it. You can still have the shebang, however, to give you the flexibility: Python will just treat it like a comment and ignore it.

Very clear.

One last question, given the array [10,20,30,40] the command (char)proc.read() returns every single character of what has been printed by the script?
I mean, first [ then 1 0 and so on?

Yes, you will receive each character one by one by calling proc.read(). This includes the newline character the Python's print() statement automatically adds to the end of the output.

However, the Process class derives from the Stream class, just like the Serial port classes. Therefore, anything that is possible with a Stream class instance can also be used with the Process object. For example, if you are used to reading data from a Serial port using readStringUntil() and parseInt(), that will also work with a Process object. Any technique you currently use to work with something derived from Stream should also work here.

I am now using parseInt() to save in my array only the integer numbers and it does his jobs except that in my array I find an extra '0' as last value. Reading about the Stream class and the function I am using, I've found this:

If no valid digits were read when the time-out (see Stream.setTimeout()) occurs, 0 is returned;

and I think that the extra 0 is because of that.

What do you think? How can I avoid it?

mridolfi:
What do you think?

Sure, it's possible.

How can I avoid it?

Don't call parseInt() more times than you have valid values?

As was said before, if you post your code, we can give better answers.

This is the final and working code:

#include <Bridge.h>
#include <Process.h>

void setup()
{
   Bridge.begin();
   Serial.begin(19200);
}

void loop()
{
   Process proc;
   int i=0;
   size_t dim = 1;
   int* array = (int*)malloc(dim);

   
   Serial.print("Running process...");
   Serial.println("done!");
   proc.runShellCommand("python /tcpclient.py");
    
   while (proc.available()){
     if((char)proc.read() != ']'){
        array[i] = proc.parseInt();
        dim++;
        array = (int*)realloc(array,dim);
        Serial.println(array[i]);
        i++;
      }
      else
        break;
   }
   
   Serial.println();
   delay(1000);
}

Thanks for your help.

mridolfi:
This is the final and working code:

By working, do you mean you have solved your issues? Or are you still getting the extra zero value?

I know it's just a piece of test code, but may I make a couple observations?

Minor detail: You've rearranged the print statements, and now you're printing out "done" before you even call the process. :wink:

Much more serious: I question your use of malloc()/realloc(), as I think the way you are doing things will lead to serious problems. Firstly, you are allocating an array of ints using the number of elements as an argument. It should be the number of elements multiplied by the size of an element, which for an int is 2. Either initialize Dim to sizeof(int) and add sizeof(int) on each pass, or call malloc with Dim * sizeof(int). For one element, your code will allocate one byte, but an array of one int needs two, meaning you will overrun the allocated memory space by one byte. Two ints will get two bytes, but need four, so you are now overruning by two bytes. By the time you've read all four elements of the test array, you've allocated four bytes, but are actually using eight. This isn't a problem in this test code, because you never allocate another variable after this one. But if you did malloc() more data, writing to your int array will corrupt this next variable, just as writing to that variable will corrupt your int array. Very dangerous.

In addition, your code never frees the allocated memory. So, assuming your process is still returning a four element array, each time you pass through the loop() function you are using up six bytes of heap (four for the too-small array block, and two for the memory block header - and if you fix the code, you will be using up ten bytes on each pass.) This will eventually use up all of your heap, and eventually malloc() or realloc() will fail. You are not checking the return value of these functions, so when the malloc()/realloc() eventually fails and returns a NULL pointer, you will not know it and will run into problems when you try to dereference the NULL pointer. But your program may crash before that happens, since you are overruning the allocated memory, and will likely start to corrupt stack values before the heap and the stack actually crash into each other.

In general, I strongly suggest people avoid malloc() on such a small processor as this, unless there is no other way around it. In this case, unless the number of array elements returned from your process is highly variable and covers a wide range of element counts, I would just statically allocate an array large enough to hold the worst case. While this uses up more memory all the time, it uses less memory than malloc() (no need for an allocation block header or free list) and is static memory where the size is calculated properly by the compiler - there is no chance of memory leaks, and less chance of overruning the array.

As an alternative to using a statically or dynamically allocated array, consider whether you can process the values serially. Rather than read all elements into the array, and then work with the values, can you read in one element, do whatever processing you need with it, then read in the next element and process it? That would allow you to use a single int variable and not need an array at all. In that case, it wouldn't matter how many elements the process returned, you would never have memory issues.

On the other hand, if the process always returns the same number of elements, then you would be much better off using a fixed size array and skipping the malloc() operations. Even if it always returns the same number of elements, there is the concern that future modifications to the Python code and sketch code might get them out of sync with each other, where the sketch is assuming one number of elements and the Python code is sending a different amount. In this case, rather than going with code that can dynamically handle any number of elements, it may be more efficient (and safer) to code for the assumed number of elements, but then have a sanity check where after you convert the last value, you verify that the next character is ']'. If it is not, then you know the Python and sketch code is out of sync and you need to fix something.

The problem is that I don't know how many elements I will receive each time and it can be highly variable but not bigger than 500.
Is it better to allocate an array of dim 500?

mridolfi:
Is it better to allocate an array of dim 500?

If you need to keep them all in memory at once, then yes, I would declare the array to be a fixed size of 500 if I were writing the code.

My logic is that it doesn't really matter how much memory you have free, as long as you have enough memory to get the things done that you need to do. Yes, using dynamic memory will give you more free memory when the process returns less than 500 elements, but so what? If you don't need that memory for something else, then it doesn't matter. But if you do need that memory to hold 500 elements some times, what does it hurt having that space allocated all of the time?

If having that array statically defined and always around causes you problems because sometimes you need that memory when you aren't running the process, then make the 500 element fixed size array be on the stack by declaring it locally inside the function. Then, the stack will grow by 1000 bytes when you enter the function, but that memory will be automatically and safely restored when the function returns. But of course, be very sure you never overrun the array, because if you do, you will be corrupting the stack, and that will almost certainly cause your sketch to crash in unpredictable ways.

Now, if you run into problems because the fixed 500 element array is too big, you may be tempted to go back to the dynamic allocation method. But the flaw in that logic is that if there isn't enough memory to hold the full array, then there won't be enough memory to hold the dynamically allocated array if the process should return 500 elements. So you don't really gain anything by going with dynamic allocation.

I've been doing embedded programming for about 35 years, and in that time I've never had to resort to dynamic memory allocation in an embedded microprocessor - I've always been able to work out a way to avoid it. On a big machine with many megabytes of virtual memory, dynamic data structures using dynamic memory allocation can be very useful. But on a small microprocessor with only a few kilobytes of RAM, I stay away from it.

Note that using the String class (as opposed to a character array string) uses lots of dynamically allocated memory behind the scenes. Because of this, there are a lot of people who will also avoid using Strings on an Arduino, but instead will use more difficult but more efficient direct character array manipulation techniques.

With this code

...
 proc.runShellCommand("python /tcpclient.py");
    
   while (proc.available()){
     if((char)proc.read() != ']'){
        array[i] = proc.parseInt();
...

is there any method to know how fast is Arduino reading from the python script?
What would be the fastest solution?

That is likely a rather efficient solution. You can time it by calling micros() before the loop and saving the value, then calling it again after the loop and subtracting that value from the starting value. That will tell you how many microseconds it took.

You could try other methods as well, and time them. Some other methods may or may not be faster, but it will likely be only a small difference, and any other method will likely take more code. Some possible other methods:

  • Read the data into a character array, and when you see a comma or bracket, call atoi() on the buffer of data, then reset the buffer.
  • Read the whole string into a character buffer, up through the closing bracket, then use a loop to call strtok() to pull out token strings and call atoi() for each found token
  • Read the whole string into a character buffer, up through the closing bracket, then use scanf() and an appropriate format string to convert all values at once (won't work with a variable number of elements.)
  • Convert the value while reading: read the first character, subtract '0' from it to get a value from 0 to 9, and save that as the value. Read the next character, subtract '0' from it, add it to ten times the saved value, and store the new value. Repeat for each digit until you get a comma or bracket.

Then there are several variations on the scheme using the String object based functions to read a String from the Process object and break it apart/parse it.

But like I said, I doubt you will get significantly faster times from the methods, and you will write a lot more code.

If speed is the most important factor, the fastest way to transfer the data will be to bypass the Bridge, disable the Linux console on /dev/ttyATH0, and do the communications yourself: have your Linux side script write to /dev/ttyATH0, and your sketch read from Serial1. You will have to start the Linux script through some other method than from the sketch, but it will give you the lowest communications overhead.

Thanks for your detailed reply.