[GSoC 2020] Arduino to MicroPython Transcompiler

Hello Arduino Community,

I am Ashutosh Pandey, a 3rd year computer science student from India. I am one of the two students selected by Arduino for this year's Google Summer of Code.

My project is centered around developing a 'source to source compiler' or Transcompiler as they are commonly called, to convert Arduino code to Micropython to speed up the porting process for the Portenta board.

My proposal can be found here:

On the advice of my mentor(s) I've posted this thread here and will regularly update it with my progress. I've begun working on the coding part (officially it starts on June 1st) and I'll be posting them here.

I look forward to hearing your suggestions, feedback and criticism if any.

Thank you for this opportunity :slight_smile:

I imagine that MicroPython (or any form of Python) to Arduino C++ would be more useful.

It's hard to see why anyone would want to convert from a working high-performance program to a working low-performance program. I use Python for all my PC programming, so I am not saying that because I am anti Python.

...R

Thank you for your comment.

This project does involve a specific use case though. There's a new board (Portenta) coming out and it does support Micropython, among other things. Previously we've seen Arduino try to introduce new things (like FPGAs) into boards and it's difficult to get traction because on launch there's insufficient examples to get by. The idea is to have more Micropython code available to get better adoption.

Also from what I can see from online research (may as well be wrong) converting python to c++ seems to be somewhat harder than the converse.

The new Nano boards and Portenta have a lot more processing power than the AVR Arduino's, so I guess the loss in performance should be acceptable.

ashutosh_pandey:
Also from what I can see from online research (may as well be wrong) converting python to c++ seems to be somewhat harder than the converse.

Agreed. That is why it would be useful.

The new Nano boards and Portenta have a lot more processing power than the AVR Arduino's, so I guess the loss in performance should be acceptable.

But if the board can run compiled C++ code and if the C++ source code already exists why would anyone want to take the performance hit?

It would be a bit like taking the engine out of Lewis Hamilton's race car and replacing it with the diesel engine from a Mercedes taxi.

To my mind the sorts of things you would use MicroPython for are very different from the sorts of things you would do with an Uno or Mega. I think if I was bringing out a new product I would write a new set of examples that reflect the extra capabilities of the more powerful MCU.

Just my 3 cents.

...R

Robin2:
But if the board can run compiled C++ code and if the C++ source code already exists why would anyone want to take the performance hit?

Because they are either already familiar with Python, or find it easier to learn or accomplish their goal with Python that with than C++.

Often that the user can't (or doesn't want to) find a complete program that does what they want for their project. They use libraries to make it easier, but they are still writing the sketch themselves. If some of the many Arduino libraries were available also in MicroPython, this would be extremely useful to people wanting to program Arduino boards using MicroPython.

Robin2:
To my mind the sorts of things you would use MicroPython for are very different from the sorts of things you would do with an Uno or Mega.

Good thing, since you have no hope of using MicroPython on those boards.

Robin2:
I think if I was bringing out a new product I would write a new set of examples that reflect the extra capabilities of the more powerful MCU.

Arduino has certainly done that with the SAMD boards. Many of the new Arduino libraries use SAMD-specific features.

In addition to several new libraries, they wrote a couple of tutorials showing examples of machine learning on the Nano 33 BLE, which certainly showcases its capabilities:

MicroPython support will be one thing to showcase the extra capabilities of the Portenta H7.

pert:
If some of the many Arduino libraries were available also in MicroPython, this would be extremely useful to people wanting to program Arduino boards using MicroPython.

I can see the value of that, but I suspect that automatically creating a library that works with MicroPython from an existing Arduino library would be more challenging than simply converting a complete Arduino program to a complete MicroPython program.

However if the purpose of the GSOC project is to create a library-converter then I'm all for it. That was not the impression I got from the OP's posts.

...R

Robin2:
I can see the value of that, but I suspect that automatically creating a library that works with MicroPython from an existing Arduino library would be more challenging than simply converting a complete Arduino program to a complete MicroPython program.

However if the purpose of the GSOC project is to create a library-converter then I'm all for it. That was not the impression I got from the OP's posts.

...R

I apologize for that, the end goal is indeed to translate libraries (and the examples included in the libraries). As you note correctly, making a 1:1 conversion between Arduino and MicroPython automatically is probably not feasible. There will be some manual intervention involved.

However writing all the libraries by hand is also tedious (given there are some 2700+ libraries) and if a large fraction of the process can be automated, this process could be sped up and made less tedious. That is the purpose of the Transcompiler.

The goal is to get something as close to compilable and readable code as possible.

Weekly Update 1:

As specified in the guidelines, I have to post weekly guidelines on what I'm doing. This is the first week update.

Lexical Analyser: The first part of building a compiler (or transpiler) is to build a lexical analyser to take the input language (In our case, Arduino) and split it into tokens.

Arduino Language is a subset of C/C++ , So first, we have to define the various essential parts of a language such as:

1.) Operators: + - * / & ++ -- == < > etc

2.) Punctuation: ( ), { } , [ ], : , ; etc

3.) ConstantLiterals: Such as decimal numbers, hexadecimal numbers, floating point numbers.

4.) Identifiers: Variable names that consist of lowercase and uppercase characters and numbers.

5.) Blank numbers and Newlines: \n \t

6.) Keywords: The Arduino Language reference contains some 200+ keywords which have special meaning to the compiler. I used it to define the keywords. It can be found at: Arduino Reference - Arduino Reference

7.) Single Line/ Multiline Comments: These need to be recognised by the compiler so that they can be ignored.

8.) Preprocessor directives: Right now we are ignoring these as they are complex to handle and language specific (#include and #define come under this).

I used standard textbook implementations of Regular expressions for everything except the Arduino Language specific keywords. Currently the scanner.h and scanner.l files are done, I have to work on the scanner.c file to make the lexical analyser work. the .l and .h files can be found on my github repo here

Thank you for your time and I will continue to update this thread (hopefully more frequently) as I make progress.

Have you considered what happens when you simply change the name of a .ino file to .py and try to run it using Python? I wonder how few changes would actually be needed before it would work?

Might that be a simpler approach?

...R

Apologies for the late reply, got caught up with work.

There are a couple of difficulties in the approach you're describing. The ESP32 does support both C/C++ (Arduino) and MicroPython, and the MicroPython code looks very different from the equivalent Arduino code. Python handles everything from datatypes to indentation very differently.

Also there's the question of making the code Pythonic. C/C++ code is written in a different style than how Python is usually written, and Python has a certain accepted coding style that should be followed. Obviously doing this automatically is not trivial, but that's why the project is 3 months long :slight_smile: to overcome such difficulties.

Update: 03/06/2020

I've hit a few snags with the symbol table development, also trying to make changes in the AVR-GCC. It's mostly due to gaps in my knowledge. Rather than rushing out even more code, I'm trying to restructure everything and make it well organised.

By 10/06/2020 I'll post a more comprehensive comment with hopefully a barebones working transpiler.

Regards,

Ashutosh

ashutosh_pandey:
Also there's the question of making the code Pythonic.

That certainly wouldn't bother me if it worked. There is a expression "the best is the enemy of the good"

...R

Weekly Update #3

After reading this thread and consulting with the mentors, I have decided to alter my approach and use existing Clang compiler instead of making everything from scratch. This will lead to a slight change of plans, but should eventually end up being faster, more complete and allow me to focus on the difficult part of the project.

  • Generating an accurate AST requires significant work, and more importantly, there's tools out there that make the exercise redundant. Clang/LLVM produces a better AST compared to GCC from what I'm reading, and the documentation is a lot easier to work with. Ref: https://clang.llvm.org/hacking.html

Tasks accomplished -

Task for this week -

  • Generate an accurate AST from Clang, taking care to note the optimisations that makes outputted code less readable should be avoided.

  • Figure out corresponding MicroPython keywords for the Arduino Language ones.

Weekly Update #4

Tasks Accomplished:

1.) Set up Clang to generate the AST for standard C++ (without Arduino specific keywords).

2.) Compared the output from AVR GCC AST to Clang AST.

3.) Extracted the output from the Arduino Preprocessor (both by using the Arduino CLI - arduino-cli compile --fqbn arduino:avr:mega MyFirstSketch.ino --preprocess ) and by using verbose output in Arduino IDE.

Tasks pending:

1.) Integrate the Arduino Core API (AVR and SAMD) into the AST using Libtooling, this will enable us to get a full proper AST output.

I am maintaining a google doc with details of my work, I'll organise the github repo once I make more progress. The google doc can be found here: GSoC Notes - Google Docs

Hi Ashutosh,
I'm the lead on CircuitPython and would be happy to advise on any MicroPython/CircuitPython questions. I think your main challenges will be 1) adapting the C++ API to Python's API style and 2) testing and fixing converted drivers. The Arduino libraries I've seen tend to reply on code execution timing that can't always be relied upon in MicroPython or CircuitPython.

Do you have a prioritized list of libraries? Have you thought about using the existing CircuitPython libraries as a starting point?

~Scott

Hi Scott,

Thank you for your interest in my project! Regarding your questions, it's something like this:

1.) My first goal is to integrate the Arduino Core API with Clang (which parses C++) . This is proving to be a little challenging mainly because what goes on in the Arduino compile process is not very simple to break down, but I'm working on it.

The goal is to first be able to read and map very simple 'Arduino Language' commands like digitalRead() , digitalWrite() , ledPin, High, low etc.

Basically all the core keywords defined in the Arduino Language reference. That is my starting point. I want to first be able to translate something like 'Blink' sketch and build my way up from there.

There are a list of Arduino libraries listed on https://www.arduinolibraries.info and I'll try to start with the more common ones. I haven't made a list yet, as my focus right now is on proving correctness of output first.

2.) I actually did go through the Circuitpython GitHub repo while drafting my proposal. I'll take a look at its libraries more closely now that you've mentioned it can be used as a starting point, because I'll need examples of Arduino code and equivalent python code to compare outputs once I get the MicroPython output working.

I'll be updating this thread weekly, hopefully within some weeks I'll have a rudimentary working version , from there I can work on optimising it :slight_smile:

Regards,

Ashutosh

Weekly Update #5

1.) Building Makefile for Arduino Preprocessor.

2.) Learning how to use LibClang.

3.) Working on Micropython Symbol table.

4.) Studying the Arduino Core API

Miscellaneous:

My project mentors asked me to open an issue or make a PR if I saw any discrepancy in the documentation/code in the course of my project work. Keeping that in mind, I opened the following issues and made a few PR's.

1.) Arduino- AVR core Readme: Unlike the Arduino SAMD core, the AVR core did not have a readme file with some basic details about the content. So I opened an issue: Issue #349 and made the following PR: PR #350

2.) Fixed a small typo in Arduino SAMD readme: Fixed a typo by ashpande · Pull Request #533 · arduino/ArduinoCore-samd · GitHub

3.) Arduino Language Reference: The 'Bits and Bytes' section did not have code examples and notes, and in one case the return value was incorrect. I opened the issue: Improving Documentation in Bits and Bytes section of the reference · Issue #748 · arduino/reference-en · GitHub

and made the following PR's:

bit()

bitClear()

bitRead()

I will surely be doing more of this alongside my project work if the PRs meet Arduino's standards.

Weekly Update #6

Tasks accomplished:

1.) Built basic program to parse Clang textual output. Identifies various programming constructs like punctuators, variables, literals etc.

  1. Program compiled with Arduino Core (Hardware Independent version). Stuff like LED_BUILTIN is still broken. Have to come up with replacements for arduino specific keywords that Clang can parse, then translate to micropython.

Tasks ongoing:

1.) Building program that contains references to arduino specific keywords, to get a proper parse tree.

2.) Building a simple preprocessor tool that removes #include directives, because that is giving me problems.

Blockers:

None

Weekly Update #7

Tasks accomplished:

1.) Front end of compiler almost done, will upload code as soon as testing is finished and bugs are ironed out. Probably early next week.

2.) Testing is underway with programs of different lengths, testing the AST output

Tasks ongoing:

1.) Identifying the hardware specific features of the Portenta H7 for MicroPython.

Blockers:

Details on Portenta MicroPython port are scarce, must ask mentors about this.

Weekly Update #8

Tasks accomplished:

1.) Front end done

Tasks ongoing:

Working on MicroPython commands

Identified hardware specific Portenta features

Blockers:

None

Weekly Update #9

Tasks accomplished:

1.) Built a part of the code generator

2.) Started testing the output

Blockers:

1.) The Clang/LLVM build system was taking too long because my laptop didn't have enough RAM/ Powerful enough processor. So I decided to get an AWS EC2 instance for the time being to speed up my work.

2.) Need to procure a Portenta board to test out the code.