Designing a new programming language for Arduino

I think that however the Arduino is programmed there is a problem with the mixed background of its users. I came to it with no experience of C let alone C++ but I had used other languages in the past, mostly various flavours of BASIC including VBA, Z80 assembler and monstrosities such as Mumps. As a result I needed to learn the C/C++ syntax for constructs that I expected to be there but logic is logic whatever language is used.

It is difficult to be certain, but were I in the position of a complete novice then from what I have seen of the proposed new language I don't think that they will be any better off. One problem will always be that, by definition, computer languages are created by experienced computer programmers. What sounds simple and logical to them is gobbledegook to outsiders.

UKHeliBob: I think that however the Arduino is programmed there is a problem with the mixed background of its users. I came to it with no experience of C let alone C++ but I had used other languages in the past, mostly various flavours of BASIC including VBA, Z80 assembler and monstrosities such as Mumps. As a result I needed to learn the C/C++ syntax for constructs that I expected to be there but logic is logic whatever language is used.

I think your situation is slightly different because you already were familiar with programming logic and concepts before you started with C/C++. For many people here - C++ is the first language that they learn, and I think it is a bad choice for the first language (even a bit of prior experience with Basic would help a lot in my opinion)

It is difficult to be certain, but were I in the position of a complete novice then from what I have seen of the proposed new language I don't think that they will be any better off. One problem will always be that, by definition, computer languages are created by experienced computer programmers. What sounds simple and logical to them is gobbledegook to outsiders.

This is also a fair point, hopefully I can make it more useful and easy to understand, that is why I really appreciate any feedback.

YemSalat: I will probably have a go at writing a very basic transpiler in the next few days,

I forgot to say that you also need to write the book that explains how to use your language.

Actually I think you should write the book first and then make a language that complies with the book. As you mentioned Ruby earlier you may be familiar with the "pickaxe" book - Programming Ruby by Dave Thomas. That is the sort of thing I have in mind - but maybe yours would not need 800 pages for Version 0.1

IMHO without proper documentation there is no point writing a language aimed at making things easier. And by proper documentation I mean explanations of why you should do things and what is the purpose of each thing, not just a few examples that each show a single "how" to use a feature with no attempt to show alternatives or exceptions.

...R

I really appreciate any feedback.

Let me know when you have something to test and I will assume my alternative identity of "British Standard Idiot" and put it to the test.

@Robin2 We'll see how it goes, I think version 0.1 will fit on a couple pages :)

@UKHeliBob hehe, that kind of attitude would certainly be helpful! I will post a separate topic in the "Projects" section once I have something to show.

A small update. So far I have settled on the tools, I will be using Jison to generate the parser - http://jison.org Jison is a JavaScript port of Bison compiler-compiler, I have been working mostly with nodejs in the past couple years, so I feel most comfortable writing the parser in JS. It also completely solves the cross-platform compatibility issue, as it will work across all operating systems out of the box.

The parser will build a syntax tree, then I will convert it to C/C++ code which can be just copy-pasted into the Arduino IDE.

For the first "milestone" I am planning to accomplish the following:

  • optional (inferred) types for variable declarations
  • some default types - int, float, long (Strings might have to wait until the next one)
  • if / else statements
  • for loops
  • functions
  • some basic built-in functions for pin manipulation

So, this would be just a 'proof of concept' version, nothing too serious. I will upload it to github once I have something to upload :)

No timeframe for now, but hopefully I will present something by the end of the week.

YemSalat: @Robin2 We'll see how it goes, I think version 0.1 will fit on a couple pages :)

Within reason, the more pages in the manual the better. Most open source software lacks adequate documentation.

...R

@Robin2, if the project goes well - I will definitely make sure to write proper documentation and prepare some starter resources.

A quick update.

I switched from Jison to pegjs for the compiler (http://pegjs.org) as it supports PEG notation which is much simpler to write. It also has some really useful examples: https://github.com/PhilippeSigaud/Pegged/wiki/Grammar-Examples Even a complete grammar for C!

So far I added the following features to the language parser:

  • declaring variables (6 types so far: byte, float, int, long str, bool)
  • 'optional' types
  • optional semicolons
  • assignment / comparison / math operations
  • if and switch statements
  • for loops
  • literals for time (like: [b]1m 15s[/b] - converts to an integer containing the number of milliseconds)
  • wait and every constructs for basic timing (like: every 5s { /* ... */ })

I am yet to build the compiler itself (I am also planning a separate evaluator), but the parser seems to work pretty well so far (couple small issues, but I know how to fix them).

Here is a simple example, this expression:

a = 5
b = (a + 2) * 3

Will generate the following syntax tree (in JSON format):

{
   "$": "COMPILE",
   "body": [
      {
         "$": "EXPRESSION",
         "expression": {
            "$": "ASSIGNMENT",
            "operator": "=",
            "left": {
               "$": "IDENTIFIER",
               "name": "a"
            },
            "right": {
               "$": "LITERAL",
               "value": 5,
               "type": "int"
            }
         }
      },
      {
         "$": "EXPRESSION",
         "expression": {
            "$": "ASSIGNMENT",
            "operator": "=",
            "left": {
               "$": "IDENTIFIER",
               "name": "b"
            },
            "right": {
               "$": "MATH_EXPRESSION",
               "operator": "*",
               "left": {
                  "$": "MATH_EXPRESSION",
                  "operator": "+",
                  "left": {
                     "$": "IDENTIFIER",
                     "name": "a"
                  },
                  "right": {
                     "$": "LITERAL",
                     "value": 2,
                     "type": "int"
                  }
               },
               "right": {
                  "$": "LITERAL",
                  "value": 3,
                  "type": "int"
               }
            }
         }
      }
   ]
}

The next thing that I am quite excited about - is that I started creating and IDE for the language, and I just keep realizing how useful it can be for beginners. I am using Ace (https://ace.c9.io/) for the text editor and it already has a ton of useful features built in (like autocomplete!), but what is even more important - I can evaluate all the code in 'live' mode and detect errors right away, without having to 'compile'. Ace has some nice APIs for code highlighting, etc.

Granted that the parser is very simple so far, but I did a few tests and I am able to run it in a web worker (it allows running multi-process code in JavaScript as JS is single-threaded by nature) - and it parses 1000 lines of code instantly as I type (I will share a demo soon)

I am not going to spend too much time on the IDE right now, just gonna build some basic stuff around the text editor so its easy to test the language when the first version is out.

Again, no time frame so far, but aiming to release something by the end of the weekend or early next week.

I will also definitely need help with optimizing the resulting C/C++ code, hopefully there is enough people around here who can help with that :)

If those two lines require a tree like that the world's forests will soon be devastated. :)

...R

Robin2:
If those two lines require a tree like that the world’s forests will soon be devastated. :slight_smile:

…R

Hehe :slight_smile: thats just because the language is low level and not dynamic, so all the instructions are quite explicit and almost always can convert straight to C/C++ without much modification. Anything more ‘sophisticated’ would require a more complex runtime which we just can’t afford on the Arduino.
(Also JSON format makes it look bigger then it is)

Just for the heck of it, I generated parse trees for the same expression in Python and C.

Original expression:

a = 5
b = (a + 2) * 3

Python AST (python has a built-in AST generator)

Module(
    None,
    Stmt(
      Assign(
        AssName('a', 'OP_ASSIGN'),
        Const(5)
      ),
      Assign(
        AssName('b', 'OP_ASSIGN'),
        Mul(
          Add(
            Name('a'),
            Const(2)
          ),
          Const(3)
        )
      )
    )
  )

C (generated with clang)

TranslationUnitDecl 0x10302d6c0 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x10302dc00 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
|-TypedefDecl 0x10302dc60 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
|-TypedefDecl 0x10302e020 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list '__va_list_tag [1]'
`-FunctionDecl 0x10302e0c0 <c1.cpp:1:1, line:6:1> line:1:5 main 'int (void)'
  `-CompoundStmt 0x10302e3d0 <line:2:1, line:6:1>
    |-DeclStmt 0x10302e248 <line:3:3, col:12>
    | `-VarDecl 0x10302e1d0 <col:3, col:11> col:7 used a 'int' cinit
    |   `-IntegerLiteral 0x10302e228 <col:11> 'int' 5
    `-DeclStmt 0x10302e3b8 <line:4:3, col:22>
      `-VarDecl 0x10302e270 <col:3, col:21> col:7 b 'int' cinit
        `-BinaryOperator 0x10302e390 <col:11, col:21> 'int' '*'
          |-ParenExpr 0x10302e350 <col:11, col:17> 'int'
          | `-BinaryOperator 0x10302e328 <col:12, col:16> 'int' '+'
          |   |-ImplicitCastExpr 0x10302e310 <col:12> 'int' <LValueToRValue>
          |   | `-DeclRefExpr 0x10302e2c8 <col:12> 'int' lvalue Var 0x10302e1d0 'a' 'int'
          |   `-IntegerLiteral 0x10302e2f0 <col:16> 'int' 2
          `-IntegerLiteral 0x10302e370 <col:21> 'int' 3

As you can see, C generates a more ‘verbose’ AST, but it can be converted almost directly into assembly. While Python’s AST is more concise - it requires a massive runtime to support all of the entities.

A small sneak peek of what the error highlighting looks like in the IDE I am making:

Yet another update:

So far I am happy with how the project is progressing. I think it will take a little bit more time until the first release, as now I am not only building the compiler but also the IDE which comes with it.

Compiler progress: I have built an evaluator (code analyzer), which can check whether the types are correct, if all variables have been declared, etc. It can also put additional info into the parse tree (e.g. types for inferred declarations) It still needs a few tweaks, but overall its initial version is complete.

Right now the compiler consists of three main parts:

  • The Parser - Checks the code syntax and generates the first parse tree

  • The Evaluator - Analyzes code semantics and generates the second parse tree

  • Code Generator - (I am yet to write this one) - it will produce C++ code for Arduino; As I said before I may need help from the community in the future in order to optimize it properly.

IDE progress: The IDE is going well, especially since I added the analyzer - it can now point out errors like undefined variables, incorrect types, etc. Its really nice to have a compiler 'built-in' because we can check not only the syntax, but also the semantics of the code, and all that can be done 'live'

Besides that it allows for some other great functionality in the future, for example, we could develop an Arduino emulator which can be used for testing code before uploading to the actual device.

I also checked out some other similar projects like https://codebender.cc/, but I think my approach has some benefits over what they are doing.

Other stuff: Putting aside the compiler implementation details, here is a somewhat interesting 'quirk' that I stumbled upon while developing the languge.

As I mentioned before I wanted to add 'optional' types to the language, meaning that the type of the variable can be inferred from its value during assignment. E.g. ( [color=blue]b[/color] = [color=brown]4[/color] ) - here b is obviously an int I also made semicolons optional as I think it can make the code look more 'approachable' to beginners

But take a look at this situation:

a =     // accidentally forgot to assign value
        // (real life example, I did this while testing the IDE)
b = 3

in a system with no declaration types and no semicolons this converts into:

int a = b = 3

Which can lead to some [u]weird[/u] errors, and the compiler also would not be able to catch such an error.

I decided to fix this by not allowing multi-line assignments, so both left and right parts of the assignment must be on the same line. I think that this is a good enough fix, but this made me think whether optional types will bring more benefits then problems. I will leave them in for now, but this is something that we can discuss in the future.

Ideas: I also have a couple ideas that I want to add to the language.

For example, 'short-hand functions' (or 'dynamic variables') which could be declared with -> operator, and are really just a shorter way to express a function with no arguments

So you can write something like this:

a = 1
b = 2
c -> a + b
serial_println( "Sum is " + c ) // Sum is 3

a = 3
serial_println( "Sum is " + c ) // Sum is 5

Also sub-string expressions, which are kind of like string 'templates' and act as an alias for sprinf() :

foo = "Hello"
serial_println( "{foo} World!" ) // Hello World!

Anyways, its important to focus on getting the basics right for now :) I will post a new update soon.

PS I updated the topic title since it has become more of a 'developer notes' kind of post.

Why not use the Pascal trick (inherited from ALGOL) of distinguishing an assignment ( x := y ) from a comparison ( if ( x = y) )?

Groove: Why not use the Pascal trick (inherited from ALGOL) of distinguishing an assignment ( x := y ) from a comparison ( if ( x = y) )?

or the C trick of distinguishing an assignment ( x = y ) from a comparison ( if ( x == y) )?

I'm sorry, but if a beginner cannot use = and == properly then they are unlikely to use := and = properly.

The thing is, at school you're taught that = is part of an equation ( left side of = sign is identical to right hand side), so in C, "x = x + 1;" is clearly arithmetically nonsensical.

in C, "x = x + 1;" is clearly arithmetically nonsensical.

True, but I am not convinced that beginners worry about such niceties. Personally I have never had any problem understanding the x = x + 1 construction and have never given a single thought to the fact that both sides of the = are normally identical. My brain simply says that as a result of the operation x will now equal its original value plus 1.

at school you're taught that = is part of an equation ( left side of = sign is identical to right hand side)

( if ( x = y) )

Here both sides of the = may or may not be identical. Where does that leave the notion that both sides must be identical ?

Here both sides of the = may or may not be identical.

You've just asked the question "*Here both sides of the = may or may not be identical. * the left side is equal to the right".

Personally I have never had any problem understanding the x = x + 1 construction

It was one of the hardest things I had to overcome, converting from ALGOL and Pascal.

in a system with no declaration types and no semicolons this converts into:

Why wouldn't you treat the end-of-line as the end of a statement if there is no semi-colon?

It is possible to have continuation character for the odd occasions when code needs to run on to a second line.

Indeed, if semi-colons don't matter why not just run over the code and delete them and always use the EOL as the delimiter.

...R

Groove: Why not use the Pascal trick (inherited from ALGOL) of distinguishing an assignment ( x := y ) from a comparison ( if ( x = y) )?

I have been thinking about this myself, I am not entirely convinced this is a good solution, I think it makes the language a little bit more 'cryptic'. I've been playing with Go language a couple month back, it is similar to Pascal in this respect, and guess which question comes up ever so often in Go community ( http://stackoverflow.com/questions/16521472/assignment-operator-in-go-language )

UKHeliBob: or the C trick of distinguishing an assignment ( x = y ) from a comparison ( if ( x == y) )?

This is what I am doing now. Overall I would prefer to keep the syntax more similar to C (since it is so popular among embedded systems programmers)

I remember I was using some program back in the day to generate simple animations, I can't remember what was the name of the program, but it had a simple scripting language built into it, and in that language the '=' sign was treated differently depending on the context.

So for example, this is assignment: a = 23 But this is equality check: if ( a = 17 ) - since you rarely want to actually assign something inside an if statement.

This makes sense when you translate the code into English and replace '=' with 'is', so you get:

a is 23
if a is 17

I remember that I never had problems with this concept, perhaps could do something similar? I am not a big fan of context-dependent behaviour though.

Robin2: Why wouldn't you treat the end-of-line as the end of a statement if there is no semi-colon?

It is possible to have continuation character for the odd occasions when code needs to run on to a second line.

Indeed, if semi-colons don't matter why not just run over the code and delete them and always use the EOL as the delimiter.

...R

That's pretty much what is happening now, Assignment expressions were just one of the edge cases where this can actually break things. All other expressions seem to be 'multi-line safe'. E.g. you can also define multiple variables like so:

int a = 1,
    b = 2,
    c = 3

Why not use the Pascal trick (inherited from ALGOL) of distinguishing an assignment ( x := y ) from a comparison ( if ( x = y) )?

Two symbols are two symbols, I have no problem with ==. At the same time, context within an if( ) makes sense too, I've never written anything where I assigned a value within an if ( ) statement, that is just too convoluted for me. Write it as two lines, let the compiler deal with it.

Especially where : needs a Shift press too.

CrossRoads: Especially where : needs a Shift press too.

Agree 100%. That's also why I hate underscore characters in code.

...R