High level language -> C++ compiler

Hi all,

If you wanted to use a high level language on a micro-controller, there's not a lot of options. There is micropython and Squirrel, but they are both interpreted and relatively slow. They also make heavy use of heap memory. C++ is perhaps the best option if you want a highish-level object-orientated language, but it is a relatively verbose and difficult language (compared to python and so on). Header guards, maintaining both source & header files . . . even class access modifiers. After working with a high level language, these all seem so unnecessary.

I've been using Vala lately for some desktop programming tasks. It's a nice language to use. It's a language that has C# inspired syntax and it compiles down to C code. It's fast to develop with, safer than C, and the executables it produces are fast! Wouldn't it be nice to have an equivalent for Arduino?

What I'd like to do is create a language that contains all the niceties and built-in safety of Python and other high level languages. It must compile down to a subset of C++ and it must be fully (mostly?) compatible with the mountains of other C/C++ code out there for Arduino.

To make it compatible with C++, the source-to-source compiler wouldn't be able to do any type checking - that'd be left to the C++ compiler. It probably wouldn't be that difficult really. C++ already has all the features, it's just a matter of building a parser & code generator that can convert this hypothetical syntactic sugar-code into C++.

Does this sound like a good idea, or a waste of time? Maybe something like this exists already?

Writing programs for a microprocessor is very different from writing programs for a PC which is many times faster, has many times more memory and has an operating system.

High level languages such as Python take advantage of all those features.

A program for a microprocessor must be lean and mean.

I only use C/C++ when I have no choice. I use Python or Ruby for PC programming. I am not aware of anything apart from C/C++ that allows sufficient low level control of the Arduino's features. Assembler would, but that is just too tedious.

IMHO if you want to improve the life of Arduino programmers you need to start with C/C++ and figure out what syntactic sugar would be helpful - rather than force another language into the small shoes of an Arduino.

...R

Robin2:
Writing programs for a microprocessor is very different from writing programs for a PC which is many times faster, has many times more memory and has an operating system.

High level languages such as Python take advantage of all those features.

A program for a microprocessor must be lean and mean.

I only use C/C++ when I have no choice. I use Python or Ruby for PC programming. I am not aware of anything apart from C/C++ that allows sufficient low level control of the Arduino's features. Assembler would, but that is just too tedious.

IMHO if you want to improve the life of Arduino programmers you need to start with C/C++ and figure out what syntactic sugar would be helpful - rather than force another language into the small shoes of an Arduino.

...R

Yeah, you're right. The problem I'm trying to solve is partly syntactic sugar, but another part is the process of developing C++ code. C++ pretty much requires that you maintain a header and a source file for each class or logical module. Granted some people do just write HPP files and stick it all in the header, but that massively bogs down compile time if you ever need to tweak something. All other languages that I've used only have a single source file, with definitions and implementations all lumped together - and I like that. Simply removing the need to include headers and write header guards would be a small, but welcome convenience for me.

Here's an example of a feature that I love from Vala. Enums can be converted to a human readable string just by calling .to_string()

enum COLOR
{
RED,
GREEN,
BLUE
};

...

COLOR color = COLOR.RED;
printf(color.to_string());

This would print out 'COLOR_RED'. You can override the to_string function to make to produce your own text. This is something that I can see being extremely useful in an embedded environment.

Camel:
but another part is the process of developing C++ code. C++ pretty much requires that you maintain a header and a source file for each class or logical module.

That is generally not needed with an Arduino program.

I may easily be wrong, but I get the feeling your thinking is running in PC programming grooves rather than being adapted to 2K of SRAM.

...R

Robin2:
That is generally not needed with an Arduino program.

I may easily be wrong, but I get the feeling your thinking is running in PC programming grooves rather than being adapted to 2K of SRAM.

...R

Not at all! This whole idea is geared towards micro-controllers. I keep mentioning Vala because it's high-level and compiles down into C with hardly any bloat at all. It's all about reducing verbosity and programmer workload without blowing out in generated code size. Vala proves that it's possible and that using C/C++ as a stepping stone is a good way to do that.

I could write a thousand lines of C++ in several files, or with this tool I write 500 lines of code in a couple of files and it generates the equivalent C++ for me. That's the idea.

Here's my efforts so far.

I've called it CoffeeCat - named after the Asian palm civet that poops out Kopi Luwak coffee. Beans go in, coffee-plus-poo (CPP) comes out :slight_smile: . Also to raise awareness of the terrible mistreatment of these civets.

Here's some of the key features that I'm proposing.

  • No pointers, no explicit references, no new/delete or malloc/free.
  • No arrays. Not C-styled arrays anyway. I propose the use of lists where the maximum length is specified and pre-allocated on the stack or in global space. This can be achieved in C++ using templates (see etk::List & example). etk::List provides bounds checking and gives the illusion of dynamic growth & shrinkage.
  • No implicit deep-copying, except for atomic types (chars, ints, bools). All object instances are references. To create a copy of an object you must use a copy keyword. I think will be a performance benefit from this, since often people deep-copy willy nilly without thinking about it.
  • In the generated C++, all objects passed as parameters to functions are passed as const references by default so functions cannot sneakily modify objects passed to them. To make CoffeeCat pass a non-const modifiable reference you must explicitly use the 'out' keyword. This makes it crystal clear in the source code that the function could change the object you are passing it. To pass a copy of an object, you would use the copy keyword.
  • No access modifiers. Public, protected and private are gone. We're after simplicity, and access modifiers introduce a whole new level of complexity for what I think isn't much benefit. If someone shouldn't touch a member variable, then document it.
  • No exceptions. No exceptions.
  • There will be a string type and it will be easy to convert atomic types to and from strings. strings will not use dynamic memory and will probably use etk::StaticString under the hood.
  • Enums can have member functions. This is surprisingly useful in C#/Vala and saves a lot of time.
  • Possibly getting a bit ambitious now, but I'd like to have tuples and dictionaries. C++14 has these already, but they are implemented in a way that isn't friendly towards tiny micro-controllers. It should be possible for the CoffeeCat code generator to build some micro-controller friendly code that does this.
  • C++ inlining. Sometimes it's necessary to write to a specific address (pretty much all low level hardware stuff comes to mind). That would be impossible in CoffeeCat due to lack of pointers, so CoffeeCat will use the extern keyword to pass C++ code through. Or you could implement driver level code in C++ and call it from CoffeeCat.
  • I'm thinking no templates either . . . not sure about this yet. There will probably be no templates for a while anyway simply due to the work involved in parsing them.

All other C++ features will be preserved, provided the CoffeeCat parser can parse it. All the normal C++ control statements such as for loops, switch case, while, do while, if else if else will all be in there. goto won't exist.

Does this all sound reasonable??

Camel:
Does this all sound reasonable??

I don't know. I think you may have a marketing problem.

I know enough C/C++ to program my Arduinos. Why should I even spend time looking at your stuff to see if I would like it?

How would someone who does not yet know C/C++ find your program so that s/he could have an easier intro to Arduino programming?

To attract the attention of existing Arduino users maybe you could take the 3rd example in Serial Input Basics and reproduce it in your new code so we can see the two side by side.

Or make a better pair of examples if you prefer.

...R

Robin2:
I don't know. I think you may have a marketing problem.

Thats ok :slight_smile: This is a hobby project and admittedly a bit of a pipe dream, but for what it's worth . . .

Robin2:
I know enough C/C++ to program my Arduinos. Why should I even spend time looking at your stuff to see if I would like it?

Perhaps you prefer the syntax, or maybe it saves you time? I hope it's feature rich enough to give you the tools you need without causing clutter and confusion as C++ often does.

As much as I love C++, here's an interesting page. Unlike the author of that article, I don't believe C++ is bad, but I do think it's easy to write bad code with. I expect CoffeeCat will allow users to focus on solving their problem rather than distract them with language quirks. It's easier and safer than C/C++, but just as small and fast.

Robin2:
How would someone who does not yet know C/C++ find your program so that s/he could have an easier intro to Arduino programming?

Yep, that's going to be a problem :slight_smile: If it works and fills a void, then it will be adopted as people discover and share. That's partly why I started this thread - to get some feedback to see if the idea resonates.

I think it's going to take a while just to get it working, so I'm not too worried about spreading the word just yet. Examples will come in due course. I haven't gotten it parsing classes or designator lists yet, so still a long way to go before basic functionality is achieved.

Oh! And variables must be initialised when they are declared. That's another mandatory rule in CoffeeCat. I've been stung by bugs due to variables not being initialised countless times. It's soo easy to do, especially before C++11 when you could only initialise members in the constructor.

There's at least a million and one ways to chase your tail with C/C++. I love the language, but you've gotta be careful!

I don't think C/C++ is "bad" - just a PITA.

But you have not yet convinced me that I would get any net value for the time spent learning your system starting from my existing knowledge base.

Post an example.

...R

Ok, well, none of this has actually been implemented so this is all hypothetical. But here's a string example

In C/C++

int a = 50;
char buffer[20];
sprintf(buffer, "%i", a);
Serial1.println(buffer);

In CoffeeCat

int a = 50
Serial1.println(a.to_string<20>())

Enums in C

enum STATES
{
    LOW,
    MID,
    HIGH
};

...
char buffer[128];
switch(state)
{
    case LOW:
        sprintf(buffer, "State is LOW");
        break;
    case MID:
         sprintf(buffer, "State is HIGH"); etc etc

Serial1.println(buffer);

In CoffeeCat

enum STATE:
    LOW,
    MID,
    HIGH

Serial1.print("State is")
Serial1.println(state.to_string<20>())

In C++

void foo(Object& o)
{
    o.member = 50;
}
...
foo(myobj);

how do you know if foo has changed myobj?? you don't!

In CoffeeCat

def void foo(out var o):
    o.member = 50

foo(out myobj)

By enforcing the use of the out keyword, it's clear that myobj will be modified by foo

hmm, the enum thing at second glance doesn't seem overly helpful to beginners (just my opinion).

enum STATE:
    LOW,
    MID,
    HIGH

Serial1.print("State is")
Serial1.println(state.to_string<20>())

This implies the user must know what length to provide. Still easy to make mistakes here. Its really just swapping one problem for another. If this is to make things easier, it should be hiding these details in the background, say: state.to_string()

At least in C++ if the user creates say a switch to decide what string they want to print, they are wholly responsible for providing correct storage. And if using a pointer, there is really little need to even care about the length.

You also may have over looked the basics already provided.

int a = 50

Serial1.println(a.to_string<20>())

Using Arduino's API:

int a = 50;
Serial1.println(a);

I know there are cases where you may want the text, but not printed out. This is where the String library is actually useful for beginners. I know the whole dynamic memory rant, but free() was fixed long ago, and if used inside a function making it short lived its quite safe.

String num = 50;

Then there is itoa() which is probably a bit simpler than using sprintf() for it. Not to mention GString found in my PrintEx library. I can't remember but I think your etk lib had a similar feature.

Also, if you plan to not include pointers, I would follow C# and use '&' instead of 'out' as it is keeping one symbol for one thing (references).

pYro_65:
hmm, the enum thing at second glance doesn't seem overly helpful to beginners (just my opinion).

enum STATE:

LOW,
   MID,
   HIGH

Serial1.print("State is")
Serial1.println(state.to_string<20>())




This implies the user must know what length to provide. Still easy to make mistakes here. Its really just swapping one problem for another. If this is to make things easier, it should be hiding these details in the background, say: `state.to_string()`

That's a good point. I suppose the compiler will know how long the string needs to be, so manually specifying the length is actually unnecessary. I should mention that there will be bounds checking on strings, so if you were to run out of space it wouldn't buffer overrun and corrupt memory.

pYro_65:
You also may have over looked the basics already provided.

Using Arduino's API:

int a = 50;

Serial1.println(a);

Haha! Of course. facepalm That wasn't exactly a good example to pick. But I think it illustrates a point still. Imagine if println was some other function that wasn't overloaded for ints. Having a simple, in-expression way to convert ints to strings is very nifty. I know operator overloading, templates and other C++ trickery can overcome a lot of these types of issues, but along with cool C++ tricks comes a massive learning curve. And I think that puts a lot of people off, so they continue to use C functions that are verbose and potentially unsafe.

pYro_65:
Also, if you plan to not include pointers, I would follow C# and use '&' instead of 'out' as it is keeping one symbol for one thing (references).

Thanks for the suggestion. I'm kind of backpeddling on this concept a little. I actually have no experience with C#, but Vala (which is apparently very similar) uses heap memory and reference counting under the hood, so objects don't go out of scope, they get cleaned up when there's no need for them anymore. If I were to make something that uses references to objects on the stack, they will go out of scope and that will lead to all kinds of issues. Not really sure what the best approach is from here. Maybe it'll just have to use heap memory? Or maybe I'll have to make my compiler clever enough to recognise this and generate an error.

Robin2:
To attract the attention of existing Arduino users maybe you could take the 3rd example in Serial Input Basics and reproduce it in your new code so we can see the two side by side.

Alright! Again, this is purely hypothetic. This language doesn't exist yet. After reading your 3rd example, I figured I could do something to the same effect by extracting sub strings.

global int num_chars = 32
global var receivedChars = string<numChars>()
global bool newData = false

def void setup():
    Serial.begin(9600)
    Serial.println("<Arduino is ready>")

def void loop():
    recvStuff()
    showNewData()

def void recvStuff():
    receivedChars.clear()
    var raw_chars = string<numChars>()
    while(Serial.available()):
        receivedChars += Serial.read()
    
    int start_pos = 0
    int count = 0
    while(count < receivedChars.length()):
        if(receivedChars.get_char(count) == '<'):
            start_pos = count
        else if(receivedChars.get_char(count) == '>'):
            raw_chars.sub_string(out receivedChars, start_pos, count)
            newData = true
            break
        count += 1
    
def void showNewData():
    if(newData):
        Serial.print("This just in ... ")
        Serial.println(receivedChars)
        newData = false

Admittedly, your original algorithm could be faster that mine but probably not by much and there's no reason you couldn't rewrite your algorithm in CoffeeCat. Just thought I'd show off the proposed syntax & a string feature.

Juniper, a functional reactive programming language for the Arduino is now available.

http://www.juniper-lang.org/

calebh:
Juniper, a functional reactive programming language for the Arduino is now available.

http://www.juniper-lang.org/

Ha, cool!! That's really great :slight_smile:

I do have some questions though. After looking through the source, it seems heap memory is used a fair bit. Is it possible to write code that does not use the heap?

Does Juniper use exceptions? I see that shared_ptr uses 'new' which will throw an exception if it fails to allocate more memory. Can that exception be caught? The alternative, of course, is to use malloc and placement new/delete and then have an operator overload for casts to 'bool' so the pointer can be checked for null.

Camel:
Ha, cool!! That's really great :slight_smile:

I do have some questions though. After looking through the source, it seems heap memory is used a fair bit. Is it possible to write code that does not use the heap?

Does Juniper use exceptions? I see that shared_ptr uses 'new' which will throw an exception if it fails to allocate more memory. Can that exception be caught? The alternative, of course, is to use malloc and placement new/delete and then have an operator overload for casts to 'bool' so the pointer can be checked for null.

As far as I know, the AVR compiler doesn't have support for C++ exceptions, so even if you wanted to use C++ exceptions you wouldn't be able to. Right now there is no exception handling in Juniper. I'm planning on maybe adding something like exceptions4c lite to Juniper.

Juniper doesn't use any heap memory other than for refs and function closures. In most of the Juniper programs that I've written, I've found refs to be completely unnecessary except their use as global variables. If refs are only used in global variables, the memory is allocated immediately when the program starts, and is never freed. Also tracing garbage collectors are the ones that stop the world while collecting and fail to respond. Juniper uses a reference counting system, which does not pause the program at random times for collection. The downside is that references that are cyclic will never be collected.

Closures are what worries me most about memory usage. Unfortunately due to the funarg problem we cannot avoid using the heap.

Everything else is stack based, including arrays and lists. Theoretically it should be possible to determine the maximum size of the stack at compile time. This is probably a feature that's quite a ways away since there are more critical things to work on right now (unless you want to contribute!)

Edit: I looked through the CoffeeCat source code and found it very pleasant to read. Unfortunately I think your language of choice for writing the compiler will really hold you back. Why not use a ML family language which are designed for writing compilers?

I've basically given up on getting this to work for Arduino since it requires C++14 and Arduino is stuck with an old compiler. I'm just chipping away at the parser/code generator and getting a feel for what I'm trying to achieve here. So far I've been able to get it to build code for the STM32F103.

The re-make of Robins Serial Input example now compiles and works on the STM32.

Some examples here

There's still a lot of stuff that doesn't work yet - like enums. Thoughts anyone?

What can't you do that requires C++14?

pYro_65:
What can't you do that requires C++14?

It's mostly because the ETK dependency uses auto return types seemingly everywhere. It probably wouldn't take much effort to port it down to C++11.