Weekly Update 1:
As specified in the guidelines, I have to post weekly guidelines on what I'm doing. This is the first week update.
Lexical Analyser: The first part of building a compiler (or transpiler) is to build a lexical analyser to take the input language (In our case, Arduino) and split it into tokens.
Arduino Language is a subset of C/C++ , So first, we have to define the various essential parts of a language such as:
1.) Operators: + - * / & ++ -- == < > etc
2.) Punctuation: ( ), { } , [ ], : , ; etc
3.) ConstantLiterals: Such as decimal numbers, hexadecimal numbers, floating point numbers.
4.) Identifiers: Variable names that consist of lowercase and uppercase characters and numbers.
5.) Blank numbers and Newlines: \n \t
6.) Keywords: The Arduino Language reference contains some 200+ keywords which have special meaning to the compiler. I used it to define the keywords. It can be found at: Arduino Reference - Arduino Reference
7.) Single Line/ Multiline Comments: These need to be recognised by the compiler so that they can be ignored.
8.) Preprocessor directives: Right now we are ignoring these as they are complex to handle and language specific (#include and #define come under this).
I used standard textbook implementations of Regular expressions for everything except the Arduino Language specific keywords. Currently the scanner.h and scanner.l files are done, I have to work on the scanner.c file to make the lexical analyser work. the .l and .h files can be found on my github repo here
Thank you for your time and I will continue to update this thread (hopefully more frequently) as I make progress.