What languages the code is translated by the compiler until reach the ESP32? I need a reference

I am writting my tesis, and am looking for references.

I saw one day something telling how our code, written in the Arduino IDE, is translated to something else so it can be sent via serial and be finally recognized by the microprocessor (that's what a compiler do after checking errors in the end), but I can't find this explanation anymore. I've been looking for days and nothing. When I found this, I didn't pay attention because I was searching for something else months ago...

I am sure this kind of thing happens, because machines can't comprehend human speech, only binary. For example, in common programming, things are translated to Assembly, then to Machine Language to the CPU proccess and do its magic, so the same thing happens with Arduino.

I tried contacting Arduino.cc team to provide me such material but had no success. Also I couldn't find something in Espressif pages that explains this, and I didn't contact them because this is something the compiler does, so this information shouldn't be there.

Do anyone know a reference (article, document..) explaining this flow of languages?

I will be infinitely grateful

Have a look at this. If it's not detailed enough, then please ask for more details

1 Like

Not even very close. Serial has nothing to do with how the code is processed. You can use a bucket or a glass or... to add water to a plant, same with data it can be transferred in serial, parallel, etc depending on the hardware requirements. The processor does not recognize the code but trys to execute it, it is not an interpreter.

Check out speech recognition, machines can do this. Look at AI (Artificial Intelligence) for a big surprise. I have written code speaking into my iPhone.

You enter the code in a file using ASCII text, in US it is normally English. It is then checked with a preprocessor that checks for errors and puts it together in a big file including libraries etc. Arduino uses an optimizing compiler that will remove unused code. Once the preprocessor is finished it outputs it to a file where the compiler converts it to basically ASCII in assembler format. The assembler then converts this to 1s and 0s in a format the linker can understand. The linker will then take all the modules and combine them. Then there is the locator which assigns addresses and puts it into a loader format (intel hex record) there are many of these, both in binary and ASCII format.

Your fun now begins, not everybody followed these exact steps and for example the linker and locator can be combined into one package. Sometimes the code can be relocatable and is actually assigned its final address as it is loaded into the processor's memory(s). Compare Motorola "S" records to intel "HEX" records they are close but not the same.

Now it gets even more interesting as to which memory/segment the code goes to and is it code or data. Go back to the intel 4004 and follow the progression of controllers and the tools they used.

I recommend you take a course or two on basic computing at your university before you get to far involved in your thesis. There are many online tutorials etc available as well.

Follow this link and to all of the segments: https://www.youtube.com/c/CodeBeauty There is a lot of material in small bytes (bites).

Good luck!

Well said, Gil.

But the codes stored in the processor memory is not the final step. Those instructions are used to reference the micro-code in the processor that controls the actual processor steps and that is all part of the microprocessor built by the engineers who design the device.

2 Likes

The ESP32 toolchain is supplied by the Espressif corporation, and in overview, that converts the Arduino C/C++ source code to ESP32-specific machine language instructions that are loaded onto the ESP32 processor to be executed.

Links to the Espressif documentation can be found in the getting started guide.

1 Like

WOW, so you admit to not doing your own thesis! Your first few 'facts' are way off.
MUTE!!!!

1 Like

Okay for basic instructions on a multi-core controller. But how are the ports of the other core addressed when the code can run on either core?

Don't rush.
I think the OP means the process of loading the firmware via the bootloader - in this case, what the author is trying to say makes sense:

Indeed, Arduino IDE "converts" the source text into machine code and then transmits it via the serial port to the controller.

1 Like

Maybe if one of the more advanced members could post the AVR assembly listing of the "blink" sketch, the water might be a little less muddy?

1 Like

It's fair to say that it won't help because Arduino on ESP32 runs the code inside of RTOS. Picking the Arduino code, then the user defined code from this program is not a task for the feint of heart.

I compiled a simple hello world example and the resultant assembly code was 200 times the max character limit for this forum.

2 Likes

It’s not precise enough for compilation theory or a thesis but in a nutshell and at helicopter view level - this is close to what happens.

In more details (still simplified) I would describe the compile process in the Arduino IDE for an ESP32 sketch that transform your C++ code into a machine-readable binary as this:

First, the Arduino IDE preprocesses the sketch by adding necessary include directives, such as including Arduino.h, and organizing function prototypes. This ensures that the C++ code adheres to the expected structure required for compilation.

The C++ code is then passed to the compiler ( GCC, via the ESP32 toolchain), which parses the code and translates it into an intermediate representation (IR in the rest of the text). This IR is a low-level, platform-independent code that represents your program’s logic, and during this step, several optimization passes are applied to remove redundant code and improve performance.

The IR is then compiled into assembly language specific to the ESP32's Xtensa architecture, a human-readable form of the machine instructions that the ESP32's CPU can execute directly.

This assembly code is then translated (assembled) into machine code, resulting in object files that contain the binary instructions the ESP32 understands, although these files are not yet linked together into a single program.

Next, the object files, which include your sketch, the ESP32 core libraries, and the FreeRTOS components, are linked together by the linker. During this step, all the necessary components are combined into a single executable binary file, with the linker resolving addresses, function calls, and references between the various object files to create a cohesive program.

But for such a target architecture, in addition to your compiled program, the ESP32 requires a bootloader and a partition table. The bootloader initializes the hardware and prepares the system to execute your program, while the partition table defines how the ESP32's flash memory is organized. These components are bundled with your program into the final binary file(s).

Finally, the Arduino IDE uses a flashing tool (through the Serial line you mentioned most likely) to upload this bundled binary file to the ESP32's flash memory, enabling the ESP32 to read the bootloader, which then loads and runs your sketch, now fully integrated with the OS and other necessary components.


This differs if you pick a simpler target architecture like the famous old UNO which does not require an OS.

When compiling for an Arduino Uno, the process starts with the compiler translating the C++ sketch into the IR, which is then optimized and converted into AVR-specific assembly language. The final output is a binary file containing only the application code and necessary libraries, excluding the bootloader. The Uno typically comes with a bootloader pre-installed in a separate section of its flash memory. This bootloader's role is to facilitate the upload of the compiled binary to the main program memory via a serial connection, such as USB. When the Uno is powered on or reset, the bootloader briefly checks for incoming code to upload; if none is found, it hands control over to the main program stored in memory and begins executing it. If the bootloader is not present or has been removed, you must use an external programmer, like an in-circuit serial programmer (ICSP), to directly write the code to the microcontroller, bypassing the bootloader entirely.

1 Like

Look at the full manual of any Arduino processor. Towards the back they will list the "native" instructions the processor is expecting. The compiler converts the C++ statements to as many of the "native" statements as required. Then there is error checking optimization etc.

The Arduino IDE also has a large number of "Macros" These are little pieces of C++ code that replaces some single "convivence" statements.

Ok I will be looking in this link. Thanks! :handshake:

ps: it doesn't seem to be enough, but at least gives me some keywords to look for better references, for example AVRDUDE

Well, I could just say in my thesis "the compiler convers the code into something else" but I am looking for a source.

What's wrong with that?

I was about to say your link wasn't useful, but I found the section of ESP32 forums. I will be looking there. Thanks! :handshake:

Yea assembly codes tends to be huge. I had exercises in one of the subjects at my college where I had to convert C codes into assembly and it was painful. Threating conditional jumps,functions.. After that, there even had a silly exercise asking to convert a "simple" assembly codes into binary! :fearful:

Does it have to be based on ESP32? I ask because it's a convoluted example of the process of how Arduino does it. As others have suggested an ATMega uC (328P or 2560) would provide a much purer example of the process. It would make for a cleaner process and an easier to follow thesis.

Here's the thing that I'm not understanding. If this is your capstone, where is your original research that presents original ideas and new findings?

Man, thank very much for all effort writting. :handshake: Even being brazillian and not using a translator, I could understand 90% of everything. There had some terms I never heard about but that's ok.

Your explanation makes sense, and it matches with many things I learned in my college. I wish this would be explained in a reference or documentation.. That's what I'm looking for.

In addition, some trenches of your text I am thinking in using as inspiration to my, or I would cite it and link to this post. I will think about it. I mean, if you allow it ofc.

Again, thank you.

I guess so, because this will be mentioned in a trench that explains how Arduino IDE proccesses the code I made. Many articles using ESP32 explains how to install Arduino IDE, and I want to do something different.
But of course, I will separe a time to look for ATMega references, if things work very similar, this would be enough.

My original research involves current measurement using Current Transformers, and tension measurement using a tension sensor (in my case, a ZMPT101B 250V/5V).

I use both sensors to calculate potency and then calculate consumed energy.

My work supposely should include sending data to an IoT platform to be visible to the user via phone but I'm running out of time, and my orientator is suggesting to simply use a display. It should also proccess the quality of the network (via harmonics). Basically, if there are any harmonics beyond 60Hz, it means something is wrong.

I had to build a signal conditioning circuit to convert the output of the current sensor into something in the range of the WROOM-32S, because it has low amplitude and has negative values. I learned a lot with that..

Does this answer your doubts? When I finish my work, if you are interested, I can share the document with you.

No it doesn't! It just means the reaction of physical devices, neither right nor wrong, Just there.

1 Like