What's the best way to profile code and see which areas are slowing things up?
I'm hoping for something a little more in-depth than pasting a whole bunch of Serial.println(millis());
everywhere if possible.
What's the best way to profile code and see which areas are slowing things up?
I'm hoping for something a little more in-depth than pasting a whole bunch of Serial.println(millis());
everywhere if possible.
You could use a pair of single pin toggle instructions around key sections, and use a 'scope or another Arduino to measure the pulse width.
Overhead is two instruction cycles.
printing time would certainly have an impact on the overall timing of the code, but those times can be captured in a "trace buffer" and printed after some number of iterations of the code.
I was hoping for some clever kind of profiling software I didn't know about that could run the code and give info on the average runtime of each function.
So if there's no way around 'time before and after each function'
Maybe I could litter my code with a buffered version of something like this?
#define TIMED(X) { TimedObject t(#X); X }
class TimedObject{
public:
uint32_t startTime;
const char* name;
TimedObject(const char* n) : startTime(millis()), name(n) {}
~TimedObject() {
Serial.print(name);
Serial.print(" took: ");
Serial.print(millis() - startTime);
Serial.println(" ms");
}
};
void doStuff(){ delay(750); }
void doOtherStuff(){ delay(1350); }
void doOtherStuff2(){ delay(250); }
void setup() { Serial.begin(9600); }
void loop() {
TIMED( doStuff(); )
TIMED( doOtherStuff(); )
TIMED( doOtherStuff2(); )
}
But one that perhaps prints an averaged run time every 30s or so?
The "best way" differs by application. Usually, when this question surfaces, it is to troubleshoot a more identifiable problem. Generally, as AWOL said, toggling a pin state on an oscilloscope reveals things serial print may not. Proofreading one's code may solve the problem.
If this is a general question, not related to a specific problem, you have 3 good first steps.
If you have a specific problem, I would suggest you explain that here, as it may lead to more targeted answers.
It's just a general question really.
It comes because yes, I do currently have code that seems to be running slower than I'd hope... (which for now at least I'm happy to tackle myself) but also in general I've been curious about the performance of different approaches to certain things and benchmarking them out of curiosity.
Sorry, if you already know these sources!
There is a library for profiling Arduino code
https://www.arduino.cc/reference/en/libraries/timeprofiler/
and there was a thread from 2014 about profiling arduino code ...
https://forum.arduino.cc/t/a-simple-method-for-profiling-arduino-code/203958
with a link to
The Atmel studios (now Microchip studios) IDE has fancier debugging tools I believe. I'm unclear whether there is a need for additional hardware to use probes & such though.
Yes, it requires the use of a debugging tool. One used to be called "Atmel ICE" tho there are other cheaper ones available.
instead of sprinkling Serial.println(millis());
all over, placing targeted print statements in the various branches of code may help to confirm some things we assume is happening, for example, by placing Serial.write('0')
and Serial.write('1')
at points that seemed sequential, I was able to determine that a division by 0 was causing my program to crash. Printing the value of a variable as it is operated on can help determine if the operation performed achieves the desired results. For speed measurements, a scope or as gcjr suggested, storing to a buffer and printing later is the typical approach.
The traditional method is to have a timer interrupt that periodically (every mSec or so) outputs the current PC value to a serial port or some other output channel. Post-process that data, along with the linker .map file, to figure out what function is executing at each of those data points, and output the % of total run time is spent in each function. Higher-end processors have JTAG ports that allow the data capture to be controlled by an external device, without interrupting the program flow.
This sounds very interesting! And probably way over my head. Which you'll realise when I ask you the following question... What is 'PC value' in this case?
If this is something that the greater community could also use, maybe the avengers can assemble and this can be collaboratively created?
The program counter.
the problem with "littering" the code with this stuff is that it can start impacting performance or require significant amounts of memory.
when i did tracing in an ethernet drive, we captured a 16-bit ID and 16-bit timestamp
if you only think there is a problem, the first thing to do is verify that this one. if there is, identify which half of the code consumes more than you think. then divide that half again, ... until you find what consumes the more time than you suspect.
So after a bit of messing about and reading stuff... I found this video and modified his code to print to the serial monitor. (Link to his used code here)
Three caveats ...
Yes this is a bit messy still... I/we can probably clean it up a lot.
Yes, doing a whole bunch of serial prints will slow down the program as discussed, but if the profiling of each function was consistent, then I think the relative time relationships should still give the insight required.
It did involve me declaring a profiling duration and wrapping the loop code.
"Profiling.h"
struct ProfileResult
{
const char* Name;
uint32_t Start, End;
uint32_t ThreadID;
};
struct InstrumentationSession
{
const char* Name;
};
class Instrumentor
{
private:
InstrumentationSession* m_CurrentSession;
int m_ProfileCount;
public:
Instrumentor()
: m_CurrentSession(nullptr), m_ProfileCount(0)
{
}
void BeginSession(const char* name = "Session", const char* filepath = "results.json")
{
WriteHeader();
m_CurrentSession = new InstrumentationSession{ name };
}
void EndSession()
{
WriteFooter();
delete m_CurrentSession;
m_CurrentSession = nullptr;
m_ProfileCount = 0;
}
void WriteProfile(const ProfileResult& result)
{
if (m_ProfileCount++ > 0)
Serial.print(",");
String name = result.Name;
name.replace('"', '\'');
Serial.println("{");
Serial.println("\"cat\":\"function\",");
Serial.print("\"dur\":");
Serial.print(result.End - result.Start);
Serial.println(",");
Serial.println("\"name\":\"" + name + "\",");
Serial.println("\"ph\":\"X\",");
Serial.println("\"pid\":0,");
Serial.println("\"tid\":0,");
Serial.print("\"ts\":");
Serial.println(result.Start);
Serial.println( "}");
}
void WriteHeader() {
Serial.println("{\"otherData\": {},\"traceEvents\":[");
}
void WriteFooter() {
Serial.println( "]}");
}
static Instrumentor& Get() {
static Instrumentor instance;
return instance;
}
};
class InstrumentationTimer
{
public:
InstrumentationTimer(const char* name)
: m_Name(name), m_Stopped(false) {
m_StartTimepoint = micros();
}
~InstrumentationTimer() {
if (!m_Stopped)
Stop();
}
void Stop() {
Instrumentor::Get().WriteProfile({ m_Name, m_StartTimepoint, micros(), 0 });
m_Stopped = true;
}
private:
const char* m_Name;
bool m_Stopped;
uint32_t m_StartTimepoint;
};
#if PROFILING
#define PROFILE_BEGIN() Instrumentor::Get().BeginSession();
#define PROFILE_FUNCTION() InstrumentationTimer timer(__PRETTY_FUNCTION__);
#define PROFILE_END() Instrumentor::Get().EndSession();
#else
#define PROFILE_BEGIN()
#define PROFILE_FUNCTION()
#define PROFILE_END()
#endif
Example code
#define PROFILING 1
#include "Profiling.h"
#define SECONDS 1000
#define PROFILING_DURATION 10 * SECONDS
void foo() {
PROFILE_FUNCTION()
delay(1250);
}
void foo2() {
PROFILE_FUNCTION()
delay(150);
foo3();
}
void foo3() {
PROFILE_FUNCTION()
delay(350);
}
void setup() {
Serial.begin(115200);
}
void loop() {
#if PROFILING
PROFILE_BEGIN();
while (millis() < PROFILING_DURATION) {
#endif
// Do Loop Code Here
foo();
foo2();
// End Loop Code Here
#if PROFILING
}
PROFILE_END();
while (1);
#endif
}
This then serial.prints output you can paste into a JSON file, and then load it into chrome://tracing/ to get a nice visual of whats happening and when like this:
Again.... The caveats above... Messy still, printing slows things down etc.
I was wondering if maybe instead of printing while running, we can just 'queue up' the captures, and then iterate over them at the end of the profiling to print the JSON out?
As gcjr said above, get it working, then make it right. Well... it seems to work Can we make it right?
so you have no prior experience doing this? you've never had to profile your code before?
No sir. Outside of printing millis() before and after a function.
Using the chrome::tracing and the JSON is actually the type of thing I was hoping existed when I first posted. So it would be good to get a version of this working that people don't sneer at when they see it.
Arduino programs tend to be small. I'd think that the occasions where you have to dig into profiling by hardware would be vanishingly small. I have had to use a scope once or twice and a logic analyzer, but even then, that was more for convenience than because I absolutely couldn't progress without it.
The easiest way to sort your code out is to post it here.
I definitely appreciate it, but as I spoke on in post #6 I'm more looking for a general tool that can be used, rather than a fix to a specific problem I'm having.
there are many ways to do profiling that often depends on the real-time aspects of what is being profiled