Tracking down an elusive memory leak

As the title implies I am trying to hunt down a memory leak in a program I have. I am doing this on an esp8266, and I check every 10 seconds to see how much free heap I have left. What is making it difficult to find the problem is that the memory starts sporadically, so I might have to wait up to 30-40 minutes to see if I found the problem, and even then it could just be waiting to start. For example I have a graph below of my heap over time. This one waited ~ 1500 seconds to start leaking, then all down hill from there. The program is logging data and transmitting it to a server. I have stripped out all the Strings from the program as I have heard they can be problematic, but heap fragmentation is usually less than 5%.

I also looked at the stats from every cycle of my loop and it doesn't change every cycle when it is dropping. I've tried probing at different points in my loop, but again it's hard to say when it starts sporadically. It usually is pretty slow and steady, but as you can see from the second cycle below sometimes it just blasts through the memory.

I'm pretty much out of ideas, and am very open to others. sorry I'm not sharing my code here, it is pretty involved. I'll including my loop, if anyone wants it.

Google Photos

scott meyers wrote Effective C++ that focused a lot proper use of new() and delete() to avoid memory leaks

The problem is I am not using any new() or delete() or free() commands. I try to let the compiler take care of all the memory allocations and clean up. Of course I have not read through all of the code for the libraries I am using, which are mostly the Serial and client streams.

Thoughts and prayers.

Yeah, asking for help with an unknown program of substantial size (according to you), is really asking for a lot. Put yourself on the other side of the fence and think about it. There are billions of things (literally) that could cause your problem, so guessing it would require amazing luck.

Suppose you have a NodeMCU ESP8266 with 128kbyte of ram and someone is eating 22kbyte of ram. Is that a problem ?
There is a lot going on for the Wifi functions. It is not just your sketch that is running.

A memory leak is when the memory gets completely filled by a bug until the board crashes. Is seems that the 22kbyte ram is released. That is not a leak. That is something unknown that is using the memory.

Have you tested the memory with a existing small example sketch. It must use Wifi functions of course.

Some don't want to show a sketch because it is so enormous big. Then it turns out it is only one file with only a few thousand lines.

You can try commenting out various function calls to track down which function causes the leak, then you will know where to look.

For serious help we need to see code.

Maybe you could debug it with gdb:
https://arduino-esp8266.readthedocs.io/en/latest/gdb.html

@Koepel If you look at the image I linked, you'll see that I have about 25KB of free ram, and once the leak starts it drops by about 100B to 1000B every 10 seconds, which quickly causes the device to crash. Definitely not standard changes in memory caused by background other code running. I expect the bumps, but something is not being released.

@PerryBebbington I have been doing this. I got rid of a leak that would start right away, coming from restarting a client connection in a function, but there still seems to be something else. I put my code up on my github. GitHub - jeffpkamp/Esp8266-thermostat

@aarg Thanks for the pointer to t he debugger. I'll take a look at that.

I thought it was the amount of used memory, but it is the other way around ! Well, then it is indeed a typical memory leak.

Maybe not much use but I had a circuit with an 8266 wirelessly controlling an outside light - would run fine for several weeks , then would keep failing . If I switched it off/on it would be fine again for a few more weeks .
I never bottomed it out and junked it in the end . Do feel your pain , but wondered if it was something to do in the coms side rather than what was ,in my case, pretty simple short code.

I guess you could cheat and add a watchdog timer ....*

  • ...,I’ll get my coat

problem may be in a library, not your code

jeffpkamp:
I have stripped out all the Strings from the program as I have heard they can be problematic

I don't think you have. Consider this little fragment for instance:

  if (piServer.connected() && piServer.readStringUntil(char(5)) == "Success") {

Or more blatantly:

      piServer1.print("{\"name\":\"" + String(data.myName) + "\"}");

You can often get away with using Strings on something with more RAM like an ESPxxx, but perhaps not in this case.

@hammy I think i've had a few in the past that might have done this over very long periods too... This is the first time I truly ever looked at the memory because I discovered the debug feature and started seeing OOM errors. That was when I was really taxing this thing by making it to SSL, but the problem appears to have followed me :(. I have thought about adding an if statement that if the memory gets below 12K it just resets. I'd be okay if this happened every week or two, but I'm keen to figure out what is making it do this ever 5-30 minutes.

@gcjr That's my fear, but I've used a nearly all of these libraries for a few years will minimal issues in reliability.

@wildbill I did get rid of the bottom one, or have it commented out and still had the problem. I guess I can change the "readStringUntil" with "readBytesUntil". I don't think that's it, but then again I have no clue at this point, so why not try it all :).

@PerryBebbington I have been doing this.

OK, well, my suggestion was based on the limited information supplied at the time. I have northing useful to add so I'll leave you to cleverer people than me.

Good luck, I know these kinds of problems can be a right pain.

Depending on what that readStringUntil gets, it does have the potential to make holes in your heap. I haven't looked at what the library does, but if it reads something long, I would expect that the String library would need to allocate memory more than once, leaving orphan chunks of memory behind.

That should fix itself as those chunks can be reused but if there are other String operations happening too, then between them, they can cause fragmentation.

I would go through that code and remove Strings until your leak is stopped.

hammy:
Maybe not much use but I had a circuit with an 8266 wirelessly controlling an outside light - would run fine for several weeks , then would keep failing . If I switched it off/on it would be fine again for a few more weeks .
I never bottomed it out and junked it in the end . Do feel your pain , but wondered if it was something to do in the coms side rather than what was ,in my case, pretty simple short code.

Sounds like a millis rollover.

hunt down a memory leak in a program I have.

Back when I took such things seriously, we heavily instrumented malloc() and free() so that a user could do "show memory", and see not only how much was allocated, how many of each sized block and so on, but also who had allocated each block, and so on. Naturally, this made malloc() more expensive and higher-overhead, but it was worth it. (sigh, p = malloc_named(size, "packetBuffer"); only cost one extra pointer...)
Most of the libc malloc() code out there these days doesn't have any visibility into its internals, which is really depressing. (just being able to see "oh look, there are 100 instances of 40byte malloc()ed blocks" is incredibly useful, and I think is even obtainable from the existing data structures. But not easily. :frowning:
It does help to have a multiprocessing, multi-user kernel...

So I got tired of blocking off one chunck at a time and having everything leak memory, so I blocked everything off except the part of the loop that reports memeory every 5 seconds. This did not leak memory over 2 hours. I then unblocked a small portion of the function around it, and it started leaking memory. So I was going through the small portion to figure out who the culpret was, and to my surprise it is a sprintf function.

snprintf(bigbuff, 9500, "{\"Temp\":%0.2f,\"State\":%d,\"Heat\":%d,\"Cool\":%d,\"Memory\":%d,\"Source\":\"%s\",\"Humidity\":%0.2f}", Temperature, Status, Heat, Cool, ESP.getFreeHeap(), Source, Humidity);

big buff is a global variable I use to hold anything that I was using String for.

char bigbuff[9500];

Anyone have any idea why sprintf (or in this case snprintf ) is leaking memory? From my understanding (and from looking at the results from the compiler) bigbuff (and the other global variables) should be allocated at the start of the program and never be removed from the heap, and not lead to memory leaks. I guess the sprintf function must have to do something to convert the numbers to strings?

Sprintf can act badly if the parameters you pass don't match the data types specified in the format string. Also, the Arduino version doesn't handle float. Try printing the result.