Go Down

Topic: SOLVED - IDE-Dependent RAM Problems and Different Global Variable Calculations (Read 3049 times) previous topic - next topic

LukeZ

Feb 07, 2017, 12:33 am Last Edit: Feb 27, 2017, 02:28 am by LukeZ Reason: Issue resolved
I've been having some odd issues with a program that imply to me RAM corruption - which given the size of the project is entirely possible - but the problem only occurs when the sketch is compiled with Arduino IDE 1.6.10 or later (I've tested all versions through to 1.8.1), but the same sketch has no issues whatsoever with 1.6.9 or prior.

Regardless of which version of the IDE I use, the code compiles just fine so there is no glaring code mistake, unless the compiler is missing it.

A curious difference that I have noticed involves the RAM calculation message returned by the IDE. When I compile in 1.6.9 the message at the bottom of the screen says:

Global variables use 4,910 bytes (59%) of dynamic memory, leaving 3,282 bytes for local variables. Maximum is 8,192 bytes.

But when I compile the exact same sketch in 1.6.10 or later, the message changes and now it calculates Global variable use at only 4,372 bytes (53%).

Using the MemoryFree utility at runtime also shows differences in the same sketch compiled under different IDEs. In the loop where the crashes normally occur, printing freeMemory() when compiled from 1.6.9 shows 3,022 bytes free, but when compiled with 1.6.10 it returns 3,534 bytes free.

Although I was of course originally ecstatic to think I had instantly gained half-a-K of RAM with no effort on my part simply by upgrading the IDE, the resulting problems reminds me there is no such thing as a free lunch. Actually I was happy with our 1.6.9 RAM usage but it would be nice if I could track down what is happening and whether it is possible to correct something about my program in order for it to work reliably in later versions of the IDE.


Information about my project
Naturally to even encounter these sorts of issues we have gone a bit further than the average user. Our board is our own hardware but uses an ATmega2560 processor. We are using a custom bootloader and a custom entry in Arduino's boards.txt file. The sketch and related C++ libraries consist of about 40,000 lines of code, so I won't post it all here.

You can however see or download the firmware from our GitHub here: Open Panzer Project - TCB.

If you scroll down on that page and read the help document you can see the custom Boards.txt entry. But even without that entry or our custom bootloader, the code should compile just fine on a stock Arduino Mega.

All my testing is being done on Windows 7 Professional 64 bit.


More detailed information about the problem
The problem that I encounter with 1.6.10 and later is that the board either locks up, or reboots several times and eventually locks up. The time between lock-up/reboot seems random but is usually within the first 5-10 seconds or so. It always occurs within this do-loop: void Setup() lines 411-438.

This loop runs indefinitely while the board attempts to detect a signal from an RC receiver. Oddly, if a receiver is detected and the sketch proceeds on to the main loop, the rest of the program seems to run just fine without lockups, regardless of what IDE the sketch was compiled with. But if no receiver is connected and it sits there waiting in that loop, in IDE 1.6.9 and earlier it can sit in that loop for hours with no problem, but in IDE 1.6.10 and later that loop causes a lockup or a reboot within 5-10 seconds.

The specificity of this behavior also implies there could very well be a code issue but since no IDE throws an error, and since the same receiver detection process can take place later in the main loop without problems, and since the observed behavior is a random board crash rather than an identifiable code bug, I am leaning towards something to do with RAM.



pert

The reason for the change in memory usage is LTO (Link Time Optimization) was added in the Arduino AVR Boards version (1.6.12) included with Arduino IDE 1.6.10. This is not really an IDE version specific thing because you could install Arduino AVR Boards 1.6.12 or newer in previous Arduino IDE versions.

LTO is a pretty aggressive optimization and I have seen reports of it causing problems but always at compile time, rather than at run time as with your problem. To determine if the issue is caused by LTO you can disable LTO by reverting the following changes in Arduino_Dir\hardware\arduino\avr\platform.txt:
https://github.com/arduino/Arduino/commit/4cf3278ee0629151cedef6445e02722da1879737
If the issue no longer occurs then that will indicate it's caused by LTO. If it still occurs it indicates the problem was caused by another change made between Arduino IDE 1.6.9 and 1.6.10.

On an unrelated note, I would recommend you to create a hardware package for your custom board and bootloader. The reason is that the changes to the Arduino AVR Boards package detailed in the instructions in your repository will be lost any time the user updates to a new version of the Arduino IDE or Arduino AVR boards. The instructions will also not work if the user has previously updated their Arduino AVR Boards version via Boards Manager as the active version of Arduino AVR Boards will no longer be located at Arduino_Dir\hardware\arduino\avr\. This will actually allow you to make the package installation much easier, especially if you add Boards Manager installation support. I have a good deal of experience with this process and would be happy to help if you like.

LukeZ

Pert, thanks for the fast and informative response. I think we are on to something. I uninstalled 1.6.9, installed 1.8.1, and reverted platform.txt as shown in GitHub link. The RAM calculation is now very close (within 2 bytes) of what it was in 1.6.9:

Global variables use 4908 bytes (59%) of dynamic memory, leaving 3284 bytes for local variables

At first I thought the problem had been solved - board boots up, enters the loop waiting for a receiver, and sat there without error for quite some time (there are some blinking LEDs in this loop so it is easy to tell that's where the code is at). However after some more patience I notice the board will still reboot pretty reliably now anywhere between the 1 and 3 minute mark. After a random number of reboots, anywhere from 3-5, it will often lock up completely. I could keep staring at it longer but in the last half hour this seems to be the pattern.

Perhaps I should have tried only 1.6.10 rather than jumping straight to 1.8.1 since there could potentially have been other IDE changes made.

Clearly eliminating LTO makes a significant difference and therefore must be related, but ultimately that alone does not solve the problem.

On your other note I suspected such a task was in my future, but the times I have searched about it I found no real tutorial or How-To, other than the ones describing the old method using boards.txt. I certainly would be grateful for any guidance on that topic you would be willing to share, but perhaps it deserves its own thread for the benefit of others. And perhaps I should get to the bottom of this IDE thing first...

LukeZ

Ok, I went back and installed 1.6.10 and removed LTO. This time it compiles with the exact same number of bytes of global variables used as in 1.6.9 (4,910 bytes).

Behavior however is similar to what I described with 1.6.10. I just sat here and watched it for 10 minutes. It rebooted after 2 minutes, again after 1 minute, and then it ran a full 7 minutes unfazed before it finally locked up for good.

Obviously this is better than 5-10 seconds with LTO, but something else is going on since ultimately corruption still occurs.

pert

OK, so we can rule out LTO. There are two more simple tests you can do to narrow down the cause of the issue more. Do one at a time and test to see if the reboot still occurs after each:

Using Arduino IDE 1.6.10, do this:
  • Tools > Board > Boards Manager
  • Wait for downloads to complete
  • Click on Arduino AVR Boards
  • Select 1.6.11 from the drop-down menu (this is the Arduino AVR Boards version included with Arduino IDE 1.6.9).
  • Click "Install"
  • Wait for installation to complete
  • Click "Close"
  • Upload your sketch

The above will determine if the problem is caused by the Arduino IDE itself (arduino-builder) or one of the libraries bundled with the Arduino IDE.


  • Install Arduino IDE 1.6.9
  • Tools > Board > Boards Manager
  • Wait for downloads to complete
  • Click on Arduino AVR Boards
  • Select 1.6.12 from the drop-down menu (this is the Arduino AVR Boards version included with Arduino IDE 1.6.10).
  • Click "Install"
  • Wait for installation to complete
  • Click "Close"
  • Upload your sketch

The above will determine if the issue is caused by the Arduino AVR Board 1.6.12 core libraries or bundled libraries.

Note that hardware packages are installed to the Arduino15 folder on your computer. Thus the files at Arduino_Dir\hardware\arduino\avr\ will no longer be used. This can be very confusing when you're editing files in Arduino_Dir\hardware\arduino\avr\ and seeing no effect.

On your other note I suspected such a task was in my future, but the times I have searched about it I found no real tutorial or How-To, other than the ones describing the old method using boards.txt. I certainly would be grateful for any guidance on that topic you would be willing to share, but perhaps it deserves its own thread for the benefit of others. And perhaps I should get to the bottom of this IDE thing first...
Sounds good, we'll work on one thing at a time.

LukeZ

Thanks again, this was a productive test. And I'd have had no idea about the Arduino15 folder so I'm glad you mentioned that.

Arduino IDE 1.6.10 with AVR Boards 1.6.11
- Global variables use 4,910 bytes (59%)
- No reboots or lockups detected after running for half an hour

Arduino IDE 1.6.9 with AVR Boards 1.6.12
- Global variables use 4,372 bytes (53%)
- Constant rebooting every 5-10 seconds with inevitable lock-up after several cycles

So it would seem the IDE itself is not the problem, since we can get the program to run with 1.6.10 and we can get it to fail with 1.6.9.

Our project doesn't reference any Arduino libraries, only ones we've written ourselves (and one 3rd party library EEPROMex). I'm blissfully ignorant of much to do with core libraries. We're using Hardware Serial for sure but perhaps others as well.

pert

#6
Feb 09, 2017, 12:51 am Last Edit: Feb 09, 2017, 09:16 am by pert Reason: Correct compiler versions
OK, so now we know the problem is with Arduino AVR Boards, not the IDE and is caused by something that changed between Arduino AVR Boards 1.6.11 and Arduino AVR Boards 1.6.12. This narrows down the possible causes to:

There are a couple other things that changed but I don't think they could be the cause:
  • AVRDUDE version - this should only affect success of upload/burn bootloader, not the function of the program
  • Arduino AVR Boards bundled libraries (Arduino_Dir\hardware\arduino\avr\libraries) - I don't see any of these libraries used in your code

LukeZ

Thanks for the response Pert. It looks like part of your post got cut off, and I'd be interested to hear the rest of this sentence: "The possibly relevant ones are: ... "

But I did go through the changes on that page since May 10. I see some USB stuff which as you say I presume is irrelevant. Also some changes related to String objects - we do use a String object in two places. As a test I commented out all references to the String object and recompiled (with 1.6.10 and Boards 1.6.12) and I still get lockups. They occur now at a different location but the program still crashes.

If there's something else useful on that GitHub page it's eluding me.

You mentioned two things you didn't think should be the cause, one of which is bundled libraries. In fact our project does reference the twi library, although we don't really need it. Here again I did a test with the library removed, but still got lockups.

I'm not sure what the ramifications would be of the avr-gcc change, or how I would go about troubleshooting what in our code could be causing problems.




LukeZ

I didn't think our custom bootloader should make a difference either but just to be sure I loaded the stock Arduino Mega bootloader on my board and verified that my code still locks up with Boards 1.6.12 and still works just fine with Boards 1.6.11. So I'm crossing that off the list.

pert

Thanks for the response Pert. It looks like part of your post got cut off, and I'd be interested to hear the rest of this sentence: "The possibly relevant ones are: ... "
Well that's frustrating. I had a nested list of all the commits and did a preview to be sure they would show up correctly but the forum must have eaten the nested list when I posted. However, it was just all the non-USB related commits after May 10.

I have two tests for you to figure out if the bug is caused by the new compiler version or changes in the core libraries:

Replace the core library files from Arduino AVR Boards 1.6.11 with the ones from Arduino AVR Boards 1.6.12 then test with the modified Arduino AVR Boards 1.6.11. The core library files are located at {Arduino IDE installation folder}\hardware\arduino\avr\cores\arduino if you're using Arduino AVR Boards included with the Arduino IDE or at Arduino15\packages\arduino\hardware\avr\{version number}\cores\arduino if you're using Arduino AVR Boards installed via Boards Manager. The Arduino15 folder location is shown on the line after File > Preferences > More preferences can be edited directly in the file:

Use Arduino IDE 1.6.10 with Arduino AVR Boards 1.6.11 installed because it will make this a little easier to have the package located in the Arduino15 folder.
Download the correct version of avr-gcc for your OS
ARM Linux: http://downloads.arduino.cc/tools/avr-gcc-4.9.2-atmel3.5.3-arduino2-armhf-pc-linux-gnu.tar.bz2
Mac: http://downloads.arduino.cc/tools/avr-gcc-4.9.2-atmel3.5.3-arduino2-i386-apple-darwin11.tar.bz2
Windows: http://downloads.arduino.cc/tools/avr-gcc-4.9.2-atmel3.5.3-arduino2-i686-mingw32.zip
Linux 32 bit: http://downloads.arduino.cc/tools/avr-gcc-4.9.2-atmel3.5.3-arduino2-i686-pc-linux-gnu.tar.bz2
Linux 64 bit: http://downloads.arduino.cc/tools/avr-gcc-4.9.2-atmel3.5.3-arduino2-x86_64-pc-linux-gnu.tar.bz2
Delete the files in Arduino15\packages\arduino\tools\avr-gcc\4.8.1-arduino5
Copy the files from the top avr folder of the downloaded file to Arduino15\packages\arduino\tools\avr-gcc\4.8.1-arduino5

LukeZ

Ok, thanks Pert for your willingness to stick with this. I went through your suggestions:

First, install Arduino 1.6.10 and revert to boards 1.6.11. No problems with this setup as previously confirmed.

Now overwrite Boards 1.6.11 cores with Boards 1.6.12 cores, but leave all else the same. For what it's worth the compiled size of program memory in this configuration came to 8 bytes higher; calculated global variables remained precisely the same. Flash to board, let it run 10 minutes, absolutely no reboots or lockups.

Revert back to Boards 1.6.11 cores just for the fun of it. So we are now back to our starting point for this test.

Download avr-gcc 4.9.2. Delete avr-gcc 4.8.1 from the Arduino tools folder and replace it with 4.9.2.

Compile, interestingly program memory is calculated now to take almost 700 bytes more, but global variables remain unchanged. Flash to board. Reboots several times (less frequently now but still happens) with the eventual and inevitable lockup.

It would seem then you have enabled me to narrow this down to the compiler. Somehow I suspect that is going to be the most difficult to pinpoint... but we're making progress! (I hope!)

Again I really appreciate your help. Lots of places I wouldn't even get a response to this sort of thing.

pert

I agree figuring out why the new avr-gcc version is causing a problem is tricky and I don't really know how to proceed. However, at least you have significantly narrowed down the cause, that's a big step towards a solution. You could work around the issue by creating a Boards Manager package that specifies a tool dependency of avr-gcc 4.8.1. When the user installs that package avr-gcc 4.8.1 will be installed and used to compile for the TCB, even though avr-gcc 4.9.2 may be in use for the installed version of Arduino AVR Boards. I think it would be much better to find out how to make the code compatible with the modern avr-gcc version instead though.

Maybe someone else here can chime in with some advice.

I think the next step would be to determine the minimum code that will still cause the issue.

LukeZ

I think the next step would be to determine the minimum code that will still cause the issue.
I'm willing to try this but it would need to involve a strategic approach rather than going in blind. Otherwise given the size of our project and the randomness of the errors I think one could easily lose vast amounts of time wasted without arriving at any useful conclusions. Even small changes to the code seems to shift the location of reboots, which themselves may not occur for several minutes. I am fortunate right now the location where these occur is very noticeable, but in other places it would not be. Naturally removing portions of code itself breaks the program so those have to be worked out and distinguished from the compiler-specific errors. It all becomes a very big mess quickly.

And I wonder what is to say if the very size of our code base is in fact the problem, rather than the code itself? How many Arduino projects compile to over 100k bytes?

What I would really need are likely culprits to start looking at, or some kind of hunch that might lead me to focus my attention in some place more likely to be problematic. Otherwise I am flailing about blind.

Would it be useful to start a new thread in this forum with a better title related to the compiler? Or perhaps over at avrfreaks.net? From browsing over there in the past my impression has been they don't like Arduino questions but I suppose if it could be shown to be a compiler issue they might not react as badly... I really don't know!


LukeZ

I am somewhat hopeless about finding a needle in a haystack of 40k lines that appears randomly at different places and times, with no indication from the compiler that anything is wrong.

In the meantime and as a work-around I would like to pursue creating a custom board that will specify the earlier compiler even if the user is working on the new IDE, for which I've started a new thread.

LukeZ

Ok, after much weeping and gnashing of teeth and countless hours of despair, I did discover the needle in the haystack.

In one library I had an interrupt enabled (Timer 1 Output Compare C), but I had no ISR defined, which should have been "ISR(TIMER1_COMPC_vect)" This of course was a mistake. The line that enabled the interrupt was a holdover from earlier testing when the interrupt was required. Later the code changed and it wasn't needed, the ISR was removed but the interrupt was still enabled in the class begin() function:
Code: [Select]
TIMSK1 |= (1 << OCIE1C);

Now I don't know if the compiler is supposed to warn you about this, but apparently it typically does not. That makes sense, if you don't define the ISR then I would think it will just be auto-created as a blank function for you in the background, that does nothing.

What I didn't expect was that the compiler (or IDE?) would go off and search other libraries in my libraries folder, even ones that weren't "#include"-ed in my sketch, and find a different library that did have that ISR vector defined, and use the code within it instead. But that is what seems to have happened, and actually as far as I can tell both versions of the compiler did the same thing, so perhaps the issue was not the compiler after all, but the IDE or something else. Behavior at runtime was obviously in this case going to be undefined whenever that interrupt hit, but for some reason the undefined behavior worked out ok with one compiler but caused the code execution to wander off into oblivion with the other compiler. Again I don't think that's the compiler's fault but it made it appear as if one version of the compiler worked while the other one did not.

How did I discover that this unrelated library was being roped in? When I removed that library from my sketchbook\libraries folder, the compiler complained that the ISR was un-defined, even though that library wasn't included! When I returned the library to the libraries folder, the compiler was quiet (again, even though nowhere in my sketch was this library #include-ed).

Now to be completely honest, at one point in the past this other library had been included, but was no longer. So perhaps the IDE had kept a memory of that even though the #include statement was removed. There are all kinds of details about linkers, cmake, .o file and such which I don't know about and don't really want to know about, but perhaps the IDE was re-using one of these files rather than re-building the project, and was therefore keeping a reference to this other library around even though I had removed the #include.

Another thing I've noticed is that when you un-install the IDE nothing from the C:\Users\username\AppData\Local\Arduino15\ folder is deleted. It all just remains there junking up your hard-drive for eternity, and will be picked-up by the next IDE should you install a new one. Whether that has anything to do with this problem or not, I don't know.

Although I can reproduce this error reliably and repeatedly on my large project, when I try to create a simple test sketch for posting here A) the IDE never warns about a missing ISR function if you leave it out, but B) the sketch also doesn't break, the ISR just does nothing (which is what I would expect). Neither does the IDE seem to be inclined to adopt the ISR from an unrelated library even if I tempt it by creating such a library, or even if I first include that tempting library then later un-include it.

So perhaps my IDE got itself goobered at some point, but I did install and uninstall every version under the sun so you think it might have cleared itself up.

Anyway I've resolved the issue on my end which was in fairness a code oversight on my part, though not technically an error I wouldn't think. And I'm not sure I can definitively point to any error that some other person should fix or feel bad about. So I'm going to call this resolved.

But by golly do I feel sorry for you if you get yourself into a situation like this without any warning from the IDE that it's off doing naughty business or from the compiler that there might be an issue, and you have 40,000 lines of code to go digging through... 


Go Up