Memory overrun blows Arduino chip?

Ref: This older forum topic http://arduino.cc/forum/index.php/topic,25235.0.html - "Large sketch == hozed chip?"

In this question, the individual had an atmega 168 chip, and had created a program that caused one of the internal fuses, (the extended fuse - 0x7), to blow, causing the chip to become inoperative unless re-flashed.

He discovered that he was causing the issue by overruning the existing ram:

The culprit here turned out to be my ignorance. I was trying to use way more RAM than the ATmega168 provides.

Specifically I was trying to use some relatively large arrays, which I've now learned will get pulled into RAM by default. The RAM overrun was then causing the chip to basically become unresponsive.

This was the end of that particular message thread, but it left one very important question un-answered: Why?

Why should trying to write beyond available ram cause a hard fault in the chip? I would expect a bizarre program failure, or an error message on compile, (No more RAM mon!), but a blown chip? That's weird and scary. Does that mean that if someone, perhaps even myself, accidentally attempts to use RAM past the end of physical RAM, they're risking their chip? (i.e. If I screw up a calculation and accidentally try to use more ram than exists - I'm hozed?)

I've messed with micro-controllers before and seen some weird things happen, but this takes the cake!

Any ideas?

Thanks!

jharris1993:
In this question, the individual had an atmega 168 chip, and had created a program that caused one of the internal fuses, (the extended fuse - 0x7), to blow, causing the chip to become inoperative unless re-flashed.

He discovered that he was causing the issue by overruning the existing ram:

No, that's impossible. The only way to change a fuse is with an ISP.

You can write to the flash memory using special instructions but it's much more complicated than writing to RAM and the bootloader section is usually protected from overwriting.

What that posting refers to is the uploaded program not working once it gets too big to fit in memory. Nothing more.

fungus:

jharris1993:
In this question, the individual had an atmega 168 chip, and had created a program that caused one of the internal fuses, (the extended fuse - 0x7), to blow, causing the chip to become inoperative unless re-flashed.

He discovered that he was causing the issue by overruning the existing ram:

No, that's impossible. The only way to change a fuse is with an ISP.

You can write to the flash memory using special instructions but it's much more complicated than writing to RAM and the bootloader section is usually protected from overwriting.

What that posting refers to is the uploaded program not working once it gets too big to fit in memory. Nothing more.

Sir,

Perhaps I am mistaken, and if I am, please show me how I have misread this message.

Forgive the size of this posting, but I am going to quote the original posting at length:

I'm using a Diecimila on Windows XP, using Arduino software 0011.

TLDR version: When I upload a sketch over about 4K, the chip seems to stop responding and I have to use avrdude directly to re-burn the bootloader or upload a different hex.

Long version:
After playing with the example sketches I started fiddling on my own and have run into a problem: when I upload a program over roughly 4K, everything goes south.
What I mean by that is that the sketch uploads, but then it's as if the chip stops responding, and from that point on nothing works - I can't even upload new sketches (I get the ubiquitous 'not in sync' error).

It first happened a few days ago - I uploaded and nothing happened. I figured I'd make a programming error, made some changes to the sketch and tried to re-upload - 'not in sync' error. Some reading on this forum gave me lots of things to try in response to that error, but none seemed to work.

Luckily I had another ATmega168, so I cobbled together a parallel programmer and tried burning the bootloader to it via the Arduino IDE. I got a verification error, but lo and behold the chip functioned and I was able to upload the Blink example, and it worked.

Back to the first chip: I figured it must be hosed somehow, so I bumbled with avrdude until I figured out how to read the chips' memory and fuses. I compared the broken and working chip, and the only difference was the extended fuse (0x7 on the broken one). I changed it to match, but still no luck.

After more bumbling I learned that even though the Arduino IDE gave 'not in sync' errors, I could upload hex files directly to the broken chip using avrdude. The next step was of course to burn the bootloader via avrdude directly, which worked! Now the 'broken' chip seemed to be working again! I hopped back into the Arduino IDE and uploaded Blink, and it worked like a charm.

Armed with a way to 'un-break' the chip I set about experimenting and the conclusion is that once the sketch gets to be a certain size, the chip stops responding. That 'certain size' doesn't seem to be constant though, so I'm fairly sure the root cause is something else. The time I actually fiddled long enough to nail it down, it was 4828 bytes. 4826 worked as expected, 4828 equaled a seemingly hosed chip - LED on pin 13 wouldn't light and the IDE could no longer upload to it.

As a test I took the Blink example and added a big integer array (referenced to keep it from being optimized away) and got the same behavior - after roughly 4K the chip goes south. Oh, and I tried using both ATmega168s, and both had the problem.

So, any ideas on what could be going wrong, or anything I could do to troubleshoot the problem? I'm still really new to all this, so any help is much appreciated.

His subsequent reply:

The culprit here turned out to be my ignorance. I was trying to use way more RAM than the ATmega168 provides.

Specifically I was trying to use some relatively large arrays, which I've now learned will get pulled into RAM by default. The RAM overrun was then causing the chip to basically become unresponsive.

By using the PROGMEM directive to keep some data in program memory only and out of RAM, things are running as they should again.

(Emphasis in both quotes was provided by me.)

Correct me if I am wrong, but as I read it exceeding memory bounds caused the chip to go totally un-responsive, (i.e. "bricked"), which he could only resolve using another board as a PIC/JTAG type programmer to re-flash the chip.

This does not sound like a simple memory bounds issue to me.

As I see it, there are a few possibilities:

  • There are certain reserved addresses, located beyond the end of physical RAM, that if written to can cause special and wonderful things to happen.

  • There is/was a firmware issue that caused this when large programs were written.

  • There was a defect in the chip, or in the IDE, that caused this problem.

Or, maybe even something else.

Which brings me back to that musical question: Why? Is there some issue that we, as programmers, need to know about to avoid bricking our own boards?

Thanks again for all your help.

There is of course another option, which is the issue was caused by another factor entirely. Running out of RAM isn't hard to do, and i think you'd find many who have done so (myself included, many times). So presuming the claim is correct as documented i have not reproduced this, but since his proposition is trivial to test why don't you experiment for yourself, post your findings and test code back here when you have.

I think it will set your mind at ease.

Cheers! Geoff

When I first joined this forum over three years ago or so there was always a lot of posts about their arduino boards not able to upload sketches any longer. In lots of cases reburning the bootloader got them functional again. But there never seemed to be any progress on identifying what the root cause (or causes) of this kind of failure was. At the time I offered up a question of sort to the software gurus to see if anyone could create a sketch in the IDE that could be complied and uploaded to a board which then resulted in not being able to upload any other sketch again to the board. I don't believe anyone came up with an example but that's not saying it's impossible I guess, as I said many have recovered from there fault whatever the cause by just reburning the bootloader. Burning a bootloader does result in having the fuses set to proper values as well as the bootloader code installed.

So while the problem symptom dates back for a long time, I don't know if we have ever identified the source or reasons for bootloader code or fuse values being corrupted once loaded correctly.

Lefty

retrolefty:
When I first joined this forum over three years ago or so there was always a lot of posts about their arduino boards not able to upload sketches any longer. In lots of cases reburning the bootloader got them functional again. But there never seemed to be any progress on identifying what the root cause (or causes) of this kind of failure was. At the time I offered up a question of sort to the software gurus to see if anyone could create a sketch in the IDE that could be complied and uploaded to a board which then resulted in not being able to upload any other sketch again to the board. I don't believe anyone came up with an example but that's not saying it's impossible I guess, as I said many have recovered from there fault whatever the cause by just reburning the bootloader. Burning a bootloader does result in having the fuses set to proper values as well as the bootloader code installed.

So while the problem symptom dates back for a long time, I don't know if we have ever identified the source or reasons for bootloader code or fuse values being corrupted once loaded correctly.

Lefty

Lefty,

Ahhh! So this has history, 'eh? Glad I re-opened that can of worms.

The source of the original post I mentioned claimed to have modified the "Blink" code to cause this problem repeatably. Maybe I should give him a shout, grab his code, post it here, and let's all have a crack at it. If we get reasonably consistent repeatability, then we can toss this at the developers / hardware gurus and let them chew on it.

What say ye?

Jim

The method that I've run into involves an interaction between the bootloader, the sketch, and WDT RESET/Auto-Reset
but it didn't modify any of the fuses.
For example, if the code enables the watchdog and allows a watchdog to occur, the older bootloaders
would now be hopelessly in a continuous reset as they didn't properly handle clearing the watchdog
registers properly. So once a WDT occurs, it occurs over and over again.
Depending on the bootloader, even an external (auto-reset) would not get out of this situation.
(powerup reset would but then depending on the bootloader and the sketch, it might immediately
fall back into the WDT situation if avrdude was not immediately run before the sketch started
after a powerup)

From the Atmel 328 manual:

Note: If the Watchdog is accidentally enabled, for example by a runaway pointer or brown-out
condition, the device will be reset and the Watchdog Timer will stay enabled. If the code is not
set up to handle the Watchdog, this might lead to an eternal loop of time-out resets.

When I wrote a sketch that immediately set the watchdog timer to the shortest possible value in setup()
it became very difficult to get another sketch uploaded. While it was possible, by powering up the
board and immediately running avrdude from the commandline, I found it easier to simply upload a
new bootloader using the ISP interface. While a new bootloader was not needed, when the bootloader
was re-written, it had the side nice effect of removing the offending sketch.

At one point, I offered a patch to the bootloader fix this "bug" well as another needed source code only
patch to allow the bootloader to actually be built on the new AVR toolset that was shipping at the time
a couple of years ago. It was rejected in fear that it would break the bootloader in some other way.
(I ran it for more than a year before I switched my boards to optiboot)
The newer optiboot bootloader does not have this issue.

Now in terms of actually bricking a chip, there was case where a certain load on the RESET pin caused
a ringing on the reset line and the voltage was high enough that it tripped a high voltage erase and clobbered
part of the fuses. But this was not dependent on a sketch but more what was on the reset line, like a particular
shield etc...

--- bill

jharris1993:
His subsequent reply:

The culprit here turned out to be my ignorance. I was trying to use way more RAM than the ATmega168 provides.

Specifically I was trying to use some relatively large arrays, which I've now learned will get pulled into RAM by default. The RAM overrun was then causing the chip to basically become unresponsive.

By using the PROGMEM directive to keep some data in program memory only and out of RAM, things are running as they should again.

(Emphasis in both quotes was provided by me.)

Correct me if I am wrong, but as I read it exceeding memory bounds caused the chip to go totally un-responsive, (i.e. "bricked"), which he could only resolve using another board as a PIC/JTAG type programmer to re-flash the chip.

Where does it say that? All I see him say is that he moved some stuff into progmem and it started running again.

I think you're getting confused by his use of the word "unresponsive". You can definitely upload a program and it won't run but that doesn't mean the chip is bricked.

retrolefty:
When I first joined this forum over three years ago or so there was always a lot of posts about their arduino boards not able to upload sketches any longer. In lots of cases reburning the bootloader got them functional again.

That could happen if the memory protection fuses aren't set correctly.

(When you use a bootloader you normally set fuses to protect the area of flash where the bootloader lives).

retrolefty:
But there never seemed to be any progress on identifying what the root cause (or causes) of this kind of failure was. At the time I offered up a question of sort to the software gurus to see if anyone could create a sketch in the IDE that could be complied and uploaded to a board which then resulted in not being able to upload any other sketch again to the board. I don't believe anyone came up with an example but that's not saying it's impossible I guess, as I said many have recovered from there fault whatever the cause by just reburning the bootloader. Burning a bootloader does result in having the fuses set to proper values as well as the bootloader code installed.

The only thing I can think of is a batch of chips got sent out with incorrect fuse settings. The setting allowed people to overwrite their bootloaders. Rewriting the bootloader would set the fuses correctly, preventing them from being able to reproduce the problem.

fungus:
The only thing I can think of is a batch of chips got sent out with incorrect fuse settings. The setting allowed people to overwrite their bootloaders. Rewriting the bootloader would set the fuses correctly, preventing them from being able to reproduce the problem.

But like I said earlier with the older bootloader,
it was possible to enable WDT and create a situation where it was not possible
to upload a new sketch using "normal" means.
When the bootloader was re-written it also removed the sketch and things worked as "normal" again.

-- bill

bperrybap:

fungus:
The only thing I can think of is a batch of chips got sent out with incorrect fuse settings. The setting allowed people to overwrite their bootloaders. Rewriting the bootloader would set the fuses correctly, preventing them from being able to reproduce the problem.

But like I said earlier with the older bootloader,
it was possible to enable WDT and create a situation where it was not possible
to upload a new sketch using "normal" means.
When the bootloader was re-written it also removed the sketch and things worked as "normal" again.

-- bill

Well the problem symptom with using WDT with older bootloaders was well know even back them. And yes reburning the bootloader would 'fix' the problem, but also unless the time-out value was very short one could recover without reburning the bootloader by holding down the reset button, then upload a different sketch that didn't use WDT and only release the reset button when the compiler was done and AVRDUDE was called by the IDE to upload the new code. And even back then there was a bootloader modified by ADAFRUIT that was available and worked with WDT, if one wanted to use it.

The problem I was writing about was just people reporting that they could no longer upload at all (and not using any WDT commands) and just reburning the bootloader got them working again. Now it was and is still hard to do good troubleshooting analysis just going by user reports and not first hand experience, but again it was a pretty common report that many users had to reburn their bootloader to return functionality to their chip and the root cause was never really solved, at least to my satisfaction. I personally have never had to reburn a bootloader to solve a problem, just to upgrade for better performance like ADAFRUIT or UNO bootloaders.

So again, with WDT problems aside, is it possible to write and upload a sketch via the IDE that will prevent ever being able to upload a different sketch, as was reported by users of pre Uno boards? Also this is not the problem seen with the very first release of the UNO board with using the AVR USB serial converter chip that if the running sketch started right up sending data via serial commands that the AVRDUDE upload would fail due to choking on the incoming data stream. That was fixed by a quick update of the Uno's bootloader code as I recall.

Lefty

I see at least 3 scenarios that could create the "I bricked my AVR from a sketch" situation.

  1. WDT issues
  2. Sketch spewing output on serial port
  3. Arduino hardware issue on some boards that allows ringing on reset line to trigger HVP erase cycle

Of these #3 is the only true "brick" scenario which can be caused by an interaction
of the auto-reset circuit and certain loading on the reset line that was seen to happen
with certain shields. If this happens, the AVR must be reprogrammed including fuses.

#1 and #2 are/were software issues that have since both been resolved.

Unfortunately, because things were never fully tracked down,
it wasn't known what the original issue was. Given that we are not seeing widespread
reports of this still happening, I'm guessing that some of it was user error, and some of
it was contributed to the above 3 issues, which are no longer an issue on the newer
hardware and sofware.

So the good news is that today, with updated hardware, bootloaders, firmware, and IDE
these 3 issues should no longer be a problem.


As far as WDT goes:
While the WDT issue was known among some technical folks, I doubt it was known among many non technical users.
(Even some technical folks were unaware about it when I posted about it a few years back)
And while the sketch may not intentionally enable WDT all it takes is writing a 0x8 to location 0x60.
(Enabling WDT is not a protected operation). So run away code could potentially accidentally turn on WDT
and by default it would enable WDT RESET and use the shortest possible timer.

I believe that many of the users of the IDE are fairly naive with such technical details
and would not think of holding down the reset
button until the just the right time. And even then, actually doing this is actually not that easy depending
on which bootloader you have.
There is a short timing window to get this to work. Compounding things is that there were/are several
different bootloaders floating around that worked slightly differently with respect to reset and boot-loader
to application (sketch) timing, particularly when taking into consideration the clones out there.
Also recall that on the older IDE's EVERTHING was built every single time.
So trying to time when to release the reset button
to get the board up and going at just the right time is difficult, especially if IDE diagnostic messages are not enabled.
My assumption is that many of the users would simply say that the sketch "bricked my AVR",
when in fact all that happened was that the AVR was crashing in a way that made the IDE-Autoreset mechanism
not work in the normal usage case.

And if you look at it from their point of view. If you power cycle it, and try to do a normal sketch upload
it doesn't work. If you push the reset button and then try to do a normal sketch upload it doesn't work.
But if you re-write the bootloader it works every time.

It only works, if push and hold down the reset until just the right time.

--- bill

So the good news is that today, with updated hardware, bootloaders, firmware, and IDE
these 3 issues should no longer be a problem.

That indeed may be the case. Also many (myself included) now enjoy the ability to operate without a bootloader at all using the upload sketch using programmer option the IDE version 1.0.x now grants. I find myself using it more and more for various reasons. So if one has the means to reburn a bootloader in the first case, then they also have the ability to upload sketches without even needing a bootloader.

Lefty

retrolefty:
Also many (myself included) now enjoy the ability to operate without a bootloader at all using the upload sketch using programmer option the IDE version 1.0.x now grants. I find myself using it more and more for various reasons. So if one has the means to reburn a bootloader in the first case, then they also have the ability to upload sketches without even needing a bootloader.

You also could do this pre 1.x
I do it with 0022. In the older IDEs you had to define new board types for each programmer.
But even though the "Upload Using Programmer" option in 1.0.x doesn't require a new
board type you still may need one because the IDE doesn't give you back the bootloader space
when you use ISP (It doesn't know you are using ISP vs a serial interface with a bootloader).
So while you can use ISP with the new option without having to define a new board type,
you don't get all the benefits of using ISP like being able to use the full flash space
unless you create your own board type. For most people this probably isn't an issue given
the new smaller size of the bootloader.

--- bill

bperrybap:

retrolefty:
Also many (myself included) now enjoy the ability to operate without a bootloader at all using the upload sketch using programmer option the IDE version 1.0.x now grants. I find myself using it more and more for various reasons. So if one has the means to reburn a bootloader in the first case, then they also have the ability to upload sketches without even needing a bootloader.

You also could do this pre 1.x
I do it with 0022. In the older IDEs you had to define new board types for each programmer.
But even though the "Upload Using Programmer" option in 1.0.x doesn't require a new
board type you still may need one because the IDE doesn't give you back the bootloader space
when you use ISP (It doesn't know you are using ISP vs a serial interface with a bootloader).
So while you can use ISP with the new option without having to define a new board type,
you don't get all the benefits of using ISP like being able to use the full flash space
unless you create your own board type. For most people this probably isn't an issue given
the new smaller size of the bootloader.

--- bill

Are you sure of that? After I've used the upload using programmer to load a sketch, I've done a read back of flash memory via a AVRDUDE GUI standalone program and found that the chip had been completely erased (if it had a boodloader prior, it was then gone after this upload using), that is, it didn't save/protect the bootloader that the chip may have contained prior to upload using. I'll have to check the fuse values after upload using next time to see if it still is set to protect a bootload space, but again there is no bootloader code that survived upload using.

Lefty

Oh the bootloader is gone.
Its been while since i've tested it and I haven't tested it on 1.0.1 but the IDE used to check
the code size and if it was larger than the size specified in the boards.txt file (upload.maximum.size) it wouldn't upload
the image.
The issue was (at least it used to work this way) was that if you used ISP, you could gain the bootloader
space but the space is still "reserved" by the IDE as it always uses the size in upload.maximum.size
to check the image size against
before uploading because it doesn't know if the programmer being used is using ISP or a bootloader.

--- bill

bperrybap:
Oh the bootloader is gone.
Its been while since i've tested it and I haven't tested it on 1.0.1 but the IDE used to check
the code size and if it was larger than the size specified in the boards.txt file (upload.maximum.size) it wouldn't upload
the image.
The issue was (at least it used to work this way) was that if you used ISP, you could gain the bootloader
space but the space is still "reserved" by the IDE as it always uses the size in upload.maximum.size
to check the image size against
before uploading because it doesn't know if the programmer being used is using ISP or a bootloader.

Are the fuses normally set to protect the bootloader?

All ISP programming starts with a "Chip Erase" command that will wipe out the bootloader as well as an previous sketch.

I had a chip go dead that was recovered by re-installing the bootloader with a real STK-500. I don't recall what it had been running...

So I think we're still at the stage where "people think that this has happened", "it shouldn't be possible for it to happen", and "no one has identified a probable cause or demonstration sketch."

It is interesting that it doesn't seem to happen as often as it used to. All that's changed is the chip, the bootloader, the core libraries, and the IDE... :slight_smile:

westfw:
All that's changed is the chip, the bootloader, the core libraries, and the IDE... :slight_smile:

That cracked me up, thanks :slight_smile:

Conversely to the apparent frequency of reports, the user base would have expanded significantly at the same time too...

Geoff