Memory overrun blows Arduino chip?

fungus:

jharris1993:
In this question, the individual had an atmega 168 chip, and had created a program that caused one of the internal fuses, (the extended fuse - 0x7), to blow, causing the chip to become inoperative unless re-flashed.

He discovered that he was causing the issue by overruning the existing ram:

No, that's impossible. The only way to change a fuse is with an ISP.

You can write to the flash memory using special instructions but it's much more complicated than writing to RAM and the bootloader section is usually protected from overwriting.

What that posting refers to is the uploaded program not working once it gets too big to fit in memory. Nothing more.

Sir,

Perhaps I am mistaken, and if I am, please show me how I have misread this message.

Forgive the size of this posting, but I am going to quote the original posting at length:

I'm using a Diecimila on Windows XP, using Arduino software 0011.

TLDR version: When I upload a sketch over about 4K, the chip seems to stop responding and I have to use avrdude directly to re-burn the bootloader or upload a different hex.

Long version:
After playing with the example sketches I started fiddling on my own and have run into a problem: when I upload a program over roughly 4K, everything goes south.
What I mean by that is that the sketch uploads, but then it's as if the chip stops responding, and from that point on nothing works - I can't even upload new sketches (I get the ubiquitous 'not in sync' error).

It first happened a few days ago - I uploaded and nothing happened. I figured I'd make a programming error, made some changes to the sketch and tried to re-upload - 'not in sync' error. Some reading on this forum gave me lots of things to try in response to that error, but none seemed to work.

Luckily I had another ATmega168, so I cobbled together a parallel programmer and tried burning the bootloader to it via the Arduino IDE. I got a verification error, but lo and behold the chip functioned and I was able to upload the Blink example, and it worked.

Back to the first chip: I figured it must be hosed somehow, so I bumbled with avrdude until I figured out how to read the chips' memory and fuses. I compared the broken and working chip, and the only difference was the extended fuse (0x7 on the broken one). I changed it to match, but still no luck.

After more bumbling I learned that even though the Arduino IDE gave 'not in sync' errors, I could upload hex files directly to the broken chip using avrdude. The next step was of course to burn the bootloader via avrdude directly, which worked! Now the 'broken' chip seemed to be working again! I hopped back into the Arduino IDE and uploaded Blink, and it worked like a charm.

Armed with a way to 'un-break' the chip I set about experimenting and the conclusion is that once the sketch gets to be a certain size, the chip stops responding. That 'certain size' doesn't seem to be constant though, so I'm fairly sure the root cause is something else. The time I actually fiddled long enough to nail it down, it was 4828 bytes. 4826 worked as expected, 4828 equaled a seemingly hosed chip - LED on pin 13 wouldn't light and the IDE could no longer upload to it.

As a test I took the Blink example and added a big integer array (referenced to keep it from being optimized away) and got the same behavior - after roughly 4K the chip goes south. Oh, and I tried using both ATmega168s, and both had the problem.

So, any ideas on what could be going wrong, or anything I could do to troubleshoot the problem? I'm still really new to all this, so any help is much appreciated.

His subsequent reply:

The culprit here turned out to be my ignorance. I was trying to use way more RAM than the ATmega168 provides.

Specifically I was trying to use some relatively large arrays, which I've now learned will get pulled into RAM by default. The RAM overrun was then causing the chip to basically become unresponsive.

By using the PROGMEM directive to keep some data in program memory only and out of RAM, things are running as they should again.

(Emphasis in both quotes was provided by me.)

Correct me if I am wrong, but as I read it exceeding memory bounds caused the chip to go totally un-responsive, (i.e. "bricked"), which he could only resolve using another board as a PIC/JTAG type programmer to re-flash the chip.

This does not sound like a simple memory bounds issue to me.

As I see it, there are a few possibilities:

  • There are certain reserved addresses, located beyond the end of physical RAM, that if written to can cause special and wonderful things to happen.

  • There is/was a firmware issue that caused this when large programs were written.

  • There was a defect in the chip, or in the IDE, that caused this problem.

Or, maybe even something else.

Which brings me back to that musical question: Why? Is there some issue that we, as programmers, need to know about to avoid bricking our own boards?

Thanks again for all your help.