Spontaneous un-programming of Arduino USB board

Hi. I am having the problem that Arduino USB boards keep “spontaneously” unprogamming themselves (i.e., reprogramming the sketch fixes the problem). This is happening on several different boards about once every couple of months.

Initially, I thought maybe they were particularly static sensitive (much more so than other uProcessor boards I have used), but we just had one do the same thing untouched, sealed in a box.

I am uncertain at this point whether the problem has been introduced by a circuit design error on our side (we are using them to read an inductive sensor and drive some LEDs, where boards are powered by a surge-protected feed to the round power jack), or results from a design problem with the Arduino board itself (or Atmel chip).

My goal in submitting this post is to see if:
a) Anyone else has experienced this, and if so, if they had discovered the cause?
b) Whether anyone had suggestions of what we might try to determine the cause.
c) Whether anyone had suggestions of what we might do to help alleviate this problem, even if we don’t know the cause (e.g., Schottky protection diodes added to various places)?

Thanks,
David

hi

what happens when they “unprogram”? Can you tell us what the symptoms are? There is a known startup bug with the Arduino NG that makes it look like it’s lost its program. This happens when the RX pin floats and the bootloder thinks it’s about to receive a program. The solution is a 10K or 4.7K resitor from the RX pin to ground.

D

Hi Daniel:

Thanks for the quick response. I am unsure I can tell you anything immediately that will be of use… Normal functionality is (in slightly simplified form):
a) 8 LEDs are normally off.
b) The board monitors the state of the inductive sensor.
c) When inductive sensor pulses on, LEDs perform a motion pattern and then go off.

In problem state, on board bootup, the two “end” LEDs (they are linearly arranged) are on (they should be off), the rest off, and (I am 90% sure) signals to inductive sensor are ignored. [The current problem board is offsite in a nature center, so I can’t test it, which is why I am sounding vague. I will be getting it back and thereafter can perform more tests.]

Can you suggest the kinds of things I might measure/try? I have a digital storage oscilloscope, multimeter, spark generator, variable voltage power supply, STK500 programmer, etc., and so can try whatever tests might be of use. Also, would it be worth trying some kind of “program lock” after we program things (via the fuses maybe ? I am only just getting familiar with your system) that would reduce the chance of this kind of thing in the field?

And finally, the problem has been sporadic enough that I don’t have a complete pattern on any front (e.g., were the boards partially damaged somehow before installation, are the symptoms really always identical, etc.). I wanted to post now, even without all this information, because without some idea of what to look for, given how intermittent the problem is, it may take me years to debug…

David

Daniel:

Apology - for some weird reason my first viewing of your reply did not show your second sentence??

On all boards, we added 10k resistor between RX and ground.

David

hi David:

I think it is really unlikely that the boards are losing their program (edit: if the fuses are right). That just doesn’t happen much. I’ve never heard it reliably reported as a problem in this forum, and on top of that I’ve taught about a hundred students to use it, with no spontaneous reprogramming issues. Which is not to say it isn’t happening, it’s just really unlikely.

A few thoughts:

  • If you programmed the Atmega bootloader chips yourself, there is the strong possibility that the fuses aren’t right.
  • What does the power supply line look like ( on your scope) when that inductive sensor is running? Have you added significant filter caps in the appropriate places?

D

PS: yes that second line wasn’t there at first; I am a terrible typist and I often revise my errors… and add things as I go. :slight_smile:

Hi Daniel:

  1. I am really glad you told me about adding your second sentence, because I thought I was losing my mind. :slight_smile:

  2. We did not, on the problem boards in question, program the bootloader ourselves, but rather, purchased assembled/programmed from Sparkfun. (Since then we have also played a bit with bootloader programming though, but on other boards. I certainly agree though that when we do program ourselves, we could easily get the fuse settings wrong…)

  3. I will look at various things, including power supply voltage coming in to the Arduino, with the scope.

  4. Thanks for letting me know your feeling about things and what might (or is unlikely to be) the problem - that is very useful.

  5. To clarify your thoughts, are you saying that:
    a) It is unlikely the chip has lost its programming, period? (Which is odd since reprogramming fixes at least some manifestations of the problem…)
    or
    b) If it has lost its programming, it was not spontaneous but rather, due to some circuit design error on our end?

  6. If you have any other thoughts about what I might check, of course please shoot them my way. And if I have any further info to share (either for my benefit by asking you questions, or for other’s benefit in the future), I certainly will.

Regards,
David

Hi David

answering questions in the forum is always a bit of a guessing game without much certainty, as one can’t see the circuit at hand. In general the Arduino is extremely reliable, and what you are describing should not ever happen. Atmel would be out of business if their chips spontaneously reprogrammed themselves! While they do have a little stock-options problem at the moment, their chips are working fine!

I would poke around with your scope set to AC input, and see if there are excessive transients on the power supply lines. My guess is that the program is not “erasing” itself, but rather it isn’t being allowed to start…

D

Hi Daniel.

  1. Your comments about what to look for are invaluable, as are your “what to expect from the Arduino board generally, reliability-wise”. I am delighted to learn my experience is not the norm, for I would very much like to be able to use them in our projects.
    And I appreciate the difficulty of debugging in this very indirect manner…

  2. I understand your comments about Atmel (stock options aside :slight_smile: etc., but let me note:

a) We have had certain MAJOR manufacturer’s uP’s (I will leave unnamed, but not Atmel…) lose their programming periodically - we still have not figured out why. (And we have not ever had this happen with select other manufacturer’s uP’s, so it’s not simply that we are idiots…) And we have found design/functionality errors in the silicon itself for that same (major) manufacture, a truth which they reluctantly confirmed when we called…

b) One ARM chip my colleague used could be directly sparked onto the chip itself with a 1/8" piezo-generated static discharge and be fine (both physically, and even continue running!), while another manufacturer’s ARM chip would reliably hang during operation if the spark generator was discharged two inches above the PCB in open air…

So I have stopped assuming anything until I have personal experience, and/or feedback from someone with personal experience such as yourself.

I will let you know what I learn (if I can eventually figure anything out…).

Thank you again for all your prompt, friendly advice,
David

Daniel, I’d like to add my thanks for this one too. We’ve had a few people seemingly experience this problem, and it’s definitely out of my area of expertise. If you can manage to figure out what’s happening, it’d be a great help.

hey guys

yeah I have never heard of this happening… which leads me to believe that it has to be something design-related with the outboard circuitry. Not sure I can help much, other than to say that you should have a good look at the power supply. Perhaps have a look at your code as well and see if there is anything that would make it hang unexpectedly.

D

PS: I submitted a question to Atmel engineering to see if flash memory loss is possible under some unknown condition… Will let you know what they say.

wow!

Atmel emailed me a user password for their support site in about two seconds, and I immediately found this document on how to corrupt your eeproms! Although ( I am not completely sure if by EEProm they are referring to the small eeprom or the larger flash memory.)

I have to run at the moment, but I leave you this juicy tidbit for your perusal:

EEPROM corruption
Question
My EEPROM is sometimes corrupted, what can I do to prevent this?
Answer
During periods of low VCC, the EEPROM data can be corrupted because the
supply voltage is too low for the CPU and the EEPROM to operate properly.
These issues are the same as for board level systems using EEPROM, and the
same design solutions should be applied.
An EEPROM data corruption can be caused by two situations when the voltage
is too low. First, a regular write sequence to the EEPROM requires a minimum
voltage to operate correctly. Second, the CPU itself can execute
instructions incorrectly, if the supply voltage is too low.
EEPROM data corruption can easily be avoided by following this design
recommendation:
Keep the AVR RESET active (low) during periods of insufficient power supply
voltage. This can be done by enabling the internal Brown-out Detector (BOD).
If the detection level of the internal BOD does not match the needed
detection level, an external low VCC Reset Protection circuit can be used.
If a reset occurs while a write operation is in progress, the write
operation will be completed provided that the power supply voltage is
sufficient.
What all this means is: If you can’t guarantee power, you have to make sure
that the part is kept in RESET when it is outside of spec. You can do this
using the internal BOD, but this will not take care of the case when an
EEPROM write has already began when the part loses power. Thus you must also
make sure to write to the EEPROM only when you’re sure to have power.
It is not enough to write to the EEPROM during “safe periods” and leave the
BOD disabled, though: If the part gets outside spec it can begin executing
erratically, and the program couter could concievably jump to the part in
the code in which the EEPROM is written.
These are not bugs but intrinsic demands of the EEPROM.
Interrupts are not disabled automatically, but the customer is urged to take
care of the following during EEPROM write (the order of steps 3 and 4 is not
essential):
1. Wait until EEWE becomes zero.
2. Wait until SPMEN in SPMCR becomes zero.
3. Write new EEPROM address to EEAR (optional).
4. Write new EEPROM data to EEDR (optional).
5. Write a logical one to the EEMWE bit while writing a zero to EEWE in
EECR.
6. Within four clock cycles after setting EEMWE, write a logical one to
EEWE.
Caution: An interrupt between step 5 and step 6 will make the write cycle
fail, since the EEPROM Master Write Enable will time-out. If an interrupt
routine accessing the EEPROM is interrupting another EEPROM access, the EEAR
or EEDR Register will be modified, causing the interrupted EEPROM access to
fail. It is recommended to have the Global Interrupt Flag cleared during all
the steps to avoid these problems.

Edit: I just checked the bootloader tutorial in the Playground, and it says to use the following settings for the Atmega168 ( this needs to be confirmed with Massimo or Gianluca);

Brown-out detection disabled; [BODLEVEL=111]

This certainly sounds like a likely answer to the problems you’re experiencing, if the BOD fuse settings above are the ones in use.
Also, you said above that you are using some kind of surge-suppressor on the DC line. If this is on the DC input side of the Arduino, these contain inductors that slow the voltage rise; might be worth checking out.

D

Daniel:

  1. Very interesting, thanks! Re surge protection, I have both transorb and (small) cap in series - so certainly rise time is indeed limited.
    However, we just connected a variable voltage power supply and ramped it up/down dozens of times to no negative effect… Maybe I will have to build signal generator to do 10^10 times :-).

  2. Because I am actually quite novice to the whole Atmel / Arduino environment, may I ask (in order to be able to think about Atmel’s document):

a) Once a program has been written to the EEPROM, and assuming my sketch does nothing other than standard variable writes, does running a sketch (or the Arduino boot loader) involve any writes to EEPROM? I would think not, but don’t in fact know…

b) I gather from the Atmel doc that even if normally EEPROM would not be written, it could be inadvertently written simply by low voltage on the power supply due to corrupted program counter (and thus CPU inadvertently getting into EEPROM write code), yes?

c) I also gather from the end of your post that currently the standard fuse settings (as in the case for example in boards received by Sparkfun) are with brownout disabled, and thus are counter to recommendation by Atmel, and that we should re-program them to rectify that, yes?

More if/as it develops,
David

(Irrelevant to current problem, but) Make that “transorb and (small) cap in >parallel<” not “series” ! :slight_smile:

Hi Daniel and All:

Important update:

  1. I set up an Arduino that I believe (I will need to 100% confirm early next week when my colleague returns) is programmed with fuse settings as per standard (re Brownout being disabled, assuming that is indeed the standard setting as Daniel suggested yesterday), and with our external circuit in place.

  2. I then connected a signal generator with sin wave output in about the +/- 10V range, and series diode to give only positive voltage, and connected it to the Arduino round jack power inlet.

  3. I then varied the frequency over an approximately 5 minute period between about 0.1Hz and 1000Hz…

  4. And Voila - I erased the programming in the board in a manner similar to what happened before!

  5. Note that in the past, one time it was the boot loader that got erased (I initially thought I had physically destroyed the USB chip, but reprogramming the boot loader fixed things), and most times it was our program that got lost. This would make perfect sense in ratio since I imagine even a short program is likely longer than the boot loader, and so the chances that the user program rather than bootload got corrupted would be high - assuming random corruption.

  6. Next week I will try, with the help of my associate, to enable the Brownout Fuse and see if that fixes things. My interpretation, admittedly a bit shaky, is that if neither the bootloader nor our program is writing EEPROM normally (I still hope to have this confirmed the Arduino Powers That Be…), that simply enabling the Brownout Fuse is enough to (theoretically) fix things?

Regards,
David

hi

yes that sounds like it might be worth investigating… although I have to say that the things you have mentioned (spark generators and five-minute sine wave power supplies) are things one tries to avoid in a design. I understand the torture-test part, but at the extremes, like a +/- 10V sine wave at 1HZ or less, things are bound to fail. :slight_smile:

D

Hi Daniel:

Just as an FYI and for context:

  1. Re our general use, we make museum and trade show exhibits that have to work (and generally do :slight_smile: 24x7x365 for years on end.

  2. Re spark tests, we ground everything, puts || caps and series resistors where needed on PCB and elsewhere, etc. But if, even after that, a spark to a (grounded) metal plate in the user interface causes a hang, we simply cannot use such sensitive components, because carpeted trade shows (and many museum environments) generate 1/8" sparks to any metal surface every few minutes from tens of thousands of visitors’ hands…

  3. Re sin wave generator:

a) I hope you noticed my diode comment, so only 0 to +10V actually going to board (well within Arduino voltage reg spec).

b) I hope it is clear that I am not expecting the system to reliably boot with 1Hz or less power-on rise time! :slight_smile: BUT, if such an situation reliably erases program memory (which it appears to do) so that chip never again boots with any power-on rise time until it is reprogrammed, that is not OK, as that implies the chips will almost certainly become spontaneously de-programmed after just a few hundred-to-thousand power cycles in the client environment. And as you correctly pointed out in your previous post, there must be some solution because Atmel could not possibly manufacture such a chip - at least not for non-hobby use - imagine if every few thousand starts of your microwave oven the internal uP became deprogrammed! :slight_smile:

More as I learn it,
David

Hi David,

yes I understand completely… I just meant that spark generators = problems!

I understand the test, it is intriguing. Let us know if enabling the BOD fixes things in the sine-wave test.

D

edit: it would be interesting to try the Atmega168V, the low 1.8-5.5V version, to see if the same thing happens.

Will do!

David

hey

Atmel has an application note here about corrupted flash memory. It is for the C51 series of microcontrollers, but it seems to say the same thing the EEPROM corruption support doc posted above says: brown-out detection.

D

PS if you want 24-7-365 reliability, you might consider designing an industrial-style Arduino, with optically isolated inputs and outputs, and a guaranteed fast rise-time power supply. There might even be a market in selling it :slight_smile: If rise-time is indeed the issue, what about using 4700Uf of supply capacitors ( or even a 1Farad?) and then using a long rise-time R/C combination on the reset pin.

Hi Daniel and All:

  1. Thanks for the additional reference.

  2. I am now having trouble reliably reproducing the problem. After that first (I thought) successful replication of it, I have cycled the input power for over 16 hours at different frequencies and not corrupted the memory.

  3. So, until I can reliably replicate the problem, “fixing” it will not be detectable as such…

  4. In conclusion, I may be out of touch for an extended period until I learn more.

David