I'm an intermediate-to-advanced Arduino user (not an expert programmer...a M.E.) and have exhausted the internet's knowledge to my understanding on this topic. I will not bore you I hope with irrelevant project specifics, and greatly appreciate any help I might get about this topic. I really don't think I'm the only one with this question! I understand what the BOD and WDT'ers are, and my application needs to be under 10 micro-amps current draw directly from a battery. I will run the AVR directly from the battery, and I have a sketch functioning with 6 micro Amps in deep sleep mode with the BOD off. BOD voltages will occur, and though noone will die it would result in returned products if the uP locks up or is bricked. For this application, the uP needs to run for 20 years without corruption and hundreds of these uP's will be used per device...so in other words I'm looking for rock solid performance for this program and hardware.
So my specific question is: Without BOD, can the arduino (within any practical probability) corrupt it's own flash or EEPROM, assuming I am not actively writing to flash or EEPROM when the brown-out voltages occur? I read that during brown out the uP cannot be counted on to perform properly and may lockup or give random outputs. My WDT would reset the lockup or corrupt code in 8 seconds, so I'm cool with that...but could permanent damage to the chip or code occur? How likely is it that the haywire uP during brown out could rewrite some of it's own code and be permanently malfunctional or bricked?
Thank you in advance; you guys do amazing work here!
Best,
Jared Brandt
My understanding from Atmel documentation is that it is theoretically possible - if the voltage gets so low that the processor registers are corrupted, nearly anything is possible - but extremely unlikely. It probably helps if the chip is sleeping when the brownout occurs, and is power-cycled (eg, by removing the dead battery to replace it) when power is restored.
I don't think I've seen any posts here from people whom that has actually happened to though, and we see a lot of people with weird problems posting here.
Although there are engineers here who might be able to give an informed opinion on this topic, I think the forums at https://www.avrfreaks.net/ would have more engineers familiar with industrial uses of the ATmega328p chip. Your technical questions might get better responses on those forums.
You need to do some serious development work looking at failure rates and get professional advice - hoping to have numerous devices out there for 20 yrs is a tall order.
I would be seeking advice from Atmel about failure rates. I presume you are not using an Arduino board, which is just not suitable.
jaredabrandt001:
For this application, the uP needs to run for 20 years without corruption and hundreds of these uP's will be used per device...so in other words I'm looking for rock solid performance for this program and hardware.
Your logic is fundamentally flawed;
Its very unlikley indeed that you will get 'rock solid' performance from 'hundreds of these uP's' and 'without corruption' and 'for 20 years'
You need to plan for failure and corruption.
On a scale of 1 to 10 of achiving what you seem to want, I would put your chances at -10 or lower.
srnet:
On a scale of 1 to 10 of achiving what you seem to want, I would put your chances at -10 or lower.
Why do you think so?
@OP: when you do not enable selfprogramming by fuses the MCU is unable to write to Flash - so corruption is impossible to happen I think. If your application never writes to EEPROM you may move stored data to Flash which can be protected. But the Watchdog is not almighty: in theory it is possible your program enters some wrong endless loop which does nothing except for resetting the Watchdog. AFAIK it is nearly impossible to prove the code does not contain possibility for such failure. OTOH such lockup should be very very rare.
If you are using onboard WDT it consumes ~4uA. It is lot of current from your ~6uA budget. Did you consider using BOD but disabling it for sleep? It should reduce the average consumption of BOD circuit greatly.
I just reread the OP:
jaredabrandt001: Without BOD, can the arduino (within any practical probability) corrupt it's own flash or EEPROM, assuming I am not actively writing to flash or EEPROM when the brown-out voltages occur?
If the risk of brownout is only for limited time and you are able to predict this time you may sleep for the period. In sleep the MCU should be safe from any corruption down to POR threshold.
'rock solid' performance from 'hundreds of these uP's' and 'without corruption' and 'for 20 years'
That is unreasonable and unachievable.
I worked for a few years designing set top boxes that had to have as low a failure rate as possible. At the time the firm were aiming for just a 1% failure rate over 5 years. An analysis of my box showed it had a 0.1% failure rate which was amazing. And we spent a lot of time analysing returned failures.
Having a 0% failure rate over 20 years is simply impossible, even military equipment does not have specifications like that.
Read up on the bath tub curve for component failure.
BBC micro had an AVR uC as processor? Or set top boxes?
From Datasheet the memory retention failure should be "much less than 1PPM for 20 years @85°C". Or what else do you think may "wear out" in the AVR?
Anyway OP asks about processor failure due to brownout. Having BOD enabled I think there will be less than 0.0001% failure rate due to BOD. He asks interesting question: is WDT able to provide similar protection?
I agree it seems the system is quite complicated. And it may fail due to many reasons in 20 years. But I doubt the AVRs will be the leading cause unless damaged externally such as via an ESD event. A mechanical failure of some sort or failure of capacitors is much more likely IMHO.
BBC micro had an AVR uC as processor? Or set top boxes?
No but the principle of reliability engineering transcend the specific processor being used.
Or what else do you think may "wear out" in the AVR?
Nothing, but that does not mean it will work forever, that is what the bath tub curve tells us. Any component has a probability of failing at any time, the chance of getting thousands of processor no matter what to have 0% failure in 20 years is in itself also 0%, it ain't going to happen. Each failure might be for a different cause but fail they will.
Having BOD enabled I think there will be less than 0.0001% failure rate due to BOD.
Atmel data sheets are not tremendously accurate when it comes to memory retention, so I think your estimate is flawed.
Even so it is totally unrealistic to expect this level of failure.
But I doubt the AVRs will be the leading cause
I doubt they will be the leading cause either, but there will be a failure rate of the AVRs when you examine enough returns. At the end of the day if you have a failure to the customer it is a failure no matter what the cause.
If you were to complain to ATMEL that their chip failed, they would take it back ( if you are a big enough customer ) and decapsulate the chip and examine it under an electron microscope and would declare that it got static damage during your manufacturing process, which you can't prove one way or the other.
To get better longevity and reliability, put three devices in each location and use majority rules on the results. Ideally, the three devices should use different processor designs and different supporting circuitry to avoid a design error taking out all three at the same time.
Wow, thanks for the responses all! I guessed there might be some varying opinions out there on this topic. I see some valuable input here!
So I need to minimize the possibility of failures. Things I see here that help do this:
disable self-programming fuse should prevent flash from being overwritten. I'd love to hear more opinions on this. @DrAzzy, your comment: "I don't think I've seen any posts here from people whom that has actually happened to though, and we see a lot of people with weird problems posting here." is the kind of input I'm looking for, along with your preceding info. I acknowledge the possibility of random I/O's flipping bits during brown out, but statistically it seems (I have not done the math) nearly impossible. As long as the WDT pops in and resets me, I'm ok as long as flash is unchanged and uP is not damaged.
enable brown out when I can. I need to have brown out disabled during sleep for power consumption reasons, but for this application I can enable it when not asleep, which is 95% of the time it would occur I'd guess.
I think the original question still remains, can WDT reset adequately substitute for brown out, given the flash disable fuse is disabled.
@Smajdalf:
Exactly! Thank you for such an insightful response (and that's not just because I liked your answer)! This ATTiny13a consumes 6uA with watchdog enabled at the moment, and a 10mA surge once every 8sec for WDT reset adds another 1uA on average. Brown out takes 15uA on it's own, so I'm hoping to nix it when asleep.
@srnet:
"Your logic is fundamentally flawed", followed by "On a scale of 1 to 10 of achiving what you seem to want, I would put your chances at -10 or lower. "
I'm not too sure I can take you seriously, but as @johnwasser highlights, that's what happens on these forums:P
I suppose I need to clarify a couple of things about the reliability:
The WDT is the reset type, and no harm is done if the program resets every 8 seconds...harm is done when the uP flash is corrupted or if it is unresponsive. I've seen several others post questions about this, as the datasheet basically acknowledges the possibility of random events during brown out, but doesn't so much explain about permanent damage to the hardware and software sufficiently IMO.
All devices will be tested before they ship. Warranty will be provided for 10yrs but I want reliability to 20yrs in design.
uP will sleep 99% of it's 20yr service, and when away it's not hyperactive. Transistors have a limited # of offs and ons and we won't come close to these limits.
uP is used for communication and monitoring. System will be encrypted and errors will be checked. We need under .01% failure rate due to uP error in the field in 10yrs of service though, because around 1k processors are needed in many single applications. And no, larger uP's will not assist.
...So the above conditions pose the separate question for another forum topic: "Is this reasonable, do I need to redefine my conditions, or redesign?"
jaredabrandt001: @Smajdalf:
Exactly! Thank you for such an insightful response (and that's not just because I liked your answer)! This ATTiny13a consumes 6uA with watchdog enabled at the moment, and a 10mA surge once every 8sec for WDT reset adds another 1uA on average. Brown out takes 15uA on it's own, so I'm hoping to nix it when asleep.
Heres the problem, to improve reliability you want to keep the temperature stable, temperature cycling is one of the failure mechanisms.
Although the amount of heating caused by a 10mA surge is small, there will be a lot of micro temperature cycling if its every 8 seconds.
No, it had either a Synertec or a Rockwell made 6502, both examples of Silicon based microprocessors, the AVRs are another example.
Any complex silicon device has a chance of failure and this is well know to increase with age.
I appreciate it sounds a bit weird that processors would 'wear out' but there are practical reasons for this, the internal stresses caused by temperature cycling being one of them.
OP needs to define what a 'failure' is in his product.
Many devices can run for years without intervention - but... either by design or as a defensive strategy - they may need to perform a 'soft' restart on themselves to be confident of operating parameters.
This doesn't mean corruption or damage - but good programming in this scenario might include checksumming memories and other strategies to validate the operating environment - before 'launching the rocket'.
Guys, some of you are calling an experienced engineer inexperienced; with next to zero information. If you don't have substantial knowledge or explanation, your comments litter forums. You're simply not giving good information here...save your trash talk for youtube please. If you have something informative to add, I'd be glad to hear it, as long as it might be based on fact.
As we all should understand, far more complex electronic devices work around us all the time, and statistics explain failure rates quite adequately. We're all nerds here; meaning there are no hot cheerleaders, so there's need to try to make oneself feel bigger than you are. Please leave the 20yr hardware to me; I've got this.
The uP's will basically be daisy-chained and used for sensor reading and comm relay. Nothing critical here, but yes 400 of these uP's need to talk for 10 years at least. If this is impossible I must have far too much faith in electronic hardware. I know to avoid electrolytic caps, transformers, moisture, and high heat devices...these are problems for 20yr hardware. I am not aware that AVR uP's are not up to this task, and I see no data to show otherwise.
As for the 20yr-lasting software side...the only relevant questions remaining here I think are:
With or without brown out protection, how likely is it that the registers or flash might be corrupted (with self-programming fuse off). Statistically impossible has been suggested...is it true?
Is the watchdog timer reliable enough to reset the uP despite brown out or other lock-ups, for lets say over 10 years.
*Devices will fail...rock solid reliability does not mean zero errors. .01% of uP failures in 10yrs is acceptable.
As an expert, you must know there are no ‘absolutes’ when considered in isolation.
The UP is possibly the last component that will fail"
Many years ago in formal diagnostic training, we learned that the ‘computer itself is the least likely component to fail in an equal world.
The cause of those other failures is as diverse as nature itself.,
Connectors, environment, power, design and operational errors etc.
You can minimise your chances of failure with aggressive, accelerated stress testing and compliance monitoring (not my areas), but that magical .01% can be blown away by a drop of fly poop in the wrong place.