Has anyone seen this type of failure? I'm kind of screwed and need some help...

Hi everyone...

I have a custom built Arduino compatible board based on the ATMega1284P-AU (surface mount) with an external 16 MHz crystal. I have a total of 6 boards, 3 that I built one board myself and three done by a professional assembler.

Three of the six failed, all with a similar failure. What happens is that I hook up my AVRISP mkII programmer and I can read the device ID, but generally I can not write to the MPU. I don't think I set the fuses wrong and locked myself out, as I was pretty careful not to do that. I think that I may have burned a hex file into one of them but I'm not sure. When I try to set fuses and/or lock bits the verification fails. After several tries, I am unable to read the MPUs device ID any longer.

I'm really screwed with this and would appreciate any help or suggestions. I don't know if maybe I got a bad batch of MPUs, or if its an assembly error or a subtle design flaw or what. I doubt its the later, as I have 5 total board with this or a similar design working.

Thanks...

Jim

Well, the nasty question is which 3 of the 6 boards are failing, yours or the assemblers :wink:

Can you check the crystal to see if its oscillating? Maybe you have the wrong load caps on this time around?

Are the voltage levels right? Can you probe them with an oscilloscope to check for signal quality? (though I'd spend more time on the oscillator question)

Are there any tantalum caps in your design (and if so, are they backwards)?

--
The Ruggeduino: compatible with Arduino UNO, 24V operation, all I/O's fused and protected

Thanks for the reply!

RuggedCircuits:
Well, the nasty question is which 3 of the 6 boards are failing, yours or the assemblers :wink:

Actually, of the revision B boards, 2 out of 3 made by the assembler failed. Two of the three I made in my toaster oven worked. I made two of an earlier revision and both worked. I think, that was months ago now. I checked the work of the assembler and the chips and other parts all seem to be in the correct spots, with the correct orientation. I can't see any shorts even with my magnifier (not that its a great magnifier) and none of the chips get particularly hot.

Ironically, the last board I made failed in this same way, which is what motivated me to hire the boards out.

Can you check the crystal to see if its oscillating? Maybe you have the wrong load caps on this time around?

Caps all look the same so they could be wrong. If the caps were wrong, would it have worked for a little while?

Crossroads suggested I use an o scope with a low capacitance probe to see if it is oscillating. I don't have one, so I am going to ask around.

Are the voltage levels right? Can you probe them with an oscilloscope to check for signal quality? (though I'd spend more time on the oscillator question)

The Vcc of the MPU has 4.98 V according to my volt meter, but of course its too slow to see if that voltage dips if the chip fires up. Again, i need an O scope I guess.

Are there any tantalum caps in your design (and if so, are they backwards)?

No there are none.

The failing board I just tested again draws 142.6 mA, which is less than a working board that draws 208 mA, but has LEDs and other components. But it seems that the MPU must be doing something with all that current. Nothing gets hot though.

WOW! The most wildly improbably thing just happened!

I was measuring the current one of these boards draws, and after I did that I didn't bother to unhook it or even turn off the power supply. I got preoccupied with something else. Suddenly, I noticed that the LEDs on my board were blinking erratically. I think this is the board I tried to burn a test app into that makes those LEDs blink.

That gave me a hint the crystal was an issue. So, I did what any experimenter would do. I poked at it. That made things change. The LEDs would stop blinking, or change their rate. I hooked up my programmer and I could read the device ID!

So I whipped out my soldering iron and tried to reheat the crystal (its thru hole). No joy. So then I replaced it with a 20 MHZ crystal (I don't have any 16MHz on hand) and wala! It seems my board is repaired! Later today I'll go out and get some 16 MHz crystals and see if that's the issue with the other board I have here as well.

So then, that brings me to the next question. Why did this happen? The crystal I am using is thru hole. The one I installed on the board I made that failed, I installed with a soldering iron. The assembler used a wave soldering machine for all his. It is a DigiKey part number: 887-1244-ND which is a 16 MHz, 18pf thru hole crystal made by TXC Corporation. Are these knows to be easily damaged by heat, or for being particularly fragile?

skyjumper:
WOW! The most wildly improbably thing just happened!

I was measuring the current one of these boards draws, and after I did that I didn't bother to unhook it or even turn off the power supply. I got preoccupied with something else. Suddenly, I noticed that the LEDs on my board were blinking erratically. I think this is the board I tried to burn a test app into that makes those LEDs blink.

That gave me a hint the crystal was an issue. So, I did what any experimenter would do. I poked at it. That made things change. The LEDs would stop blinking, or change their rate. I hooked up my programmer and I could read the device ID!

So I whipped out my soldering iron and tried to reheat the crystal (its thru hole). No joy. So then I replaced it with a 20 MHZ crystal (I don't have any 16MHz on hand) and wala! It seems my board is repaired! Later today I'll go out and get some 16 MHz crystals and see if that's the issue with the other board I have here as well.

So then, that brings me to the next question. Why did this happen? The crystal I am using is thru hole. The one I installed on the board I made that failed, I installed with a soldering iron. The assembler used a wave soldering machine for all his. It is a DigiKey part number: 887-1244-ND which is a 16 MHz, 18pf thru hole crystal made by TXC Corporation. Are these knows to be easily damaged by heat, or for being particularly fragile?

Did you include the caps for the crystal? You don't mention them, but I am wondering if maybe (if you didn't) the lack of caps was causing it to not oscillate - but the 20 MHz crystal somehow had the "right" value and would oscillate...? This is all just a wild guess...

Well the 16 MHz crystal has a capatance of 18pf and there are two load caps each 22pf.

I just ran out to Modern Device in Providence, grabbed some 16MHz crystals and now all the boards are working! My guess is that the crystal I selected does not like heat, so my soldering iron and the assemblers wave solder machine damaged it. I am looking for a replacement from a different manufacturer.

Good to hear they are working again!

Maybe the crystals got set too close to the board and so the two leads' and solder pads got shorted out across the case (assuming it's a metal-case).
I've seen that. They make mica insulators that shape for that reason.

There are a bunch of options for the clock oscillator on the AVRs. Most of the Arduinos set them for "low power crystal oscillator", which is supposed to be good for "up to 16MHz." There's also a "Full swing crystal oscillator", and a bunch of "startup time" bits that might be worth investigating (or necessary, depending on bits of crystal specification that I'm not particularly familiar with.)

Maybe the crystals were damaged during transport? I've dropped a circuit before, and noticed the code would not execute, and then replaced the crystal to get it working again.