Help me troubleshoot dead boards

I’m soldering bunch of custom ATMega1284p boards. It’s a good design, I’ve had it working for months, but in the last batch about 50% of them are not working, on most of them I cannot upload sketch via FTDI, however if I plug in chip with firmware on it, everything works.
Can someone suggest what part I should be checking and what is the most common component that will prevent data from being uploaded?
I kind of suspect I either got a bunch of bad 16 Mhz crystals or maybe 22 pf Caps. Could this prevent me from uploading?
Also on 2 boards I’m able to upload sketch partial (i.e. 19%) before it stops. Again 50% of boards have no issues, and I check and triple checked,everything is soldered correctly. Any suggestions would be great appreciated! :slight_smile:

Could post a little bigger file? It doesn't quite fill my 50" monitor ;)

CrossRoads:
Could post a little bigger file? It doesn’t quite fill my 50" monitor :wink:

Are you being sarcastic? :slight_smile: It’s pretty big :slight_smile:
I’ll attach eagle schematic…
Here’s the error I’m getting:

Reading | ################################################## | 100% 0.02s

avrdude.exe: Device signature = 0x1e9705
avrdude.exe: reading input file "D:\projects\DIY\XRONOS\firmware\V2\xronos2_04_0
3_1284.hex"
avrdude.exe: writing flash (51554 bytes):

Writing |                                                    | 0% 0.00s
avrdude.exe: stk500_paged_write(): (a) protocol error, expect=0x14, resp=0x64
avrdude.exe: stk500_cmd(): programmer is out of sync

XRONOS_CLOCK_V2.1.sch (1020 KB)

Yeah, 5112 x 3312 is way too big too scroll around.

I don't see anything unusual. What do you have set for fuses?

CrossRoads: I don't see anything unusual. What do you have set for fuses?

mighty_opt.bootloader.low_fuses=0xff mighty_opt.bootloader.high_fuses=0xde mighty_opt.bootloader.extended_fuses=0xfd

Yeah I don't understand why it's not working :( I uploaded sketch to via good board, swapped chip, everything works, but when I try to upload with error above, and memory get's corrupted at that point... It has to be bad trace, component or solder joint somewhere...

You mentioned you thought about the 16 Mhz crystals or maybe 22 pf Caps., so the frequency would not be correct? Have you done a frequency test (see if 10 minutes registers as 10 minutes, etc)? on the bad boards?

This sounds depressingly like the Serial0/Reset (apparent) electrical interference that some people have experienced with the 1284. :-( Although (if so) you're the first person I've heard who has both working and non-working boards; usually it's been "mine works fine" and "mine doesn't work at all."

Like Bill ,the first thing I thought about was the infamous RX0 problem. On one of the bad boards, you might try cutting the trace to RX0 from the FTDI header, inserting a 1K-5K series-R, and tacking a 50-100pF cap onto the RX0 pin.

51K is a long sketch to upload via RS232, but I’ve been doing similarly [a lot] with my 1284 boards, and have had no trouble.

Also, attached a resized figure that’s easier to read. I like the features on the board, BTW :-).

Schematic looks fine - how about the board layout? That might tell us more. Are there traces near the crystal traces? Are there decoupling caps near the uC power pins? I don’t see any on the schematic now that I look again.

CrossRoads:
Schematic looks fine - how about the board layout? That might tell us more. Are there traces near the crystal traces? Are there decoupling caps near the uC power pins? I don’t see any on the schematic now that I look again.

Attaching board. I think I have decoupling caps everywhere :slight_smile: Crystal is very close to the chip…

oric_dan:
Like Bill ,the first thing I thought about was the infamous RX0 problem. On one of the bad boards, you might try cutting the trace to RX0 from the FTDI header, inserting a 1K-5K series-R, and tacking a 50-100pF cap onto the RX0 pin.

51K is a long sketch to upload via RS232, but I’ve been doing similarly [a lot] with my 1284 boards, and have had no trouble.

Also, attached a resized figure that’s easier to read. I like the features on the board, BTW :-).

Wow I never heard about this issue… Thanks for letting me know! I somehow got 2 boards working again (not sure exactly how I did it), but 2 are 100% won’t upload. I’ll try cutting RX line and putting resistor and cap (should other end of cap be connected to the ground?)

XRONOS_CLOCK_V2.1.brd (178 KB)

You've got SCK running right under the crystal traces, and MISO passing quite close to one pin. That can't be good during a download.

Here are links to a couple of threads - may help to start at the end [where solutions were basically had], and read towards the beginnings, lol. http://forum.arduino.cc/index.php?topic=146773.0 http://forum.arduino.cc/index.php?topic=139671.0

You might also try changing fuse setting to "full power oscillator", I believe this is it, http://forum.arduino.cc/index.php?topic=146773.msg1109658#msg1109658

I found 5 or 6... one on the DS1307 (DS3231 is much more accurate) one on the output of the regulator and one on the USB bus... Missing at least 4 more ... two on the controller Vcc and AVcc and pin 10 is open (the control signal for the 3V3 level shifter A Very safe rule of thumb is to place a .1 uF cap at every point where you have a Vcc net name... You don't by any means stuff them all... And pads or holes are nearly free anyway so put them on the schematic... The wonder is that it works at all... missing those parts. BTW, us old Geezers harp a LOT about bypassing... Didja ever wonder why?? do you really think the error is unique to you?

Doc

Can someone suggest what part I should be checking and what is the most common component that will prevent data from being uploaded?

My guess is that since numerous boards have been completely satisfactory, that design issues (while improvements may be made) is not at fault. IMO, there is a fair chance of board contamination by the board manufacturer. You may be able to salvage the boards by thoroughly cleansing and oven drying the board.

Yes, a batch of poor crystals or crystal load caps may cause frequency shift and subsequent serial timing issues. But board contamination can cause resistances between conductors and these cross-currents can cause havoc.

Good luck,

Ray

oric_dan: Here are links to a couple of threads - may help to start at the end [where solutions were basically had], and read towards the beginnings, lol. http://forum.arduino.cc/index.php?topic=146773.0 http://forum.arduino.cc/index.php?topic=139671.0

You might also try changing fuse setting to "full power oscillator", I believe this is it, http://forum.arduino.cc/index.php?topic=146773.msg1109658#msg1109658

Dan, that was it. I'm so happy, I can't even express it! As soon as I changed fuses it started to upload to "bad" boards. I've yet to verify all of them, but so far at least one is "resurrected"!!! :) Thank you!!!

Good to hear. I’ve never had any problem with my own boards, and neither has Bob(uino), but for some reason some boards just “break bad” [like the TV series ;-)]. Something to do with the layout, and crosstalk from RX0 to the adjacent oscillator input pin. Using full-power oscillator apparently minimizes cross-talk effects.

Full power means more drive, means a lower impedance seen @ the oscillator and thus makes it less susceptible to stray electrical fields from the SPI bus.
Kinda like reducing the value of a pull-up resistor to increase current drive for long or noisy two wire comms…

Doc

All boards are now operational! You guys rock!!! Yeah it must be either manufacturer defect with PCBs or something else, but it does appear like a crosstalk issue. Full power fuse totally cured it tho! :slight_smile: I couldn’t be happier. BTW this second time fuses saved my bacon, first time it was BOD with 644p (which would randomly loose Flash program on chips during power loss)…
Also this issue with XTAL crosstalk now completely explains why I couldn’t upload any sketches via my “Rapid Bootloader” shield. It worked with 644p, but not with 1284p chips. Now with new fuse I can easily burn sketches with ZIF socket instead of prying them with special tool of regular IC socket on spare Xronos board…
Now I’ll probably have to disassemble all 15 clocks I made so far and redo bootloaders/firmware on all of them just in case. I just hope these new fuse values won’t affect other functionality :slight_smile: Anyone coming to Maker Fair in NY this weekend? I’ll be there with these Xronos clocks, would love to meet you guys in person!

I doubt it's a manufacturer pcb defect, rather more likely to do with trace routing. Also, it seemed to be the case that the more recent 1284 chip runs were less prone to the problem, but who knows. In my case, I "always" use guard rings around the crystal pins, and believe Bob(uino) does likewise. Good luck at the faire.

oric_dan: I doubt it's a manufacturer pcb defect, rather more likely to do with trace routing. Also, it seemed to be the case that the more recent 1284 chip runs were less prone to the problem, but who knows. In my case, I "always" use guard rings around the crystal pins, and believe Bob(uino) does likewise. Good luck at the faire.

Can you tell me more about those guard rings? I'd like to implement it in the future. BTW, problem manifested elsewhere. There's something strange with SPI bus still on those boards, I can't use them... For example I use SPI for both reading SD card (for sounds) and receiving pockets from RFM12B module. My program puts radio to sleep when it reads files otherwise there's a conflict. It works great on other boards, but those that has issues with uploads fail to read first file after radio is supposedly sleeping. For example I play back "file1.wav", "file2.wav", "file3.wav", it always unable to read "file1.wav". If I completely disable radio chip tho, it works fine, so I can't use both for some strangest reason... It's bizarre...