Is Timer0 physically broken?

The following code produces completely unexpected results on an Uno clone that uses the CH340C USB chip, but works as expected on an Elegoo Uno R3 that uses the Mega 16U2 chip.

unsigned long Now;

void setup()
{
  Serial.begin(250000);
}

void loop()
{
  delay(100);
  Now = millis();
  Serial.println(Now);
}

The expected result is that it just keeps printing the updated time to the serial monitor. Again, no problem with the Elegoo. It's approaching 3 million as I write this.

However, on the clone, it eventually crashes. I've seen two variations of crashing:

  1. It gets to a number, and then keeps printing the same number indefinitely. Once it get to that number, it's obvious that the delay() function is no longer working. (The output starts rapidly scrolling up the screen.) I've seen this happen at 21550 (twice), 31174, 34382, and 366960, as well as several other number that I didn't record.

  2. One time it got to 451822 and then just stopped completely. Several seconds later, the Uno reset and started over. (I assume a watchdog timer timed out.)

What's going on here? Is Timer0 physically broken? If so, then why does it work as expected for awhile? Could there be a design problem with the clone, perhaps related to the CH340C chip? Again, if so, then why does it "kind of" work?

If the internal timers are in some sort of unexpected mode, don't they go back to whatever the default mode for the Arduino IDE is when you upload the sketch? To be clear, I uploaded the above sketch to both boards from the same computer, with the same IDE, within minutes of each other.

Any ideas?

It might be your non-standard baud rate. Try one that is a multiple of 300, like 115200.

1 Like

I tried five times at 115200 baud. The problem remains. The first four runs got to 39592, 26761, and 21347. The fifth run was somewhat different. It got all the way to 217392, started to print a new line, printed "21" and then stopped completely. I waited for several minutes, and it never reset.

Compile with warnings enabled. Do you see anything like, "redefinition of 'Now' "? Well, a shot in the dark...

These chips are known for poor quality. Try another clone.
Isn't the current version CH340G?

Is any additional hardware connected to the Uno?

Try testing the processor independently. Use the other working Uno as a serial-USB converter by tying reset to ground and connecting the RX/TX pins to the problem board RX/TX pins. Then the program will be talking to a different hardware serial reader.

Or, you could configure the problem board the same way, ground RESET and connect RX to TX and run loopback tests from the PC.

I haven't heard this before.
CH340C and CH340G are similar but the C variant has a built in crystal.

I'm also more inclined to expect a problem in the USB chip and/or driver than a problem with the AVR's Timer. Is the PC doing anything like going into some power-saving mode during these runs? Are you using the same PC USB port for both tests?

I implemented aarg's suggestion and have some interesting results to report, but first will quickly address some of the other comments.

Compiling with warnings fully enabled didn't yield any error similar to "redefinition of Now."

No other hardware is connected to the Uno other than a desktop computer. The computer isn't going into any power saving modes or screen blanking. The same USB port was used for both Unos, but the Elegoo shows up as "COM5 (Arduino Uno)" and the clone shows up as "COM3."

From the manufacturer's website (USB to Serial Port Chip CH340 - NanjingQinhengMicroelectronics), "CH340C/N/K/E and CH340B integrate 12MHz clock, no external crystal required." However, my board uses an external crystal. It's marked as "SAY 12.0114."

I pulled another identical (almost) clone out of service to test it. It worked for over 7 minutes, and then crashed at 438973. I said it was "almost" identical because I noticed that the USB crystal is marked "b 12.00 PDI."

Implementing aarg's suggestion yielded results that surprised me. When I held the Elegoo in reset mode and used it as the serial interface to the clone, the problem remained, crashing at 28266 and 7915. When I switched, and held the clone in reset mode and used it as the serial interface to the Elegoo, it worked beautifully! I ran it up to 3 million.

Since the problem is on two nearly identical boards, it appears to be a design flaw rather than a defective part. I'm still thinking that somehow Timer0 is getting corrupted.

It shouldn't be relevant, but the ATmega328P in the Elegoo is a Dual in-line Package and it's a Quad-flat Package in the clone.

You may then have got boards with fake or clone ATMEGA328P chips. I have seen reports of such fakes but only in the context of excessive current consumption in sleep modes. The ones you have seem especially poor. Can you supply a picture where the markings are readable.

Fake: Deep-Sleep Problems Lead To Forensic Investigation Of Troublesome Chip | Hackaday

If it is a fake chip, it is not from exactly the same batch as those referred to in the link in post #10 because those had a KR country code and yours has a TH code.

I suppose I would do the signature print test here Arduino atmega328p unique id/serial number ยท GitHub to see how it compares with the fakes which appear to be mostly 0xFF filled.

Maybe another possibility is a crystal mismatch. Later dies of the ATmega328P are supposedly stricter. Seeing if you get the same results with the internal oscillator might show something.

If you suspect it's Timer0 related, an interesting test would be to abandon Serial testing and just load the blink sketch and see how many days/hours it runs...

Running the signature test doesn't cause the chip to heat up and produces the following:

boot sig dump
1E 97 95 11 16 2 A5 A5
FF 96 FF EB FF 89 42 30
31 30 45 51 69 F 13 D
17 1 12 6 13 6 FF FF

So, I tried several more things, but I think 6v6gt nailed it. Switching to the internal oscillator absolutely eliminated the problem. I ran it overnight with no glitches. (I doubled the numerical value of the baud rate and halved the numerical value of the delay time in the code to somewhat compensate for the lower 8 MHz clock speed. That way the serial monitor could remain at the same baud and the frequency of printed outputs remained the same, but of course the values printed out were half the actual values of time.)

Switched back to the 16 MHz crystal, and the problem returned. (By the way, the clone uses a quartz crystal, not a ceramic resonator.)

As mentioned above, I have two clones (that I bought at the same time) with the same problem, so it might be a bad batch of crystals, or it might be bad (or inappropriate) capacitors or bad capacitor values. Even though they appear to be "simple," it's not hard to screw up a crystal oscillator design.

I might buy a few crystals and some NP0 capacitors and experiment a bit.

I've got to believe that there are a lot of these clones out there that have this problem and people either haven't noticed, or just assumed that they have some kind of glitch in their code. Also, sometimes it runs for quite a long time without glitching.

My guess is that there are states of the micro-controller that are more (or less) vulnerable to clock glitches than other states, so some clock glitches might not result in anything noticeable.

Two things I did notice, but could simply be picture quality are:
A) The soldering on the crystal appears clumped. Or it could be the result of applying test probes.
B) The crystal load capacitors C5 and C6 appear quite chunky when compared with those pf CH340 C20 and C21. These should be around 22pf

Otherwise the general finish on the board looks good.

The angle of the picture makes them look chunkier, but they're not.

For good measure, I touched up the solder joints, but it didn't help. Again, I have another board with the same problem (although it doesn't happen as often) which makes me think there's something wrong with the parts.

I looked at the voltages on both terminals of the crystal, which are connected to pins 7 and 8 of the TQFP Mega328P. I don't know what's normal, but measuring with a 10X (10 M-ohm) oscilloscope probe shows pin 7 has 0.72 Vpp with a 0.64 Vdc offset, and pin 8 has 0.37 Vpp with a 0.52 Vdc offset. Both sides appear as sine waves. (The scope and probes are good for much higher frequencies.)

Even though the Elegoo (which is the one that works flawlessly) uses a ceramic resonator, I thought it might be worth measuring the corresponding points in its circuit. The resonator is connected to pins 9 and 10 of the DIP Mega328P. Pin 9 has 0.94 Vpp with a 0.70 Vdc offset, and pin 10 has 0.54 Vpp with a 0.38 Vdc offset. Both sides appear as sine waves.

So the signal is larger on the good unit, which might or might not be significant.

To finish out this topic, I think I simply have two bad ATmega328P chips. I've tried replacing the crystal and the capacitors (even trimming them for precise frequency at room temperature), on one of the two boards. I can improve the length of time between glitches - getting up to several hours without a glitch - but can't ever make the problem entirely disappear. I've tried several other Uno clones, and they all seem to work well. The two bad boards have identical code numbers on the chips. My conclusion is that it was either a bad batch, or more likely that they both got hit with a static charge somewhere along the way - probably since I've had them. One of them has bad signature bits, so the latter seems more likely.

Or, they are fakes.