I've been making weather stations based on the Arduino. These weather stations take multiple weather measurements and store it to an SD card at a sampling time of approximately 0.1 Hz. Unfortunately, I have been experiencing a strange bug in the DS1307 real time clock that only manifests itself after a few days.
This is what happens: Approximately 3 to 5 days after the sketch first starts (It's not consistent), the RTC will fail randomly. It will fail in one of two ways. It will either get stuck on one time and continue to take measurements but log the same time over and over again, or it will change the date value to something meaningless, and log that over and over again.
Normal operation can be seen here (the dates are highlighted):
Getting stuck on one time can be seen here:
Meaningless values getting stuck can be seen here:
I've already tried replacing the chip but it did not do anything. It's on custom circuit board but I don't think that is the problem because the chip works fine initially, then fails after a few days. I thought it might be a battery issue but the DS1307 has its own battery backup. Finally, since all the failures seem to happen in the middle of the day, I thought the problem might stem from a heat issue. However, the maximum temperature seen inside the box is approximately 55C, while the maximum operational temperature for the DS1307 is 85C.
I understand my code is very long but even if you don't read it, are there any ideas on what might be going on? Is it a memory or overflow problem? I can produce more details about the code/hardware upon request. Thanks for any and all help!
Check the connections from the DS1307 to the Arduino very carefully. I suspect that you've got a bad connection somewhere (the most likely culprits are either the SDA or SCL connections) which is eventually stressed by the heat during the day.
I do not have the circuit board in front of me right now, but I will check the connections tomorrow when I do. If it is the connections, when the board cools down, shouldn't they start transmitting properly again? Also, I have other devices on the same I2C bus that do not seem to be affected.
If the other I2c devices aren't affected then the problem is much closer to the chip itself - maybe even the IC socket?.
Another possibility is that a glitch of some sort causes the clock to stop, optionally modifying the date/time as well. You would then read the same, possibly garbage, date/time after that.
Do you have a capacitor across the 5V supply to the clock?
The temperature rating is 85C for the industrial version but 70C for the commercial.
If you are only at 55C at the DS1307 you are probably OK.
What is happening with your power supply?
Is there a derating on the battery at 55C?
How is your crystal connected? Did you use a guard plane or ring as called
out in the datasheet? Did you use a crystal with a maximum ESR of 45KOhms
and a CL of 12.5pF? There are a lot of watch crystals that will not work
with the DS1307. Also the DS1307 and DS1337 have different crystal
requirements.
Make sure the crystal your using meets exactly the specs required by Dallas (or Maxim).
From the datasheet for the DS1337 (which uses a different xtal as the DS1307):
The internal oscillator circuitry is designed for operation with a crystal having a specified load capacitance (CL) of 6pF. For more information about crystal selection and crystal layout considerations, refer to Application Note 58: Crystal Considerations with Dallas Real-Time Clocks. An external 32.768kHz oscillator can also drive the DS1337. In this configuration, the X1 pin is connected to the external oscillator signal and the X2 pin is floated.
Corrected, but I just wanted to point out the importance of having a proper xtal.
@el_supremo: I didn't consider the IC socket, I soldered it in but maybe it's a bad joint? I do not have a capacitor across the 5V pin but I'm making a new board that does have it.
@jluciani: I am running everything off of a 6V sealed lead acid battery that is running through a low dropout regulator. I do not think that there is battery derating because the rest of the components are ok. This is the battery I'm using: http://www.batterysharks.com/MK-Battery-ES4-6-p/es4-6_b6-4.5.htm. As for the crystal, I'm using this one: http://www.mouser.com/Search/ProductDetail.aspx?R=AB38T-32.768KHZvirtualkey52750000virtualkey815-AB38T-32.768KHZ. It's 32.768 kHz and 12.5 pF, just as specified. I do not however, have a ground fill under the chip as seen in this picture:
Would that be a possible source of the problem? I thought the ground plane/pour was just there to reduce noise on any other lines.
Sounds like it's time for a little divide and conquer: two tests you could do to try and separate code from temp issues. Does the system as is run ok if you keep it in a cooler environment? What happens if you keep the equipment in place but just run a simpler program that reads the clock and writes it to the SD?
It sounds like your error is reasonable predictable and occurs after
a long period of time so it is hard to explain this away as noise.
It is good design practice to guard high impedance circuits.
Have you tried running your circuit under more controlled conditions?
Indoors (25C) and off of a bench power supply?
My instinct is that you have a software problem. At minimum, the software should be able to detect that the RCT is not reachable, rather than using the wrong timestamp info. But my guess would be that something in RAM is getting clobbered and causing the problem.
I will be moving it inside today and running it for a while to see if I can duplicate the problems in a more temperature normalized environment.
@tasosstr. My data lines are pretty short already. The chip lies next to pins 20/21 on the shield (I'm using a mega). My shield eagle schematic can be found here: GitHub - madvoid/LEMS_PCB They are the .brd and .sch files, though I can also upload pdfs if anyone wants them as well.
@gardner. Can you expound on that? I just finished an email conversation with someone from maxim and they said something along the lines that you did, though I did not completely understand it. I am using an established library, so maybe some other people have had this problem? If not, I am going through the library right now but since I don't really know what I'm looking for, I'm hoping the problem will jump out at me
if(tn9_pos == tn9_len && tn9_rawval[0] == 0x53){ // If sensor has sent junk packet...
digitalWrite(tn9_action,LOW); // Make sensor start sending data
}
This is not a junk packet but an ECHO of data written to the TN9 to adjust 'Emissivity'. Not sure if this would have some bearing on you problem though. The TN9 datasheet also says 'Operating Range -10~50°C / 14~122°F' so maybe the RTC is fine but the TN9 is causing the problem. Maybe compiling with '#define INFRARED 0' and seeing if device fails after a few days will confirm this.
@Riva. Riva, thank you for telling me about that! I was wondering what the emissivity packet was but I never did find out. The difference between the TN9 and the DS1307 is that the DS1307 is inside an enclosure, which gets up to the 55C I was talking about, but the TN9 is left in a separate more ventilated enclosure. According to the TN9's onboard ambient temperature sensor, the TN9 enclosure never goes above ~37C. See .