Hi Arduino enthusiasts,
I imagine my problem/question will be difficult to comment on without pouring over my wiring and code, but incase there is something fundamental I am missing, I thought I would ask. I am very comfortable with programming, been writing code for the better part of 30 years, but electronics is not my strong suit at all.
First I should summarize my project. I am using my Arduino Mega2560 R3 to read from one port of a Dual Port SRAM chip - in particular an AM2130 (I have tried an IDT7132 with the same result). The other port is written to by an old 6801 based car computer at the command of an interrupt triggered by the Arduino - I modified the car computer code to add the interrupt routine. The interrupt routine simply copies the car computer RAM to the SRAM chip so the Arduino can read the data and process it further. The car computer adds a checksum byte and a request counter to the end of the RAM values it copies so the Arduino can test the data for validity. Here's the data sheet for the SRAM chip:
5 times a second, the Arduino flips a pin connected to the IRQ of the car computer, waits a suitable amount of time for the data to be copied by the car computer (10us - I've tried longer values, doesn't help my issue), then uses an adaptation of this sketch to read the values back (224 bytes total) from the SRAM:
Let me refer to the 224 bytes as a Packet. My issue is that about 0.5% of my packets contain at least one "bad" byte. I know it must be the Arduino side of the equation, because I wrote a retry routine and sure enough, second pass the Arduino gets the correct value. Here's an example of my Serial output:
Error - Checksum mismatch!
Checksum error - 3C vs 7C
Retrying and dumping differences:
0 : EE/EE 1/1 76/76 80/80 80/80 66/66 66/66 0/0 0/0 0/0 1/1 CD/CD 4/4 74/74 DE/DE 0/0
10 : 41/41 C0/C0 91/91 1E/1E 2/2 0/0 60/60 0/0 0/0 5F/5F 3F/3F FD/FD 2/2 6D/6D 3F/3F A2/A2
20 : 83/83 59/59 8B/8B 8E/8E 8E/8E 0/0 ED/ED 1/1 14/14 88/88 8E/8E 4B/4B 4B/4B 20/20 7D/7D 47/47
30 : 4B/4B 4B/4B 4B/4B 4B/4B 8C/8C [b]96/D6[/b] 96/96 76/76 86/86 ED/ED 2F/2F 14/14 14/14 0/0 14/14 9D/9D
40 : 46/46 85/85 42/42 0/0 2D/2D 2A/2A 3D/3D AE/AE 4E/4E 63/63 1A/1A D6/D6 55/55 1/1 45/45 0/0
50 : 15/15 83/83 6E/6E 30/30 0/0 6E/6E 1/1 BE/BE 0/0 3/3 0/0 0/0 0/0 0/0 2/2 6D/6D
60 : 0/0 5/5 1/1 43/43 50/50 0/0 39/39 0/0 3F/3F 0/0 0/0 50/50 0/0 39/39 3D/3D FF/FF
70 : 58/58 0/0 0/0 50/50 0/0 0/0 3D/3D 0/0 5/5 FF/FF 58/58 1/1 43/43 0/0 0/0 3/3
80 : 2D/2D 3/3 C/C 0/0 32/32 0/0 0/0 A/A 0/0 0/0 0/0 18/18 0/0 0/0 44/44 6/6
90 : 0/0 0/0 B2/B2 A5/A5 80/80 80/80 40/40 9C/9C 4D/4D 4D/4D 40/40 40/40 40/40 40/40 40/40 40/40
A0 : 40/40 40/40 9A/9A EF/EF 9A/9A 5/5 0/0 2/2 4/4 0/0 0/0 9C/9C 98/98 0/0 29/29 91/91
B0 : 20/20 FF/FF FF/FF FF/FF 0/0 0/0 0/0 88/88 0/0 0/0 0/0 FF/FF 0/0 2/2 0/0 0/0
C0 : 3/3 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0
D0 : 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 C4/C4 43/43 6D/6D 3C/3C
E0 : EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE
F0 : EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE
Differences at bytes: 35
This dump shows each byte read in the first (error) pass, followed by a slash, followed by the byte read in the retry pass. If you look at the byte at offset 0x35, it differs by exactly the checksum, so the second pass got the right answer. My request counter is at offset 0xDD/0xDE (16 bits so it doesn't wrap around too fast). It didn't change between the two reads the Arduino made from the SRAM, so I know that the car computer didn't touch the data due to the interrupt line floating around or something else strange.
What I find really confusing is that, even though 99.5% of the packets are clean, when it does error out, it's just as frequent to have many bad bytes as just one. Here's another packet with lots of differences:
Checksum error - 78 vs FC
Retrying and dumping differences:
0 : EE/EE 1/1 0/0 80/80 80/80 66/66 66/66 0/0 0/0 0/0 1/1 CD/CD 0/0 60/60 DE/DE 0/0
10 : 41/41 60/60 45/45 2E/2E 2/2 0/0 20/20 0/0 0/0 49/48 34/34 D1/D0 2/2 EF/E9 34/34 0/1
20 : 78/78 D8/5C 8B/8B 9B/9B 9B/9B 0/0 ED/ED 1/1 14/14 95/95 9B/9B 2E/2E 2E/2E 2E/2E 7D/7D 46/46
30 : 2E/2E 2E/2E 2E/2E 2E/2E 89/89 DE/DE DE/DE 37/37 47/47 ED/ED 2F/2F D7/D6 D8/D5 0/2 0/86 C6/C6
40 : 61/61 D7/D7 9D/A7 0/1F 92/54 8D/CC BB/92 3C/8D F5/BB 84/3C 7/7 99/99 18/18 0/0 66/66 0/0
50 : 0/26 34/78 0/36 80/30 0/0 0/CA 0/8 0/9A 0/0 0/2 0/0 0/0 0/0 0/0 2/2 ED/F2
60 : 0/0 8/4 1/1 50/4A 50/50 0/0 39/39 0/0 3F/3F 0/0 0/0 50/50 0/0 39/39 3D/3D FF/FF
70 : 3D/3D 0/0 0/0 50/50 0/0 0/0 3D/3D 0/0 5/5 FF/FF 3D/3D 1/1 50/4A 0/0 0/0 3/3
80 : 2D/2D 3/3 E/E 0/0 32/32 0/0 0/0 A/A 0/0 0/0 0/0 18/18 0/0 0/0 3/3 6/6
90 : 0/0 0/0 67/67 80/80 80/80 80/80 40/40 4D/4D 4D/4D 4D/4D 40/40 40/40 40/40 40/40 40/40 40/40
A0 : 40/40 40/40 81/81 0/0 81/81 1/1 0/0 0/0 4/4 0/0 0/0 80/80 0/0 0/0 0/0 2F/2F
B0 : 20/20 FF/FF 0/0 0/0 1E/1E 0/0 0/0 95/95 0/0 9/9 B4/B4 0/0 0/0 2/2 0/0 0/0
C0 : 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0
D0 : 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 B/B A5/A5 78/78
E0 : EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE
F0 : EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE EE/EE
[b]Differences at bytes: 19 1B 1D 1F 21 3B 3C 3D 3E 42 43 44 45 46 47 48 49 50 51 52 53 55 56 57 59 5F 61 63 7C[/b]
So 29 bytes different between the reads there. What's funky about this one is that many, but not all, of the error bytes are 0, not just a single bad bit but the entire byte is coming back low. Regardless of how "severe" the error is in terms of number of bytes trashed, a fraction of a second later it goes back to getting clean packets for a while until the next error. :o
I can and will code around the issue by enhancing this retry logic to just try reading a few times before giving up on a particular iteration. That said, there is still a measurable chance that an 8-bit checksum is "fooled" by two or more offsetting bad bytes so I'm still wishing I could find a more robust answer.
I used oshpark.com to make a really nice circuit board for this interface. I'm using ribbon cables with crimped on connectors (like the old style "IDE hard drive cables"). It looks like this (with the IDT7132 chip installed rather than the AM2130, but like I said, both behave the same):
I was careful to use pull-down resistors on both ports for the address lines I don't use (I'm only addressing the lower 256 bytes of the SRAM chip), so the extra address lines aren't floating around. So I can't see anything sketchy about my wiring or design, it's all put together quite solidly compared to my breadboard version which had the same issue (I thought soldering it all down would help - Argh!)
The one piece that's a little suspicious is that I had to make the ribbon cable that goes between the Arduino and SRAM board about 2.5 feet long because the Arduino can't live very close to the car computer just due to space issues. I figured that the Arduino isn't going very fast though relatively speaking, and comparing this to running an old IDE hard drive with the same style cable, I remember cables of that length working fine in those applications (did they have to cope with high amounts of error checking/correction in those applications too?) My breadboard version used a much shorter cable when I had the car computer, breadboard, and Arduino laid side-by-side on the car floor while I was designing this. I had the same problem then with the occasional errors as I do now.
I guess that sums it up. Thanks for reading, and any thoughts you might have!