sudden change after hours - because of interrupt?

Hi,

I made a GPS/GSM program on a 1284p. It runs fine for hours and then suddenly I get strange behavior like:

sprintf(caCmdBuf, "AT+CIPSEND=%i\r", bLength);

suddenly generates a string AT+CIPS<TAB>ND=39 ... (the E is changed to a TAB character)

I call this every 5 seconds and every time it is wrong?

For hours I run the same code, buffers are large enough, SRAM is <3KB of 16KB available, ...

I'm looking for days now and I can only think that an timer3 interrupt (timer interrupt of 100ms) can cause interference with my main code?
Is it possible that this interrupt gives problems to c-functions like strcpy, strlen, sprintf, ... or interference with cpu registers? (gcc bug?)

I'm starting to put all c-functions within an atomic block like

ATOMIC_BLOCK(ATOMIC_FORCEON)
{
    sprintf(caCmdBuf, "AT+CIPSEND=%i\r", bLength);
}

Will this help?
Any ideas where to start looking?

I use arduino 1.0.5, no bootloader, 1284p cpu.

PS: I've made a smaller version on a UNO R3 on 1.0.4 and there it ran for weeks without any problem...

Thanks for helping!

Korstiaan

How to use this forum

This is a programming question, so where is your code?

Memory corruption caused by array overflow would be my first guess. You have quite a lot of char arrays that are sized by magic numbers, which are presumably repeated elsewhere in your code. I recommend that you define a constant for each array giving the size of the array. Where the array is used to hold a null-terminated string, make sure the constant is one less than the actual size of the array (to ensure there is space for the null without requiring -1 offsets in all tests). When accessing the array using an index value, check that the index value is less than the array size. When using sprintf etc to write to the array, replace all those calls with snprintf. (Do this for the local arrays too - not just the global ones.)

Have you checked how much free memory you have? You have some pretty big arrays there and you might actually be running out.

Hi,

The arrays are globals and have a constant to define size (more than 2x bigger than needed...).
I will change to snprintf to be sure...
I also checked a thousand times to see if the array is large enough:

else if ((pos < SIZEREADBUF-1) ...

I can't find nothing! I execute the same code for almost 2 hours without any problem (always the same strings,...) and then suddenly something is changed? It is like, because it was interrupted that the code is corrupt?

Memory could not be a problem since it is only 2596 bytes out of 16K SRAM (1284p).

Korstiaan

korstiaan:
The arrays are globals and have a constant to define size (more than 2x bigger than needed...).

Not all of them are globals, and not all of them are sized with a constant (and none of them explicitly allow space for a null terminator).

You only need one bug to cause the problem so you need to systematically eliminate potential bugs from all of your code. The techniques I described are intended to do that by addressing the risk of array overflow.

You say that a different version of the code works perfectly on a Uno. The problem must surely lie in the differences from that version?

...R

void setServerIPinEEPROM(byte ip1, byte ip2, byte ip3, byte ip4)

{
    ATOMIC_BLOCK(ATOMIC_FORCEON)
    {
        EEPROM.write(IP_ADDRESS,  ip1);
        EEPROM.write(IP_ADDRESS+1, ip2);
        EEPROM.write(IP_ADDRESS+2, ip3);
        EEPROM.write(IP_ADDRESS+3, ip4); 
    } 
}

Why all the atomic blocks? How is anything going to change here?

Hi Nick,

I know, it is not necessary.
But I suspected interference between main code and interrupts handlers so I put as much as possible in atomic blocks.
I also read somewhere that ALL c-functions are NOT safe from interrupts so yesterday I put every c-function inside such a block. (huge work...)
Last evening however, after some testing, I again saw some const char arrays that suddenly where changed(in the middle a character changed)...
Conclusion: the atomic blocks didn't helped for that.
I checked, re-checked, changed all to snprintf, ... NOTHING

I have almost the same program running on a 328p. It runs fine without any problems.
During months of development I also encountered strange behavior due to too small arrays or other stuff (bad pointers) but each time that was found very quick. (reset of chip, complete garbage output, ...)
This behavior is also different because now it is almost always 1 character that is changed inside a const char array?? I never saw this during my 328p development.

Last guesses:

  • compiler/arduino problem with combination 1284p (1284p not supported...)
  • faulty chip?
  • to much nesting? How deep can I go?
  • ...

If I can't find it I have to stop and start all over on another platform.

Korstiaan

korstiaan:
I also read somewhere that ALL c-functions are NOT safe from interrupts so yesterday I put every c-function inside such a block. (huge work...)

Where did you read that? Please ask here before doing such a massive thing. If interrupts corrupted functions so much that you had to turn them off inside each function, there wouldn't be much point in having them, would there?

... ALL c-functions are NOT safe from interrupts ...

Just to be explicit, that is complete nonsense.

Your code is almost impossible to read with all those ATOMIC_BLOCK in it. They obscure what you are trying to do, and achieving nothing. Sorry, but that's the truth.

If memory is getting corrupted, it is not interrupts, it is almost certainly a buffer overflow. It may be working on one processor but not another out of sheer luck.

Here, for example:

        strcpy(caCmdBuf, cmd);

There is no test there that the string you are copying in is small enough to fit into caCmdBuf. You should look up strncpy which imposes a limit.


        // clear whatever is waiting before sending a command

while (Serial.available() > 0)
            Serial.read();

Read data and throw it away? What is the point? Better read this about doing serial comms:

Hi,

strcpy(caCmdBuf, cmd);

I know that cmd never can be larger.

Serial.Read: is my choice and is needed. Has nothing to do with the problem...

Korstiaan

korstiaan:
I assume know that cmd never can be larger.

My correction.

Also difference between working 328p and 1284p:

On 328p I used Timer1. On the 1284p I can't? If I use Timer1 the timer1 interrupt won't come?? Therefore I switched to Timer3. Strange however why Timer1 won't work.
On 328p I used SoftwareSerial for the communication to the GPS module. On the 1284p I use 2x HW USART.

Korstiaan

please, I'm now starting to mess with ATMEGA1284P and I can not connect a shield gsm.
can you tell me how did you connect the doors tx rx?
Was softserial?
Can you show the code?

thank you