Help understanding what may cause a pro mini to become unresponsive

I know I'm going to get flamed for this, I cannot post my code. So please dont immediately ask for it.

I hope to obtain some understanding of under what circumstances a pro mini can be rendered totally dead until re-flashed.
I have some code, which is essentially storing a user programmable variable.

I have two pro minis conencted via IIC, one is running a user interface, and sends some data to the other upon a long button press, the second then saves it to its Eeprom for later use after a power cycle.

the reason for this is long and complicated.

in a nutshell, the system detects an event, and requests the appropriate variable from the slave, then uses it from that point on. When saving the new user modified variable, a long button press sends the data via IIC and the slave checks it and updates the EEPROM. This is something that shouldnt happen more than a handful of times over the lifespan of the pro mini, so im not worried about wear leveling on the EEPROM at the moment.

the problem arrives when, during the save of the new data, the "event" unexpectedly changes, IE an analogRead is not what is expected, or if the external process that changes the measured "event" happens when it shoudlnt, in this case, once in the menu to update the user variable, everything actually works, the slave saves the new variable to the EEPROM, and the Master continues to function as expected, until a power cycle.

after a power cycle it is completely unresponsive, no LED flashes on pin 13 from the bootloader, no code being executed (the first thing done is report a version number via serial)
it just sits there and shows no sign of doing anything.

code can be reflashed, then on bootup, the slave sends over the value that was saved in eeprom and everythign works again.

im at a loss. What can happen in a code to leave a pro mini totally unresponsive after a power cycle, but still working until power cycled???

the IDE reports the code uses 56% program storage space, and 71% of dynamic memory.

if anyone has any insight in how to go about finding out what is happening id be very grateful. possibly paypal grateful for someone who can give me the right info.

thanks in advance
m00se

Then why post? Without seeing any code or wiring diagram, then we would all be guessing what the problem(s) might be.

I'd advise you to put your flame proof suit on, and quickly....

1 Like

image

2 Likes

Just to confirm, it's the master that freezes until its code is reloaded? Can you post a BOM and schematics?

That is correct. The master freezes.
I think I may be facing something realated to "Make sure you’re not assigning values to an array outside its declared length."
from here:

There are two variables and a fixed integer, the user can change the variables at will and save them, but the fixed integer is not offered as a user changeable number. However when the external "event" is removed during the programming of the users variable, there is some chance the code is trying to save the new number to where the fixed integer should be , which is what is loaded into the code if the external "event" isnt detected.

Writing outside array boundaries may cause a crash. But usually the board should recover after power down / power up.

Do you save a c-string like variable to eeprom?
The '\0' might be missing...

I just checked and all three variables are declared integers.
Its got me flumouxed. The EEPROM is on the slave, and is being written and read correctly. and after reflashing the Master, the Slave reports the previously stored integer ok.

are there any buffers etc that arent flushed on a reboot?

That was my thought too. I'm trying to think what could be in a setup routine that might get flushed/fixed by uploading a new sketch.

Can you add serial debug statements to check at which part it's failing.

if (eepromWriteAuth == 1) {
        EEPROM.write(8, ny1); 
        delay(10); 
        EEPROM.write(28, ny1); 
        delay(10); 
        EEPROM.write(38, ny1); 
        delay(10);
        eepromOk = 1;
      }

im just storing single digits, this process is repeated then the two digits are combined to make a two digit integer.
the digits are stored in three EEPROM locations, this is for error checking on bootup. If all three are the same, I assume they were correctly written and read. If one varies then I know something went wrong.

I have serial debugging statements everywhere, to the point I have to comment some out or it wont fit.....

void setup() {
  Serial.begin(115200);
  Serial.println("Booting");
  digitalWrite(Pin, LOW);
  pinMode(Pin, OUTPUT);
  digitalWrite(Pin, LOW);

it literally doesnt even get this far in the code after a reboot, nothing comes out of the serial.

OT, but use Serial.println(F("Stuff")); to reduce memory fragmentation. There's other recommendations too using characters instead of String and whatnot.

Is the master soldered into a PCB or can you remove it? Maybe try to reset it on it's own without anything attached to it?

Do you have anything connected to tx or rx?

Here they talk about resetting the Arduino with software.
I wonder if a similar thing is happening and keeping it in a reset loop?
I have omitted to mention, that one other pin on the arduino, the one refered to as "Pin" in the code snippet above, briefly flashes each time the code starts after a reflash. Its not HIGH before the power cycle that causes the problem, nor is it HIGH at any point during the flash, but for a second on first boot after flash it is HIGH.

could some behind the scenes action be controlling the pins in an unexpected fashion ?

Master is soldered to a PCB and not accesible at the moment. Could be in a pinch.

The Tx Rx of the pro mini? Yes I have an FTDI adaptor connected to it for debugging purposes. I also have the GRN pin connected to the FTDI for reset during flashing.

Do you think its worth trying it again with the FTDI removed?

Yes. Eliminate as many variables as you can.

There was a slight variance in the result.
With the FTDI 100% disconnected, after the power reset the code worked for approx the first 20 seconds,
then the same frozen appearance, while the slave happily boots and blinks its heartbeat on pin13 to confirm its alive.
I am flashing and will retry this to see if this is also repeatable.

with the FTDI removed, it will boot once more, then once it reports the variable being used, crashes and stops responding to anything. so its not looking like the time before crash is important, rather the point in the code.

And upon reflashing several of the pins go HIGH which normally dont when flashing from an non crashed state