System hang using 'INT2' and 'timer1' interrupt

Hello,
I have a project to manage the windows (open / close) of my greenhouse. I use GPS time, temperature , rain , wind sensor, the day in the season, the time , logging info to display, SD card and Ethernet.
That's a big sketch with more then 1000 lines.
I start this project three years ago and upgrade it during my spare time during winter.
Now I have intermittent hang. I added some trace to try to approach the code which was giving this hang.
I narrowed my code to around 50 lines, which could reproduce the problem.
The purpose was to have a 1 sec interrupt to display ( on LCD 16*2) the time at the second level, and also each 2,5 sec read my anemometer counts to calculate the wind speed.
Actually the clock interrupt subroutine is just "enable interrupt" and read the RTC.
For the LCD display it's getting an interrupt each 2,5 second via the timer and display the wind value.
In case of gust I want to be informed directly , so not possible to put in main loop.
Here the timer is set to 1 sec and the display to some text. ( need more then one lcd.print to get the hang)
When I check the I2C bus on a logic analyzer, I can see the packet from the RTC and to the display, and also decode the values. (addresses , data)
Due to the fact the two interrupt are not synchronized, the RTC packet can cross sometimes the LCD packet, and it works.
I put timer1 a little bit less then 1 sec to try get this shifting of packets, and also to reproduce the hang. (if timer1 more than one second the hang seems not there)
When the hang occurs, the CPU freeze and also the I2C bus. The clock interrupt stops also, but NOT timer1. It continues to execute the isr_timer subroutine, but hang at the first instruction which access the I2C bus. Timer1 continue his job each second , but the CPU is down so the stack is not pop down, the freeRam decrease at each pass to finally reach zero.

Viewing several capture on the logic analyzer, it seems that the hang occurs when the clock packet is going out of the LCD packet (but the hang is not always there)

Readings different topics I start to understand the heart of the arduino but couldn't find an explanation to my problem.
That's very similar with the old CPU I used to work before to be retired.

(Sorry but english is not my mother language)

Thanks for your help.

Pierre

Troubleshoot_hang_intr_01.ino (1.55 KB)

Hi PHW - please read through: How to use this forum

I got lost trying to read through your issue and got stuck looking at your code... You would get more meaningful replies if you posted the code using BBL tags.

You're abusing interrupts for stuff that is much better done in the loop using millis() to get the timing.

If you have to use interrupts (which is not the case in your project), the ISR must be kept as short as ever possible and it must not use stuff that depends on interrupts to work (as I2C). You seem to know that as the first instruction in your interrupt handlers is re-enabling interrupts, which unfortunately is not very clever although it might seem at first glance. The problem is that your LCDtimeddisplay() may also interrupt isr_timer() and vice-versa. If LCDtimeddisplay() is asking the RTC for the current time while it's interrupted by isr_timer() the I2C hardware might be in the middle of transferring a byte to the RTC. But now it is instructed to send a byte to the LCD but the hardware is not in a state to start transmitting a byte. The corresponding code depends on going through these states in a fixed order, otherwise it might be kept in an endless while loop.

Thank you for your answers.

In fact the code I added in my first post is just a summary of my big project code.
This mini code just demonstrate the hang which occurs.

In the mean time I modify this mini code and move the LCD print code from the ISR routine to main loop.
With this modification it's also possible to demonstrate the hang problem.

#include <Wire.h>
#include <RTClib.h>
#include <LiquidCrystal_I2C.h>
#include <Arduino.h>
#include <TimerOneThree.h>

int freeRam ()
{
  extern int __heap_start, *__brkval;
  int v;
  return (int) &v - (__brkval == 0 ? (int) &__heap_start : (int) __brkval);
}

RTC_DS1307 rtc;   // define 'rtc' for time adjustment
LiquidCrystal_I2C lcd1(0x26, 16, 2);

const byte interruptPin = 2;
long Count = 0 ;

void setup ()
{
  Serial.begin(9600);  //115200
  rtc.begin();
  Wire.begin();
  lcd1.init();  // set up the LCD's number of columns and rows:
  lcd1.backlight();
  Wire.beginTransmission(0x68); // RTC I2C address
  Wire.write(7);
  Wire.write(0x10);   // set bit 4 to start SQWE bit (bit 0,1 = 0 for 1Hz)
  Wire.endTransmission();
  rtc.adjust(DateTime(__DATE__, __TIME__));//to the date & time this sketch was compiled
  attachInterrupt(digitalPinToInterrupt(interruptPin), LCDtimedisplay, FALLING) ;
}
void loop()
{
  long StartLoopCnt = millis();
  Count ++ ;

  Serial.print ("1,");   // following codes done at 'timer1 ' interrupt (level 14?)
  Serial.print(freeRam());
  Serial.print (",");
  lcd1.setCursor(0,0);
  lcd1.print("hello");
  lcd1.setCursor(10,0);
  lcd1.print("world");
  lcd1.setCursor(0,1);
  lcd1.print("nice day");
  Serial.print ("2,");
  
  Serial.println (Count); //count number of loop
  while (millis() < (StartLoopCnt + 990)) { // loop here to reach 10 sec
  }
}
void LCDtimedisplay()  // Display time at each second interrupt
{
  interrupts () ;  // enable interrupt
  DateTime now = rtc.now();  // core done at 'INT2' level (2)
}

In my real project I need interrupt to take immediate action when necessary. My real main loop is a one minute scan of all my sensors, recording information and driving the motor for the windows.
If a gust occurs I want to close my windows without waiting a one minute scan.
Also to display the on second clock it must be done in the interrupt routine . That's why I ' interrupts() ' (disable) in the beginning of the routine and so don't block the rest of the system. I'm aware that this is done at the 'level' of the interrupt but for example the I2C bus is running (fetch time).

I'm using for the moment a Mega2560 , a DS1307 for the clock and a LCD 16*2.
I already used other CPU,clock and display with the same results.

If I let the hang state, sometimes it restart itself from the beginning .
Actually, the console stop to output serial trace, BUT the I2C continue to run. With the logic analyzer I can see the packets on the bus . Each second their is a access 'write' with 68 as I2C slave, followed by a 'Read' were I can decode the bytes which represent the time incremented.

I think that could be reproduce.

Pierre

void LCDtimedisplay()  // Display time at each second interrupt
{
  interrupts () ;  // enable interrupt
  DateTime now = rtc.now();  // core done at 'INT2' level (2)
}

You still do I2C in an interrupt handler. It seems you didn't understand what I wrote in my previous post. The code in the interrupt is executed while some other code is interrupted which can occur at any part of the code even in the midst of an I2C transaction.

So I formulate it easily: Don't do any I2C inside an interrupt handler!
And don't re-enable interrupts inside an interrupt handler if don't know exactly what you're doing!

In my real project I need interrupt to take immediate action when necessary. My real main loop is a one minute scan of all my sensors, recording information and driving the motor for the windows.

That's no need for interrupts. The code can be refactored to check for periodical tasks every few milliseconds or so.

Also to display the on second clock it must be done in the interrupt routine .

No, that's only necessary if the code is badly organized. Refactor it to a state machine and you get everything done without an interrupt.

Agreed, in this case there is absolutely no need for interrupts, and your incorrect usage of them is causing the hangs.

This line, equivalent to using delay() is causing the rest of the problems:

 while (millis() < (StartLoopCnt + 10000)) { // loop here for the rest of the time
  }

My real main loop is a one minute scan of all my sensors, recording information and driving the motor for the windows.

Your "real" main loop, the loop function, can scan hundreds of thousands of times per minute. You simply have to use better programming practices.

Incidentally, you are not using millis() correctly. Use subtraction rather than addition to avoid roll over, as explained here.

It take me some time to change my mind.
I was thinking real time system but as you said Arduino in a State Machine.
I modify my test sketch with the remarks you gave me.
But I keep the interrupts because I needed in my real project main loop.

#include <Wire.h>
#include <RTClib.h>
#include <LiquidCrystal_I2C.h>
#include <Arduino.h>
#include <TimerOneThree.h>

int freeRam ()
{
  extern int __heap_start, *__brkval;
  int v;
  return (int) &v - (__brkval == 0 ? (int) &__heap_start : (int) __brkval);
}

RTC_DS1307 rtc;   // define 'rtc' for time adjustment
LiquidCrystal_I2C lcd1(0x26, 16, 2);
LiquidCrystal_I2C lcd2(0x27, 20, 4);


const byte interruptPin = 2;
long Gust = 0 ;
bool timflag = false ;
bool Clkflag = false ;

void setup ()
{
  Serial.begin(9600);  //115200
  //Serial.println ( "Starting Hang troubleshooting test ");
  rtc.begin();
  //rtc.adjust(DateTime(__DATE__, __TIME__));
  DateTime now = rtc.now();
  Serial.print (now.hour()); Serial.print (":");
  Serial.print (now.minute()); Serial.println (":");
  Wire.begin();
  lcd1.init();  // set up the LCD's number of columns and rows:
  lcd1.backlight();
  lcd2.init();  // set up the LCD's number of columns and rows:
  lcd2.backlight();
  Wire.beginTransmission(0x68); // RTC I2C address
  Wire.write(7);
  Wire.write(0x10);   // set bit 4 to start SQWE bit (bit 0,1 = 0 for 1Hz)
  Wire.endTransmission();
  Timer1.initialize(2500000);// Timer interrupt every +/- 1 seconds
  Timer1.attachInterrupt(isr_timer);
  attachInterrupt(digitalPinToInterrupt(interruptPin), LCDtimedisplay, FALLING) ;
}
void loop()
{

  if ( timflag ) {
    lcd2.setCursor(0, 0);
    lcd2.print("hello");
    lcd2.setCursor(10, 0);
    lcd2.print("world");
    lcd2.setCursor(0, 1);
    lcd2.print("nice day");
    lcd2.setCursor(10, 3);
    lcd2.print(Gust);
    timflag = false ;
  }
  if ( Clkflag ) {
    DateTime now = rtc.now();
    lcd1.setCursor(8, 0);
    lcd1printDigits(now.hour ());
    lcd1.print(":");
    lcd1printDigits(now.minute ());
    lcd1.print(":");
    lcd1printDigits(now.second ());
    Clkflag = false ;
  }
}

void LCDtimedisplay()  // Display time at each second interrupt
{
  Clkflag = true ;
}

void isr_timer()
{
  timflag = true;
  Gust ++ ;
}

void lcd1printDigits(int digits) {
  if (digits < 10) {
    lcd1.print('0');
  }
  lcd1.print(digits);
}

Then I apply those principles to my main program. I don't use anymore the 'while (millis) ' .
I record via interrupt each turn of my wind anemometer, and also a one 'second' interrupt from DS1307 e on LCDto display the time but now use the main loop to display the time on LCD.
After several adjustment, my main loop is scanning all my sensors, and actions are taken.
Sometimes , like when the window motor is turning , my one second clock display freeze, but recover shortly after.
I also move it from a ATmega256 (use as test CPU) to my real CPU ATMega1284. (easier to include in a printed circuit board).
Many thanks for your help.
Pierre

PHW_DEC:
In my real project I need interrupt to take immediate action when necessary. My real main loop is a one minute scan of all my sensors, recording information and driving the motor for the windows.
If a gust occurs I want to close my windows without waiting a one minute scan.

You do not understand what an "interrupt is", or how to program microcontrollers.

A common "newbie" misunderstanding is that an interrupt is a mechanism for altering the flow of a program - to execute an alternate function. Nothing could be further from the truth.

The main loop() needs to run as a "state machine" - it waits for nothing. It looks at a number of situations or tasks one by one, if it needs to do something for one and can do so immediately, it does it, if not, it goes on to the next task and treats it in the same fashion. Checking for a wind gust is simply one of those tasks as is counting the anemometer impulses. To you, they may seem rapid, but to a microprocessor implementing the loop() many thousand times per second, this is no problem, so the task of determining the anemometer speed is one task, and if it exceeds your threshold, you set a marker or "flag" which is then seen by another task in order to make the decision to open or close the windows.

Your "one minute scan" is implemented by reading the millis() value and checking whether on any particular pass of the loop(), it has advanced by at least 1000 since the last such action. If so, set a flag which will prompt the first sensor reading. If that reading does not have an instant result, then you pass on to the next task and complete the reading on a subsequent pass. Once that is complete, you set a flag so that on the next pass, the next reading in the list can be made.

This flag can be a single counter, called a "state variable". On each pass of the loop(), it determines which of the various steps in the data collection will be performed. Since any one step will take only a few microseconds, the loop() will easily be able to promptly check the other major tasks such as keeping an eye on that anemometer.

You have a very fast microcontroller, much faster than the first "Personal computers". It would be poor coding indeed to have it wait for anything, "twiddling its thumbs", so unless you have some action which can be executed in "no time at all" with an urgency that it must be performed within microseconds or else data - information - will be lost or some harm will occur, then you neither need nor want interrupts. Nothing you have described fits that description.