ESP8266 Exception (0)

Hi all! I have an ESP8266 running a long code and recieving the measurements of 5 sensors (CO2, TVOC, Pressure Difference, Temperature, Humidity and Particulate Matter) which is working fine for many hours until I receive an Exception (0): epc1=0x4021a614 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000.

Facts:

  1. The ESP8266 time of failure is unpredictable; it can be running for 3 hours, for 15 hours or for 24 hours without throwing the Exception, but it finally does.

  2. When the ESP8266 gets the Exception, I guess it resets itself (I am not there for hours to see exactly the moment in time when the Exception is received), and tries to do the setup(). However, the board is stuck in the autoConnect() function, receiving a *wm:[2] Connection result: WL_NO_SSID_AVAIL. Inmediately after pressing the Reset button in the pyisical PCB, the PCB connects correctly to the stored WiFi and works until another Exception (0) is found.

  3. The physical Reset button in the PCB, after the Exception, only works when the FTDI cables are connected to the PCB.

Exception decoder result

**Exception 0: Illegal instruction** PC: 0x4021a614 EXCVADDR: 0x00000000 *Decoding stack results* 0x40100c4c: **interrupt_handler(void*, void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_wiring_digital.cpp** line **167** 0x4022d0b8: **ip4_output_if_opt_src** at core/ipv4/**ip4.c** line **1764** 0x40100b88: **interrupt_handler(void*, void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_wiring_digital.cpp** line **138** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x40100b88: **interrupt_handler(void*, void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_wiring_digital.cpp** line **138** 0x40229248: **sys_timeout_LWIP2** at core/**timeouts.c** line **304** 0x401000ab: **app_entry_redefinable()** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **386** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x402308f9: **br_sha2small_round** at src/hash/**sha2small.c** line **87** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x40223a6d: **glue2esp_linkoutput** at glue-esp/**lwip-esp.c** line **301** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x40223c9b: **new_linkoutput** at glue-lwip/**lwip-git.c** line **272** 0x402240fa: **ethernet_output** at netif/**ethernet.c** line **312** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x40223a6d: **glue2esp_linkoutput** at glue-esp/**lwip-esp.c** line **301** 0x40223c9b: **new_linkoutput** at glue-lwip/**lwip-git.c** line **272** 0x402240fa: **ethernet_output** at netif/**ethernet.c** line **312** 0x4022b674: **etharp_output_to_arp_index** at core/ipv4/**etharp.c** line **769** 0x40214187: **String::copy(__FlashStringHelper const*, unsigned int)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/**WString.h** line **343** 0x4022b748: **etharp_output_LWIP2** at core/ipv4/**etharp.c** line **885** 0x40215696: **__yield()** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/**core_esp8266_features.h** line **64** 0x4022d0b8: **ip4_output_if_opt_src** at core/ipv4/**ip4.c** line **1764** 0x40101448: **malloc(size_t)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\umm_malloc\**umm_malloc.cpp** line **912** 0x4022d120: **ip4_output_if_opt** at core/ipv4/**ip4.c** line **1572** 0x40224a38: **memp_malloc** at core/**memp.c** line **355** 0x4022d146: **ip4_output_if** at core/ipv4/**ip4.c** line **1549** 0x4022e083: **ip_chksum_pseudo** at core/**inet_chksum.c** line **392** 0x40228de2: **tcp_output** at core/**tcp_out.c** line **1621** 0x402287ad: **tcp_enqueue_flags** at core/**tcp_out.c** line **1086** 0x401010c6: **umm_free_core(umm_heap_context_t*, void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\umm_malloc\**umm_malloc.cpp** line **642** 0x401010c6: **umm_free_core(umm_heap_context_t*, void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\umm_malloc\**umm_malloc.cpp** line **642** 0x40101414: **free(void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\umm_malloc\**umm_malloc.cpp** line **688** 0x401010c6: **umm_free_core(umm_heap_context_t*, void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\umm_malloc\**umm_malloc.cpp** line **642** 0x40101414: **free(void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\umm_malloc\**umm_malloc.cpp** line **688** 0x40212d21: **stack_thunk_del_ref()** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**StackThunk.cpp** line **82** 0x401010c6: **umm_free_core(umm_heap_context_t*, void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\umm_malloc\**umm_malloc.cpp** line **642** 0x401010c6: **umm_free_core(umm_heap_context_t*, void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\umm_malloc\**umm_malloc.cpp** line **642** 0x40101414: **free(void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\umm_malloc\**umm_malloc.cpp** line **688** 0x40201e95: **BearSSL::WiFiClientSecure::~WiFiClientSecure()** at c:\users\alberto\appdata\local\arduino15\packages\esp8266\tools\xtensa-lx106-elf-gcc\3.1.0-gcc10.3-e5f9fec\xtensa-lx106-elf\include\c++\10.3.0\bits/**shared_ptr_base.h** line **1183** 0x40202874: **send_data()** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/**WString.h** line **115** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x40100690: **ets_post(uint8, ETSSignal, ETSParam)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **238** 0x40100c4c: **interrupt_handler(void*, void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_wiring_digital.cpp** line **167** 0x40100b88: **interrupt_handler(void*, void*)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_wiring_digital.cpp** line **138** 0x4020f9f7: **SD_ZH03B::readData()** at C:\Users\Alberto\Documents\Arduino\libraries\SD_ZH03B-master\**SD_ZH03B.cpp** line **73** 0x4020f9f5: **SD_ZH03B::readData()** at C:\Users\Alberto\Documents\Arduino\libraries\SD_ZH03B-master\**SD_ZH03B.cpp** line **73** 0x401001f0: **SoftwareSerial::writePeriod(unsigned int, unsigned int, bool)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\libraries\SoftwareSerial\src\**SoftwareSerial.cpp** line **382** 0x402155e8: **__esp_suspend()** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/**core_esp8266_features.h** line **64** 0x40215739: **__esp_delay(unsigned long)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **161** 0x402157ae: **esp_try_delay(unsigned int, unsigned int, unsigned int)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266\**core_esp8266_main.cpp** line **182** 0x40216718: **__delay(unsigned long)** at C:\Users\Alberto\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\3.1.1\cores\esp8266/**coredecls.h** line **69** 0x4020f9fa: **SD_ZH03B::readData()** at C:\Users\Alberto\Documents\Arduino\libraries\SD_ZH03B-master\**SD_ZH03B.cpp** line **73** 0x40201c38: **readSensorData()** at C:\Users\Alberto\Downloads\v1.0.0_MedicinePlus/**v1.0.0_MedicinePlus.ino** line **557** 0x4020290e: **loop()** at C:\Users\Alberto\Downloads\v1.0.0_MedicinePlus/**v1.0.0_MedicinePlus.ino** line **626**

After compilation results

. Variables and constants in RAM (global, static), used 34888 / 80192 bytes (43%)
║ SEGMENT BYTES DESCRIPTION
╠══ DATA 1572 initialized variables
╠══ RODATA 3916 constants
╚══ BSS 29400 zeroed variables
. Instruction RAM (IRAM_ATTR, ICACHE_RAM_ATTR), used 63103 / 65536 bytes (96%)
║ SEGMENT BYTES DESCRIPTION
╠══ ICACHE 32768 reserved space for flash instruction cache
╚══ IRAM 30335 code in IRAM
. Code in flash (default, ICACHE_FLASH_ATTR), used 454992 / 1048576 bytes (43%)
║ SEGMENT BYTES DESCRIPTION
╚══ IROM 454992 code in flash

Interpretation of the results

  1. I can see in the Exception Decoder results that there is something wrong about the particulatte matter sensor (ZH03B). However, I have the same model running without a single error for many months in another PCB version.

  2. It also could be something about the high Instruction RAM (96%)?

Please, let me know what your thoughts are and/or if you need any more information. Thank you!

96% of the Instruction RAM is very tight
note you are using the String class which is normally ok on an ESP8266 but it can fragment memory and if you a low on SRAM give intermittent problems and crash the program
possibly move to an ESP32

1 Like

Thank you for your reply, @horace. Moving to ESP32 would be impossible in this project. Could you advice which maximum percentage of Instruction RAM would be fine? The other ESP8266 that I mention in my original post is 93% full

Actually that is the amount of RAM used by ISR's that are placed in IRAM for the sake of speed. The actual code that is, not the variables, those are placed in normal RAM. It is independent of other RAM and fixed after compile time. One of the first things the program will do is load those functions in to that specific part of RAM. As long as it's less or even to 100% it is all good.

My bet is on an accidental division by zero somewhere here

avg_pm1 = avg_pm1/(sampletime_ms/inst_sampletime_ms);

    avg_pm25 = avg_pm25/(sampletime_ms/inst_sampletime_ms);

    avg_pm10 = avg_pm10/(sampletime_ms/inst_sampletime_ms);
      
    avg_temp = avg_temp/(sampletime_ms/inst_sampletime_ms);

    avg_humid = avg_humid/(sampletime_ms/inst_sampletime_ms);

    avg_tvoc = avg_tvoc/(sampletime_ms/inst_sampletime_ms);

    avg_co2 = avg_co2/(sampletime_ms/inst_sampletime_ms);

    avg_pdif = avg_pdif/(sampletime_ms/inst_sampletime_ms);

Or somewhere else. I don't think those few Strings will cause an issue.

1 Like

Thank you for the answer @Deva_Rishi. Actually, sampletime_ms and inst_sampletime_ms have a constant value defined in the begining of the code; 5000 and 300000 respectively. This would mean that at some point, they become corrupt and get a value of 0. Can this be possible? Under which condition can this happen?

Thank you!

No it isn't, but are there any other possible divisions by zero ?

1 Like

No, there are not other possible divisions by zero in my code. All the possible ones are related to sampletime_ms and inst_sampletime_ms. Any other idea?

Any chance there is insufficient power available ?
How have you wired everything up, can you show a schematic (not fritzing)

Seems to refer to the flash, it could even be a faulty unit.

1 Like

one can check if it was a brownout or other reset by checking esp_reset_reason_t
on powerup/reset I call

// print and return reason for system reset
// see https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/misc_system_api.html#_CPPv418esp_reset_reason_t
int print_reset_reason(void) {
  int reason = esp_reset_reason();
  Serial.printf("CPU reset reason: %d ", reason);
  switch (reason) {
    case ESP_RST_UNKNOWN: Serial.println("unknown_RESET"); break;
    case ESP_RST_POWERON: Serial.println("POWERON_RESET"); break;
    case ESP_RST_SW: Serial.println("SW_RESET"); break;
    case ESP_RST_PANIC: Serial.println("PANIC_RESET"); break;
    //case ESP_RST_INT_WD: Serial.println("INTERRUPT_WDT_RESET"); break;
    case ESP_RST_TASK_WDT: Serial.println("TASK_WDT_RESET"); break;
    case ESP_RST_WDT: Serial.println("OTHER_WDT_RESET"); break;
    case ESP_RST_DEEPSLEEP: Serial.println("EXIT_DEEP_SLEEP_RESET"); break;
    case ESP_RST_BROWNOUT: Serial.println("BROWNOUT_RESET"); break;
    case ESP_RST_SDIO:
      Serial.println("SDIO_RESET");
      break;
      //case ESP_RST_USB: Serial.println("USB_RESET"); break;
  }
  return reason;
}

1 Like

so the expression

avg_pdif = avg_pdif/(sampletime_ms/inst_sampletime_ms);

can be replaced with

avg_pdif = avg_pdif/(5000/300000);

did I understand correct?

don't you see that in this expression you are dividing by zero?

1 Like

from post 1

unsigned long inst_sampletime_ms = 5000;
unsigned long sampletime_ms = 300000;

therefore

avg_pdif = avg_pdif/(sampletime_ms/inst_sampletime_ms);

is

avg_pdif = avg_pdif/(300000/5000);

or have I read it wrong?

1 Like

I read OP's statement in the post #5:

1 Like

We could fill the interweb with reasons you could have one of your variables with a value of zero.

physical reasons based on the quality of memory
software reasons based on your math
software reasons based on your understanding of separation of variables into different spaces
(notice I did not say that the processor or the compiler made a mistake that caused the fault, does not happen)

or, you could do what I do whenever I'm about to do math, a hard fault is called hard because you brick.
You'll never find the reason until you can stop it from occurring, one possible cause at a time.

If your understanding of an ARM processor doesn't include what type of hard fault it is, then it is
doubtful that you would be able to write an exception handler that fixed the problem and returned to running.

if (value used as denominator not valid as denominator)       // do this for every value
{
    fix value;
}

// now you can perform a math operation which is 5-50 times longer than the time spent checking values.

If you want, during development, count the number of times you brick; you'll have free time, having not bricked.

1 Like

I do not think he problem is in the wiring or because of insufficient power available (it can be running for many hours without an issue). However, I attach images of the schematic:

Regarding the faulty unit, I will try with two more, which are identical to the one I am testing

I will add this to the code and let you know

No, sorry, I made a mistake in my previous post. sampletime_ms = 300000 and inst_sampletime_ms = 5000

This is right

Why so complicate condition?
Why just not
i < (sampletime_ms/inst_sampletime_ms)

1 Like

So you suggest checking on each operation with denominator, the denominator value for all the loop? Can do this :slight_smile:

a hard fault results from math or memory operations, and happens to prevent the illegal operation from occurring.
It happens before the instruction is allowed to execute, so you can't save off something to leave a clue.

300000/5000 is a constant computed at compile time and stored away by the compiler, probably in code space.
note this number is not recomputed each time the comparison is made
a true constant should be #defined to make it obvious where it is being used (to the human)

var1/var2 is a read of two variables and a computed result:
unlikely var1 memory or where var1 exists in memory has moved as you are not using pointers,
note this number is probably recomputed from the variables on each check for loop completion
unless the compiler just loads and computes it once because you never change it in the code

so therefore if you are having a hard fault due to math one the variable was crunched

ways your C global/static variables can get crunched by code: the stack or heap can grow to the
point that it overlaps your variables allocated in the same general region of memory.

ways your C g/s variables can get crunched by devices: if you point a device at memory to store
a block of memory in your memory, the processor cannot detect that you have had that done

ways your local stack-based variables can get crunched: unlikely by code, the stack is king.
devices that write into memory don't care that you point them at memory on what you call a stack.

1 Like