Radiohead library and rfm95 weird init issues

Hi all,
so I'm using the sketch attached below with rfm95 and I get really weird results;
I have atmega 328 run with internal 8mhz oscillator connected to the rfm95 with the following connection:
D9 to RESET pin
MISO/MOSI to MISO/MOSI (default atmega328 pins)
D2 to NSS
and SCK to SCK (default atmega328 pin)

sometimes the init works and sometimes it doesnt, I would obviously suspect connectivity.. but heres the catch, the rfm and atmega328 are soldered to a custom made pcb (ordered from china). sometimes when it refuses to connect for a long time I re-heat a random rfm95 pin and it inits OK, sometimes I just touch it and it comes back to life (sometimes it doesnt). sometimes it takes 4-5 times of failed attempts and only then it inits OK.

I checked with an multimeter and there is definitely a good connection with the pcb and atmega.

any suggestions would be great, thanks.

// LoRa 9x_TX
// -*- mode: C++ -*-
// Example sketch showing how to create a simple messaging client (transmitter)
// with the RH_RF95 class. RH_RF95 class does not provide for addressing or
// reliability, so you should only use RH_RF95 if you do not need the higher
// level messaging abilities.
// It is designed to work with the other example LoRa9x_RX
 
#include <SPI.h>
#include <RH_RF95.h>
 
#define RFM95_CS 10
#define RFM95_RST 9
#define RFM95_INT 2
 
// Change to 434.0 or other frequency, must match RX's freq!
#define RF95_FREQ 434.0
 
// Singleton instance of the radio driver
RH_RF95 rf95(RFM95_CS, RFM95_INT);
 
void setup() 
{
  pinMode(RFM95_RST, OUTPUT);
  digitalWrite(RFM95_RST, HIGH);

  while (!Serial);
  Serial.begin(19200);
  delay(100);
 
  Serial.println("Arduino LoRa TX Test!");
 
  // manual reset
  digitalWrite(RFM95_RST, LOW);
  delay(20);
  digitalWrite(RFM95_RST, HIGH);
  delay(20);
 
  while (!rf95.init()) {
    Serial.println("LoRas radio init failed");
    delay (100);
      digitalWrite(RFM95_RST, LOW);
  delay(20);
  digitalWrite(RFM95_RST, HIGH);
  delay(20);
  }
  Serial.println("LoRa radio init OK!");
       rf95.setModemConfig(3);

rf95.setPreambleLength(8);
  // Defaults after init are 434.0MHz, modulation GFSK_Rb250Fd250, +13dbM
  if (!rf95.setFrequency(RF95_FREQ)) {
    Serial.println("setFrequency failed");
    while (1);
  }
  Serial.print("Set Freq to: "); Serial.println(RF95_FREQ);
    rf95.setModemConfig(3);

  // Defaults after init are 434.0MHz, 13dBm, Bw = 125 kHz, Cr = 4/5, Sf = 128chips/symbol, CRC on
 
  // The default transmitter power is 13dBm, using PA_BOOST.
  // If you are using RFM95/96/97/98 modules which uses the PA_BOOST transmitter pin, then 
  // you can set transmitter powers from 5 to 23 dBm:
  rf95.setTxPower((int16_t)16);
}

int16_t packetnum = 0;  // packet counter, we increment per xmission
void loop()
{
  rf95.printRegisters();


  Serial.println("Sending to rf95_server");
  // Send a message to rf95_server
  
  char radiopacket[20] = "Hello World #      ";
  itoa(packetnum++, radiopacket+13, 10);
  Serial.print("Sending "); Serial.println(radiopacket);
  radiopacket[19] = 0;
  
  Serial.println("Sending..."); delay(10);
  rf95.send((uint8_t *)radiopacket, 20);
   rf95.printRegisters();

  Serial.println("Waiting for packet to complete..."); delay(10);
  rf95.waitPacketSent();
  // Now wait for a reply
  uint8_t buf[RH_RF95_MAX_MESSAGE_LEN];
  uint8_t len = sizeof(buf);
 
  Serial.println("Waiting for reply..."); delay(10);
  if (rf95.waitAvailableTimeout(1000))
  { 
    // Should be a reply message for us now   
    if (rf95.recv(buf, &len))
   {
      Serial.print("Got reply: ");
      Serial.println((char*)buf);
      Serial.print("RSSI: ");
      Serial.println(rf95.lastRssi(), DEC);    
    }
    else
    {
      Serial.println("Receive failed");
    }
  }
  else
  {
    Serial.println("No reply, is there a listener around?");
  }
  delay(1000);
}

If you read the datasheet it appears the NRESET (pin 6) is an OUTPUT and should be left floating except for being pulled LOW for 100 microseconds to cause a reset. Perhaps keeping it HIGH except for a 20 millisecond LOW pulse is causing problems.

void setup() 
{
  pinMode(RFM95_RST, INPUT);  // Let the pin float

  while (!Serial);
  Serial.begin(19200);
  delay(100);
 
  Serial.println("Arduino LoRa TX Test!");
 
  // manual reset
  digitalWrite(RFM95_RST, LOW);
  pinMode(RFM95_RST, OUTPUT);
  delayMicroseconds(100);  // Pull low for 100 microseconds to force reset
  pinMode(RFM95_RST, INPUT);
  delay(5);  // Chip should be ready 5ms after low pulse
 
.
.
.

You are correct about the HIGH-Z, I must have missed that.

Anyway, it still acts the same way.

the updated code:

// LoRa 9x_TX
// -*- mode: C++ -*-
// Example sketch showing how to create a simple messaging client (transmitter)
// with the RH_RF95 class. RH_RF95 class does not provide for addressing or
// reliability, so you should only use RH_RF95 if you do not need the higher
// level messaging abilities.
// It is designed to work with the other example LoRa9x_RX
 
#include <SPI.h>
#include <RH_RF95.h>
 
#define RFM95_CS 10
#define RFM95_RST 9
#define RFM95_INT 2
 
// Change to 434.0 or other frequency, must match RX's freq!
#define RF95_FREQ 434.0
 
// Singleton instance of the radio driver
RH_RF95 rf95(RFM95_CS, RFM95_INT);
 
void setup() 
{
   pinMode(RFM95_RST, INPUT); 

  while (!Serial);
  Serial.begin(19200);
  delay(100);
 
  Serial.println("Arduino LoRa TX Test!");
 
  // manual reset
  digitalWrite(RFM95_RST, LOW);
    pinMode(RFM95_RST, OUTPUT);

  delayMicroseconds(100); 
   pinMode(RFM95_RST, INPUT); 
  delay(20);
 
  while (!rf95.init()) {
    Serial.println("LoRas radio init failed");
    delay (100);
                                                                                                                                    
  }
  Serial.println("LoRa radio init OK!");
       rf95.setModemConfig(3);

rf95.setPreambleLength(8);
  // Defaults after init are 434.0MHz, modulation GFSK_Rb250Fd250, +13dbM
  if (!rf95.setFrequency(RF95_FREQ)) {
    Serial.println("setFrequency failed");
    while (1);
  }
  Serial.print("Set Freq to: "); Serial.println(RF95_FREQ);
    rf95.setModemConfig(3);

  // Defaults after init are 434.0MHz, 13dBm, Bw = 125 kHz, Cr = 4/5, Sf = 128chips/symbol, CRC on
 
  // The default transmitter power is 13dBm, using PA_BOOST.
  // If you are using RFM95/96/97/98 modules which uses the PA_BOOST transmitter pin, then 
  // you can set transmitter powers from 5 to 23 dBm:
  rf95.setTxPower((int16_t)16);
}

int16_t packetnum = 0;  // packet counter, we increment per xmission
void loop()
{
  rf95.printRegisters();


  Serial.println("Sending to rf95_server");
  // Send a message to rf95_server
  
  char radiopacket[20] = "Hello World #      ";
  itoa(packetnum++, radiopacket+13, 10);
  Serial.print("Sending "); Serial.println(radiopacket);
  radiopacket[19] = 0;
  
  Serial.println("Sending..."); delay(10);
  rf95.send((uint8_t *)radiopacket, 20);
   rf95.printRegisters();

  Serial.println("Waiting for packet to complete..."); delay(10);
  rf95.waitPacketSent();
  // Now wait for a reply
  uint8_t buf[RH_RF95_MAX_MESSAGE_LEN];
  uint8_t len = sizeof(buf);
 
  Serial.println("Waiting for reply..."); delay(10);
  if (rf95.waitAvailableTimeout(1000))
  { 
    // Should be a reply message for us now   
    if (rf95.recv(buf, &len))
   {
      Serial.print("Got reply: ");
      Serial.println((char*)buf);
      Serial.print("RSSI: ");
      Serial.println(rf95.lastRssi(), DEC);    
    }
    else
    {
      Serial.println("Receive failed");
    }
  }
  else
  {
    Serial.println("No reply, is there a listener around?");
  }
  delay(1000);
}

So you changed it to not do the reset after each init failure. Are you still getting multiple init failures before success?

That is correct. sometimes.

now it worked for like 5 minutes (I plugged the power in and out many times to check robustness) and it stopped working. (probably will come back to life in 2-30 minutes).

** just got back to life.

So as it is still in its erratic behavior, I tried changing the vdd capacitor, didn't help.

I found out that sometimes it does init but later stuck at "Waiting for packet to complete...".

I would guess that there is some problem with the interrupt? (that goes to D2)

I have 2 models that act exactly the same way so I dont think its connectivity issue. (also everything is solders on a PCB)

Do you have any idea what could cause this?

maybe because I'm using 8mhz frequency the atmega misses the interrupt from the rf95?

Does anyone have any idea how to approach this?

I tried to verify the spi data the module gets, it gets 0x8 in the init (im talking about spiRead(RH_RF95_REG_01_OP_MODE)). instead of 0x80. it looks like some sort of sync issue, any ideas?

That looks bad. As far as I can see 0x08 isn't even a valid operating mode!

 // RH_RF95_REG_01_OP_MODE 0x01
 100 #define RH_RF95_LONG_RANGE_MODE 0x80
 101 #define RH_RF95_ACCESS_SHARED_REG 0x40
 102 #define RH_RF95_MODE 0x07
 103 #define RH_RF95_MODE_SLEEP 0x00
 104 #define RH_RF95_MODE_STDBY 0x01
 105 #define RH_RF95_MODE_FSTX 0x02
 106 #define RH_RF95_MODE_TX 0x03
 107 #define RH_RF95_MODE_FSRX 0x04
 108 #define RH_RF95_MODE_RXCONTINUOUS 0x05
 109 #define RH_RF95_MODE_RXSINGLE 0x06
 110 #define RH_RF95_MODE_CAD 0x07

I know, I've read through the radiohead library + most of the datasheet of rf95.

sometimes I get 0x8, sometimes 0x0 sometimes 0x80, thats why i now suspect the spi communication.

Weird - I left the controller alone for 9 days, came back and it all works perfect, no crashes no nothing.

I guess no one has any clue on what could be the problem or else they would've comment.

For the readers - I suspect maybe there was some EM noise that could have f***ed things up, or any other external noise.