How to write reliably from an ISR to a digital IO

Hello,

I am trying to manage the RS485 drive enable / receive enable from an interrupt handler that captures the tx – empty interrupt. I am using an Controllino maxi-power.
The write jobs to D1 are to witness what is happening on my PicoScope5000.

void NeoHWSerialSylvester485::_tx_empty_irq(void)

{
  cbi(*_ucsrb, TXCIE0);
  digitalWrite(CONTROLLINO_D1,1);
  digitalWrite(CONTROLLINO_RS485_DE, LOW);
  digitalWrite(CONTROLLINO_RS485_nRE, LOW);
  digitalWrite(CONTROLLINO_D1,0);
}

The strange thing is that on rare moments, the

digitalWrite(CONTROLLINO_RS485_DE, LOW);

digitalWrite(CONTROLLINO_RS485_nRE, LOW);

Seems to fail:

I tried the same with bitClear,

void NeoHWSerialSylvester485::_tx_empty_irq(void)
{
  // disable interrupt:
  cbi(*_ucsrb, TXCIE0);
  digitalWrite(CONTROLLINO_D1,1);
  bitClear(PORTJ,PJ5);
  bitClear(PORTJ,PJ6);
  digitalWrite(CONTROLLINO_D1,0);
}

With the same results.

Fig 1. shows the failing operation:

t = -9.739 the RS485 bus is in a “rest” state

t = -8.735 the SN65HVD08 receives DE/nRE, and the bytes are transferred

t = 0 the interrupt hander is called (brow signal = D1 witnessing activation of ISR)

What I do see is that the SN65HVD08 seems to go back to its “receiving” state at t=0 and then jumps back to its “transmission” state. (1 time out of 100 or more)

As there are no digital writes in my code (except at 1 place) and as this place is “witnessed” by the green signal on my scope, I have no clue why, from time to time, the write job seems to fail.

Note: For the moment I trying to make a demo sketch that is as small as possible, and I will post it if the problem does not get solved. The rest of the code should not influence the operaton of the ISR and its I/O but one never knows.

Note: if I read back the DE and RE signals from the main loop, indeed, if the write fails, they both are at level 1 and not at 0. Therefore, I assume that there must be something going on in the MCU iself but I do not know what.

Fig 2. OK example:
afbeelding

Question: 1) How to reliably write to an IO from an interrupt handler.

Question: 2) Other idea’s why this is failing?

Fig 3. Failure detail:

There is no unreliable writing to digital outputs.

Your D1 pulse may be too short, see the spikes in your other signals.

Fig: 2: how did you obtain the OK example?

Fig. 3 indicates problems with your probes, probably poor ground.

Can you present a circuit diagram?

@DrDiettrich,
Thank you for your reaction.

Could be, but what happens if an ISR interrupts the code of an ongoing digitalwrite and executes another digitalwrite at the same time, are interrupts blocked during digitalwrite? I have not checked the code yet, have you checked the internals? I have not done this yet.

IMHO the spikes are secondary to the topic, the TTL/RS485 converter resides on the print of the Controllino. What is important is that in most of the times, the RS485 bus goes to idle state, but sometimes, on very rare occasions it does not. The blue channel and the red one illustrate this clear and the purple channel = blue - red also. The main objective of the signals D1 is to show that the interrupt handler is executed on the OK and the NOT ok cases. Could be that I need to add some resistors to load the outputs D0 an D1 so the signals are cleaner. But the proof of execution of the ISR is there as is the proof of the non-execution of the code that sets the DE and the nRNE at the end of the send job (channel C).
Pulse too short; even with a for lus executing the digital write multiple times, same issue.

The same way as the other ones, no difference whatsoever.

Only one of the probes is earthed, see the schematic. But as said before and in the initial post, on those occasions where things go wrong, if I read back the status of the digital outputs that drive not-receive-enable and drive enable, they remain 1 in spite of the fact that the ISR was executed writing them to 0. Therefore the source of my problem resides within the Atmega 2560. The outer hardware part can be ruled out but is essential to narrow down the problem.

A single instruction, such as one that writes to port, cannot be interrupted. The function invoked by a digitalWrite() call in the main program can be interrupted, which would insert a (hopefully very short) delay.

I agree with Dr. Diettrich -- digital port writes are reliable, so your problem lies elsewhere. And those scope traces show connection or probe problems.

@jremington

Thank you for your post.

The issue I was afraid of is documented in.

By @Grumpy Bear.

But as it is posted solved I will look further and let you know if I find the solution to my problem.
For the moment I am going to focus on the rest of my code, removing all ballast and see when the problem goes away. If I have someting relevant to share, I will add this to my post.

The issue mentioned above is backwards. A digitaWrite running in main program can be overwritten by digitalWrite in an ISR, but not the other way around.

@hzrnbgy
You have a point! In line with my previous post: I will dive into my own code and let you know the result. Thanks for your reaction.
As for the remark regarding the scope image:
Earthing all probes instead of 1 => no difference.
But activating 20Mhz filter resulted in the following graph:


So if there are any additional things I can improve on the image, I am open to your suggestions.

The complaint refers to the code for the digitalWrite() function, which in 2009 may not have been interrupt-safe. It appears that this problem was fixed long ago.

Code should NEVER be structured in a way that allows that to happen. EVER. If you write it in the ISR, then you must NOT write it in any other place. OR, you must implement interlocks or access controls to enforce who can write it, and when, so there can NEVER be such contention.

Yup

void digitalWrite(uint8_t pin, uint8_t val)
{
	uint8_t timer = digitalPinToTimer(pin);
	uint8_t bit = digitalPinToBitMask(pin);
	uint8_t port = digitalPinToPort(pin);
	volatile uint8_t *out;

	if (port == NOT_A_PIN) return;

	// If the pin that support PWM output, we need to turn it off
	// before doing a digital write.
	if (timer != NOT_ON_TIMER) turnOffPWM(timer);

	out = portOutputRegister(port);

	uint8_t oldSREG = SREG;
	cli();

	if (val == LOW) {
		*out &= ~bit;
	} else {
		*out |= bit;
	}

	SREG = oldSREG;
}

It may not be solved in the Controllino firmware? Check yourself the library version you are actually using.

You are correct 100%, so my main loop code fills the buffer enables the RS485 drive enable pin, and then I await until the characters are sent and the last character has left the UART, and then and only then I reset the drive enable pin of the RS485 driver so the other side can answer.
Now, for some reason and I have no clue at the moment, the reset job of the driver enable pin fails, on very rare occasions. I made my scope witness the first event (the set of the enable) and the reset event (from the ISR).

Update: Instead of running the program on Controllino, I ran it on Mega2560, scope signals are much better now. It could be that the poor signal forms were caused by the Controllino 24V output driver electronics.

Then, regarding my further investigation: the problem does not appear if I eliminate the lib I am writing and Ethernet. The next step is to see where the issue comes from.

For the "safe" write I found a possible problem with alternative functions of certain pins, which can be controlled by both hardware (PWM) or software. For ATmega controllers this problem is solved by a multiplexer that outputs either the port register bit state (controlled by software) or the alternative state (by hardware). This way the PORT register holds the software state only, while PIN reads back the mix of PORT and alternate function bits. As long as a port modification is protected, by interrupts temporarily disabled, it's safe to read PORT, modify and write it back in non-atomic instruction sequences. Other controllers may implement different (vulnerable) protocols.

Hello @DrDiettrich

I do agree with this statement, but the experiments and the investigations I did point me in an other (?) direction or at least do not confirm my analysis.
So, first the ISR (HardwareSerial derivative, that I modified a bit to detect the transmitter empty interrupt).

void NeoHWSerialSylvester485::_tx_udr_empty_irq(void)
{
  /*  ...  */
  if (_tx_buffer_head == _tx_buffer_tail) 
  {
    // Buffer empty, so disable interrupts
    cbi(*_ucsrb, UDRIE0);
    // enable transmitter empty interrupt:
		if (m_pinDe!=-1)
			sbi(*_ucsrb, TXCIE0);
  }
}

Then the code to reset the DE of the RS485:

**void NeoHWSerialSylvester485::_tx_empty_irq(void)**
{
  // disable transmitter empty interrupt:
  cbi(*_ucsrb, TXCIE0);
  // reset DE IO:
  uint8_t oldSREG = SREG;
  cli();
  digitalWrite(m_pinDe, 0);
  SREG = oldSREG;
}

DE is set when writing to the fifo starts:

size_t NeoHWSerialSylvester485::write(uint8_t c)
{
	digitalWrite(m_pinDe,1);
        /*...*/

And from my main sketch i trigger the write job each 250ms, so overlap is not posible as the telegram is only 10 bytes for test.
Observation: As long as DE is part of a port that is not used by any other part of the code => works ok, but as soon as I use 13 (IO LED PB7, the problems start).

Test code:

void loop()
{
  de = bitRead(PORTB, 7);
  if (millis() - millisPrev > 250)
  {
    millisPrev = millis();
    while (de == 1)    
    {
      Serial.println("Error, de is not zero.");
      delay(10000);
    }
    buf[0]++;
    NeoSerial3.write(buf,sizeof(buf));
    Serial.println("+");
  }
  int UDPacketSize = UDPServer.parsePacket();
}

Execution of UDPServer.parsePacket() results in DE (read back with bitRead() resulting in 1 and not 0).

Code of UDPServer.parsePacket() narrows down to (UDP is not used in the app, nodata sent)

int EthernetUDP::parsePacket()
{
/*...*/
	if (Ethernet.socketRecvAvailable(sockindex) > 0) 
  {
/*...*/
	}
	return 0;
}

socketRecvAvailable narrows down to:

uint16_t EthernetClass::socketRecvAvailable(uint8_t s)
{
	uint16_t ret = state[s].RX_RSR;
	if (ret == 0) 
  {
    	SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
		uint16_t rsr = getSnRX_RSR(s);
        SPI.endTransaction();
	ret = rsr - state[s].RX_inc;
	state[s].RX_RSR = ret;
    }
   return ret;
}

If i comment out the part SPI.beginTransaction to SPI.endTransaction() -> No problem.

Now i make my point: i do not see any use of ports in begin and endtransaction. So for the moment I do not know which code is altering D13 at that is part of port B?

 inline static void beginTransaction(SPISettings settings) 
  {
    if (interruptMode > 0) 
    {
      uint8_t sreg = SREG;
      noInterrupts();

      #ifdef SPI_AVR_EIMSK
      if (interruptMode == 1) 
      {
        interruptSave = SPI_AVR_EIMSK;
        SPI_AVR_EIMSK &= ~interruptMask;
        SREG = sreg;
      } else
      #endif
      {
        interruptSave = sreg;
      }
    }

    #ifdef SPI_TRANSACTION_MISMATCH_LED
    if (inTransactionFlag) {
      pinMode(SPI_TRANSACTION_MISMATCH_LED, OUTPUT);
      digitalWrite(SPI_TRANSACTION_MISMATCH_LED, HIGH);
    }
    inTransactionFlag = 1; 
    #endif

    SPCR = settings.spcr;
    SPSR = settings.spsr;
  }
and
 inline static void endTransaction(void) {
    #ifdef SPI_TRANSACTION_MISMATCH_LED
    if (!inTransactionFlag) {
      pinMode(SPI_TRANSACTION_MISMATCH_LED, OUTPUT);
      digitalWrite(SPI_TRANSACTION_MISMATCH_LED, HIGH);
    }
    inTransactionFlag = 0;
    #endif

    if (interruptMode > 0) {
      #ifdef SPI_AVR_EIMSK
      uint8_t sreg = SREG;
      #endif
      noInterrupts();
      #ifdef SPI_AVR_EIMSK
      if (interruptMode == 1) {
        SPI_AVR_EIMSK = interruptSave;
        SREG = sreg;
      } else
      #endif
      {
        SREG = interruptSave;
      }
    }
  }

So if any member of the forum can shed some light on this, I would be happy to learn what I going on here.

But the conclusion that the problem is related to synchronisation, where as far as I can see CLI is used as a synchronisation mechanism, is true. Possibly some direct writes to the PORT register based on previous saving of the PORT register without protection by CLI is the bases of my problem, but for the moment I have not found the place.
Second: If you could share code that makes it possible to toggle an individual DO reliably, i would be happy to learn something.

That looks like the firmware solution still should be improved. If digitalWrite() disables and finally enables interrupts again, this means that afterwards interrupts are enabled even in an ISR. Instead the Status Register should be saved before disabling interrupts and should be restored instead of re-enabling interrupts.

I was a bit too fast with this conclusion, further testing revealed that the use of a completely free ports works ok but disabling interrupts during my call to digitalWrite() was also required.

@DrDiettrich @jremington @gfvalvo @hzrnbgy

I narrowed down my issue to: the execution of W5100.setSS() to DO at PB4
Interfering with my ISR resetting PB6.

What I do not understand is, that the code lines in digitalWrite() where there could be interference are protected by cli() used for synchronization. cli() = disabling all interrupts() ?

	uint8_t oldSREG = SREG;
	cli();

	if (val == LOW) {
		*out &= ~bit;
	} else {
		*out |= bit;
	}

	SREG = oldSREG;

So cli(), as I understand it should act as a synchronization object whereas, I see that in fact the only conclusion from my observations (IMHO) is that the reset by my ISR is somehow overwritten by an "old value" of PB originating from another function call. Hence there is no mutex / critical section effect?

So, if you all could help me to understand what I am missing here, I would really appreciate it as for the moment I can only conclude that *out &= (called from my main loop) is interrupted by my ISR doing the same and then upon return *out = overwrites the effect of my ISR. This in spite of the fact that this should not happen because of cli(); ?

I even stored the results of digitalPinToBitMask and digitalPinToPort in helper variables so only the part where the IO is written was cyclically executed in my ISR, but also this did not help.

Several machine instructions are required to implement the two "*out" lines of C/C++ code, so each line can be interrupted, and the results of the operation corrupted by actions of an interrupt routine. Look up "atomic operations".

	if (val == LOW) {
		*out &= ~bit;
	} else {
		*out |= bit;
	}

The cli() call prevents those interruptions.

I prefer noInterrupts() for clearness. And aren't there specific (atomic?) instructions for setting or clearing register bits?

I think that the (yet unknown) controller data sheet should be consulted. Usual controllers disable interrupts on entry of an ISR, so that it should not be required to disable interrupts inside the ISR again. If the controller allows for nested interrupts (by priority) other protection/synchronization is required.