Debugging state machine timing discontinuity

TLDR: UART communication causes slightly irregular state machine timing, not sure how to approach debugging.

I am updating my company's Arduino stack software from the Arduino Uno to the Arduino Due. It utilizes a software state machine that gets data from sensors, stores the data, and sends it off the board through UART. The state machine must run at 45 Hertz, or 22.2 ms per cycle.

The problem is this: every 404 cycles (exactly 404 cycles, every time), the logic that maintains timing regularity fails, and for 13 cycles (exactly 13 cycles, every time), the state machine goes directly into the next cycle as soon as the last step is complete.

The program maintains an internal timer by triggering an RC compare interrupt, as follows:

void configureTimerInterrupt(){
  // Enable the clock to the TC0 peripheral
  pmc_enable_periph_clk(ID_TC0);

  /*Configures the timer:
    - First two parameters set it to RC compare waveform mode. This means the timer resets when it reaches the value in RC.
    - The third parameter sets the clock source to MCK/128. MCK is at 84 MHz, so this sets the clock to 656.25 kHz.
  */
  TC_Configure(TC0, 0, TC_CMR_WAVE | TC_CMR_WAVSEL_UP_RC | TC_CMR_TCCLKS_TIMER_CLOCK4);
  //RC sets the value that the counter reaches before triggering the interrupt
  //This sets it to 10kHz
  TC_SetRC(TC0, 0, 64.625); // 

  // Enable the interrupt RC compare interrupt
  TC0->TC_CHANNEL[0].TC_IER = TC_IER_CPCS;
  // Disable all other TC0 interrupts
  TC0->TC_CHANNEL[0].TC_IDR = ~TC_IER_CPCS;
  NVIC_EnableIRQ(TC0_IRQn);
  NVIC_SetPriority(TC0_IRQn, 0);

  TC_Start(TC0, 0);
}

void TC0_Handler(){
  // Clear the status register. This is necessary to prevent the interrupt from being called repeatedly.
  TC_GetStatus(TC0, 0);
  if (micros() - timer >= SAMPLE_PERIOD) {
		newCycle = true;
		timer = micros();
	}
}

Then, at the end of each cycle, to maintain timing regularity, the state machine waits until the internal timer reaches 22.2 ms, as follows:

case idle: {
        volatile uint32_t irq_state = __get_PRIMASK();  
        __disable_irq();                       
        if (micros() - timer > SWEEP_OFFSET) {
            currentState = startSweep;
        }
        __set_PRIMASK(irq_state);  
        break;
    }

Interrupts are disabled during this stage due to a problem that arose before I joined the company, and removing this logic does not resolve the problem.

This glitch is caused by the function that manages UART communication (removing this function removes the glitch). Because comms have to happen simultaneously with another processor function (due to timing constraints), the program copies all data into a contiguous memory block and has the UART's peripheral DMA controller manage communication:

void sendData(){
        p_memory_block = memory_block;
        memcpy(p_memory_block, sweepSentinel, sizeof(sweepSentinel));
        p_memory_block += sizeof(sweepSentinel);
        memcpy(p_memory_block, p_sweepTimeStamp, sizeof(sweepTimeStamp));
        p_memory_block += sizeof(sweepTimeStamp);
        memcpy(p_memory_block, sweep_buffer, sizeof(sweep_buffer));
        p_memory_block += sizeof(sweep_buffer);
        memcpy(p_memory_block, imuSentinel, sizeof(imuSentinel));
        p_memory_block += sizeof(imuSentinel);
        memcpy(p_memory_block, p_IMUTimeStamp, sizeof(IMUTimeStamp));
        p_memory_block += sizeof(IMUTimeStamp);
        memcpy(p_memory_block, IMUData, sizeof(IMUData));
        p_memory_block += sizeof(IMUData);
        memcpy(p_memory_block, imuSentinelBuf, sizeof(imuSentinelBuf));
        p_memory_block += sizeof(imuSentinelBuf);
        memcpy(p_memory_block, ramBuf + IMU_TIMESTAMP_OFFSET, sizeof(IMUTimeStamp));
        p_memory_block += sizeof(IMUTimeStamp);
        memcpy(p_memory_block, ramBuf + IMU_DATA_OFFSET, sizeof(IMUData));
        p_memory_block += sizeof(IMUData);
        memcpy(p_memory_block, sweepSentinelBuf, sizeof(sweepSentinelBuf));
        p_memory_block += sizeof(sweepSentinelBuf);
        memcpy(p_memory_block, ramBuf + SWEEP_TIMESTAMP_OFFSET, sizeof(sweepTimeStamp));
        p_memory_block += sizeof(sweepTimeStamp);
        memcpy(p_memory_block, ramBuf + SWEEP_DATA_OFFSET, sizeof(sweep_buffer));
        p_memory_block = memory_block;
        pdc.send(memory_block, totalSize);    
}
    template <typename T>
    void send(T* buffer, int size){
        //check if UART is ready for transmit
        if(*p_UART_SR & TXBUFE){
                //set buffer and size
                *(volatile uint32_t*)p_UART_TPR = (uint32_t)buffer;
                *p_UART_TCR = size;
           
        } else{
            //wait until ready
            while(!(*p_UART_SR & TXBUFE)){
                ;
            }
            //same as above
                *(volatile uint32_t*)p_UART_TPR = (uint32_t)buffer;
                *p_UART_TCR = size;
        }
        
    }

I have am not sure where to begin with resolving this. Both our oscilloscope readings and the data we get suggests that absolutely nothing changes in the UART communication from the 403rd to the 404th cycle. But, it is mission critical that the cycles have regular timing.

I attempted to be thorough, but please let me know if I have left anything important out of this post.

Is it possible that you're blocking in here and this is causing the timing problem? If so, you should investigate having a TxBuffer empty interrupt trigger sending the data instead of a busy loop.

I assume you are sharing a small part of a large system. Issues ,ike this can be fiendishly difficult to diagnose, and sometimes equally hard to patch.

I agree @cedarlakeinstruments - there is no place for busy-waiting in code like this.

It also looks like you can think of things like adding digital outputs in key places to raise signals for when or if you are getting to places in the code, and maybe getting stucked there.

The wait for transmitter ready might be one such place.

// raise busy pin however

while(!(*p_UART_SR & TXBUFE)){
                ;
            }

// lower busy pin

Believe it or not, as much as this sounds like chewing lemon peels, I envy you. I can only get a vicarious thrill out of this in my miserable life under the umbrella wasting what's left of it.

a7

1 Like

I don't see a critical section for all those copies.


why is there a need to disable interrupts in the idle state when you wait until the internal timer reaches 22.2 ms?

Disabling the interruptions might have impact on the sending process.

The PDC does communication by taking a start byte and a number of bytes to iterate over, so they have to be adjacent in memory - is there a better way to manage this without interrupts? It is preferable that the process during which communication happens is interrupted as minimally as possible.

Thanks everybody for the help. I agree that busy waiting should be removed, and I will work on implementing that - but I do not believe that is the issue here. Regardless of the fact that I'm not sure why that would affect the ability of the internal timer to regulate the state machine, the following check:

digitalWrite(7, HIGH);
while(!(*p_UART_SR & TXBUFE)){
     ;
}
digitalWrite(7, LOW);

gets nothing on the oscilloscope.

Given that the problem is in the communication function, I am wondering if there is a better way to handle non-blocking communication while trying to avoid multiple UART empty interrupts than the memory copying strategy in my original post?

You use micros() for the wait (which is not precise to the microsecond )

  • How do you update the timer variable ?
  • should it be >=
  • blocking interrupts might delay a tiny bit the update of the underlying micros() function - may be you need a better timer access for this ?

Can you share, or draw if you haven't one, a timing diagram of activities that comprise the 22.2 ms period, some of which period it seems is simply waiting?

That's what makes this so odd, to me, at this point. I don't see what would carry over an d mess up the timing.

What baud rate and how many characters are being transmitted during the interval?

I'm waiting for someone to see that 405 * 22.2 is 32768, or something like that. It isn't, just sayin'. The regularity of the glitch is odd but an important clue.

More spaghetti: are all variables involved in interrupts volatile? Is all access to them outside the context of interrupt service done in a protected section?

Is it possible some innocent counter is rolling over quietly forking you every so often?

Could you change your build order to shake out things that may be invoking undefined behaviour, so far harmlessly?

Can you build the system with every conceivable warning, way beyond what the IDE turns on, and see if any warnings you haven't seen pop up?

And lastly, a huge task I imagine, can you replace the real activities of the system with proxies that just waste the same amount of time, switch the serial comms to shipping out a same-sized buffer of fake data, with the aim being the creation of a much smaller complete minimal example that reproduces the flaw, which might be easier to share here and would certainly prove you aren't looking in the wrong place, despite the assertion that the problem goes away when you remove the part you think is the problem?

Clutching straws: what is all attached here? Can you share a block diagram of the system and identify the various things hanging off the microprocessor? Is it possible one of them is either by design messing with you, or has changed or broken in a manner to do?

I know, I know, a thousand questions…

a7

BTW given you do the test anyway, this could be simplified

into

template <typename T>
void send(T* buffer, int size){
  while (!(*p_UART_SR & TXBUFE)) ;   //wait until UART is ready for transmit
  // initialise data to send (should that be in a critical section?)
  *(volatile uint32_t*)p_UART_TPR = (uint32_t)buffer; // this is not really type correct
  *p_UART_TCR = size;
}

Thank you all so much again for your time and help!!

Before answering all of @alto777 's questions, I will note another oddity - sometimes, this glitch just disappears when I power up the board. It reappears and disappears when I least expect it, like a phantom haunting my prospects of maintaining employment.

Here's two quick visualizations. First, an oscilloscope reading:

Blue is a DAC pin - it runs a 28 step voltage sweep, each step about 500 us. The start of the sweep is the start of the state machine. Yellow is a 2.5 ms I2C transaction with an IMU. Green is a 0.5-1 ms SPI transaction with an EEPROM chip - it alternates between those two times because every other cycle it does an extra transaction. Purple is UART communication, which starts at the beginning of the sweep. Not shown (because I ran out of scope channels) is the other SPI transaction - on each voltage step, there are 8 SPI transactions with an ADC. The "waiting" period occurs after the EEPROM SPI transaction. See the rough timing diagram:

UART is at 230400 bps. On each cycle, it transmits 294 bytes - one start bit, one stop bit, no parity, so it should come out to 2940 bits per cycle.

The only interrupt currently in action is the timer interrupt that manages the internal timer - the code for that is in the original post. All involved variables are volatile. What do you mean by "protected section?"

I will work on the proxy system and see if it reproduces the flaw. I will have to do that tomorrow, as the error disappeared temporarily when I powered on the board this morning.

The system is actually quite simple. Here's a 10 second diagram:

The peripheral DACs and UART on the microprocessor are used for the voltage sweep and communication.

Observation. Even at that baud rate, the 294 byes will result in blocking until the last n are in the TX buffer. N may be 64, depends on which arduino. That blocking may have unintended side effects.

1 Like

For pure digital signals you could use a 8 channel logic analyser and the freeware SigRok PulseView

How about using a much more powerful microcontroller that can be programmed with Arduino-IDE too?
A teensy 4.0 has a clockfrequency of 600 MHz, 1MB RAM

How about increasing that buffer to a size that all bytes fit at once?

@camsysca could you explain what you mean by blocking in this context? To my understanding, since I am using the peripheral DMA controller, not Arduino's Serial.write(), it should be non-blocking?

@StefanL38 I may have misunderstood your comment, but I believe the buffer I am currently using already stores all 294 bytes - the section of my original post with all the memory copies copies all the data into a 294 byte buffer

const size_t totalSize = 294;
uint8_t memory_block[totalSize];
uint8_t* p_memory_block = memory_block;

WRT your hardware recommendations - it would be quite a hassle to adjust hardware at this point in the design process, and we would like to try and make the Due work for now. There are various reasons that the Due/SAM3x8e is preferable for our project - but I will note the recommendations.

Oops. Apologies, you may be correct. No direct experience with that mode of transmission - but I would dig deep, to ensure there's no length limit or other subtlety.

No I guess I do not understand how your code works.
So very likely that I misunderstood your code.

Seems like DMA needs no function calls at all.

@J-M-L raised the same question

using a different name.

When there are interrupts and regular code using the same variables, they must be declared volatile, and in the case of multi-byte variables, their use in non-interrupt code must be done with interrupts disabled briefly so there is no chance that the ISR fires during the regular code getting or putting the bytes that comprise the variable.

So you'll see

   noInterrupts();
   long myCopy = someVolatileLong;
   interrupts();

and use of myCopy subsequently. There's a header atomic.h that is just a bit more clever and flexible, but you get the idea. Non-atomic access issues can be very hard to see, and may lurk without symptoms only to spring up when you least need it to.

THX for the 'scope disgram and notes. How long is the period of waiting for the next 22.2 ms frame to begin? I return to wondering why this isn't basically eliminating any effect of one interval on the next.

This may be more important than the regularity 404 cycles of the glitch, and is truly the stuff of nightmares.

I repeat my suggestion to get the compiler to tell you as much as possible about the code, it feels like something that should be isn't getting set up right from the start.

a7

On the Arduino Due, Direct Memory Access (DMA) allows efficient data transfer between memory and peripherals like UART without CPU intervention.

For this to work it involves configuring the DMA controller and UART settings, setting up a DMA channel to handle data transfers, enabling UART to generate DMA requests, starting the DMA transfer, and optionally handling completion through interrupts. I’m assuming this is done correctly since the OP hints this part is working.

This process is not the one we typically see with the Serial class

@J-M-L this is my understanding as well. Solidify my confidence by wondering with me if the check for ready is never done, that is to say that when the buffer send lever is pulled, it is always ready, like if you waited 10 milliseconds between characters going out at 115200.

Here the snippet you eliminated with your simpler logical equivalent:

        } else{
            //wait until ready
            while(!(*p_UART_SR & TXBUFE)){
                ;
            }

This is consistent with @swallace23's assertion about the diagnostic pin wiggle surrounding the busy-wait

gets nothing on the oscilloscope.

which at first I took to mean was unremarkable, but now I see as literal - just doesn't happen, no positive pulse, never busy waits.

a7

I think testing (*p_UART_SR & TXBUFE) is not appropriate for DMA controlled transfers.

If I remember correctly you would typically need to check the status of the DMA controller rather than the UART status registers (probably the Channel Status Register - DMAC_CHSR)

I concur. I haven't used DMA on the AVR family but I have on others. It's typically done by configuring DMA, enabling a "done" interrupt and then dispatching the DMA operation and getting on with life. When the DMA operation is done, the interrupt is asserted. No busy waiting should ever be needed.