AVR-GCC for Linux doesn't handle well 16 bit words

Weeks ago I found that avr-gcc (5.0.4, which is the "standard") couldn't left-shift bits in a 16bit words; in that case I had to multiply the variable by a power of two. ("couldn't" means wrong results.)

Now I've found an even stranger bug: In a software delay function of mine, when a 16bit variable called now is less than 32767 (2^15-1) a comparison is wrong; while when the now variable is greater than such value, the comparison is performed as expected. Let me show you the function and then the results.

The function is:

void software_delay( TickType_t ticks )
{
   volatile TickType_t now = xTaskGetTickCount();

   Serial.print( "\nBefore: " ); Serial.println( now );

   volatile TickType_t next_wakeup = now + ticks;
   Serial.print( "Next: " ); Serial.println( next_wakeup );

   while( 1 )
   {
      if( now > next_wakeup ) 
         break;
      now = xTaskGetTickCount();
   }

   Serial.print( "After: " ); Serial.println( now );
}

The loop seems too weird, but I was testing different ways to catch the bug. TickType_t is a 16bit integer. Pay attention to the conditional if:

if( now > next_wakeup ) 
         break;

Whenever the now variable is less than 32767 the loop isn't executed, as shown by the output:

Before: 2598
Next: 4598
After: 2602

Before: 6603
Next: 8603
After: 6612

Before: 6603
Next: 8603
After: 6612

...

Before: 30680
Next: 32680
After: 30722

The After value should be a little bit greater than Next. This output shows that the loop is not being executed. However, when now is greater than 32767, the loop is executed:

Before: 34723
Next: 36723
After: 36724

Before: 40724
Next: 42724
After: 42725

...

Before: 58682
Next: 60682
After: 60683

What the heck!

The assemby output seems to be right (although I'm not that kind of expert). Let me show the the section of the loop:

   while( 1 )
   {
      if( now > next_wakeup ) 
     1e2:	2b 81       	ldd	r18, Y+3	; 0x03
     1e4:	3c 81       	ldd	r19, Y+4	; 0x04
     1e6:	89 81       	ldd	r24, Y+1	; 0x01
     1e8:	9a 81       	ldd	r25, Y+2	; 0x02
     1ea:	82 17       	cp	r24, r18
     1ec:	93 07       	cpc	r25, r19
     1ee:	28 f0       	brcs	.+10     	; 0x1fa <_Z14software_delayj+0x76>
         break;
      now = xTaskGetTickCount();
     1f0:	0e 94 8c 08 	call	0x1118	; 0x1118 <xTaskGetTickCount>
     1f4:	9c 83       	std	Y+4, r25	; 0x04
     1f6:	8b 83       	std	Y+3, r24	; 0x03
   volatile TickType_t next_wakeup = now + ticks;
   Serial.print( "Next: " ); Serial.println( next_wakeup );

   while( 1 )
   {
      if( now > next_wakeup ) 
     1f8:	f4 cf       	rjmp	.-24     	; 0x1e2 <_Z14software_delayj+0x5e>
         break;
      now = xTaskGetTickCount();
   }

The problem persists even if I surround the conditional if in a critical section (this project uses a real time kernel).

This kind of missbehaviour isn't acceptable by any means for a serious embedded systems.

Any idea? Some hints? Thank you in advanced!

Data types, (casting), type assumptions…

The snippet doesn’t tell use everything.

One thing I noticed is that you calculate and use future time stamps. I'm not sure what your type 'Ticktype_t' is, but when numbers get large, arithmetic rollover becomes an issue. In many time comparison situations, the way to avoid it is to use unsigned arithmetic and compare previous time stamps with the current time. This method is not affected by byte or word overflow.

Actually, back there you did say that you are using 16 bit integers, those are normally signed, which would not produce the right result.

Another muse - have you tested this on a different compiler? By implication from the subject line, on another systems besides Linux?

Hi,

I wanted to narrow the problem to the core, and do not overwhelm you with unimportant details. But in order to get as many help as possible, here you can see the complete code here.

You'll see new instructions that I added after my post in order to track the bug.

Hi,

Future timestamps are handled correctly even with rollovers (it took me too long to figure it out in a previous project).

TickType_t can be uint16_t or uint32_t. Per my project it's 16bit:

#if( configUSE_16_BIT_TICKS == 1 )
	typedef uint16_t TickType_t;
	#define portMAX_DELAY ( TickType_t ) 0xffff
#else
	typedef uint32_t TickType_t;
	#define portMAX_DELAY ( TickType_t ) 0xffffffffUL
#endif

//in another file:
#define configUSE_16_BIT_TICKS 1

Now, the problem isn't when rolling over 65531 to 0 (actually it does well), but when the unsigned 16bit variable now passes the 32767 magic number as you saw in the output.

No, I don't have access to a compiler other than avr-gcc. However, I tested it with the release 11 of such compiler and the problem is still there, so I came back to the "standard" 5.0.4 release.

And no, I'm a heavy Linux user, so I don't have a box with other OS =(.

Why don't you follow the well known procedures for comparing time values? The right expression provides correct results. Pattern:
if (now - last > interval) ...

Why volatile variables for fixed times?

Hi,

Does it handle correctly rollovers? Imagine that now is 100 and last 55000. Anyway, it worths to try it.

The compiler agressively optimizes the code, so volatile is a manner of telling it: "don't optimize this variable". And why? Because I'm tracing the bug so I need to get as many information as possible.

BTW, leaving out any output code optimization (-O0) makes the code to crash. The code also crashes when using Serial.print() with 32bit variables. But those are problems for another time.

To claim an actual compiler bug, please create a "minimal program" that exhibits the bad behavior. In this case, you haven't shown the definition of TickType_t or xTaskGetTickCount...

Hi,

From a previous post:

TickType_t can be uint16_t or uint32_t. Per my project it's 16bit:

#if( configUSE_16_BIT_TICKS == 1 )
	typedef uint16_t TickType_t;
	#define portMAX_DELAY ( TickType_t ) 0xffff
#else
	typedef uint32_t TickType_t;
	#define portMAX_DELAY ( TickType_t ) 0xffffffffUL
#endif

//in another file:
#define configUSE_16_BIT_TICKS 1

The function xTaskGetTickCount() is part of a real time kernel, that does a similar thing as the Arduino's millis() function:

TickType_t xTaskGetTickCount( void )
{
    TickType_t xTicks;

    /* Critical section required if running on a 16 bit processor. */
    portTICK_TYPE_ENTER_CRITICAL();
    {
        xTicks = xTickCount;
    }
    portTICK_TYPE_EXIT_CRITICAL();

    return xTicks;
}

The counting is ok, the comparison is wrong.

Why am I claiming it as bug? As I mentioned in the opening post, this is the second time I find this kind of strange behaviour. If you look at the assembly output you might say it's correct, but when the program is running, then it's not ok.

I would like to use the DBG debugger, but it's a pain in the back to debug an Arduino program (using the atmega328 chip) in a Linux box.

Yes.

And, for both of you, the correct comparison is greater than or equal.

1 Like

It get worst!

The code I tried is:

void software_delay( TickType_t ticks )
{
   volatile TickType_t now = xTaskGetTickCount();

   Serial.print( "\nBefore: " ); 
   Serial.println( now );

   volatile TickType_t next_wakeup = now + ticks;
   Serial.print( "Next: " ); Serial.println( next_wakeup );

   volatile TickType_t last = now;

   while( 1 )
   {
      if( now - last >= ticks )
         break;

      now = xTaskGetTickCount();
   }

   Serial.print( "After: " ); 
   Serial.println( now );
}

with assembly output (for the loop):

volatile TickType_t last = now;
     1e4:	8d 81       	ldd	r24, Y+5	; 0x05
     1e6:	9e 81       	ldd	r25, Y+6	; 0x06
     1e8:	9a 83       	std	Y+2, r25	; 0x02
     1ea:	89 83       	std	Y+1, r24	; 0x01

   while( 1 )
   {
      if( now - last >= ticks )
     1ec:	8d 81       	ldd	r24, Y+5	; 0x05
     1ee:	9e 81       	ldd	r25, Y+6	; 0x06
     1f0:	29 81       	ldd	r18, Y+1	; 0x01
     1f2:	3a 81       	ldd	r19, Y+2	; 0x02
     1f4:	82 1b       	sub	r24, r18
     1f6:	93 0b       	sbc	r25, r19
     1f8:	80 17       	cp	r24, r16
     1fa:	91 07       	cpc	r25, r17
     1fc:	28 f4       	brcc	.+10     	; 0x208 <_Z14software_delayj+0x84>
         {
            Serial.println( now  );
         }
      }

      now = xTaskGetTickCount();
     1fe:	0e 94 95 08 	call	0x112a	; 0x112a <xTaskGetTickCount>
     202:	9e 83       	std	Y+6, r25	; 0x06
     204:	8d 83       	std	Y+5, r24	; 0x05

   volatile TickType_t last = now;

   while( 1 )
   {
      if( now - last >= ticks )
     206:	f2 cf       	rjmp	.-28     	; 0x1ec <_Z14software_delayj+0x68>
      }

      now = xTaskGetTickCount();
   }

and the actual output is:

Before: 8604
Next: 10604
After: 8606

Before: 12607
Next: 14607
After: 12609

...

Before: 36631
Next: 38631
After: 36631

Before: 40632
Next: 42632
After: 40637

It doesn't even work passing the magic number 32767!

You presume print and println execute in zero time.

What are you expecting the if() to execute

Try to replace the mysterious
now = xTaskGetTickCount();
with
now++;

Why are the local variables 'now' and 'next_wakeup' declared 'volatile'?!? That seems to be the problem because when I change the function to use uint16_t it works fine across 32767-32768 and fails near 65536, just as expected for calculating future times:

void software_delay(uint16_t ticks)
{
  uint16_t now = millis();

  Serial.print( "Before: " ); 
  Serial.print(now);

  uint16_t next_wakeup = now + ticks;
  Serial.print( "\tNext: " ); 
  Serial.print( next_wakeup );

  while ( 1 )
  {
    if (now > next_wakeup)
      break;
    now = millis();
  }

  Serial.print( "\tAfter: " ); 
  Serial.println(now);
}
Before: 26225	Next: 27225	After: 27226
Before: 27226	Next: 28226	After: 28227
Before: 28227	Next: 29227	After: 29228
Before: 29228	Next: 30228	After: 30229
Before: 30229	Next: 31229	After: 31230
Before: 31230	Next: 32230	After: 32231
Before: 32231	Next: 33231	After: 33232
Before: 33232	Next: 34232	After: 34233
Before: 34233	Next: 35233	After: 35234
.
.
.
Before: 60259	Next: 61259	After: 61260
Before: 61260	Next: 62260	After: 62261
Before: 62261	Next: 63261	After: 63262
Before: 63262	Next: 64262	After: 64263
Before: 64263	Next: 65263	After: 65264
Before: 65264	Next: 728	After: 65264
Before: 65265	Next: 729	After: 65265
Before: 65266	Next: 730	After: 65266
Before: 65269	Next: 733	After: 65269
Before: 65272	Next: 736	After: 65272

To correct the rollover error:

  uint16_t startTime = now;

  do
  {
    now = millis();
  }
  while (now - startTime <= ticks);

I can't, and there isn't any mistery. xTaskGetTickCount() returns the number of ticks since the system has been running. It's similar to millis(). now++ won't work because the function isn't called every tick.

No, I don't presume anything. Under certain conditions the break is executed almost unconditionally.

And those conditions are. . . ?

Because I don't want the compiler to optimize them. Right now I'm tracing the bug so I need all information I can gather.

millis(), which is part of Arduino, returns an unsigned long (32bits), while xTaskGetTickCount() returns an unsigned int (16bits). xTaskGetTickCount() is part of a real time kernel, and its result isn't the problem, but something into the comparison.

You might ask that I should use 32bit instead of 16bit, but the problem is that Serial.print() doesn't handle them well (my system crashes, because Arduino isn't mean for real time).

I mentioned it before in the post, but I'll repeat it here for the sake of clarity:

Whenever the now variable is less than 32767 the loop isn't executed, as shown by the output [...] However, when now is greater than 32767, the loop is executed [...]

Do we now all have the means to replicate your results?