[Probably solved] Inexplicable behavior (NAN values)

Hi all,

I am currently developing software for my own home built quad copter. It is powered by an Arduino Mega 2560, and has a Chinese IMU (GY-85) wired via I2C using the arduino "Wire" library. Furthermore the motors are controlled via the "Servo" Arduino library. The orientation of the IMU is calculated by the Sebastian Madgwick AHRS function. (code below)
So far it has gone well, but now I have stumbled upon a (for me) perplexing problem. I am hoping someone will be able to give me new insights about how to tackle this problem.

The problem:
Succinctly put the AHRS algotithm returns a quaternion of which all 4 values are NAN (Not A Number).
The weird part is that I can fix this (i.e. returning usable values) by adding the line:

delayMicroseconds(1);

In a specific part of the algorithm:

Quaternion IMU_MADGWICK::MadgwickAHRSupdate(float gx, float gy, float gz, float ax, float ay, float az, float mx, float my, float mz)
{
	// Use IMU algorithm if magnetometer measurement invalid (avoids NaN in magnetometer normalisation)
	if((mx == 0.0f) && (my == 0.0f) && (mz == 0.0f)) {
		MadgwickAHRSupdateIMU(gx, gy, gz, ax, ay, az);
	}

	// Rate of change of quaternion from gyroscope
	qDot1 = 0.5f * (-quaternion.q1 * gx - quaternion.q2 * gy - quaternion.q3 * gz);
	qDot2 = 0.5f * (quaternion.q0 * gx + quaternion.q2 * gz - quaternion.q3 * gy);
	qDot3 = 0.5f * (quaternion.q0 * gy - quaternion.q1 * gz + quaternion.q3 * gx);
	qDot4 = 0.5f * (quaternion.q0 * gz + quaternion.q1 * gy - quaternion.q2 * gx);

	// Compute feedback only if accelerometer measurement valid (avoids NaN in accelerometer normalisation)
	if(!((ax == 0.0f) && (ay == 0.0f) && (az == 0.0f))) {

		// Normalise accelerometer measurement
		recipNorm = invSqrt2(ax * ax + ay * ay + az * az);
		ax *= recipNorm;
		ay *= recipNorm;
		az *= recipNorm;   

		// Normalise magnetometer measurement
		recipNorm = invSqrt2(mx * mx + my * my + mz * mz);
		mx *= recipNorm;
		my *= recipNorm;
		mz *= recipNorm;

		// Auxiliary variables to avoid repeated arithmetic
		_2q0mx = 2.0f * quaternion.q0 * mx;
		_2q0my = 2.0f * quaternion.q0 * my;
		_2q0mz = 2.0f * quaternion.q0 * mz;
		_2q1mx = 2.0f * quaternion.q1 * mx;
		_2q0 = 2.0f * quaternion.q0;
		_2q1 = 2.0f * quaternion.q1;
		_2q2 = 2.0f * quaternion.q2;
		_2q3 = 2.0f * quaternion.q3;
		_2q0q2 = 2.0f * quaternion.q0 * quaternion.q2;
		_2q2q3 = 2.0f * quaternion.q2 * quaternion.q3;
		q0q0 = quaternion.q0 * quaternion.q0;
		q0q1 = quaternion.q0 * quaternion.q1;
		q0q2 = quaternion.q0 * quaternion.q2;
		q0q3 = quaternion.q0 * quaternion.q3;
		q1q1 = quaternion.q1 * quaternion.q1;
		q1q2 = quaternion.q1 * quaternion.q2;
		q1q3 = quaternion.q1 * quaternion.q3;
		q2q2 = quaternion.q2 * quaternion.q2;
		q2q3 = quaternion.q2 * quaternion.q3;
		q3q3 = quaternion.q3 * quaternion.q3;

		
==================>>		//delayMicroseconds(1); //<==========================================================================
	
		// Reference direction of Earth's magnetic field
		hx = mx * q0q0 - _2q0my * quaternion.q3 + _2q0mz * quaternion.q2 + mx * q1q1 + _2q1 * my * quaternion.q2 + _2q1 * mz * quaternion.q3 - mx * q2q2 - mx * q3q3;
		hy = _2q0mx * quaternion.q3 + my * q0q0 - _2q0mz * quaternion.q1 + _2q1mx * quaternion.q2 - my * q1q1 + my * q2q2 + _2q2 * mz * quaternion.q3 - my * q3q3;
		_2bx = sqrt(hx * hx + hy * hy);
		_2bz = -_2q0mx * quaternion.q2 + _2q0my * quaternion.q1 + mz * q0q0 + _2q1mx * quaternion.q3 - mz * q1q1 + _2q2 * my * quaternion.q3 - mz * q2q2 + mz * q3q3;
		_4bx = 2.0f * _2bx;
		_4bz = 2.0f * _2bz;

		// Gradient decent algorithm corrective step
		s0 = -_2q2 * (2.0f * q1q3 - _2q0q2 - ax) + _2q1 * (2.0f * q0q1 + _2q2q3 - ay) - _2bz * quaternion.q2 * (_2bx * (0.5f - q2q2 - q3q3) + _2bz * (q1q3 - q0q2) - mx) + (-_2bx * quaternion.q3 + _2bz * quaternion.q1) * (_2bx * (q1q2 - q0q3) + _2bz * (q0q1 + q2q3) - my) + _2bx * quaternion.q2 * (_2bx * (q0q2 + q1q3) + _2bz * (0.5f - q1q1 - q2q2) - mz);
		s1 = _2q3 * (2.0f * q1q3 - _2q0q2 - ax) + _2q0 * (2.0f * q0q1 + _2q2q3 - ay) - 4.0f * quaternion.q1 * (1 - 2.0f * q1q1 - 2.0f * q2q2 - az) + _2bz * quaternion.q3 * (_2bx * (0.5f - q2q2 - q3q3) + _2bz * (q1q3 - q0q2) - mx) + (_2bx *quaternion.q2 + _2bz * quaternion.q0) * (_2bx * (q1q2 - q0q3) + _2bz * (q0q1 + q2q3) - my) + (_2bx * quaternion.q3 - _4bz * quaternion.q1) * (_2bx * (q0q2 + q1q3) + _2bz * (0.5f - q1q1 - q2q2) - mz);
		s2 = -_2q0 * (2.0f * q1q3 - _2q0q2 - ax) + _2q3 * (2.0f * q0q1 + _2q2q3 - ay) - 4.0f *quaternion.q2 * (1 - 2.0f * q1q1 - 2.0f * q2q2 - az) + (-_4bx *quaternion.q2 - _2bz * quaternion.q0) * (_2bx * (0.5f - q2q2 - q3q3) + _2bz * (q1q3 - q0q2) - mx) + (_2bx * quaternion.q1 + _2bz * quaternion.q3) * (_2bx * (q1q2 - q0q3) + _2bz * (q0q1 + q2q3) - my) + (_2bx * quaternion.q0 - _4bz *quaternion.q2) * (_2bx * (q0q2 + q1q3) + _2bz * (0.5f - q1q1 - q2q2) - mz);
		s3 = _2q1 * (2.0f * q1q3 - _2q0q2 - ax) + _2q2 * (2.0f * q0q1 + _2q2q3 - ay) + (-_4bx *quaternion.q3 + _2bz * quaternion.q1) * (_2bx * (0.5f - q2q2 - q3q3) + _2bz * (q1q3 - q0q2) - mx) + (-_2bx * quaternion.q0 + _2bz *quaternion.q2) * (_2bx * (q1q2 - q0q3) + _2bz * (q0q1 + q2q3) - my) + _2bx * quaternion.q1 * (_2bx * (q0q2 + q1q3) + _2bz * (0.5f - q1q1 - q2q2) - mz);
		recipNorm = invSqrt2(s0 * s0 + s1 * s1 + s2 * s2 + s3 * s3); // normalise step magnitude
		s0 *= recipNorm;
		s1 *= recipNorm;
		s2 *= recipNorm;
		s3 *= recipNorm;
		
		// Apply feedback step
		qDot1 -= beta * s0;
		qDot2 -= beta * s1;
		qDot3 -= beta * s2;
		qDot4 -= beta * s3;
	}

	// Integrate rate of change of quaternion to yield quaternion
	quaternion.q0 += qDot1 * (1.0f / sampleFrequency);
	quaternion.q1 += qDot2 * (1.0f / sampleFrequency);
	quaternion.q2 += qDot3 * (1.0f / sampleFrequency);
	quaternion.q3 += qDot4 * (1.0f / sampleFrequency);

	// Normalise quaternion
	recipNorm = invSqrt2(quaternion.q0 * quaternion.q0 + quaternion.q1 * quaternion.q1 + quaternion.q2 * quaternion.q2 + quaternion.q3 * quaternion.q3);
	quaternion.q0 *= recipNorm;
	quaternion.q1 *= recipNorm;
	quaternion.q2 *= recipNorm;
	quaternion.q3 *= recipNorm;

	return quaternion;
}

To make it even weirder, replacing the delay line by:

if(isnan(quaternion.q0)
{
}

Will also result in the algorithm working without NAN values.

Some additional information about how the method is called:

The algorithm calculates an orientation that is returned as a quaternion (struct that consists of floats q0,q1,q2,q3)
This algorithm is called 100 times per second.
This is done via hardware timer 2 to generate an interrupt:

/*Interrupt service routines (ISR)*/
ISR(TIMER2_COMPB_vect)
{
	TCNT2 = 0;
	isr_timer_fired = true;
}

This interrupt sets a flag upon which a task will be executed:

	if(isr_timer_fired)
	{
		isr_timer_fired = false;
		Process100HzTask();
	}

This task then updates its orientation via the Madwick algotitm:

void Process100HzTask(void)
{
	{....} Filter incoming sensor data noise
	
	if(isnan(sensorData.compassX) || isnan(sensorData.compassY) || isnan(sensorData.compassZ) || isnan(sensorData.accelX) || isnan(sensorData.accelY) || isnan(sensorData.accelZ) || isnan(sensorData.gyroX)  || isnan(sensorData.gyroY) || isnan(sensorData.gyroZ))
	{
		Serial.println("found error\n\n\n\n\n\n");
	}

	orientationQuat = imu.MadgwickAHRSupdate(sensorData.gyroX,sensorData.gyroY,sensorData.gyroZ,
                                                                    -sensorData.accelX,-sensorData.accelY,sensorData.accelZ,
                                                                     sensorData.compassX,-sensorData.compassY,sensorData.compassZ);

	if(isnan(orientationQuat.q0) && isnan(orientationQuat.q1) && isnan(orientationQuat.q2) && isnan(orientationQuat.q3))
	{
		Serial.println("q0 t/m q3 are nan");
	}

       {...} stabilize quad copter
}

End part 1.

MadgwickAHRS.cpp (10.7 KB)

MadgwickAHRS.h (1.51 KB)

Part 2:

Summary:
"found error\n\n\n\n\n\n" is never printed
"q0 t/m q3 are nan" is printed default
"q0 t/m q3 are nan" is not printed when debug code (i.e. "isnan" or "delay") is added to the MadgwickAHRSupdate() method

What I have tried:
Checked the usage of SRAM, it was not even close to full.
Molded the IMU algotithm in a class with private variables instead of the standard volatile public floats and publicly accessible methods.

typedef struct Quaternion
{
	float q0;
	float q1;
	float q2;
	float q3;
} Quaternion;

class IMU_MADGWICK
{
private:
	float beta;				// algorithm gain
	float sampleFrequency;
	Quaternion quaternion;	// quaternion of sensor frame relative to auxiliary frame
	float recipNorm;
	float s0, s1, s2, s3;
	float qDot1, qDot2, qDot3, qDot4;
	float hx, hy;
	float _2q0mx, _2q0my, _2q0mz, _2q1mx, _2bx, _2bz, _4bx, _4bz, _2q0, _2q1, _2q2, _2q3, _2q0q2, _2q2q3, q0q0, q0q1, q0q2, q0q3, q1q1, q1q2, q1q3, q2q2, q2q3, q3q3;

public:
	void init(float,float);
	Quaternion MadgwickAHRSupdate(float, float, float, float, float, float, float, float, float);
	Quaternion MadgwickAHRSupdateIMU(float, float, float, float, float, float);
	float invSqrt2(float);

};

If I left aspects of my question unclear please let me now, I tried to describe the problem as best as I can.

Hoping someone will point out a silly mistake,

Ron

The fact that adding the call to delayMicroseconds() makes a difference suggests either
an uninitialized float variable or a compiler bug. I'd first check to see if any of the routes
though that routine can use an uninitialized float (which could easily cause a NaN).

I haven't followed through all the code, and without having the actual sketch as well I'm not inclined to try, but I wonder whether you may be running into a memory exhaustion or memory corruption problem.

Is it practical for you to produce a minimal sketch that produces the same symptoms without requiring any external devices to be connected - I mean by hardcoding some sample data values instead of receiving them from the device? If this is related to memory corruption/exhaustion then the behaviour may vary in obscure ways due to seemingly unrelated changes in the code, in which case it may not be feasible to reproduce in a minimal sketch, but just knowing that would be a useful clue.

Thanks for the quick replies, I have tried your suggestions:

MarkT:
The fact that adding the call to delayMicroseconds() makes a difference suggests either
an uninitialized float variable or a compiler bug. I'd first check to see if any of the routes
though that routine can use an uninitialized float (which could easily cause a NaN).

I have added all used variables in the init() function of the IMU. They are all initialized at "0.0f" now.
Unfortunately this did not solve the problem. The function uses no external variables so I think we can rule out uninitialized variables.

PeterH:
I haven't followed through all the code, and without having the actual sketch as well I'm not inclined to try, but I wonder whether you may be running into a memory exhaustion or memory corruption problem.

Is it practical for you to produce a minimal sketch that produces the same symptoms without requiring any external devices to be connected - I mean by hardcoding some sample data values instead of receiving them from the device? If this is related to memory corruption/exhaustion then the behavior may vary in obscure ways due to seemingly unrelated changes in the code, in which case it may not be feasible to reproduce in a minimal sketch, but just knowing that would be a useful clue.

I hadn't tried a minimal sketch yet and I agree with your suspicion that it is a memory bug/corruption.
So, I made a minimal sketch (see link at bottom, it was to large for standard upload), I entered some hard-coded data for the IMU and everything worked fine.
I was not able to reproduce the problem/symptoms.

So I suspect it is a memory problem, do you also agree or am I jumping to conclusions?
And how do I solve that problem? Where to start looking?

P.S. The sketch is made with the Visual Micro plugin for Microsoft Visual Studio, I don't know if this gives issues with the default Arduino IDE.

https://dl.dropboxusercontent.com/u/6492315/Nan_problem.zip

Try adding the F() macro to your Serial prints to reduce your RAM usage.

just for sake of readability, replace

		s0 = -_2q2 * (2.0f * q1q3 - _2q0q2 - ax) + _2q1 * (2.0f * q0q1 + _2q2q3 - ay) - _2bz * quaternion.q2 * (_2bx * (0.5f - q2q2 - q3q3) + _2bz * (q1q3 - q0q2) - mx) + (-_2bx * quaternion.q3 + _2bz * quaternion.q1) * (_2bx * (q1q2 - q0q3) + _2bz * (q0q1 + q2q3) - my) + _2bx * quaternion.q2 * (_2bx * (q0q2 + q1q3) + _2bz * (0.5f - q1q1 - q2q2) - mz);
		s1 = _2q3 * (2.0f * q1q3 - _2q0q2 - ax) + _2q0 * (2.0f * q0q1 + _2q2q3 - ay) - 4.0f * quaternion.q1 * (1 - 2.0f * q1q1 - 2.0f * q2q2 - az) + _2bz * quaternion.q3 * (_2bx * (0.5f - q2q2 - q3q3) + _2bz * (q1q3 - q0q2) - mx) + (_2bx *quaternion.q2 + _2bz * quaternion.q0) * (_2bx * (q1q2 - q0q3) + _2bz * (q0q1 + q2q3) - my) + (_2bx * quaternion.q3 - _4bz * quaternion.q1) * (_2bx * (q0q2 + q1q3) + _2bz * (0.5f - q1q1 - q2q2) - mz);
		s2 = -_2q0 * (2.0f * q1q3 - _2q0q2 - ax) + _2q3 * (2.0f * q0q1 + _2q2q3 - ay) - 4.0f *quaternion.q2 * (1 - 2.0f * q1q1 - 2.0f * q2q2 - az) + (-_4bx *quaternion.q2 - _2bz * quaternion.q0) * (_2bx * (0.5f - q2q2 - q3q3) + _2bz * (q1q3 - q0q2) - mx) + (_2bx * quaternion.q1 + _2bz * quaternion.q3) * (_2bx * (q1q2 - q0q3) + _2bz * (q0q1 + q2q3) - my) + (_2bx * quaternion.q0 - _4bz *quaternion.q2) * (_2bx * (q0q2 + q1q3) + _2bz * (0.5f - q1q1 - q2q2) - mz);
		s3 = _2q1 * (2.0f * q1q3 - _2q0q2 - ax) + _2q2 * (2.0f * q0q1 + _2q2q3 - ay) + (-_4bx *quaternion.q3 + _2bz * quaternion.q1) * (_2bx * (0.5f - q2q2 - q3q3) + _2bz * (q1q3 - q0q2) - mx) + (-_2bx * quaternion.q0 + _2bz *quaternion.q2) * (_2bx * (q1q2 - q0q3) + _2bz * (q0q1 + q2q3) - my) + _2bx * quaternion.q1 * (_2bx * (q0q2 + q1q3) + _2bz * (0.5f - q1q1 - q2q2) - mz);

with code that has extracted common parts like this - 2 minutes in an editor -

float c1 = (2.0f * q1q3 - _2q0q2 - ax);
float c2 = (2.0f * q0q1 + _2q2q3 - ay);
float c3 = (_2bx * (0.5f - q2q2 - q3q3) + _2bz * (q1q3 - q0q2) - mx);
float c4 = (_2bx * (q1q2 - q0q3) + _2bz * (q0q1 + q2q3) - my);
float c5 = (_2bx * (q0q2 + q1q3) + _2bz * (0.5f - q1q1 - q2q2) - mz);
float c6 = (1 - 2.0f * q1q1 - 2.0f * q2q2 - az);

s0 = -_2q2 * c1 + _2q1 * c2 - _2bz * quaternion.q2 * c3 + (-_2bx * quaternion.q3 + _2bz * quaternion.q1) * c4 + _2bx * quaternion.q2 * c5;
s1 = _2q3 * c1 + _2q0 * c2 - 4.0f * quaternion.q1 * c6 + _2bz * quaternion.q3 * c3 + (_2bx *quaternion.q2 + _2bz * quaternion.q0) * c4 + (_2bx * quaternion.q3 - _4bz * quaternion.q1) * c5;
s2 = -_2q0 * c1 + _2q3 * c2 - 4.0f *quaternion.q2 * c6 + (-_4bx *quaternion.q2 - _2bz * quaternion.q0) * c3 + (_2bx * quaternion.q1 + _2bz * quaternion.q3) * c4 + (_2bx * quaternion.q0 - _4bz *quaternion.q2) * c5;
s3 = _2q1 * c1 + _2q2 * c2 + (-_4bx *quaternion.q3 + _2bz * quaternion.q1) * c3 + (-_2bx * quaternion.q0 + _2bz *quaternion.q2) * c4 + _2bx * quaternion.q1 * c5;

This should speed up the floating point math as these parts are only calculates once.
If you want to do this 100x per second you should keep an eye open for (obvious) optimizations.

further more it allows you to debug these quite complex formulas a bit easier,

my 2 cents,

AWOL:
Try adding the F() macro to your Serial prints to reduce your RAM usage.

I have checked my SRAM usage and it reported that 5576 bytes were still available.
For this I used code from Arduino Playground - HomePage

int freeRam () {
  extern int __heap_start, *__brkval; 
  int v; 
  return (int) &v - (__brkval == 0 ? (int) &__heap_start : (int) __brkval); 
}

So I don´t believe RAM usage is the issue here. And I don´t print large strings that are known compile-time so I doubt using the F() macro will make a dent in the RAM usage.
But thanks for the tip, maybe I will still use it for optimizations.

robtillaart:
just for sake of readability, replace {...}

This should speed up the floating point math as these parts are only calculates once.
If you want to do this 100x per second you should keep an eye open for (obvious) optimizations.

further more it allows you to debug these quite complex formulas a bit easier,

my 2 cents,

Thanks for that free speed increase Rob, I did not write the algorithm myself so I assumed it would be optimized. Clearly I was wrong.

Unfortunately I still have no clue how to solve my current problem....

Try the optimization anyway, perhaps the compiler got confused with the large expressions there.

If you put another statement iso the delay or the isnan() call e.g. a digitalRead() does it work then too?

A call to a different 'context' (e.g. function / group of vars) is an signal to the compiler to stop adding more lines of code in the block it is trying to optimize.

as far as I checked and understood the code I see nothing that explains the effect you see.

You are running MEGA or DUE?

Are you reading analog input? If so, it sometimes needs a delay, or two reads to get the correct value.
just my 2cents. Jack

It seems you have ample RAM and I don't see any pointers or arrays, which would be the most obvious opportunity for memory corruption.

Just for the sake of argument, could you try explicitly initialising every automatic local variable at the point it is defined? If you have any variables which are used before being assigned then their initial values would depend on whatever was left in that memory location, which could be affected by seemingly unrelated code changes.

some post back you stated

The algorithm calculates an orientation that is returned as a quaternion (struct that consists of floats q0,q1,q2,q3)
This algorithm is called 100 times per second.
This is done via hardware timer 2 to generate an interrupt:

/*Interrupt service routines (ISR)*/

ISR(TIMER2_COMPB_vect)
{
TCNT2 = 0;
isr_timer_fired = true;
}




This interrupt sets a flag upon which a task will be executed:


if(isr_timer_fired)
{
isr_timer_fired = false;
Process100HzTask();
}

assuming the interrupt (upper code) is called 100x per second that means that the flag isr_timer_fired is set 100x per second.
This does not imply that the second snippet of code is guaranteed to be checked that often. If it is in a tight loop it will checked as much as possible.
Being called at 100Hz means you have in theory 10 milliseconds to execute, in practice much less depending on the rest of the code.
Seeing print statements like Serial.println("q0 t/m q3 are nan"); in the Process100HzTask() that take approx 2-3 millis there is little time to do something meaningfull.

other optimization opportunity I saw: (float division is quite expensive)

	quaternion.q0 += qDot1 * (1.0f / sampleFrequency);
	quaternion.q1 += qDot2 * (1.0f / sampleFrequency);
	quaternion.q2 += qDot3 * (1.0f / sampleFrequency);
	quaternion.q3 += qDot4 * (1.0f / sampleFrequency);

furthermore I see 2 normalize steps - InvSqrt2() - which might be merged reducing the floating point operations a bit further. (depends if the intermediate results are used or not)

Can you make a measurement to see how long the 100HzTask() takes ?

@Nick Gammon:
I have implemented the optimization suggested by robtillaart. It worked but didn't fix the problem.

@jackwp
I do not read analog values. Only I2C.

@PeterH
I have added the following code at the top of the method:

float recipNorm = 0.0f;

	float s0 = 0.0f;
	float s1 = 0.0f;
	float s2 = 0.0f;
	float s3 = 0.0f;

	float qDot1 = 0.0f;
	float qDot2 = 0.0f;
	float qDot3 = 0.0f;
	float qDot4 = 0.0f;

	float hx = 0.0f;
	float hy = 0.0f;

	float _2q0mx = 0.0f;
	float _2q0my = 0.0f;
	float _2q0mz = 0.0f;
	float _2q1mx = 0.0f;
	float _2bx = 0.0f;
	float _2bz = 0.0f;
	float _4bx = 0.0f;
	float _4bz = 0.0f;
	float _2q0 = 0.0f;
	float _2q1 = 0.0f;
	float _2q2 = 0.0f;
	float _2q3 = 0.0f;
	float _2q0q2 = 0.0f;
	float _2q2q3 = 0.0f;
	float q0q0 = 0.0f;
	float q0q1 = 0.0f;
	float q0q2 = 0.0f;
	float q0q3 = 0.0f;
	float q1q1 = 0.0f;
	float q1q2 = 0.0f;
	float q1q3 = 0.0f;
	float q2q2 = 0.0f;
	float q2q3 = 0.0f;
	float q3q3 = 0.0f;

This didn't fix the problem unfortunately.

@robtillaart

robtillaart:
You are running MEGA or DUE?

I am using a MEGA.

robtillaart:
assuming the interrupt (upper code) is called 100x per second that means that the flag isr_timer_fired is set 100x per second.
This does not imply that the second snippet of code is guaranteed to be checked that often. If it is in a tight loop it will checked as much as possible.
Being called at 100Hz means you have in theory 10 milliseconds to execute, in practice much less depending on the rest of the code.
Seeing print statements like Serial.println("q0 t/m q3 are nan"); in the Process100HzTask() that take approx 2-3 millis there is little time to do something meaningfull.

I am aware of the fact that the timing is not absolutely guaranteed. I verify this once in a while with micros() measurements.
Also I do not use Serial.println() in my 100Hz task when I use the software to fly the quadcopter. It is there now purely for debugging purposes. (prints take a long (variable) time I know)

I did a measurement of the time it takes for the 100 Hz task to execute. Mind you this task does more then only calculate its orientation (the method where the problem exists).
The execution time measured by micros() was around 7500 usec, which is 7.5 milli seconds.
I consider this to be the maximum acceptable time.

robtillaart:
If you put another statement iso the delay or the isnan() call e.g. a digitalRead() does it work then too?

Yes the problem disappeared when I added digitalRead().

Does this suggest that the compiler is optimizing something it shouldn't?

If you put another statement iso the delay or the isnan() call e.g. a digitalRead() does it work then too?
Yes the problem disappeared when I added digitalRead().

Does this suggest that the compiler is optimizing something it shouldn't?

No, it only suggests that the code is somehow critical, and 'breaking' it's flow seems to circumvent the root cause.
Three completely different ways of breaking the flow have the same effect makes the problem at least repeatable and independent of the 'inserted code'.

possible causes:

  • compiler optimizations ==> Can you post the output of the compiler ? (change build.verbose=true in the preferences.txt )

  • stack/heap growth/overflow corrupts local variables ==> Can you post the complete code as there may be other clues in it?

  • timing ==> how many interrupt related code exists?

  • variables overwritten e.g. due faulty pointer addressing - do you use pointer math?

  • did you use the last version of the libraries? ==> are there release notes or blogs whatever?

As said before, try to minimize the sketch that still shows this behaviour as that makes it easier to debug (especially remotely :wink:

You posted the .cpp and .h code for the class where the problem manifests but I don't think you ever posted your sketch. Can you post it?

robtillaart:

  • compiler optimizations ==> Can you post the output of the compiler ? (change build.verbose=true in the preferences.txt )

I use Visual Micro, a plugin for Microsoft Visual Studio 2010 to build and upload my Arduino code. I don't know how I can set compiler flags. I have searched briefly for this with no results so far.

robtillaart:

  • stack/heap growth/overflow corrupts local variables ==> Can you post the complete code as there may be other clues in it?

The entire sketch can be found here:
https://dl.dropboxusercontent.com/u/6492315/FlightControllerForumUpload.zip

robtillaart:

  • timing ==> how many interrupt related code exists?

Only the timer interrupt that I showed in my opening post.
And perhaps the default Arduino "Wire" and "Servo" uses some interrupt based code.

robtillaart:

  • variables overwritten e.g. due faulty pointer addressing - do you use pointer math?

I used pointers on a few locations, but nothing complicated I think. I don't know were to start looking.

robtillaart:

  • did you use the last version of the libraries? ==> are there release notes or blogs whatever?

The included libraries from the arduino library "arduino-1.0.1" I use the standard Servo and Wire library.

Edit:
The sketch/project was made in Visual Studio 2010 with the Visual Micro plugin

There's a lot of code there and I have only skimmed the main sketch file so far.

I don't understand why you're allocating inputString on the heap instead of just declaring it as a global/static char array.

You serial buffering does not seem very defensive - there is no protection against overrunning the buffer either when you append input chars or when you clear the buffer subsequently. If you handle the input correctly you don't need to zero the buffer, by the way.

Since you don't explicitly zero inputString before receiving the first message, and don't explicitly terminate the received string as you buffer it, the first message you parse will not be terminated. Fortunately you use atoi(), which can be expected to stop processing the buffer when it reaches the first non-numeric character, so it might not cause you any problem, but this is still wrong. I suggest declaring inputString as a global/static array. Make sure the size of inputString is one greater than charBufferSize so that charBufferSize is the number of chars that the buffer can hold and the buffer has space for the trailing null terminator. After reading inChar test that serialCnt is less than charBufferSize before appending the char to the inputString. Immediately after appending a char to inputString, write a null to the following element.

char inChar = (char)Serial.read(); 

if(inChar == '\n')
{
    if(serialCnt > 0)
    {
        // process the received message
    }
    serialCnt = 0;
    inputString[serialCnt] = '\0';
}
else
{
    if(serialCnt < charBufferSize)
    {
        inputString[serialCnt++] = inChar;
        inputString[serialCnt] = '\0';
    }
}

Lose the call to Serial.flush().

I still don't see what the problem is. There is a lot of code I haven't looked at, and I already see use of malloc() and pointers, so there is potential for bugs leading to memory corruption, but I haven't seen any definite bugs yet. It also occurs to me that you have a lot of expensive calculations going on and I wonder whether there may be a race condition between completing these calculations and receiving further input from one of your sensors, or the next timed interrupt occurring, which is affecting the logic of the subsequent iteration.

@PeterH
I am sorry, I apparently left some old 'dead' code in the sketch.
The methods "int ParseMessage(void)" and "void ReadSerialChar(void)" are not used anymore.
These functions have been extended and moved to the "SerialHandler" class.
That being said you are right, the buffer is not programmed very defensively.
The new serial code works differently but still has no protection against buffer overrun.
And that malloc is gone now, it wasn't used either :S

I will incorporate your advise in a new version of my software but first I want to solve this weird Nan bug.
I have tried to isolate the problem.
I disabled the 50,25,10,1 Hz tasks. -> problem was still there
I commented the fillSensorData() method -> problem was gone
I commented various parts from the fillSensorData() method -> the problem was gone, was there and sometimes or the problem only occurred after a few dozen calculations/iterations....
I really could not find a pattern.

Maybe you the bug is somewhere in here? (I have added some explicit float initialization)

void fillSensorData(SensorData* sd,Sensors s)
{
	float ax,ay,az;
	float gx,gy,gz;
	float cx,cy,cz;
	ax = 0.0f;
	ay = 0.0f;
	az = 0.0f;
	gx = 0.0f;
	gy = 0.0f;
	gz = 0.0f;
	cx = 0.0f;
	cy = 0.0f;
	cz = 0.0f;

	s.compass.getXYZ(&cx,&cy,&cz);
	s.accel.getXYZ_ms2(&ax,&ay,&az);
	s.gyro.getXYZ_degPerSec(&gx,&gy,&gz);
	gx *= (M_PI/180.0);
	gy *= (M_PI/180.0);
	gz *= (M_PI/180.0);
	sd->accelX = ax;
	sd->accelY = ay;
	sd->accelZ = -az;
	sd->gyroX = -gx;
	sd->gyroY = -gy;
	sd->gyroZ = gz;
	sd->compassX = cx;
	sd->compassY = cy;
	sd->compassZ = cz;
	batterySum += analogRead(A0);
	batterySum -= batterySum/batteryFilterLength;
	sd->battVoltage = (float)(batteryScaleFactor * (float) (batterySum/batteryFilterLength));

}

what is the value of batteryFilterLength ?

If it is zero you get a division by zero which may cause NaN... (I assume it is > 0)