Maybe this was not really clear: the filtering code was written by scjurgen, I just reused it.
I wrote a small c program to do some tests and to understand the calculations better.
Data multiplied by 0.08 is always less than 82, but the "long tmp" value in binary representation is much bigger (18 up to 30 bits in my tests).
I have split up the calculations in four parts and I print the intermediate results in decimal and binary.
The program prints the tmp value for three possible incoming samples: 1, 512 and 1023 (using 0,0,0 or 1023,1023,1023 as the 'history')
tmp1 = (data * 662828L);
--> print tmp1: tmp1: 662828
tmp2 = ((data * 662828L) >> 4);
--> print tmp2...
tmp3 = ((((data * 662828L) >> 4) + ((_v * -540791L) >> 1) + (_v * 628977L))+262144);
--> print tmp3...
tmp4 = ((((data * 662828L) >> 4) + ((_v * -540791L) >> 1) + (_v * 628977L))+262144) >> 19;
--> print tmp4...
Total output: http://arduino-signal-filtering-library.googlecode.com/hg-history/2770a5e0dae02e0ef56ace52d35fc6c350648b25/TestCode/FilterShiftingTest/debug.txt
The code: (+compiled linux and win32 binary)http://code.google.com/p/arduino-signal-filtering-library/source/browse/TestCode#TestCode/FilterShiftingTest