filter library

You test just confirmed what I'm trying to explain. Tmp4 is a value that goes to history, and because tmp3 shifted right 19 bits, tmp4 is always less than 10-bit. So, the question is why someone would shift 19-bit right, if there is a space for 16-bit variable? Why not shift 13-bit right,
and get million times more accurate value? You can downscale it whenever you gonna use it again, if it's too big. This about integer math, not the filter code.
In integer, IMHO, you always trying to keep value as big as possible, barely avoiding overflow .
Simply, because if it casted/shifted, data / precision lost forever. In this particular case,
the best results you would get, if all variables in the formula about size of 29-bit, so sum of 3 not overflow long-32.
You doing it about right, 10-bit Data multiply by 20-bit coefficients:

tmp = (A(20-bit) x Data(10-bit) + B(20-bit) x Hist1(10-bit) + C(20-bit) x Hist2(10-bit))>>19.

A - 662828 ( 1010 0001 1101 0010 1100 = 20 bit)
B - -540791 ( 1000 0100 0000 0111 0111 = 20 bit)
and C - 628977 ( 1001 1001 1000 1111 0001 = 20 bit)

, only it'd be more accurate, if :

tmp = (A(20-bit) x Data(10-bit) + B(14-bit) x Hist1(16-bit) + C(14-bit) x Hist2(16-bit))>>13.
Wouldn't it?