When they say to "integrate the rate", that is just a fancy way of saying "add up all the changes". Let's say you know your starting angle is S. If after M microseconds you check your gyro and it says you are changing at a rate of R degrees/second then you update S by adding R times M/1000000. (that constant is converting the microseconds to seconds to match the units of R.) Then you keep doing this, hopefully fast enough to catch all the little bumps and valleys in your R.

As you say, S will slowly drift off and your robot will fall over. You can probably use your accelerometers to also compute an angle in certain circumstances and use that to correct S when possible.

Floating point math and trig take a large portion of the available memory. Fortunately, you don't really want to know the tangent or arctangent. You want to know what the component of a vector in the direction of another vector is. You want to learn about "dot products" which just use a little multiplication and addition and work very nicely with fixed point numbers. I point you to a quick tutorial on 3D math (but maybe follow the 2d link at the top to get started)

http://www.geocities.com/SiliconValley/2151/math3d.html and I haven't been through it, but

http://chortle.ccsu.edu/VectorLessons/vectorIndex.html looks very thorough and I plan to read it myself before I build a motor controller.

If you decide you must do trig... then make yourself a constant array of 256 bytes that are the first 1/4 cycle of a sine wave scaled from 0 to 255. With that and a little bit of flipping you can compute the sine and cosine. You can do linear interpolation for better resolution if needed and also scan through it using a binary search algorithm for your inverse sine and cosine functions. Somewhere in the forums I posted one, though none of the supporting functions, just the array of bytes.