You have to realise that "rail-to-rail" outputs don't go fully to the rails if you actually load them - the current from them affects how close they can drive to the rail. Also I believe the bandwidth/slew-rate is greatly reduced when driving close to the rails...
Some are very good (use both FET and BJT in output stage IIRC), but in general you have to be wary of this limitation. The LM258 family seem pretty good if lightly loaded if you look at the graphs.