I guess I'd like to have a better idea of what the needs are in the first place.
How much data has to be pushed? That in turn will determine how many cycles each Arduino will spend transmitting and receiving the stuff. 1Mbit/s over serial is easy to implement natively. That's a lot of data. Allegedly. I
2C gets up to 100kB/s in fast mode. Sooner than later, you'd run out of RAM at these speeds... so a 'bigger' processor may be needed not because of the bus speeds but because the Arduinos are presently not blessed with a lot of RAM.
Very powerful alternatives that are attempting to marry a 32-bit microprocessor with shield compatibility and fairly successful IDE compatibility include
the Maple system from LeafLabs or the
Digilent Chipkit versions out there. Neither is a drop-in replacement, Arduino code may not work as well as intended but these are powerful chips...