why would you need all of that stuff? I looked up ethernet voltages and the thread I looked at said less than four volts...
It's not a matter of voltage. Besides, the signals are differential.
If you take a closer look at most ethernet device you have, something clearly stands out, due to its size: the transformers. The arduino ethernet shield uses a non-isolated approach: a resistor divider. This is very unusual actually, see W5100 datasheet for an example. This might lead to electrical problems because it forces non-isolation, and might kill your controller or switch.
Even if you could get rid of MAX, you have the ethernet PHY. This adaptor needs to actually transmit/receive bits over the medium:
http://en.wikipedia.org/wiki/Fast_Ethernet" With 100BASE-TX hardware, the raw bits (4 bits wide clocked at 25 MHz at the MII) go through 4B5B binary encoding to generate a series of 0 and 1 symbols clocked at 125 MHz symbol rate. The 4B5B encoding provides DC equalization and spectrum shaping (see the standard for details). Just as in the 100BASE-FX case, the bits are then transferred to the physical medium attachment layer using NRZI encoding. However, 100BASE-TX introduces an additional, medium dependent sublayer, which employs MLT-3 as a final encoding of the data stream before transmission, resulting in a maximum "fundamental frequency" of 31.25 MHz. The procedure is borrowed from the ANSI X3.263 FDDI specifications, with minor discrepancies."
Even at 10Mbps you'd run into trouble (even with simple manchester).
So no, no way you can avoid a PHY here.