Just responding as I'm the guy who did the speed testing on the W5100, W5200, and W5500. I've had some conversations with Wiznet about these devices too, so some not so clear things have been made obvious to me.
There are a couple of pretty significant differences between these devices that make me wonder why the W5100 is still the default standard for Arduino. The W5500 is the best chip to use today, and I'll try to explain why.
W5200 and W5500 both have 8 sockets, and double the amount of Rx & Tx RAM of the W5100. This is really useful, as it can save you losing frames as the Arduino responds quite slowly.
The W5200 and W5500 both have streaming capability on the SPI interface. It is implemented a little differently between the two, but essentially it is much much (much) faster to drive than the W5100, IF you have a streaming SPI interface. The Arduino code for the SPI interface library is (very) basic, and doesn't have any kind of interrupt driven ring buffer or otherwise. This makes me wonder how much difference would really result using Arduino libraries. However the fact that frame data can be read and written continually, without re-transmitting the three address bytes for each data byte still results in an 4x speed up over the W5100.
The W5200 and W5500 both have a much faster SPI interface than the W5100. The W5100 has only a 4 MHz SPI interface, which means that it can only run at half the maximum available on an Arduino board. You might be lucky and have one that exceeds specification. But, don't count on it. The W5200 and W5500 are made to run at over 30 MHz SPI in real life, no issue for AVR ATmega.
So much about speed. Between the W5200 and the W5500 there are some further significant differences.
The packet RAM on the W5500 has been made available as general storage for the host MCU. Both Tx and Rx RAM is available for use as required. This means that it is possible to augment the RAM on an Arduino Uno by 16kBytes (8kB Tx and 8kB Rx) which is 8x more than the ATmega328p has in total, and still maintain the same sized buffers available in the W5100, for example.
The Tx and Rx RAM is arranged in blocks associated with the socket, and the entire 16 bit address space is rolled out onto the configured RAM for each socket. This means that when writing or reading the W5500 Tx and Rx RAM the user doesn’t need to be concerned with masking the maximum physical RAM, and addressing roll-over is gracefully handled. This is unlike the W5100 and W5200, where RAM addressing would have to be masked against the configured physical RAM. If this sounds complicated, just check the datasheet where it is explained in a nice diagram.
And the final issue that I want to cover is that the W5500 finally resolves the long standing ARP errata that required holding the Subnet mask to 0.0.0.0, and storing it off device. This errata is present in the W5200 and W5100.