High Speed ADC

Has anybody had any success in pushing the ADC on a Atmega32U4 chip to faster conversions?

I know the data sheet says 50Khz to 200Khz for the ADC clock but various people have successfully pushed a standard Uno up to 1Mhz with a prescaler of 16.

So far I've got my ATmega32U4 up to 500K with a prescaler of 32 and giving reasonable results occasionally losing the LSB, but when I try to push to 1Mhz all the ADC returns is 1023 (dec) on every input. I've tried setting the DIDRx registers and setting the ADHSM bit as below.

// Turn off digital inputs on Analog lines & Turn on High speed ADC DIDR0 |= 0xF3; //bits 2 & 3 are not used DIDR1 |= 0x1E; //bits 0 & 5 used by digital lines for ethernet shield bitSet(ADCSRB,ADHSM); // Set up ADC to 1MHz Clock so ADC samples take 13? milliseconds // This reduces dead time when processor cant do anything else bitClear(ADCSRA,ADPS0); //remove standard arduino libary prescaler bits bitClear(ADCSRA,ADPS1); //& set prescaler to 16 bitSet(ADCSRA,ADPS2);

The External Aref has a 0.1uF decoupling capacitor as do all the analog lines which are driven from a nominal 1Kohm impedance slowly changing source. So noise should not be a problem.

Are there any other suggestions out there? Ultimately I want to 'Oversample' differential inputs but I'm stumped at the moment.