If the Arduino is a 16Mhz chip, wouldn't that means 16,000,000 cycles / 100 microseconds (100 cycles) for the analogRead(), thus giving 160,000 reads a second and not 10,000 reads a second? I'm missing something big, but I'm not sure what it is!

The Atmega datasheet also cautions against switching analog pins in close temporal proximity to making A/D readings (analogRead) on other analog pins. This can cause electrical noise and introduce jitter in the analog system. It may be desirable, after manipulating analog pins (in digital mode), to add a short delay before using analogRead() to read other analog pins.

What does short delay mean? 1 microsecond, 10 microseconds, 10 milliseconds, 100 milliseconds?

Let's say the delay is 100 milliseconds, a pin takes 100 microseconds to read, and we can read 1 pin at 10,000 times a second, how do we calculate how much time it would take to read all 6 analog inputs? Using my above formula, I would have guessed 16,000,000 / 600 (100 microseconds for each pin) = 26,666 reads a second. Obviously, that's wrong!

Now, let's say I use the CD74HC4067 to add more analog inputs. This chip provides 16 inputs for 1 input on the Arduino. So, the total analog inputs could be 96. How much slower would this be? Would it slow down linearly from one CD74HC4067 to two, to three, etc... Or would the slow down be exponential?

Finally, let's say I multiplex 3 inputs for a total of 48 analog inputs, what type of sampling rate can I expect? What about multiplexing to the full 96 analog inputs?

wouldn't that means 16,000,000 cycles / 100 microseconds (100 cycles) for the analogRead(),

Obviously this doesn't work since the manual says there's a maximum of 10,000 readings per second and not 160,000... Why?

No, still don't understand your arithmetic.

1 / 100useconds = 10 000.

The processor clock speed doesn't enter the calculation like that.

The ADC is a successive approximation device, and each iteration is clocked by a divided-down processor clock.

The ADC takes 13 ADC clocks to convert. The standard prescaler divides the system clock (16MHz) by 128 to give 125kHz ADC clock. Thus conversion time is 13 x 8us, or 104uS.

For 10 bit resolution it is recommended that the ADC clock be set between 50kHz and 200kHz. Faster clocking means more noise / less accuracy.

The prescaler can be set to any power-of-two factor from 2 to 128. The ADC clock should not exceed 1MHz. Setting the ADC prescale to divide-by-16 would give a conversion time of 13us, but perhaps only 6 bits of accuracy, and a low impedance source (1k or so) might be needed

It takes 100uS to do an A/D conversion. This figure already takes into account the running speed of the processor so it is a mistake to include it in any calculations again.

100 microseconds (100 cycles) for the analogRead(),

No 100uS is only 100 cycles if the clock speed is 1MHz, at 16MHz clock rate will get you a 0.0625 uS cycle time so it will take 1600 clock cycles to take a reading.

I Think what is missing here is the Time / Frequency relationship:

T = 1/F and F = 1/T ( TSECONDS ) ( FHZ )

Each CPU Clock = 62.5nS 1/.0625uS = 16 MHz

Doc

While I realize that this isn’t the question asked, bear with me and I will tie all this together…