Assuming the 8-bit (256 counts) will give me 5V/256 = 19mV resolution,

Not quite, 128 counts only, as signal AC you have to offset it , to pass both negative and positive half-wave. Problem better to consider as dynamic range limits in dB, not millivolts. Fix_fft outputs 64 counts max value, it's how algorithm works to prevent overflow, limiting dynamics to 36 dB , which is a too low for musical performance.

Actually, a 9kHz sample rate may be fine. That gives me bins to around 4kHz (9k/2), right?

No. I'll try explain better this time:

1 . Sampling rate define upper freq. range by Nyquist theorem: Fmax = Fsamp / 2.

Sampling with 9 kHz you can get 4.5 kHz as upper Fmax.

2. Bins width or spectral resolution depends on how long you do a sampling. Fmin = 1 / Tsampl.

If you taking samples 1 second, you getting 1 Hz resolution and each bin width.

But there is a problem, reading with 9 kHz gives 9 kB array of data. There is no memory and no CPU to calculate big volume of data. So in your situation, you should decrease sampling rate in order to limit size of data to less than 512 bytes on Uno(2K).