You may be using the wrong microcontroller for the task.
My suggestion would be to graduate to a Maple, ChipKit, or similar board that offers oodles of RAM, high clock speeds, etc. at a similar hardware price and with pretty good IDEs as well. That should make sampling the amount of data you're envisioning a lot simpler than trying to make a Arduino do what it likely cannot (at least with the constraints you're envisioning).
But that's just me. FWIW, I recall reading that the smallest time slice a Arduino can offer is on the order of 65ns at 16MHz. That's for a single instruction. So, short, bursty signals may be better sampled on a higher-speed (i.e. SPI based) ADC that offers speed and resolution.