Timer in CTC mode to generate 1MHz pulse

Your problem is that interrupt service routines take a finite time to execute, maybe 2.5 uS in overhead, plus your digitalWrite, etc.

So you can't execute an ISR every 3 or 4 uS and hope to get a 1 uS pulse out of it.

However this sketch (which uses the hardware timer output) reliably outputs 1 MHz on pin 9:

#define myOutputPin 9

void setup ()
{
  pinMode (myOutputPin, OUTPUT); 
  TCCR1A = 0;
  TCCR1B = 0;
  TCNT1  = 0;
  OCR1A = 7;   // toggle after counting to 8
  TCCR1A |= (1 << COM1A0);   // Toggle OC1A on Compare Match.
  TCCR1B |= (1 << WGM12);    // CTC mode
  TCCR1B |= (1 << CS10);     // clock on, no pre-scaler
}
void loop () { }

Note that OCR1A is 7, not 16. For one thing it is zero-relative (so you should have used 15 not 16) and it takes two toggles per cycle, so we really want to toggle every 8 clock cycles of the processor (ie. every 500 nS).