Missing pulse detector misses pulse!

You probably don't need to use assembler to get the performance you need. You can reduce a lot of unnecessary cycles by making your math routines more efficient. Avoiding the floating point calculation should bring significant performance improvements, try:

unsigned long new1L = new1 * 10;
if ( new1L > (29 * old) && new1L < (31L * old))
instead of
if (new1 > 2.9old && new1 < 3.1old) //if within 10% of 3xoldgap we must be at large gap

and
if (newtime > (savedtime +( savedtime / 2)) {
instead of
if (newtime > 1.5*savedtime) {

also, not sure how long your loop takes to repeat but you may want to increase the baud rate so that you don't overflow the hardware serial port buffer.