how to generate less than microsecond delay?

Can some one let me know how to generate pulses of less than 1 micro second ?

I believe this generates a delay of 62.5 ns. But it is in assembly code. Can I include this statement directly in my Ccode? is it possible?
asm("nop\n\t");

yes and yes

Check the microcontroller data sheet. For the Arduino Uno with the ATmega328, www.atmel.com/Images/doc8161.pdf#page=429 the PDF says the NOP instructions takes 1 clock cycle.

For a 16 MHz clock, 1 NOP takes 1/16MHz to complete, (1/(16*10^6) - Google Search)+seconds+to+nanoseconds

Nothing discussed actually creates a pulse tho.
To do that, you need to make an output pin go Hi & Lo. (or Lo and back Hi)

Usual fast way is Direct Port Manipulation (vs digitalWrite(pinX, HIGH); and digitalWrite(pinX, LOW); because the IDE adds some smarts to the writing to protect the user:

after setting the bits as an output with pinMode statements in setup.

PORTC = PORTC | B00000001; // sets bit 0, leave rest alone
// add your NOPs here if needed
PORTC = PORTC & B11111110; // clears bit 0, leave rest alone

I usually use it like this to load up shift registers, way quicker than shiftout ( )

PORTC = PORTC & B11111110; // clears bit 0, leave rest alone -- latchClock low
SPI.transfer(upperDataByte); // send byte to shift register
SPI.transfer(lowerDataByte); // send byte to shift register
PORTC = PORTC  |  B00000001; // sets bit 0, leave rest alone -- "latchClock High