I believe "nop" stands for "no operation" and it basically means that the CPU will do nothing while that instruction is executed. The "nop" instruction takes one cycle to execute (62.5 ns on a 16 MHz Arduino), so it is often used to achieve very short delays. For example, a device you are interfacing with might require a 500 ns delay between when you, say, set its enable line and check the data line. You could accomplish this by a string of 10 nop's (or you could just call delayMicrosecond() if you are ok with waiting a little longer than is strictly required).
Starting on page 347 of the mega168 datasheet you can find a list of all of the assemly instructions it recognizes. This list tells you how many cycles each instruction requires. You can load a port to a register in one cycle using the "in" command and write a port from a register using the "out" command, but this only applies to the first 64 I/O memory addresses. If your I/O register is outside this address space, you must load it with an LD instruction and store it with an ST instruction, both of which take two cycles. The data sheet contains a list of all of the I/O registers and their addresses.
The first 32 I/O registers are directly bit-accessable using the SBI (set bit), CBI (clear bit), SBIS (skip next instruction if bit is set), and SBIC (skip next instruction if bit is clear). SBI and CBI take two cycles. SBIS and SBIC can take from 1 to 3 cycles depending on how it branches.