Strange result when repeat writing to port wit/without NOP

In a loop I write to PortD to make two pulses via a R2R-ladder.

Kode:
#define NOP asm volatile ("nop\n\t")

1 Table720Ptr= 4 * DAC;
2 for(byte ii=0; ii<26; ii++)
3 {
4 PIOD->PIO_ODSR = Sin2table[Table720Ptr][ii];
5 // NOP; // delay 133ns in this loop <============
6 }
7 for(byte ii=0; ii<50; ii++)
8 {
9 PIOD->PIO_ODSR = 0; // 45nS
10 }
11 for(byte ii=0; ii<26; ii++)
12 {
13 PIOD->PIO_ODSR = Sin2table[Table720Ptr][ii];
14 NOP; // delay 133ns in this loop
15 }

When line 5 is commented out, I get the pulses seen in "WithNOP.jpd", else I get the pulses in "WithoutNOP.jpg".
Why has the pulses in "WithNOP.jpd" different width?
Can anyone tell mee, why the timing is slower without the NOP?
Is the compiler making something special with the code?
Can I make the loops in assembler?