No optimization on some codelines

In order to finetune delay between pulses I use to store '0' in PORTD several times.

Table720Ptr++;
for(byte i=0; i<3; i++)
{
NB_pulspar(); //Send pulsepair
delayMicroseconds(10);
PIOD->PIO_ODSR = 0; // Write to make delay.
PIOD->PIO_ODSR = 0; // Write to make delay.
PIOD->PIO_ODSR = 0; // Write to make delay.
PIOD->PIO_ODSR = 0; // Write to make delay.
}
Table720Ptr++;

But it don't help to ad more lined with "PIOD->PIO_ODSR = 0;", so I need to tell the compiler not to make optimization on those lines!
Is it possible?
And how to to do it?

Kurt

KG_DK:
In order to finetune delay between pulses I use to store '0' in PORTD several times.

Table720Ptr++;
for(byte i=0; i<3; i++)
{
NB_pulspar(); //Send pulsepair
delayMicroseconds(10);
PIOD->PIO_ODSR = 0; // Write to make delay.
PIOD->PIO_ODSR = 0; // Write to make delay.
PIOD->PIO_ODSR = 0; // Write to make delay.
PIOD->PIO_ODSR = 0; // Write to make delay.
}
Table720Ptr++;

But it don't help to ad more lined with "PIOD->PIO_ODSR = 0;", so I need to tell the compiler not to make optimization on those lines!
Is it possible?
And how to to do it?

Kurt

The Due executes instructions at 84 million instructions per second so adding a couple of extra write calls is unlikely to make a difference that's easy to see. Register writes are never optimized - they're declared volatile and the compiler already won't try to optimize them out. But, I don't know of any reason why the write couldn't execute at 84MHz like all the other instructions. Of course you can't really set ports that rapidly but I think you can set the register that quickly - it just won't toggle the pin at that speed. 4 writes in a row is likely only 4 instruction times (depending on register speed) and that's 47 nanoseconds. That's a pretty small extra delay. With four instructions you're adding 1/210th of the "delayMicroseconds(10);" line. What happens if you do like 10 or 20 of those register writes? Or, do a loop of 100 and see what happens. I'm wondering if perhaps it is having an effect just not as much of one as you thought it would

At any rate, my answer is that register accesses aren't optimized so that shouldn't be your issue.

Seems like it works OK to me...

   8016e:       f47f affd       bne.w   8016c <L_36_delayMicroseconds>
   80172:       4b09            ldr     r3, [pc, #36]   ; (80198 <L_36_delayMicroseconds+0x2c>)
   80174:       2200            movs    r2, #0
   80176:       639a            str     r2, [r3, #56]   ; 0x38
   80178:       639a            str     r2, [r3, #56]   ; 0x38
   8017a:       639a            str     r2, [r3, #56]   ; 0x38
   8017c:       639a            str     r2, [r3, #56]   ; 0x38

Note that the 4 stores are going to take MUCH less time that your 10-microsecond delay.

Thanks for answer.

I need the tuning to make two pulses 3.5uS broad spaced 12uS +/- 0.1uS!
The single pulses is made by writing 15 points from a table and the spacing with 10uS delay + writing to PORTD to adjust the delay. This pulses is send 12 times spaced 30uS.

Then I will try to find the error another place.

Kurt

Whenever you need a small (a few ns or 1 or 2 us) delay, it is advisable to insert some NOPs.

A NOP is an assembler instruction which does...nothing but requires 1 clock cycle. However, an ARM uc inserts its own wait states here and there between instructions, therefore, if you add (e.g) 50 NOPs, you can be sure that the core will add a few wait states. AFAICT the duration of a wait state is equal the duration of a NOP.

If you clock your DUE at 84MHz, 1 NOP = 11.9 ns.

Once you have inserted some NOPs (and the core its own wait states), you can check and fine tune precisely the actual duration thanks to SysTick->VAL.

Here is an assembler macro I use to insert some NOPs:

__asm__ __volatile__(
  ".macro NOPX  P               \n\t"
  ".rept &P                     \n\t"   
  " NOP                         \n\t"
  ".endr                        \n\t"   // End of Repeat
  ".endm                        \n\t"   // End of macro
  );

void setup() {

}

void loop() {
  // Insert 50 NOP: 
  // The uc will insert its own wait states between the NOPs
  // resulting in a bit more than 50 NOP

__asm__ __volatile__("NOPX 50");
}

Thank you.

If I want to use the assamblercode in my code, I have to write:
asm volatile("NOPX 50");

However, an ARM uc inserts its own wait states here and there between instructions

Heh. the ARM architecture also permits noops to be deleted from the pipeline without actually being executed, so it's also possible that a NOOP would take LESS than a single cycle. I don't think any for the Cortex-M processors actually DO that, but ... it is allowed. It is very annoying to try to write cycle-deterministic code on most ARM chips :frowning:
On the bright side, there's the systick timer, which counts at the cpu frequency, that you can probably use for pretty accurate microsecond-level timing, if you add a bit of complexity to handle the possibility of timer-reload, and the reload value is "large" compared to the number of ticks you want to delay.