No optimization on some codelines

Seems like it works OK to me...

   8016e:       f47f affd       bne.w   8016c <L_36_delayMicroseconds>
   80172:       4b09            ldr     r3, [pc, #36]   ; (80198 <L_36_delayMicroseconds+0x2c>)
   80174:       2200            movs    r2, #0
   80176:       639a            str     r2, [r3, #56]   ; 0x38
   80178:       639a            str     r2, [r3, #56]   ; 0x38
   8017a:       639a            str     r2, [r3, #56]   ; 0x38
   8017c:       639a            str     r2, [r3, #56]   ; 0x38

Note that the 4 stores are going to take MUCH less time that your 10-microsecond delay.