Seems like it works OK to me...
8016e: f47f affd bne.w 8016c <L_36_delayMicroseconds>
80172: 4b09 ldr r3, [pc, #36] ; (80198 <L_36_delayMicroseconds+0x2c>)
80174: 2200 movs r2, #0
80176: 639a str r2, [r3, #56] ; 0x38
80178: 639a str r2, [r3, #56] ; 0x38
8017a: 639a str r2, [r3, #56] ; 0x38
8017c: 639a str r2, [r3, #56] ; 0x38
Note that the 4 stores are going to take MUCH less time that your 10-microsecond delay.