i have some code that measures the time it takes a microcontroller to perform 1m integer adds. I record micros() before and after. It works with all microcontrollers i have including esp32,esp32 s2 and s3, stm32F1, stm32F4 and milk v duo.
On raspberry pi pico the value of first micros() from the last micros() value is always 1 microsecond which is wrong. Is micros() broken in arduino core for raspberry pi. I have tried both official board arduino cores from arduino and earle's version.
The results are used elsewhere so it cant be that compiler ignores it. for example the results are printed via serial. AInt is also modified after the loop. Just after the second micros() call.
i could use random instead of Num0 but then it would be measuring both the performance of random and integer add which when I replaced it with random(0,999999); it gives 1.14 MIops When raspberry is @200 Mhz
well I found the problem. It was the raspberry pi's compiler optimization. It was pre-calculating the result which is on by default... So to avoid that you have to use " asm volatile("" : : : "memory") " inside the loop to tell it to not reorder / optimize the code.
micros() on the Arduino core for rp2040 takes close to 4us to execute.
I think your numbers seem suspect. Doesn't the ESP32-S3 have hardware floating point? I'd think t hat should be more than 2x faster that the S2 or rp2040 (which don't.) Are you sure that the calculations got done in single precision?
rp2040 doesn't have a division instruction; I'm surprised that it apparently matched ESP32. (OTOH, the RP2040 has a single-cycle multiply, while the ESP doesn't.)
esp32 s3 does have one but it wont make a major difference in this case. for example the esp32 s2 doesnt have fpu but it can still calculate it on the alu. Whereas the alu on s3 would be free because the fpu would do the floating point stuff. so yeh s3 is much faster than s2 but the benckmark test the raw flops. The compiler for esp32 s3 just uses the fpu while s2 runs it on alu. All esp32 s3 and esp32 s2 flash is clocked at 80 Mhz. The esp32 s3 flash can go up to 120MHz but that wouldn't be really fair.
Don't forget esp32 and Arm are different architectures. Arm has way better memory optimizations and features pertaining to memory and cache. I've read they have something called ART Accelerator allowing better flash access etc.
And for rp2040 I'm using Earle's board library because only his version has overclocking. And yes its 240 MHz (Overlocked). it can go up to 250 MHz stable but for comparison i left it at 240..
Stuff like memory access, cache, the speed of spi flash can affect performance. This is the best case scenario for them. If for example i were to have rp2040 do operations from memory directly the performance would be horrible like almost 2 MIops because memory access is costly.
The benckmark code i wrote prevents the compiler from tossing out the loops or precalculating the results but it also allows the compiler to use registers rather than reloading the values everytime which is costly.
Operations like Mul and Divide are demanding so it makes sense that they are slightly slower than Add or Sub.
And tbh if micros() really was taking longer on rp2040 then in reality it will have higher value Mflop and MIop value making it beat the other mcus.
The benckmark code measures how much time each MCU takes to do 1m float operations and 1m Integer operations (individually ofc). Then we find out how much of these 1m Ops it can do in 1 second by dividing 1 second by the time it takes per 1m ops