Dividing by 128 would be better as it can be replaced by a bitshift. You would need to adjust 270 and 50 by a factor 128/100..
Bitshifts are far faster than division (and also more predictable).
With AVR processors, that is worth testing before making such assertions.
Maybe so, but the complexity of the code required to perform the bitshift division will gobble up whatever is gained by the bitshift. The simplicity and transparency of the other suggestions provided in this thread is worth a couple of "lost" (or well spend?) clocks - if any are lost at all.
Responding on an Arduino forum to a question about a modest AVR processor with...
mov edx, -1840700269
mov eax, edi
imul edx
lea eax, [rdx+rdi]
sar eax, 2
sar edi, 31
sub eax, edi
...Intel assembly is ... well ... many things. Let's just go with "a fallacy".
On the one hand you tell us to trust the compiler.
Then tell us not to trust the compiler.
At this point your posts are "noise". I'm splitting the thread and moving the remnants to Bar Sport.
A petty excuse.
The thread stays here until there is more than "I so say" as evidence.
I deleted all my posts on this as you seem not to be able to take my inputs seriously.
The correct choice is to ask a moderator to remove the thread. Someone other than me will decide what to do.
I am taking you seriously. i hope you take the time to run the test. Everyone using an AVR processor will benefit.
What you fail to realize is that AVR processors do not have efficient barrel shifters. Subtracting two numbers has the same cost has a single bit shift. It is possible that dividing by a constant like 100 is about the same cost as shifting by 7 bits.
Here's some AVR code demonstrating what actually happens...
uint8_t x8;
uint16_t x16;
uint32_t x32;
;; First, lets try 8bit values.
x8 = PORTB ;; Get a fresh value to prevent optimization
80: 85 b1 in r24, 0x05
PORTD = x8 / 128;
;; Not only is the compiler smart enough to turn the division into a shift,
;; it's also smart enough to know that shifting an 8bit number right 7 bits
;; is the same as just shifting the high bit left into a new register.
;; (Pretty impressive.)
82: 88 1f adc r24, r24 ; shift high bit into carry
84: 88 27 eor r24, r24 ; zero the reg
86: 88 1f adc r24, r24 ; shift the carry into reg
88: 8b b9 out 0x0b, r24
x8 = PORTB;
8a: 85 b1 in r24, 0x05
;; Division by 100 is apparently too difficult to optimize, so a standard
;; division function is called. __udivmodqi4 is 24 bytes of code.
PORTD = x8 / 100;
8c: 64 e6 ldi r22, 0x64 ; 100
8e: 0e 94 83 00 call 0x106 ; 0x106 <__udivmodqi4>
92: 8b b9 out 0x0b, r24
;; Now for 16 bits.
x16 = PINB * PIND; // create an unknown 16bit value.
94: 83 b1 in r24, 0x03
96: 99 b1 in r25, 0x09
98: 89 9f mul r24, r25
9a: c0 01 movw r24, r0
9c: 11 24 eor r1, r1
PORTD = x16 / 128;
;; again, we do some optimized shifting in the "opposite" direction.
9e: 88 0f add r24, r24
a0: 89 2f mov r24, r25
a2: 88 1f adc r24, r24
a4: 99 0b sbc r25, r25
a6: 91 95 neg r25
a8: 8b b9 out 0x0b, r24
x16 = PINB * PIND;
aa: 83 b1 in r24, 0x03
ac: 99 b1 in r25, 0x09
ae: 89 9f mul r24, r25
b0: c0 01 movw r24, r0
b2: 11 24 eor r1, r1
PORTD = x16 / 100;
b4: 64 e6 ldi r22, 0x64 ; 100
b6: 70 e0 ldi r23, 0x00
;; But divide by 100 is still generic divide. __udivmodhi4 is 40 bytes of code.
b8: 0e 94 8f 00 call 0x11e ; 0x11e <__udivmodhi4>
bc: 6b b9 out 0x0b, r22 ; 11
;; Now for 32bit
x32 = PINB * PIND;
be: 83 b1 in r24, 0x03
c0: 99 b1 in r25, 0x09
c2: 89 9f mul r24, r25
c4: c0 01 movw r24, r0
c6: 11 24 eor r1, r1
c8: 09 2e mov r0, r25
ca: 00 0c add r0, r0
cc: aa 0b sbc r26, r26
ce: bb 0b sbc r27, r27
;; divide by 128 is still optimized to a shift, but the shift is no longer
;; "clever." Just a brute-force 32bit shift, done 7 times in a loop :-(
;; (The loop is 14 bytes of code.)
PORTD = x32 / 128;
d0: 37 e0 ldi r19, 0x07
d2: b6 95 lsr r27
d4: a7 95 ror r26
d6: 97 95 ror r25
d8: 87 95 ror r24
da: 3a 95 dec r19
dc: d1 f7 brne .-12 ; 0xd2 <main+0x52>
de: 8b b9 out 0x0b, r24
x32 = PINB * PIND;
e0: 63 b1 in r22, 0x03
e2: 89 b1 in r24, 0x09
e4: 68 9f mul r22, r24
e6: b0 01 movw r22, r0
e8: 11 24 eor r1, r1
ea: 07 2e mov r0, r23
ec: 00 0c add r0, r0
ee: 88 0b sbc r24, r24
f0: 99 0b sbc r25, r25
PORTD = x32 / 100;
;; And divide by 100 is still a generic divide function. divmodsi4 is 68 bytes
f2: 24 e6 ldi r18, 0x64 ; 100
f4: 30 e0 ldi r19, 0x00
f6: 40 e0 ldi r20, 0x00
f8: 50 e0 ldi r21, 0x00
fa: 0e 94 a3 00 call 0x146 ; 0x146 <__udivmodsi4>
fe: 2b b9 out 0x0b, r18
}
100: 90 e0 ldi r25, 0x00 ;; (return 0)
102: 80 e0 ldi r24, 0x00
104: 08 95 ret
FWIW, ARM CM0 (which has a barrel shifter but no divide instruction) still uses a division function for divide by 100. Since ARM doesn't "DO" 8bit stuff, it uses the same 266byte 32bit division routine for all three variable lengths. (wouldn't THAT be an interesting thing to optimize!)
ARM CM4 has a divide instruction, and still uses shifting for /128 and the division instruction for /100.
Here a small program to prove that division by 100 takes far more time than division by 128 on Uno. Division by 100 takes 13.80 micro seconds. Division by 128 takes 1.32 micro seconds.
7 bit shifts to the right take 1.32 micro seconds as well.
I must admit, this does not prove that it is implemented with bit shift, you would have to look into the assembly code like @westfw did.
@Coding_Badly: Point stands that division by 128 is far faster than division by 100.
It might be that division by 128 is actually implemented like:
Take high byte.
Copy into int.
Bitshift left.
Add most significant byte from low byte.
This is just a theory that can explain that division by 16 takes more time.... (2.36 micro seconds) even though only 4 right shifts instead of 7 would be needed. Division by 256 is even faster (0.84). That would be: take high byte.
I see now that this is exactly what @westfw has shown...
const int inputPin = A3;
const int denom = 128; //change to 100 if interested
const bool bitshift = true;
void setup() {
Serial.begin(9600);
}
void loop() {
uint16_t x[100];
randomSeed(analogRead(inputPin));
for (int i=0; i<100; i++) {
x[i] = random(65535);
}
uint32_t t_start = micros();
if (bitshift) {
for (int i=0; i<100; i++) {
x[i] = x[i] >> 7;
}
}
else {
for (int i=0; i<100; i++) {
x[i] = x[i]/denom;
}
}
uint32_t t_end = micros();
for (int i=0; i<100; i++) {
Serial.println(x[i]);
}
Serial.println(t_end - t_start);
delay(500);
}
This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.