Hello,
I ll be using a mega2560 on a project. Need 2560 because of i/o pins.
There is a a need of extracting the 12 Msbs of an calculated uint32 to ports. For example, bit31-24 go to portA, bits 23-20 go to half portc , lsb side).
So I have
unsigned long product=0;
...
product=0x1A2B3C4D; //for test purposes
...
PORTA=product>>24;// get 8 msb 24-31
PORTC=product>>20&0x0F; //get 8lsb 20-27and "zero out" 24-27
There are 3 such cases. 24+20=44 shiftsX3=132 clock ticks=(well, if correct) 8.25usec. Thats a long time for the case.
So I wonder, is there anything faster?
AVR-GCC packs the struct LSB to MSB so byte4 would be [31:24] down to byte1 [7:0].
If you implement this union and then assign sample.longint = 2882343476 which is 0xABCD1234,
printing sample.byte4 ... sample.byte1 gives 171(AB), 205(CD), 18(12) and 52(34).
You 32bit number is broken down to its 4 byte-wide constituents with one assignment.
My apologies. The 1.8.3 compiler is generating amazingly bad code. It is not even using the SWAP instruction to isolate the nibble. That's unexpected and disappointing.
LTO generated this for an unconditional local jump...
FYI, the calculation is a multiply of 2 trigonometric (sin of 2 angles). I do not use float, because of slow procedure. First I "mapped" 0..1 to 0..255 using int(255*sin(x)) (so the max product is 65535- uint16) and putting values for all interested angles in array . Working this way whole main calculation runs in about 16usec., but there is some error because of low resolution of "mapping". Then I used uint16, so the product goes to uint32 and...you know what happens. I think its a dissaster of loosing 8usec in shifting. (in respect to the time for the rest calculation). Thats the story (and of course "thats the limits" is accepted)
@DKWatson : Ill give a try to your proposal and measure.
Now Im at the lab and did all the respective measurements.
Main procedure, just calculation as it was formed yesterday, runs exactly at 19.65usec.
Adding ONE only output :
a. 24+20 shifts, launches time to 29.98usec.
b. First union approach, just split to bytes and output 2 whole ports, measures total 20.03.
c. Second union approach, get and output 8+4 , measures total 20.16.
@ Coding Badly : I had almost written this postreply when you last posted. did not check the validity, but now I can handle it