aarg:
If you just want the top 32 bits of it, why shift it at all? Just make a union of the 32 bit value and the 64 bit value, and read it in place. Zero microseconds.
Read the thread. I've already answered that.
aarg:
If you just want the top 32 bits of it, why shift it at all? Just make a union of the 32 bit value and the 64 bit value, and read it in place. Zero microseconds.
Read the thread. I've already answered that.
cjheath:
Read the thread. I've already answered that.
You rejected an efficient solution because, "it's the compiler's job". I didn't see any other answer from you.
But if you look closely at my suggestion and reply #8, you will see that they are not the same, as reply #8 involves memory moves because of the assignment.
double Result = freq * pow(2, 32) / 5e8;
Could change that manually to
double Result = freq * 8.589934592; // double Result = freq * pow(2, 32) / 5e8;
keeping the original as comment helps to explain magic number 8.5899...
Why are you using volatile in the AVR example?
You didn't use it in the example code you showed for Due.
That might be clobbering the compilers ability to optimize the calculation.
--- bill
bperrybap:
Why are you using volatile...
Only the assignment to u64 is volatile, not the method used to calculate the value.
Otherwise gcc would figure that the value doesn't need to be calculated at all.
If you want to temporarily change the subject to "needing the top 32 bits only", then the main problem is not the slow shift, but the excessive partial multiplication. I've got rid of the frequent 64-bit divide, though I still have a 64-bit multiply, when I could achieve adequate accuracy (slightly less though) with a 32-bit multiply yielding just the top 32 bits. I'm looking at that, but there's no quick easy and portable solution. But it was not the subject under discussion.
aarg:
You rejected an efficient solution because, "it's the compiler's job". I didn't see any other answer from you.
Yes, because the entire motivation for my OP was the fact that the compiler is not doing that job. I'm well aware of other ways to do it and wasn't looking for them (which is why I said "a statement, not a question"), but it's a bit crap that the compiler cannot be relied on. I cannot tell if you have reading difficulties, or are just deliberately trying to make trouble, but it's becoming tiresome.
aarg:
But if you look closely at my suggestion and reply #8, you will see that they are not the same, as reply #8 involves memory moves because of the assignment.
The union solution you propose is not actually quicker than PaulMurrayCbr's after the peephole optimiser removes the unnecessary moves. Also, your solution is still non-portable, because it relies on i32-union-i64 being aligned so as to get the MSB. I have a rule of not deliberately writing non-portable code.
That's interesting, and possibly an answer. I don't know if I have the energy to test it however, because it requires a bunch of mucking around trying to force things into/not into registers.
robtillaart:
double Result = freq * pow(2, 32) / 5e8;
Could change that manually to
double Result = freq * 8.589934592; // double Result = freq * pow(2, 32) / 5e8;
keeping the original as comment helps to explain magic number 8.5899...
Rob, you again answered a question I didn't ask. Using "double" makes the performance problem much worse, and drags in kilobytes of code I don't need. But if I wanted to avoid the call to pow(), I wouldn't do it that way. Instead I'd write "(1LL<<32)/5e8". The compiler still constant-fold to the right value, and I can see what I meant without a comment.
You also tried to educate me on how to calibrate a timing loop, unnecessarily.
I'm all for free-flowing discussions, but seriously, is this kind of behaviour considered normal in this community?
cjheath:
...but it's a bit crap that the compiler cannot be relied on...
The generate code correctly performs the operation. That the generated code does not meet your performance expectations does not mean the compiler is "a bit crap" and it certainly does not mean the compiler "cannot be relied on".
Correct is the compiler's responsibility. Performance is your responsibility.
cjheath:
I'm all for free-flowing discussions, but seriously, is this kind of behaviour considered normal in this community?
Volunteers trying to help people in need. Yes, that is normal in this community. If you don't like people trying to help you then you are welcome to find your way to the door.
I've seen cases where optimizations are different depending on if the variable is static or automatic.
I fought a war and lost on the avr freaks site about what I considered some 8 bit vs 16 bit sign & rollover issues (bugs) on 8 bit loop variables. (somtimes the optimizer treats them as 16 bit and screws up the loop)
I found what I considered a bug but the compiler guys came back and claimed that the the C standard said this particular behavior was "implementation dependent". My argument was that that, that is ok but it was inconsistent between different types of variables and while it is implementation dependent, it should be consistent regardless of the declaration.
I lost.
There were other cases where the type dramatically affected optimization and the compiler guys came back and said it wasn't a bug as the result was still correct. They claimed it was merely a case of missed optimization.
While that is correct, I still had the suspicion that the missed optimization was actually due to a bug in the code.
In these corner cases, using a global tended to get the best optimization from the avr-gcc compiler.
Some of that is because of the how AVR addresses memory and some of it is cases of missed optimizations for other types.
--- bill
To an large extent this is true. However;, I have seen cases with avr-gcc where I was very surprised that the compiler effectively disabled certain optimizations or created completely different behaviors as I mention above depending on the declaration.
This was very unexpected and there is little that application can do other than try re-write the code a different way to work around the "crappy" code the compiler happens to be generating for that source code sequence.
I've seen this happen from time to time with gcc on many different processors over the past few decades. I've just been surprised more times on the avr, and some of it is related to it being 8 bit and how the compiler writers chose to deal with some of the implementation dependent corner cases.
IMHO, some of their decisions seem a bit odd and definitely don't seem to offer the solution of least surprise.
--- bill
That's not how I would describe it. I try to show respect to the person I'm responding to by investing the effort to comprehend what they're asking, by not feeding them irrelevant facts, by not answering questions they didn't ask, or by not treating someone like an idiot when they clearly aren't. I haven't seen that behaviour very much here. It won't stop me trying to be polite, but it makes me less likely to contribute, that's all.