aarg:
It would be faster as a preprocessor macro, instead of a function call.
That is not true, and it's actually harmful advice in my opinion. Using macros instead of functions is just asking for problems.
Modern compilers will inline functions if the implementation is available, you don't need a macro for that.
If you enable link-time optimizations (using GCC's -flto, which is enabled by default for AVR in the Arduino IDE), it doesn't even matter whether they're in the same source file/translation unit, it'll inline functions globally.
Proof: Compiler Explorer
Both the function and the macro code compile to the exact same assembly.
If the compiler insists on not inlining the function call, you could add the inline keyword to try and persuade it, or even stronger compiler-specific attributes if you're into that.
I'm sure you could find edge cases where a macro turns out to be marginally faster (probably at a high memory penalty), but in general, this is not the case, and "use a macro, it's faster" is terrible general advice.
If all else fails, in very rare cases and if you have measured it, you might consider it. Otherwise, use a function, and save yourself trouble in the long term.
It's just 4 left bit rotates, each rotate uses a logical shift left (lsl) of the low byte, which sets the carry flag to original bit 7, then a rotate left through carry (rol) of the high byte, which sets new bit 8 to the carry flag, and sets the carry flag to original bit 15. Finally, the carry flag is added (adc) to the low byte to set the new bit 0 if original bit 15 was set.
The carry bit/flag is a single bit in the CPU status register. A bit shift instruction places the bit that was shifted out into the carry flag. Similarly for other arithmetic instructions. You can branch on the carry flag, or you can use instructions that shift back in the carry flag, or add the carry flag to a register, etc.
There's also a signed flag, that keeps track of whether the result of a subtraction was negative, for instance.
I don't think AVR has any instructions that operate on the most significant bit explicitly (BST etc. can operate on any bit). When dealing with unsigned numbers "sign bit" is pretty much meaningless.
But surely the compiler/optimizer should know about carry bits, right?
Clearly there's knowledge deep down in the code generation. But no C-level accessibility of the sort that would make doing the end-around carry for an IP checksum "easy" (for example.)
Also on AVR the LSL Rd (Logical Shift Left) instruction doesn't really exist.
Neither does ROL (it's "adc Rn, Rn")
(And it doesn't actually rotate in the way you might expect coming from a different CPU architecture.)
(that also means that your quicker code sequence could become this "prettier" version:)
I don't think AVR has any instructions that operate on the most significant bit explicitly
BRMI and BRPL test the "N" bit in the status register, which is set to the MSB of the most-recent nbit-modifying instruction. A "shift left" instruction gives you "access" to three bits via conditional branches - the former MSB (C), the new MSB (N), and the former bit 3 (H)
(brownie points for anyone who manages to make use of that for ... anything.)
pcbbc:
Edit: Also on AVR the LSL Rd (Logical Shift Left) instruction doesn't really exist. The assembler implements it using the ADD Rd, Rd instruction.
westfw:
Neither does ROL (it's "adc Rn, Rn")
What would be the correct term that you'd use in a normal conversation? In our MIPS course they referred to it as "pseudoinstructions", but the AVR Instruction Set Manual lists LSL and ROL under "Bit and Bit-test Instructions". Is it common to just call everything that has its own mnemonic an "instruction"?
westfw:
BRMI and BRPL test the "N" bit in the status register, which is set to the MSB of the most-recent nbit-modifying instruction. A "shift left" instruction gives you "access" to three bits via conditional branches - the former MSB (C), the new MSB (N), and the former bit 3 (H)
(brownie points for anyone who manages to make use of that for ... anything.)
Clearly, someone has spent some time doing serious AVR programming
What would be the correct term that you'd use in a normal conversation? In our MIPS course they referred to it as "pseudoinstructions"
I like "pseudo-instruction" or "alias." TI calls them "emulated" instructions (on MSP430.)
Whether the manufacturer claims them as separate instructions or not seems to be random.
I think back in the day when RISC was "new", they'd be more inclined to make sure that the "instruction set" contained all the mnemonics from prior architectures that they were competing against (the early AVRs were positioned against 8051 (the AT90S8515 was pin-compatible!)), so it needed to document all the old 8080-like instructions.
Now that RISC is more of a "feature", they're more likely to list the pseudo-instructions separately, to make the instruction set look more "reduced." (like TI. The MSP430 has something like 27 opcodes, and 24 additional "emulated" instructions.
On AVR, there are several other pseudo-instructions (Heh. Homework for the assembly language class, to identify them!)
On ARM, "nop" was "mov r8, r8" for a long time; I think the compiler still uses that. Current thumb/thumb2 has both 16bit and 32bit "nop" instructions.
someone has spent some time doing serious AVR programming
Actually, that observation comes more from studying ARM code. ARM doesn't have a 16bit "test bits" or "and immediate" instruction (which means that M0 doesn't have them at all), but it does have a single-cycle "shift by any amount", so single-bit tests ("if (reg & (1<<13))") get implemented by shifting the value until the bit-of-interest hits something that sets the carry or sign bit.
The term, "pseudo instruction" has a solid history. But sometimes it refers to symbols that are for assembler control only and don't produce any processor instructions, like ORG, EQU, RMB, END... and many more.
Heh. This is cute. It cuts off a couple of cycles, and is shorter. Uses the multiply instruction, which is not present on all of the AVRs.
It trashes 4 registers, which is probably bad. If you're willing to trash a fifth register, you could get rid of another instruction...
Maybe if you had a long list of values that you wanted to rotate, the overhead that could be shared would factor out...
dorot4: ;; enter with x in r25:r24
ldi r17, 16
mul r25, r17
mov r18, r1
mov r25, r0
mul r24, r17
mov r24, r18
add r24, r0
add r25, r1 ;; exit with rotated value in r25:r24
(Comments stripped, in case you want to puzzle it out. It's based on a 16x8-->24 multiply I found, taking into account the special properties of multiplying by 16. I can't quite intuit whether it works for other shift factors (and it's tougher to see if it's working. I think... it might have to be more careful about carry bits.))