I was thinking about an approach that fits the following:
- hardware / software (well, that doesn't exclude much);
- 2 instructions (actually can be done in 1, depending on how you code it), in C.
- does not rely on a hardware multiplier;
Whether it is obscure or not is subjective so I would stay away from that.