divu5() and divu3(). Fast replacements for unsigned division by 3, 5, 6, 10, 15

There was a minor glitch in all of the functions which is now correct. The fix only costs 2 clock cycles so they are still very fast. I have updated the attachment in my previous post with the new version.

I have also added divu7() for division by 7. This takes slightly longer at on average 40.5 clock cycles (numbers <=0x7FFF are 40, those > are 41).