or PORTB, R16 ; HI -> (clk, dio)
:
and PORTB, R16 ; LOW -> (clk, dio)
Those are not something you can do on an AVR. The instructions that take a PORT as an argument are very limited: IN/OUT, SBI/CBI, SBIS/SBIC...
You probably want
cbi PORTB, CLK ; clock low
cbi PORTB, DIO ; data low
At two instructions, 4 clocks and no registers used, that's shortest. It's also probably what the compiler produces.
rcall TM1637_DELAY_US + 1
What are you expecting that "+ 1" to do?
Have you looked at the object code produced by the compiler? That's frequently a good idea in cases like this:
- you can focus attention on areas that are "big"
- you get some concrete examples of what sorts of instructions are available.