Trying to dive into ATtiny assembly

The usual way to save "a lot" of space using assembly language is to define your own custom register usage scheme, rather than C's "well-structured ABI."

For example, in the C code you posted, write() is relatively large, because the C ABI specifies that called functions can modify the registers used for argument passing, and since write() calls other functions, it has to save the four arguments that were passed to it.

void write(uint8_t first, uint8_t second, uint8_t third, uint8_t fourth) {
  aa:   0f 93           push    r16
  ac:   1f 93           push    r17
  ae:   cf 93           push    r28
  b0:   df 93           push    r29
  b2:   08 2f           mov     r16, r24
  b4:   16 2f           mov     r17, r22
  b6:   d4 2f           mov     r29, r20
  b8:   c2 2f           mov     r28, r18
  ba:   c3 df           rcall   TM1637_start
  bc:   80 e4           ldi     r24, 0x40
  be:   cd df           rcall   TM1637_write_byte
  c0:   c6 df           rcall   TM1637_stop
  c2:   bf df           rcall   TM1637_start
  c4:   80 ec           ldi     r24, 0xC0
  c6:   c9 df           rcall   TM1637_write_byte
  c8:   80 2f           mov     r24, r16
  ca:   c7 df           rcall   TM1637_write_byte
  cc:   81 2f           mov     r24, r17
  ce:   c5 df           rcall   TM1637_write_byte
  d0:   8d 2f           mov     r24, r29
  d2:   c3 df           rcall   TM1637_write_byte
  d4:   8c 2f           mov     r24, r28
  d6:   c1 df           rcall   TM1637_write_byte
  d8:   df 91           pop     r29
  da:   cf 91           pop     r28
  dc:   1f 91           pop     r17
  de:   0f 91           pop     r16
  e0:   b6 cf           rjmp    TM1637_stop

If you re-write those sub-functions (write_byte, start, stop) to NOT modify those registers, write could become much shorter (about 50% savings):

     rcall   TM1637_start
     ldi     r2, 0x40
     rcall   TM1637_write_byte
     rcall   TM1637_stop
     rcall   TM1637_start
     ldi     r2, 0xC0
     rcall   TM1637_write_byte
     mov     r2, r24
     rcall   TM1637_write_byte
     mov     r2, r22
     rcall   TM1637_write_byte
     mov     r2, r20
     rcall   TM1637_write_byte
     mov     r24, r18
     rcall   TM1637_write_byte
     rjmp    TM1637_stop

Of course, it may not be easy to write those sub-functions with fewer registers.

You can also put "commonly used constants" into particular registers for use by in/out.