Trying to dive into ATtiny assembly

I am trying to port a sketch I've previously written for Uno to ATtiny13. It turned out to be a challenge to fit the sketch into 1kb of memory. Having some experience with assembly (x86 assembly, that is), I thought it would be easy, but I am struggling with almost every line of code. But okay, this is my current problem: I am trying to write TM1637 "stop" routine in ASM.

C++ code that works:

void TM1637_stop(void)
{
	TM1637_CLK_LOW();
	TM1637_DIO_LOW();
	TM1637_DELAY_US();
	TM1637_CLK_HIGH();
	TM1637_DIO_HIGH();
	TM1637_DELAY_US();
}

ASM code that produces garbage on display:

TM1637_stop:
	ldi R16, 0b11111100
	and PORTB, R16 ; LO -> (clk, dio)
	rcall TM1637_DELAY_US + 1 ; wait 50 us
	com R16
	out PORTB, R16 ; HI -> (clk, dio)
	rcall TM1637_DELAY_US + 1 ; wait 50 us
	ret

Similar "START" routine works fine in assembly but this STOP is problematic, I don\t understand?

The GCC compiler produces very compact code. I would be surprised if you could make a major reduction in the size of the code using assembler.

Are you sure you are using the smallest possible types for all variables in the Uno program?

Post the Uno program.

...R

Almost all variables are uint8_t except for the array holding reference values for analog read (one-pin membrane keypad values up to a value of 1023, so it had to be of type int. Sorry I don't have the code at the moment but it is about 1200 bytes and I am sure it can be squeezed downto 1024 bytes... But yeah, my noobish assembly always produces at least a few bytes more than GCC compiler :slight_smile:

Sketch is for electronic lock, with 7-seg display, keypad, servo, few capacitors and resistors, switch, and one mosfet. It auto-shuts down completely after few seconds of inactivity in order to save battery.

Now, this int array of 17 elements is taking up too much of valuable dynamic memory. ATtiny13 has only 64 bytes.

Post. Your. Code.

Okay, since my initial question was about correctness of assembly code, as I want to learn some AVR assembly, I'll skip my original electronic lock sketch and focus to this display part which I'm trying to get to work without library to chop down every byte I can save. Here is .ino code:

// TM 1637 7-seg display test code, no library

#define TM1637_DIO_PIN PB0
#define TM1637_CLK_PIN PB1

#define TM1637_DIO_OUTPUT() (DDRB |= _BV(TM1637_DIO_PIN))
#define TM1637_DIO_INPUT() (DDRB &= ~_BV(TM1637_DIO_PIN))
#define TM1637_DIO_READ() (((PINB & _BV(TM1637_DIO_PIN)) > 0) ? 1 : 0)
#define TM1637_CLK_HIGH() (PORTB |= _BV(TM1637_CLK_PIN))
#define TM1637_CLK_LOW() (PORTB &= ~_BV(TM1637_CLK_PIN))
#define TM1637_DIO_HIGH() (PORTB |= _BV(TM1637_DIO_PIN))
#define TM1637_DIO_LOW() (PORTB &= ~_BV(TM1637_DIO_PIN))

                     /*0*/ /*1*/ /*2*/ /*3*/ /*4*/ /*5*/ /*6*/ /*7*/ /*8*/ /*9*/
uint8_t digits[] = { 0x3f, 0x06, 0x5b, 0x4f, 0x66, 0x6d, 0x7d, 0x07, 0x7f, 0x6f };
 
extern "C" {
 // function prototypes
 void TM1637_DELAY_US();
 void TM1637_start();
 //void TM1637_stop();
}

void setup()
{
 DDRB |= (_BV(TM1637_DIO_PIN)|_BV(TM1637_CLK_PIN));
 
 // initialise display and set brightness
 // 0x88 is dim and increasing value to 0x8C increases brightness
 TM1637_start();
 TM1637_write_byte(0x8c);
 TM1637_stop();

 // clear display
 write(0xff, 0xff, 0xff, 0xff);
}

void loop()
{
 // display some numbers in order to check if code works
 write( digits[1], digits[2], digits[2], digits[0] );
}

void write(uint8_t first, uint8_t second, uint8_t third, uint8_t fourth)
{
 TM1637_start();
 TM1637_write_byte(0x40);
 TM1637_stop();

 TM1637_start();
 TM1637_write_byte(0xc0);
 TM1637_write_byte(first);
 TM1637_write_byte(second);
 TM1637_write_byte(third);
 TM1637_write_byte(fourth);
 TM1637_stop();
}

/*
void TM1637_start(void)
{
 TM1637_CLK_HIGH();//send start signal to TM1637
 TM1637_DIO_HIGH();
 TM1637_DELAY_US();
 TM1637_DIO_LOW();
 TM1637_CLK_LOW();
 TM1637_DELAY_US();
}
*/

void TM1637_stop(void)
{
 TM1637_CLK_LOW();
 TM1637_DIO_LOW();
 TM1637_DELAY_US();
 TM1637_CLK_HIGH();
 TM1637_DIO_HIGH();
 TM1637_DELAY_US();
}

uint8_t TM1637_write_byte(uint8_t value)
{
 uint8_t i, ack;

 for (i = 0; i < 8; ++i, value >>= 1) {
 TM1637_CLK_LOW();
 TM1637_DELAY_US();
 if (value & 0x01) {
 TM1637_DIO_HIGH();
 } else {
 TM1637_DIO_LOW();
 }
 TM1637_CLK_HIGH();
 TM1637_DELAY_US();
 }

 TM1637_CLK_LOW();
 TM1637_DIO_INPUT();
 TM1637_DIO_HIGH();
 TM1637_DELAY_US();
 ack = TM1637_DIO_READ();
 if (ack) {
 TM1637_DIO_OUTPUT();
 TM1637_DIO_LOW();
 }
 TM1637_DELAY_US();
 TM1637_CLK_HIGH();
 TM1637_DELAY_US();
 TM1637_CLK_LOW();
 TM1637_DELAY_US();
 TM1637_DIO_OUTPUT();
 return ack;
}

And here goes .S assembly file:

#define TM1637_DIO_PIN 0b00000001 ; PB0
#define TM1637_CLK_PIN 0b00000010 ; PB1
#define DDRB 0x17
#define PORTB 0x18
#define R16 0x10

.global TM1637_DELAY_US
.global TM1637_start
;.global TM1637_stop

TM1637_DELAY_US: ; 50 us at 1.2 mhz
 ; rcall takes 3 cycles
 ; ret takes 4 cycles
 ; we need another 53 cycles
    ldi  r18, 25 ; 1 cycle
L1: dec  r18 
    breq L1 ; 1 cycle for true and 2 for false
 ret
 
TM1637_start:
 ldi R16, TM1637_CLK_PIN | TM1637_DIO_PIN
 or PORTB, R16 ; HI -> (clk, dio)
 rcall TM1637_DELAY_US + 1 ; wait 50 us
 com R16 
 and PORTB, R16 ; LOW -> (clk, dio)
 rcall TM1637_DELAY_US + 1 ; wait 50 us
 ret

TM1637_stop:
 ldi R16, 0b11111100
 and PORTB, R16 ; LOW -> (clk, dio)
 rcall TM1637_DELAY_US + 1 ; wait 50 us
 com R16
 out PORTB, R16 ; HI -> (clk, dio)
 rcall TM1637_DELAY_US + 1 ; wait 50 us
 ret

Why is this last routine, TM1637_stop: flawed?

   or PORTB, R16 ; HI -> (clk, dio)
      :
   and PORTB, R16 ; LOW -> (clk, dio)

Those are not something you can do on an AVR. The instructions that take a PORT as an argument are very limited: IN/OUT, SBI/CBI, SBIS/SBIC...
You probably want

  cbi PORTB, CLK  ; clock low
  cbi PORTB, DIO  ; data low

At two instructions, 4 clocks and no registers used, that's shortest. It's also probably what the compiler produces.

   rcall TM1637_DELAY_US + 1

What are you expecting that "+ 1" to do?
Have you looked at the object code produced by the compiler? That's frequently a good idea in cases like this:

  • you can focus attention on areas that are "big"
  • you get some concrete examples of what sorts of instructions are available.

Thank you very much, westfw!

That "+1" appendage was a desperate try to make the code work and it did the trick in the START routine, right before I changed OUT to illegal AND and OR :slight_smile:

I have no idea why +1 works, it was inspired by ATtiny instruction set manual which says that RCALL goes to PC + k + 1...

losmi:
Sketch is for electronic lock, with 7-seg display, keypad, servo, few capacitors and resistors, switch, and one mosfet. It auto-shuts down completely after few seconds of inactivity in order to save battery.

Just out of curiosity, how many hundreds of these things are you planning to make so that the difference in price between an Attiny and an Atmega328 matters?

...R

The usual way to save "a lot" of space using assembly language is to define your own custom register usage scheme, rather than C's "well-structured ABI."

For example, in the C code you posted, write() is relatively large, because the C ABI specifies that called functions can modify the registers used for argument passing, and since write() calls other functions, it has to save the four arguments that were passed to it.

void write(uint8_t first, uint8_t second, uint8_t third, uint8_t fourth) {
  aa:   0f 93           push    r16
  ac:   1f 93           push    r17
  ae:   cf 93           push    r28
  b0:   df 93           push    r29
  b2:   08 2f           mov     r16, r24
  b4:   16 2f           mov     r17, r22
  b6:   d4 2f           mov     r29, r20
  b8:   c2 2f           mov     r28, r18
  ba:   c3 df           rcall   TM1637_start
  bc:   80 e4           ldi     r24, 0x40
  be:   cd df           rcall   TM1637_write_byte
  c0:   c6 df           rcall   TM1637_stop
  c2:   bf df           rcall   TM1637_start
  c4:   80 ec           ldi     r24, 0xC0
  c6:   c9 df           rcall   TM1637_write_byte
  c8:   80 2f           mov     r24, r16
  ca:   c7 df           rcall   TM1637_write_byte
  cc:   81 2f           mov     r24, r17
  ce:   c5 df           rcall   TM1637_write_byte
  d0:   8d 2f           mov     r24, r29
  d2:   c3 df           rcall   TM1637_write_byte
  d4:   8c 2f           mov     r24, r28
  d6:   c1 df           rcall   TM1637_write_byte
  d8:   df 91           pop     r29
  da:   cf 91           pop     r28
  dc:   1f 91           pop     r17
  de:   0f 91           pop     r16
  e0:   b6 cf           rjmp    TM1637_stop

If you re-write those sub-functions (write_byte, start, stop) to NOT modify those registers, write could become much shorter (about 50% savings):

     rcall   TM1637_start
     ldi     r2, 0x40
     rcall   TM1637_write_byte
     rcall   TM1637_stop
     rcall   TM1637_start
     ldi     r2, 0xC0
     rcall   TM1637_write_byte
     mov     r2, r24
     rcall   TM1637_write_byte
     mov     r2, r22
     rcall   TM1637_write_byte
     mov     r2, r20
     rcall   TM1637_write_byte
     mov     r24, r18
     rcall   TM1637_write_byte
     rjmp    TM1637_stop

Of course, it may not be easy to write those sub-functions with fewer registers.

You can also put "commonly used constants" into particular registers for use by in/out.

Robin2:
Just out of curiosity, how many hundreds of these things are you planning to make so that the difference in price between an Attiny and an Atmega328 matters?

...R

Well it's not about saving money, I just have that urge to optimize things I make, and it is kind of challenge for me which pushes me to learn new stuff. And I'm only making one for my son's drawer :slight_smile:

WESTFW, you gave me quite a lot to chew on, thank you very much! I'm arduinoing whenever I can find spare time so it will take few days before I try out all the stuff you suggested :slight_smile:

losmi:
it is kind of challenge for me which pushes me to learn new stuff.

Learning is good.

...R