Getting serial garbage when using registers 24-30 for mul - inline assembly

For an assignment I am working on a matrix multiplication where we need to code it using inline assembly. I am a newbie in AVR assembly.

For some reason, when i use registers 24-30 as operands of 'muls' or 'mul' instruction, the Serial starts giving garbage.

#include <stdint.h>

#define DEBUG

volatile int8_t mat_a[4] = {28, 122, 80, 42};//, 54, 122, 98, 42, 99, 58, 124, 29, 21};
volatile int8_t mat_b[4] = {102, 106, 65, 114};//, 25, 45, 39, 58, 119, 121, 29, 70, 123};

volatile uint32_t mat_c = 0;


void setup(){
    Serial.begin(9600);
}

void loop(){

    asm(

        "lds r16, (mat_a+0) \n"
        "lds r17, (mat_a+1) \n"
        "lds r18, (mat_a+2) \n"
        "lds r19, (mat_a+3) \n"

        "lds r24, (mat_b+0) \n"
        "lds r25, (mat_b+1) \n"
        "lds r26, (mat_b+2) \n"
        "lds r27, (mat_b+3) \n"
    );

    asm(
        // ------- gives garbage ------
        "muls r16, r24 \n"
        // "muls r16, r25 \n"
        // "muls r16, r26 \n"
        // "muls r16, r27 \n"
        // "muls r16, r28 \n"
        // "muls r16, r29 \n"
        // "muls r16, r30 \n"

        // ------- works fine ------
        // "muls r16, r23 \n"
        // "muls r16, r31 \n"
    );

    asm(
        // // store to result
        "sts (mat_c + 0), r16 \n"
        "sts (mat_c + 1), r17 \n"
        "sts (mat_c + 2), r18 \n"
        "sts (mat_c + 3), r19 \n"
    );

    #ifdef DEBUG
        Serial.print("value: ");
        Serial.println(mat_c);
    #endif

}

(Please note the above code is just a work-in-progress. To demonstrate the point.)

How can i avoid this? Because its really limiting the number of registers i can use. (7 out of 16).
The device is an Atmega328p (Arduino Uno)

Actually the issue seems to be when the multiplication output is > 255.
If the operands are reduced be in this range, Serial output is fine.

How many bits can each register hold ?

UKHeliBob:
How many bits can each register hold ?

Sorry i forgot to mention the microcontroller. Its an Atmega328p (Arduino Uno).
So the registers are 8-bit.

According to the assembly instruction manual, the result (16bit) of a 8x8 multiplication goes to registers r0 and r1.

navoda1:
Sorry i forgot to mention the microcontroller. Its an Atmega328p (Arduino Uno).
So the registers are 8-bit.

According to the assembly instruction manual, the result (16bit) of a 8x8 multiplication goes to registers r0 and r1.

Big end first or little end first ?

UKHeliBob:
Big end first or little end first ?

Big end first. With MSB in r1 and LSBin r0.

For some reason, clearing r1 and r0 at the end, before calling the serial print seems to fix it.

Below code gives the correct answer. So i can move on from here. But if someone knows the exact reason for this, i would like to know. To understand what really is happening.

        asm(
            "clr r10 \n"
            "clr r11 \n"

            "lds r16, (mat_a + 0) \n"
            "lds r17, (mat_a + 1) \n"
            "lds r18, (mat_a + 2) \n"
            "lds r19, (mat_a + 3) \n"

            "lds r24, (mat_b + 0) \n"
            "lds r25, (mat_b + 1) \n"
            "lds r26, (mat_b + 2) \n"
            "lds r27, (mat_b + 3) \n"

            "muls r16, r24 \n"
            "mov r08, r00 \n"
            "mov r09, r01 \n"

            "muls r17, r25 \n"
            "add r08, r00 \n"
            "adc r09, r01 \n"
            "adc r10, r11 \n"

            "muls r18, r26 \n"
            "add r08, r00 \n"
            "adc r09, r01 \n"
            "adc r10, r11 \n"

            "muls r19, r27 \n"
            "add r08, r00 \n"
            "adc r09, r01 \n"
            "adc r10, r11 \n"


            // store to result
            "sts (mat_c + 0), r08 \n"
            "sts (mat_c + 1), r09 \n"
            "sts (mat_c + 2), r10 \n"
            // "sts (mat_c + 1), r31 \n"
        );

        asm(
            "clr r00 \n"
            "clr r01 \n"
        );

You must set r1 to 0 after you are done.

https://www.microchip.com/webdoc/AVRLibcReferenceManual/FAQ_1faq_reg_usage.html

What registers are used by the C compiler?

...

r1 - assumed to be always zero in any C code, may be used to remember something for a while within one piece of assembler code, but must then be cleared after use (clr r1). This includes any use of the [f]mul[s[u]] instructions, which return their result in r1:r0. Interrupt handlers save and clear r1 on entry, and restore r1 on exit (in case it was non-zero).

oqibidipo:
You must set r1 to 0 after you are done.

https://www.microchip.com/webdoc/AVRLibcReferenceManual/FAQ_1faq_reg_usage.html

What registers are used by the C compiler?

...

r1 - assumed to be always zero in any C code, may be used to remember something for a while within one piece of assembler code, but must then be cleared after use (clr r1). This includes any use of the [f]mul[s[u]] instructions, which return their result in r1:r0. Interrupt handlers save and clear r1 on entry, and restore r1 on exit (in case it was non-zero).

Thank you so much for the link! that explains why it was behaving strange when i didn't clear these registers at the end.