Pages: [1] 2   Go Down
Author Topic: Arduino inline assembly: 16 bit x 8 bit multiplication!  (Read 2840 times)
0 Members and 1 Guest are viewing this topic.
Offline Offline
Jr. Member
**
Karma: 0
Posts: 55
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Hi all,
How to multiply 8 bit unsigned int with a 16 bit int in Arduino inline assembly?

I'm trying to do very basic task of: a*5   (where is a is 16 bit unsigned int)
the code is as follow but doesn't work;

Code:
.
unsigned int a = 511;

void setup() { 
 asm volatile( 
              "mov r24, %1"       "\n\t"
              "ldi   r26, 5"         "\n\t"
              "mul r24, r26"      "\n\t"    // doesn't work... how to split r24 and r25 as HIGH and LOW byte ?
             
              "mov %0, r0"        "\n\t"
              :"+r"(a)       
               );

  Serial.begin(19200);
  Serial.print("a = ");
  Serial.println(a);
}

Thanks in advance
Logged

Seattle, WA USA
Offline Offline
Brattain Member
*****
Karma: 548
Posts: 46042
Seattle, WA USA
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Why don't you look at the code generated when you perform the multiplication the usual way? Why are you trying to do this in assembler?
Logged

Offline Offline
Jr. Member
**
Karma: 0
Posts: 55
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Why don't you look at the code generated when you perform the multiplication the usual way? Why are you trying to do this in assembler?

It's much faster to do in assembly.. smiley-grin

Where can I find the reference for In-line Assembly in GCC? Only resource I can find is:
http://www.nongnu.org/avr-libc/user-manual/inline_asm.html
Which is not sufficient... smiley-sad[/color]
« Last Edit: March 21, 2012, 07:12:16 am by DirtyBits » Logged

Seattle, WA USA
Offline Offline
Brattain Member
*****
Karma: 548
Posts: 46042
Seattle, WA USA
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
It's much faster to do in assembly.
Lot's of things that don't work are faster than those that do work. So, I'm not buying this argument (yet). If you look at the code generated by the 8 bit * 16 bit multiplication operation, and figure out what is wrong with your attempt, fix you attempt, and then find that it is still faster, then, I'll listen.
Logged

Montreal
Offline Offline
Edison Member
*
Karma: 23
Posts: 2486
Per aspera ad astra.
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Have you read AVR201? There are 16 x 16 = 24 multiplication, which is close to what you like to accomplish, probably you can skip couple lines:
Code:
;******************************************************************************
;*
;* FUNCTION
;* mul16x16_24
;* DECRIPTION
;* Unsigned multiply of two 16bits numbers with 24bits result.
;* USAGE
;* r18:r17:r16 = r23:r22 * r21:r20
;* STATISTICS
;* Cycles : 14 + ret
;* Words : 10 + ret
;* Register usage: r0 to r1, r16 to r18 and r20 to r23 (9 registers)
;* NOTE
;* Full orthogonality i.e. any register pair can be used as long as
;* the 24bit result and the two operands does not share register pairs.
;* The routine is non-destructive to the operands.
;*
;******************************************************************************

mul16x16_24:
mul r23, r21 ; ah * bh
mov r18, r0
mul r22, r20 ; al * bl
movw r17:r16, r1:r0
mul r23, r20 ; ah * bl
add r17, r0
adc r18, r1
mul r21, r22 ; bh * al
add r17, r0
adc r18, r1
ret

Logged

Offline Offline
Jr. Member
**
Karma: 0
Posts: 55
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Have you read AVR201? There are 16 x 16 = 24 multiplication, which is close to what you like to accomplish, probably you can skip couple lines:
Code:
mul16x16_24:
mul r23, r21 ; ah * bh

Thanks a lot for your help.
This is exactly what I'm trying to do but in In-line Assembly.
The main hurdle is I'm not finding a way to to Load the 16 bit value for an integer(a) in to two different rigesters;
i.e: AVR assembly it is;

.EQU  a = 511;
mov r24 = HIGH(a) ; Stores Higher 8-bit of integer 'a'
mov r25 = LOW(a) ; Stores Lower 8-bit of integer 'a'

but how to do this in In-line assembly?

Logged

Montreal
Offline Offline
Edison Member
*
Karma: 23
Posts: 2486
Per aspera ad astra.
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Don't know assembler, but a whale ago I come across this blog
http://mekonik.wordpress.com/2009/03/18/arduino-avr-gcc-multiplication/ and here macro implementation:
Code:
#define MultiU16X16to32(longRes, intIn1, intIn2) \
asm volatile ( \
"clr r26 \n\t" \
                      //<<  Removed for this post, look on a blog full version
: \
"=&r" (longRes) \
: \
"a" (intIn1),    \<< This is you looking for?
"a" (intIn2) \
: \
"r26" \
)
Logged

Offline Offline
Jr. Member
**
Karma: 0
Posts: 55
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Don't know assembler, but a whale ago I come across this blog
http://mekonik.wordpress.com/2009/03/18/arduino-avr-gcc-multiplication/ and here macro implementation:
Code:
#define MultiU16X16to32(longRes, intIn1, intIn2) \
asm volatile ( \
"clr r26 \n\t" \
                      //<<  Removed for this post, look on a blog full version
: \
"=&r" (longRes) \
: \
"a" (intIn1),    \<< This is you looking for?
"a" (intIn2) \
: \
"r26" \
)

Thanks for the link! I'm going through it
Logged

SF Bay Area (USA)
Offline Offline
Tesla Member
***
Karma: 106
Posts: 6373
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

It's much faster to do in assembly..


Are you sure now?  i *= 5 compiles to:
 
Code:
12e:   9c 01           movw    r18, r24
 130:   22 0f           add     r18, r18
 132:   33 1f           adc     r19, r19
 134:   22 0f           add     r18, r18
 136:   33 1f           adc     r19, r19
 138:   28 0f           add     r18, r24
 13a:   39 1f           adc     r19, r25
(that's (i+i)+(i+i)+i )

Where can I find the reference for In-line Assembly in GCC? Only resource I can find is:
http://www.nongnu.org/avr-libc/user-manual/inline_asm.html
Which is not sufficient... smiley-sad[/color]
[/quote]
You probably need the gnu assembler manual (to cover all the non-cpu-specific features of the assembler), and the Atmel AVR instruction set manual (which you have to apply some salt to, since it uses a different syntax for some things than the gnu assembler.)  And the device data sheet to tell you which instructions are present on the specific chip you are using.  (for example, there is an 8x8 multiply instruction, but it is not present on ATtiny cpus.)
Logged

Switzerland
Offline Offline
Sr. Member
****
Karma: 6
Posts: 375
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

There's quite a bit of stuff on AVR assembler at http://www.avr-asm-tutorial.net/avr_en/

Are you sure now?  i *= 5 compiles to:
 
Code:
12e:   9c 01           movw    r18, r24
 130:   22 0f           add     r18, r18
 132:   33 1f           adc     r19, r19
 134:   22 0f           add     r18, r18
 136:   33 1f           adc     r19, r19
 138:   28 0f           add     r18, r24
 13a:   39 1f           adc     r19, r25
(that's (i+i)+(i+i)+i )

I can't see a way of beating that.  Just for fun I tried a few other factors too, and GCC always comes up with extremely efficient code.

Ah, how I don't miss the days of writing assembler.  smiley
Logged

SF Bay Area (USA)
Offline Offline
Tesla Member
***
Karma: 106
Posts: 6373
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

(OTOH, for an 8-bit variable times a 16bit variable, writing assembler might very well be beneficial, because that's the sort of thing that C is defined NOT to do.  It will almost certainly convert the 8bit number to a 16bit number and then do a 16x16 multiply.)

(On the third hand, the AVR multiply instruction is rather inconvenient to use from C, with more that the usual number of restrictions on which registers are used...)
Logged

Global Moderator
Netherlands
Online Online
Shannon Member
*****
Karma: 168
Posts: 12433
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset


I would code x = x* 5  as   x = x + x <<2; if I wanted to optimize

don't know the assembly for that but it uses no multiply at all..
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 452
Posts: 18694
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

It's much faster to do in assembly.. smiley-grin

That is very unlikely to be true. The compiler writers know the processor backwards. As others have pointed out earlier the compiler already optimizes your requirements into six adds and a move. That's only 7 clock cycles. So even looking up "multiplication" in the assembler manual is going to lead you down the path of something that is potentially "much slower".

I would code x = x* 5  as   x = x + x <<2; if I wanted to optimize

Rob's idea is what I would have suggested myself. Bit shift by 2 to multiply by 4, and then add in the last one.

The other problem with assembly is that you will probably generate code to load the variables from memory, into registers. What else could you do? But the compiler may know the variables are already in certain registers and can skip that step.

This thread is starting to sound like a lot of other ones, like "how do I break out of an interrupt?". Let's step back ... WHY do you need to multiply something by 5 "much faster" than can be done in C?
Logged

Global Moderator
Netherlands
Online Online
Shannon Member
*****
Karma: 168
Posts: 12433
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
WHY do you need to multiply something by 5 "much faster" than can be done in C?
because it can be done ?
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 452
Posts: 18694
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The book "Beginners Introduction to the Assembly Language of ATMEL-AVR-Microprocessors" discusses multiplication.

http://www.avr-asm-tutorial.net/

He gives an example of 16 bit x 8 bit multiplication. He says it takes 10 clock cycles. However this is for two variables, not a variable and a constant. So I suppose 10 cycles isn't too bad. But that is more than the 7 cycles shown above for multiplying by a constant 5.

The code generated by C seems to me to be more than 10 cycles, but there is a bit of a crossover from the example on that web page as to what he is counting (in other words, is he counting loading and storing all variables?). In fact glancing at it, it seems to me that the code he is showing takes more than 10 cycles.

FWIW this is what I got from the C compiler:

Code:
c = a * b;
  d6: 20 91 00 01 lds r18, 0x0100
  da: 30 91 01 01 lds r19, 0x0101
  de: 80 91 02 01 lds r24, 0x0102
  e2: 90 e0        ldi r25, 0x00 ; 0
  e4: ac 01        movw r20, r24
  e6: 42 9f        mul r20, r18
  e8: c0 01        movw r24, r0
  ea: 43 9f        mul r20, r19
  ec: 90 0d        add r25, r0
  ee: 52 9f        mul r21, r18
  f0: 90 0d        add r25, r0
  f2: 11 24        eor r1, r1
  f4: 90 93 17 01 sts 0x0117, r25
  f8: 80 93 16 01 sts 0x0116, r24

LDS, STS and MUL are 2 cycles. LDI, MOVW, ADD and EOR are 1 cycle. So I count 22 cycles there. But again, if you let the compiler do it, it may not need to do some of those loads and stores, if it knows it has the variable in a register already.

It looks to me from the LDI of zero, that the compiler is extending the byte variable to an int, which is probably why the code above is a bit longer than it needs to be.

Test sketch:

Code:
volatile int a = 42;
volatile byte b = 16;
volatile int c;
void setup ()
 {
 Serial.begin (115200);
 c = a * b;
 Serial.println (c);
 }
void loop () {}
Logged

Pages: [1] 2   Go Up
Jump to: