64-bit signed multiplication and division

Does anyone know where to find code to do signed 64-bit multiplication and division with an Arduino? I only have a handful of operations to perform so it doesn't have to be elaborate. Just some functions to do the dirty work.

avrfreaks.net is the place to ask.

Do you mean 64x64 to get a 128 bit result, or 32x32 to get a 64 bit result?

Neither. I mean 64x64 to get a 64 bit result.

The 64 bit analog to what is supported in Arduino:

int32_t x, y, z;
z = x * x / y;

But I'm not looking for a 64-bit data type, just code to perform the operations on existing data types. For example, a 64-bit number could be encoded into two 32 bit integers. Whatever works.

But I'm not looking for a 64-bit data type

Why not?
What's wrong with "long long" and unsigned long long"?

jboyton:
Neither. I mean 64x64 to get a 64 bit result.

If you multiply two binary numbers, the result is twice as long as the input. Take two bytes, for example: a byte has 256 values and 256*256=65536, which you should recognise as a 16-bit number.

Usually you would expect the output to be truncated to the same length as the inputs but it's easy to exceed the maximum value - multiplication gets very big very quickly. Hardware multipliers (not on the Arduino) usually have an output register twice as large as the input size.

(Long long works for me, tested on the nearest Arduino I could reach without getting out of my chair. It's 64 bit.)

AWOL:
Why not?
What's wrong with "long long" and unsigned long long"?

What? I didn't know it existed.
That'll save me a lot of time reinventing the wheel.

Is it mentioned in the Arduino online documentation?
Does the type have any particular limitations I should know about?

The Arduino documentation only gives you a small window into the universe of C code. The whole lot of it is still there underneath but it's not relevant to the beginner Arduino user.

Find yourself a good online reference for C++ and start from there.

What I find is more difficult is the AVR code that's also in the Arduino compiler. There's enormous amounts of stuff set up for you but working out what's the correct #DEFINE macro for something like adjusting the analog conversion speed can take a lot of digging.

I realize that Arduino documentation doesn't cover all of C++. But does that mean everything in C++ is ported by the Arduino compiler? For example, the documentation states that on the Uno a double is the same as a float. So I just sort of assumed that 32-bits were all there were to play with, whether floating point or fixed point. I'm tickled to learn this isn't the case.

Be warned that long long generates a lot of code.

You could also consider big numbers.

jboyton:
I realize that Arduino documentation doesn't cover all of C++. But does that mean everything in C++ is ported by the Arduino compiler? For example, the documentation states that on the Uno a double is the same as a float. So I just sort of assumed that 32-bits were all there were to play with, whether floating point or fixed point. I'm tickled to learn this isn't the case.

The main library supported by the Arduino IDE is the "AVR LIBC" library.

If you have installed the Arduino software, you have a copy of the AVR LIBC library manual on your harddisk.
The path on your harddisk to the AVR LIBC manual is:
[Path to Arduino]/hardware/tools/avr/doc/avr-libc/avr-libc-user-manual/index.html

The same is available online here.

All the standard integer types are documented here.

jurs -- thanks for that.

Nick -- how much code? Is the big numbers library smaller? Does big numbers have a signed 64 bit type? When I've looked briefly at it in the past I couldn't quite figure out how the data is represented. It seemed like you specified the number of decimal places or something like that, not the number of bits.

Rule of thumb; 103 is roughly 210, so figure your number of bit from that.
Or, simply take log2 of your biggest number, round up to the nearest integer, and that'll give you your number of bits needed.

jboyton:
Nick -- how much code? Is the big numbers library smaller? Does big numbers have a signed 64 bit type? When I've looked briefly at it in the past I couldn't quite figure out how the data is represented. It seemed like you specified the number of decimal places or something like that, not the number of bits.

It's hard to give a specific answer. If you use long long your code will quickly increase in size, simply in relation to the number of instructions involving it.

So:

Empty sketch: 472 bytes.

long long foo;
long long bar;
void setup () { }
void loop ()
  {
  foo++;
  }  // end of loop

626 bytes.

long long foo;
long long bar;
void setup () { }
void loop ()
  {
  foo++;
  foo += bar;  
  }  // end of loop

854 bytes.

long long foo;
long long bar;
void setup () { }
void loop ()
  {
  foo++;
  foo += bar;  
  foo *= bar;
  }  // end of loop

1532 bytes.


Change the two "long long" to int and we get:

490 / 502 / 520 bytes.


So with "long long" we are up to over 1500 bytes with a bit of simple arithmetic.

And if you look at the generated code to add 1 to foo we can see why:

000000a8 <loop>:
  a8:	1f 93       	push	r17
  aa:	80 91 00 01 	lds	r24, 0x0100
  ae:	20 91 01 01 	lds	r18, 0x0101
  b2:	30 91 02 01 	lds	r19, 0x0102
  b6:	40 91 03 01 	lds	r20, 0x0103
  ba:	60 91 04 01 	lds	r22, 0x0104
  be:	e0 91 05 01 	lds	r30, 0x0105
  c2:	a0 91 06 01 	lds	r26, 0x0106
  c6:	10 91 07 01 	lds	r17, 0x0107
  ca:	b8 2f       	mov	r27, r24
  cc:	bf 5f       	subi	r27, 0xFF	; 255
  ce:	91 e0       	ldi	r25, 0x01	; 1
  d0:	b8 17       	cp	r27, r24
  d2:	08 f0       	brcs	.+2      	; 0xd6 <loop+0x2e>
  d4:	90 e0       	ldi	r25, 0x00	; 0
  d6:	f9 2f       	mov	r31, r25
  d8:	f2 0f       	add	r31, r18
  da:	81 e0       	ldi	r24, 0x01	; 1
  dc:	f2 17       	cp	r31, r18
  de:	08 f0       	brcs	.+2      	; 0xe2 <loop+0x3a>
  e0:	80 e0       	ldi	r24, 0x00	; 0
  e2:	78 2f       	mov	r23, r24
  e4:	73 0f       	add	r23, r19
  e6:	81 e0       	ldi	r24, 0x01	; 1
  e8:	73 17       	cp	r23, r19
  ea:	08 f0       	brcs	.+2      	; 0xee <loop+0x46>
  ec:	80 e0       	ldi	r24, 0x00	; 0
  ee:	58 2f       	mov	r21, r24
  f0:	54 0f       	add	r21, r20
  f2:	81 e0       	ldi	r24, 0x01	; 1
  f4:	54 17       	cp	r21, r20
  f6:	08 f0       	brcs	.+2      	; 0xfa <loop+0x52>
  f8:	80 e0       	ldi	r24, 0x00	; 0
  fa:	38 2f       	mov	r19, r24
  fc:	36 0f       	add	r19, r22
  fe:	81 e0       	ldi	r24, 0x01	; 1
 100:	36 17       	cp	r19, r22
 102:	08 f0       	brcs	.+2      	; 0x106 <loop+0x5e>
 104:	80 e0       	ldi	r24, 0x00	; 0
 106:	28 2f       	mov	r18, r24
 108:	2e 0f       	add	r18, r30
 10a:	81 e0       	ldi	r24, 0x01	; 1
 10c:	2e 17       	cp	r18, r30
 10e:	08 f0       	brcs	.+2      	; 0x112 <loop+0x6a>
 110:	80 e0       	ldi	r24, 0x00	; 0
 112:	98 2f       	mov	r25, r24
 114:	9a 0f       	add	r25, r26
 116:	81 e0       	ldi	r24, 0x01	; 1
 118:	9a 17       	cp	r25, r26
 11a:	08 f0       	brcs	.+2      	; 0x11e <loop+0x76>
 11c:	80 e0       	ldi	r24, 0x00	; 0
 11e:	81 0f       	add	r24, r17
 120:	b0 93 00 01 	sts	0x0100, r27
 124:	f0 93 01 01 	sts	0x0101, r31
 128:	70 93 02 01 	sts	0x0102, r23
 12c:	50 93 03 01 	sts	0x0103, r21
 130:	30 93 04 01 	sts	0x0104, r19
 134:	20 93 05 01 	sts	0x0105, r18
 138:	90 93 06 01 	sts	0x0106, r25
 13c:	80 93 07 01 	sts	0x0107, r24
 140:	1f 91       	pop	r17
 142:	08 95       	ret

OK, so far so good. The bignumber library has an initial overhead of the code for the library. That won't change much as you do more arithmetic, except for the code for function calls. For example, to calculate "e" takes 11228 bytes.

The main overhead in big numbers is RAM, because each digit is stored as a byte. So you can see that you will run out of RAM if you use lots of digits, or lots of variables with lots of digits. But a 64-bit number is really only about 20 decimal digits. So you would only need (roughly) 20 bytes of RAM per variable. This is probably OK.

Does big numbers have a signed 64 bit type?

No, it just stores decimal numbers as I described. They are signed. You specify how many decimal places, if any, that you want.

The example code to calculate factorials, for example only uses 8740 bytes, and gives results like this:

2! = 2
3! = 6
4! = 24
5! = 120
6! = 720
7! = 5040
8! = 40320
9! = 362880
10! = 3628800
11! = 39916800
12! = 479001600
13! = 6227020800
14! = 87178291200
15! = 1307674368000
16! = 20922789888000
17! = 355687428096000
18! = 6402373705728000
19! = 121645100408832000
20! = 2432902008176640000
21! = 51090942171709440000
22! = 1124000727777607680000
23! = 25852016738884976640000
24! = 620448401733239439360000
25! = 15511210043330985984000000
26! = 403291461126605635584000000
27! = 10888869450418352160768000000
28! = 304888344611713860501504000000
29! = 8841761993739701954543616000000
30! = 265252859812191058636308480000000
31! = 8222838654177922817725562880000000
32! = 263130836933693530167218012160000000
33! = 8683317618811886495518194401280000000
34! = 295232799039604140847618609643520000000
35! = 10333147966386144929666651337523200000000
36! = 371993326789901217467999448150835200000000
37! = 13763753091226345046315979581580902400000000
38! = 523022617466601111760007224100074291200000000
39! = 20397882081197443358640281739902897356800000000
40! = 815915283247897734345611269596115894272000000000
...
194! = 13291789929008494930671731515822733014985079866423409165175226140878049646258859332955000974414316999457420949006882042582450342442734412047662165693973895355712948808511029850588270220388904130305896013007257787995674698685240384659920882080028170533221115412350068243944971300227595295350228888376232520868975083520000000000000000000000000000000000000000000000
195! = 2591899036156656511480987645585432937922090573952564787209169097471219681020477569926225190010791814894197085056341998303577816776333210349294122310324909594364025017659650820864712692975836305409649722536415268659156566243621875008684572005605493253978117505408263307569269403544381082593294633233365341569450141286400000000000000000000000000000000000000000000000
196! = 508012211086704676250273578534744855832729752494702698292997143104359057480013603705540137242115195719262628671043031667501252088161309228461647972823682280495348903461291560889483687823263915860291345617137392657194686983749887501702176113098676677779711031060019608283576803094698692188285748113739606947612227692134400000000000000000000000000000000000000000000000
197! = 100078405584080821221303894971344736599047761241456431563720437191558734323562679929991407036696693556694737848195477238497746661367777918006944650646265409257583733981874437495228286501182991424477395086576066353467353335798727837835328694280439305522603073118823862831864630209655642361092292378406702568679608855350476800000000000000000000000000000000000000000000000
198! = 19815524305648002601818171204326257846611456725808373449616646563928629396065410626138298593265945324225558093942704493222553838950820027765375040827960551033001579328411138624055200727234232302046524227142061137986535960488148111891395081467526982493475408477527124840709196781511817187496273890924527108598562553359394406400000000000000000000000000000000000000000000000
199! = 3943289336823952517761816069660925311475679888435866316473712666221797249817016714601521420059923119520886060694598194151288213951213185525309633124764149655567314286353816586186984944719612228107258321201270166459320656137141474266387621212037869516201606287027897843301130159520851620311758504293980894611113948118519486873600000000000000000000000000000000000000000000000
200! = 788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000

You can see they are a lot more than 20 digits.

Thanks, Nick.

The code I have does a lot of bit shifting and casting and I'm worried that it won't be a seamless translation from int64_t to BigNumber with 20 decimal places. Am I worrying for no reason?
This is the code:

uint32_t bmp280_compensate_P_int64(int32_t adc_P)
{
  int64_t var1, var2, p;
  var1 = ((int64_t)t_fine) – 128000;
  var2 = var1 * var1 * (int64_t)dig.P6;
  var2 = var2 + ((var1*(int64_t)dig.P5)<<17);
  var2 = var2 + (((int64_t)dig.P4)<<35);
  var1 = ((var1 * var1 * (int64_t)dig.P3)>>8) + ((var1 * (int64_t)dig.P2)<<12);

  var1 = (((((int64_t)1)<<47)+var1))*((int64_t)dig.P1)>>33;
  if (var1 == 0) return 0;

  p = 1048576-adc_P;
  p = (((p<<31)-var2)*3125)/var1;
  var1 = (((int64_t)dig.P9) * (p>>13) * (p>>13)) >> 25;

  var2 = (((int64_t)dig.P8) * p) >> 19;
  p = ((p + var1 + var2) >> 8) + (((int64_t)dig.P7)<<4);

  return (uint32_t)p;

}

I compiled it with int64_t and it added 8KB. Unfortunately that's too big (no pun intended).
Maybe I will experiment with BigNumber.

BigNumber isn't really designed for bit shifting, although it might work OK with divide or multiply by 2. Also it will tend to be slow.

Maybe you should describe your project? There might be other ways of doing it.

Neither. I mean 64x64 to get a 64 bit result.

That suggests you don't know enough about multiplication to be using it.

Barometric pressure sensor conversion. You can deduce that, even before you notice "bmp280" in the function name.

There have been a lot of interesting comments on how best to do this, using integers and floats of various sizes and optimising the calculation.

A lot of those bit shifted things are constants. Calculate the shifted value once, in "setup", and use the result in the real-time calculation. This makes more sense than trying to do the same calculation billions of times.

michinyon:
A lot of those bit shifted things are constants. Calculate the shifted value once, in "setup", and use the result in the real-time calculation. This makes more sense than trying to do the same calculation billions of times.

It would if what you are saying were true.

I tried the BigNumber approach but so far it isn't working correctly. And the code I wrote is 3KB larger than when I was using int64_t to do it. So I'm not inclined to spend the time debugging it.

As for my project, there is a 32-bit version of the code which works. It has less precision so the signal is a little noisier. I also have the option of post-processing the raw data on a computer, assuming it's worth it.