inline vs non-inline functions: something seems wrong here

I was playing around with a binary-coded decimal arithmetic library I made.
I was trying to speed up the multiplication function by making it inline, but I found that I got weird behavior when I did that.

Please see the sketches in the attached ZIP file. The error appears in sketches #4 and #6.

In case it matters, I’m using Arduino 1.0.5, and I have it set to compile to Arduino Duemilanove w/ATmega328.

bcd_compiler_tests.zip (8.09 KB)

What "error" occurs? What do you consider "weird behaviour"?

Can you reduce and isolate the code enough to post a concise example of the problem?

int2str:
What “error” occurs?
What do you consider “weird behaviour”?

The error is an obvious arithmetic error in the results. For me, #5 runs fine, but #6 acts crazy.
I am aware that the arithmetic is supposed to be correct modulo 160 million. That is because the encoding I am using has a hexadecimal most-significant digit, and the remaining seven digits are 4-bit binary-coded decimal.

[/quote]
Can you reduce and isolate the code enough to post a concise example of the problem?
[/quote]
That I did, in sketches #5 and #6.

odometer:
Can you reduce and isolate the code enough to post a concise example of the problem?

I think it’s being requested that you post the code and the incorrect output in a post, instead of requiring people to download and open a zip file, and then dig through it.

odometer:

6 acts crazy.

I don't know what you consider crazy. Can you post a sketch which you believe produces the wrong result, and say what result you expected and what result it actually produced? Looking at multiple sketches and trying to guess what you expect them to do is a waste of time.

Well, here’s my library:

#include "Arduino.h"

inline boolean bcdBadL (unsigned long x) {
  // check all EXCEPT high nibble
  return (((x ^ (x+0x06666666)) & 0x11111110) != 0);
}

inline unsigned long bcdAddL (unsigned long x, unsigned long y) {
  // no sanity checking of input
  // "out of range" results are e.g. A0000000 for 100 million
  unsigned long binsum = x + y;
  unsigned long carry = ((binsum + 0x06666666) ^ x ^ y) & 0x11111110;
  return (binsum + ((carry>>2) | (carry>>3)));
}

inline unsigned long bcdSubL (unsigned long x, unsigned long y) {
  // no sanity checking of input
  // "negative" results are e.g. F9999999 for -1
  unsigned long bindiff = x - y;
  unsigned long borrow = (bindiff ^ x ^ y) & 0x11111110;
  return (bindiff - ((borrow>>2) | (borrow>>3)));
}

inline unsigned long bcdTwiceL (unsigned long x) {
  // no sanity checking of input
  unsigned long adj = (x + 0x03333333) & 0x08888888;
  return (x << 1) + adj - (adj >> 2); // magic here
}

inline unsigned long bcdTenfoldL (unsigned long x) {
  // because of the way we use the high nibble, this is nontrivial
  unsigned long highpart = x & 0xF0000000;
  return ((highpart<<3) + (highpart<<1)) + (x<<4);
}

unsigned long bcdMultL(unsigned long x, unsigned long y) {
  // this function multiplies two BCD numbers by means of
  // BCD addition, subtraction, doubling, multiplication by ten,
  // and a little extra magic
  unsigned long total = 0;
  byte d = 0;
  unsigned long twice = 0;
  while (true) {
    d = (15&(byte)y);
    if (d&1) total = bcdAddL(total,x);
    if (d>1) {
      twice = bcdTwiceL(x);
      if (d&2) total = bcdAddL(total,twice);
      if (d&4) total = bcdAddL(total,bcdTwiceL(twice));
      if (d&8) {
        total = bcdSubL(total,twice);
        y += 8; // magic
        if (!(0xF0&(byte)y)) y |= 0x60; // more magic
      }
    }
    y >>= 4;
    if (!y) return total;
    x = bcdTenfoldL(x); 
  }
}

And here’s what I was trying to run:

#include "bcd.h"


unsigned long index = 0x0;
unsigned long count;
unsigned long result;

void setup() {
  Serial.begin(9600);
  Serial.println("Calculation of pyramidal numbers");
  Serial.println("I define the nth pyramidal number as");
  Serial.println("1*1 + 2*2 + 3*3 + ... n*n");
  Serial.println();
}

void loop () {
  index = bcdAddL(index, 0x1);
  result = 0x0;
  for (count = 0x1; count <= index; count = bcdAddL(count, 0x1)) {
    result =bcdAddL(result, bcdMultL(count, count));
  }
  Serial.print(index,HEX);
  Serial.print("  ");
  Serial.println(result,HEX);
  delay(800);
}

And as for “crazy”:
Sketch #5 works all right.
Sketch #6 goes crazy.
And by “crazy” I mean “produces wildly wrong answers”. I’m not at home now, so I’ll have to post the crazy output some other time.

in test app 6:

The problem is in - inline unsigned long bcdTenfoldL (unsigned long x) {

if you only make that one not inline than the results are as expected.

not found the root cause/remedy yet.

In the spirit of trying to help...

I work in a technical field where troubleshooting makes up a considerable portion of my daily bread. Nothing is more frustrating than talking with other techs whose vocabulary is limited to phrases like "it doesn't work" or "it's slow". Some of these folks get paid good money to be a subject matter authority.

In this case... "I tried to multiply 3 times 2 and got 87 and a half" would be a good descriptive answer. "I get crazy results" isn't so much. What does that actually mean? "3 * 2 = leather pants on megalomaniacal rodents?"

This isn't meant to insult anyone, just a plea for everyone to always endeavor to provide useful information in any problem report. You'll make your reader's day, and you're very likely to get more and better answers.

Also, the test cases are great, don't get me wrong... but I'll admit, if I have to go through "all the extra effort" :roll_eyes: to unzip a file, I will probably not bother unless I'm really interested in the description.

odometer: Sketch #6 goes crazy. And by "crazy" I mean "produces wildly wrong answers".

Which is about as pointless a clarification as you could imagine.

Write a sketch which performs ONE calculation which returns a result which you believe is wrong, and tell us what result you expected and what result it actually returned.

For bonus points, put the whole sketch in one file to make it easier for us to run it for ourselves.

odometer:
And as for “crazy”:
Sketch #5 works all right.
Sketch #6 goes crazy.
And by “crazy” I mean “produces wildly wrong answers”. I’m not at home now, so I’ll have to post the crazy output some other time.

I ran the code you posted in reply #5.

Here is the output:

Calculation of pyramidal numbers
I define the nth pyramidal number as
1*1 + 2*2 + 3*3 + ... n*n

1  1
2  5
3  14
4  30
5  55
6  91
7  140
8  204
9  285
10  385
11  506
12  650
13  819
14  1015
15  1240
16  1496
17  1785
18  2109
19  2470
20  2870
21  3311
22  3795
23  4324
24  4900
25  5525
26  6201
27  6930
28  7714
29  8555
30  9455
31  10416
32  11440
33  12529
34  13685
35  14910
36  16206
37  17575
38  19019
39  20540
40  22140
41  23821
42  25585
43  27434
44  29370
45  31395
46  33511
47  35720
48  38024
49  40425
50  42925
51  45526
52  48230
53  51039
54  53955
55  56980
56  60116
57  63365
58  66729
59  70210
60  73810
61  77531

Is that “crazy” or “not crazy”? Can you specify at what point the inlined version (which this appears to be) fails? Which index? What is the number you get compared to the one you expect?

Is this compiled as a library literally, or as another tab in the IDE?

It’s unusual to have code in a .h file, usually it goes into a .cpp file.

I reproduced your test in Lua with the same results, so I’m guessing this is the “not crazy” version. Arrgh! This terminology is driving me nuts!

for i = 1, 61 do
  local result = 0
  for j = 1, i do
    result = result + (j * j)
  end -- for
  print (i, result)
end -- for

Here’s something odd:

  Serial.print(index,HEX);
  Serial.print("  ");
  Serial.println(result,HEX);

The output I got was not hex. I compared it to the Lua output which was in decimal.

1 1
2 5
3 14
4 30
5 55
6 91
7 140
8 204
9 285
10 385
11 506
12 650
13 819
14 1015
15 1240
16 1496
17 1785
18 2109
19 2470
20 2870
21 3311
22 3795
23 4324
24 4900
25 5525
26 6201
27 6930
28 7714
29 8555
30 9455
31 10416
32 11440
33 12529
34 13685
35 14910
36 16206
37 17575
38 19019
39 20540
40 22140
41 23821
42 25585
43 27434
44 29370
45 31395
46 33511
47 35720
48 38024
49 40425
50 42925
51 45526
52 48230
53 51039
54 53955
55 56980
56 60116
57 63365
58 66729
59 70210
60 73810
61 77531

Here's the output from sketch #6:

Calculation of pyramidal numbers
I define the nth pyramidal number as
1*1 + 2*2 + 3*3 + ... n*n

1  1
2  600005
3  1200014
4  2400030
5  3600055
6  5400091
7  7200140
8  9600204
9  12000285
10  12000385
11  12000506
12  15600650
13  19200819
14  23401015
15  27601240
16  32401496
17  37201785
18  43202109
19  49202470
20  49802870
21  50403311
22  57603795
23  64804324
24  72604900
25  80405525
26  88806201
27  97206930
28  A6207714
29  B5208555
30  B5809455
31  B6410416
32  C6611440
33  D6812529
34  E7613685
35  F8414910
36  9816206
37  21217575
38  33819019
39  46420540
40  47622140
41  48823821
42  62625585
43  76427434
44  90829370
45  A5231395
46  C0233511
47  D5235720
48  F0838024
49  6440425
50  7642925
51  8845526
52  25648230
53  42451039
54  59853955
55  77256980
56  95260116
57  B3263365
58  D2466729
59  F1670210
60  F3473810
61  F5277531

It seems to be a bizarre amalgam of the correct answers on the right (in decimal) with some other stuff in hex, on the left.

I withdraw what I said about it being "not hex" because clearly you are using BCD arithmetic.

It looks on the surface to be a compiler optimization bug to me.

I note that if I remove "inline" totally I get this:

Binary sketch size: 3,950 bytes (of a 32,256 byte maximum)

And the buggy sketch #6 with everything inlined gives:

Binary sketch size: 3,950 bytes (of a 32,256 byte maximum)

This is IDE 1.0.5 compiled for the Uno.

So honestly, I don't think inlining is doing much except confusing the compiler.

If anyone has a more up-to-date compiler and want to test it, it would be interesting to see if it works better.

This is the “buggy” version of bcd.h:

#include "Arduino.h"

inline boolean bcdBadL (unsigned long x) {
  // check all EXCEPT high nibble
  return (((x ^ (x+0x06666666)) & 0x11111110) != 0);
}

inline unsigned long bcdAddL (unsigned long x, unsigned long y) {
  // no sanity checking of input
  // "out of range" results are e.g. A0000000 for 100 million
  unsigned long binsum = x + y;
  unsigned long carry = ((binsum + 0x06666666) ^ x ^ y) & 0x11111110;
  return (binsum + ((carry>>2) | (carry>>3)));
}

inline unsigned long bcdSubL (unsigned long x, unsigned long y) {
  // no sanity checking of input
  // "negative" results are e.g. F9999999 for -1
  unsigned long bindiff = x - y;
  unsigned long borrow = (bindiff ^ x ^ y) & 0x11111110;
  return (bindiff - ((borrow>>2) | (borrow>>3)));
}

inline unsigned long bcdTwiceL (unsigned long x) {
  // no sanity checking of input
  unsigned long adj = (x + 0x03333333) & 0x08888888;
  return (x << 1) + adj - (adj >> 2); // magic here
}

inline unsigned long bcdTenfoldL (unsigned long x) {
  // because of the way we use the high nibble, this is nontrivial
  unsigned long highpart = x & 0xF0000000;
  return ((highpart<<3) + (highpart<<1)) + (x<<4);
}

inline unsigned long bcdMultL(unsigned long x, unsigned long y) {
  // this function multiplies two BCD numbers by means of
  // BCD addition, subtraction, doubling, multiplication by ten,
  // and a little extra magic
  unsigned long total = 0;
  byte d = 0;
  unsigned long twice = 0;
  while (true) {
    d = (15&(byte)y);
    if (d&1) total = bcdAddL(total,x);
    if (d>1) {
      twice = bcdTwiceL(x);
      if (d&2) total = bcdAddL(total,twice);
      if (d&4) total = bcdAddL(total,bcdTwiceL(twice));
      if (d&8) {
        total = bcdSubL(total,twice);
        y += 8; // magic
        if (!(0xF0&(byte)y)) y |= 0x60; // more magic
      }
    }
    y >>= 4;
    if (!y) return total;
    x = bcdTenfoldL(x); 
  }
}

An 100% inline version that does “normal” . I removed some magic lines

my first assumption that it was the tenfold was apparently wrong.

#include "Arduino.h"

inline boolean bcdBadL (unsigned long x) {
  // check all EXCEPT high nibble
  return (((x ^ (x+0x06666666)) & 0x11111110) != 0);
}

inline unsigned long bcdAddL (unsigned long x, unsigned long y) {
  // no sanity checking of input
  // "out of range" results are e.g. A0000000 for 100 million
  unsigned long binsum = x + y;
  unsigned long carry = ((binsum + 0x06666666) ^ x ^ y) & 0x11111110;
  return (binsum + ((carry>>2) | (carry>>3)));
}

inline unsigned long bcdSubL (unsigned long x, unsigned long y) {
  // no sanity checking of input
  // "negative" results are e.g. F9999999 for -1
  unsigned long bindiff = x - y;
  unsigned long borrow = (bindiff ^ x ^ y) & 0x11111110;
  return (bindiff - ((borrow>>2) | (borrow>>3)));
}

inline unsigned long bcdTwiceL (unsigned long x) {
  // no sanity checking of input
  unsigned long adj = (x + 0x03333333) & 0x08888888;
  return (x << 1) + adj - (adj >> 2); // magic here
}

inline unsigned long bcdTenfoldL (unsigned long x) {
  // because of the way we use the high nibble, this is nontrivial
  unsigned long highpart = x & 0xF0000000;
  return ((highpart<<3) + (highpart<<1)) + (x<<4);
}

inline unsigned long bcdMultL(unsigned long x, unsigned long y) {
  // this function multiplies two BCD numbers by means of
  // BCD addition, subtraction, doubling, multiplication by ten,
  // and a little extra magic
  unsigned long total = 0;
  byte d = 0;
  unsigned long twice = 0;
  while (true) {
    d = (15&(byte)y);
    if (d&1) total = bcdAddL(total, x);
    if (d>1) {
      twice = bcdTwiceL(x);
      if (d&2) total = bcdAddL(total, twice);
      twice = bcdTwiceL(twice);  ///////////////////////// from here some changes
      if (d&4) total = bcdAddL(total, twice);
      twice = bcdTwiceL(twice);
      if (d&8)  total = bcdAddL(total, twice);
    }
    y >>= 4;
    if (!y) return total;
    x = bcdTenfoldL(x); 
  }
}

(lesson in magic)
the lines removed state: total = total + x8 <==> total = total - x2 + x*10;

update: removed a proposal that had failures of its own :frowning:

[quote author=Nick Gammon link=topic=191390.msg1415316#msg1415316 date=1380864740] It looks on the surface to be a compiler optimization bug to me.

*** snip, snip ***

This is IDE 1.0.5 compiled for the Uno.

So honestly, I don't think inlining is doing much except confusing the compiler.

If anyone has a more up-to-date compiler and want to test it, it would be interesting to see if it works better. [/quote]

Is it just me, or does this compiler stink, especially at optimization?

Optimization is always tricky and automated optimization is no difference. But the optimizations of compilers are tested quite a lot as it is the core of their job: Making fast code fast.

As the number of scenarios to be optimized is infinite it is impossible to test all of them. So there is also

Optimization becomes especially difficult if code includes "magic", incl casting and code near the edge of the semantic edges of the language. In those cases the compiler can only hope that the programmer knows what (s)he is doing. IMHO in the end it is the programmers responsibility to verify the code does what it was meant to do.

odometer:
Is it just me, or does this compiler stink, especially at optimization?

Looking at the generated code, even without the “inline” keyword, it looks to me like it was inlining stuff. So I wouldn’t agree that it stank/stunk or whatever the word is.

What seems more likely is that the case of people using inline multiple times (as you did) when not necessary (as the compiler was already inlining) has kicked it into some sort of bug that has gone unrecognized so far. Mainly because so few people are doing that, that the bug hasn’t surfaced often.

Every now and then people produce test cases which demonstrate that the compiler has not perfectly optimized (I think integer multiplication might be one of them) but for most code I have seen the code generated is pretty efficient.

If you have the energy, you could go through the gcc bug tracker, try to work out if this bug has been reported before, and if not, report it.