Pages: [1]   Go Down
Author Topic: Obscure missed compiling optimisation...  (Read 429 times)
0 Members and 1 Guest are viewing this topic.
Leeds, UK
Offline Offline
Edison Member
*
Karma: 71
Posts: 1641
Once the magic blue smoke is released, it won't go back in!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I am messing around with optiboot to make it smaller for an ATTiny, and have found one place where I can save 8 byte (ends up being 64 as it allows the bootloader to be moved one page closer to the end)

To save space, i first declared this:
Code:
typedef union {
uint16_t integer;
uint8_t array[2];
}twoByte;
It replaces things like:
    someUint16 = (someUint8 << 8 ) | someOtherUint8;
To be:
    someTwoByte.array[1] = someUint8;
    someTwoByte.array[0] = someOtherUint8;
And can then be used directly as a int, massively reducing the code.

There is one bit where I am trying to use it however where the compiler could make an easy optimisation, but doesn't:
Code:
     bufPtr = buff;
      addrPtr = (uint16_t)(void*)address;
      ch = SPM_PAGESIZE / 2;
      do {
        twoByte a; //Again by using a union, code length is slashed, this time by 16 bytes.
        a.array[0] = *bufPtr++;
        a.array[1] = *bufPtr++;
        __boot_page_fill_short((uint16_t)(void*)addrPtr,a.integer);
        addrPtr += 2;
      } while (--ch);
Becomes:
Code:
   1e8e: a0 e0       ldi r26, 0x00 ; 0
    1e90: b1 e0       ldi r27, 0x01 ; 1
//start of the do while loop here
twoByte a; //Again by using a union, code length is slashed, this time by 16 bytes.
a.array[0] = *bufPtr++;
    1e92: 6c 90       ld r6, X ; brne jumps to here
a.array[1] = *bufPtr++;
    1e94: 11 96       adiw r26, 0x01 ; 1
    1e96: 7c 90       ld r7, X
    1e98: 11 97       sbiw r26, 0x01 ; 1 - Subtract one
    1e9a: 12 96       adiw r26, 0x02 ; 2 - Then add two?!?!?!?!?!
...
    1eae: 89 f7       brne .-30     ; //end of the do-while loop

For some reason it not only decides to use the "adiw" instruction, but also decides to essentially do "*bufPtr -1 +2", rather than just "*bufPtr + 1".

The amusing thing is that if I modify the code above to be this:
Code:
       twoByte a; //Again by using a union, code length is slashed, this time by 16 bytes.
        a.array[0] = *bufPtr++;
        a.array[1] = *bufPtr;//++;
It does what it is supposed to do, and uses this:
Code:
   1e8e: a0 e0       ldi r26, 0x00 ; 0
    1e90: b1 e0       ldi r27, 0x01 ; 1
//start of the do while loop here
twoByte a; //Again by using a union, code length is slashed, this time by 16 bytes.

a.array[0] = *bufPtr++;
    1e92: 6d 90       ld r6, X+
a.array[1] = *bufPtr;//++;
    1e94: 7c 90       ld r7, X
...
    1eae: 89 f7       brne .-30     ; //end of the do-while loop

Notice the use of X+ instead of adiw, which is a great optimisation but doesn't help me unless I can find a way to make it generate this:
Code:
   1e8e: a0 e0       ldi r26, 0x00 ; 0
    1e90: b1 e0       ldi r27, 0x01 ; 1
//start of the do while loop here
twoByte a; //Again by using a union, code length is slashed, this time by 16 bytes.

a.array[0] = *bufPtr++;
    1e92: 6d 90       ld r6, X+
a.array[1] = *bufPtr++;
    1e94: 7c 90       ld r7, X+
...
    1eae: 89 f7       brne .-30     ; //end of the do-while loop

Using X+ for both.

Could someone explain this one to me? as I can't make heads or tails of it, let alone fix it.
« Last Edit: August 08, 2012, 09:39:33 am by Tom Carpenter » Logged

~Tom~

Leeds, UK
Offline Offline
Edison Member
*
Karma: 71
Posts: 1641
Once the magic blue smoke is released, it won't go back in!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Well, i managed to find an optimisation elsewhere which saved the bytes, so fixing this isn't needed, though would be useful.
Logged

~Tom~

Ayer, Massachusetts, USA
Offline Offline
Edison Member
*
Karma: 50
Posts: 1766
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Well it would help if the Arduino project used more modern compilers.  I downloaded the 1.0.1 IDE for Linux, and the compiler that they include in the tar ball is 4.3.2 which was released in August 27, 2008.

The avr-gcc that is in the Fedora 14 distribution that I normally run is 4.5.3, which was released on April 28th, 2011.

The avr-gcc that is in the Fedora 17 distribution that I've been trying to run is 4.6.3, which was released on March 1st, 2012.

GCC 4.7.1 was released on June 14th, 2012.  I don't have a 4.7 built off hand.

Just for fun, I tried some small code on the 3 compilers, and the 4.3.2 code was 124 bytes, the 4.5.3 code was 104 bytes, and the 4.6.3 code  was 84 bytes.  Now obviously, it depends on what is being optimized.
Logged

North Queensland, Australia
Online Online
Edison Member
*
Karma: 53
Posts: 1785
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I have some 85's I'm keen on using, might be useful for me too, can you post your current code?
Logged


Global Moderator
Dallas
Offline Offline
Shannon Member
*****
Karma: 176
Posts: 12283
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Could someone explain this one to me?

Typically, expressions are represented as a tree and the optimizer works through the tree trying to flatten and prune.  My gut tells me the optimizer is limited to 2 passes through the tree and only "sees" a relatively short distance from any given node.  There also appear to be a few case specific deficiencies in the optimizer.  In other words, the optimizer is good but not great and occasionally does rather dumb things.

Quote
There is one bit where I am trying to use it however where the compiler could make an easy optimisation, but doesn't:

I strongly suggest you not bother with it.  In my experience, any tweaks you apply to get the optimizer to behave a certain way adversely affect other nearby code.  In other words, what you are trying to do is annoyingly similar to Whack-A-Mole; smack down one bit of "bad" code only to have another pop up someplace else.
Logged

Leeds, UK
Offline Offline
Edison Member
*
Karma: 71
Posts: 1641
Once the magic blue smoke is released, it won't go back in!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset


I strongly suggest you not bother with it.  In my experience, any tweaks you apply to get the optimizer to behave a certain way adversely affect other nearby code.  In other words, what you are trying to do is annoyingly similar to Whack-A-Mole; smack down one bit of "bad" code only to have another pop up someplace else.

Yeah, I am going to leave this one be. I managed to find something unneeded elsewhere which saved an equivalent amount of space allowing me to move the bootloader up one page (64bytes of extra space!).

I have some 85's I'm keen on using, might be useful for me too, can you post your current code?
I will be posting my code after a little more testing to make sure it works. It was built for the Tiny84, but should with a couple of extra #ifdef statements support the tiny 85 as I believe both work in an almost identical fashion.

Edit:
Timer 1 is different between the tiny84 and tiny85, but with a tiny modification, I have managed to compile an identical bootloader for the Tiny85 - totally untested though.
« Last Edit: August 08, 2012, 06:14:24 am by Tom Carpenter » Logged

~Tom~

Ayer, Massachusetts, USA
Offline Offline
Edison Member
*
Karma: 50
Posts: 1766
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Typically, expressions are represented as a tree and the optimizer works through the tree trying to flatten and prune.  My gut tells me the optimizer is limited to 2 passes through the tree and only "sees" a relatively short distance from any given node.  There also appear to be a few case specific deficiencies in the optimizer.  In other words, the optimizer is good but not great and occasionally does rather dumb things.
Believe me, GCC has many, many passes.   smiley

I suspect it may be aliasing issues, where the compiler has to be real careful so as not to mess up calculations.  Improving the alias detection has been one of the ongoing projects.

It may be as I mentioned before, it is fixed in newer releases.  However, if it is not fixed, you need to report the bug via GCC's bugzilla process in order to get it looked at.  This includes having a complete source file so that the compiler maintainers can reproduce it.  Unfortunately, in order to simplify things for newbies, the Arduino IDE hides a lot of the details from you, and it makes it harder.

]There is one bit where I am trying to use it however where the compiler could make an easy optimisation, but doesn't:

It really depends on the specifics.  Without a complete source to a function in question that shows the bug, it could be anything.  Believe me, what people think are easy optimizations, often aren't.  Sometimes it is a case, the maintainer didn't think of, other times it may be that optimizing that case is priority #565, and the developer has only gotten to #267.
Logged

Dubai, UAE
Offline Offline
Edison Member
*
Karma: 21
Posts: 1670
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Just a quick thank you - for introducing me to the union trick, it has been immediatley useful to me.

Duane B
rcarduino.blogspot.com




Logged


Ayer, Massachusetts, USA
Offline Offline
Edison Member
*
Karma: 50
Posts: 1766
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Just a quick thank you - for introducing me to the union trick, it has been immediatley useful to me.

Duane B
I should mention that using unions is the preferred way.  If you take random pointers and convert them to char * pointers to get to the individual bytes without using a union, you will get yourself in trouble, particularly with newer compilers.  This is non-standard behaviour and a number of people find that it just sort of happened to work in a previous revision, and it doesn't in the newer revision, and their code was broken in the first place.
Logged

Dubai, UAE
Offline Offline
Edison Member
*
Karma: 21
Posts: 1670
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Hi,
   I have gone for this -

Code:
union time16_t{
uint16_t theTime;
struct {
 uint8_t theLowByte; // little endian ordering
 uint8_t theHighByte;
 };
};

Results are as expected -

Code:
// nonsensical snippet to test compiler output using union to access individual
// bytes of 2 byte int.

// result = single load, compare and store, Yay !

 if(currentTime.theHighByte > 100)
     1c4: 80 91 3a 01 lds r24, 0x013A
     1c8: 85 36        cpi r24, 0x65 ; 101
     1ca: 38 f0        brcs .+14      ; 0x1da <loop+0xb6>
  {
    if(currentTime.theLowByte < 20)
     1cc: 80 91 39 01 lds r24, 0x0139
     1d0: 84 31        cpi r24, 0x14 ; 20
     1d2: 18 f4        brcc .+6      ; 0x1da <loop+0xb6>
    {
     currentTime.theHighByte = 1;   
     1d4: 81 e0        ldi r24, 0x01 ; 1
     1d6: 80 93 3a 01 sts 0x013A, r24
    }
}


It should cut down quite a bit of code where I do not need the resolution offered by the least significant byte in comparisons and range checks but still need maximum accuracy in later operations.

Great, Thanks

Duane B

rcarduino.blogspot.com
Logged


Ayer, Massachusetts, USA
Offline Offline
Edison Member
*
Karma: 50
Posts: 1766
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Hi,
   I have gone for this -

Code:
union time16_t{
uint16_t theTime;
struct {
 uint8_t theLowByte; // little endian ordering
 uint8_t theHighByte;
 };
};

Just bear in mind that this may have to be adjusted if you ever move your code off Arduinos.  Some processors are big endian (or bi-endian, and the RTOS/OS you are using might be running in big-endian mode).  Also using unions will make your code slower if you are running on a machine with 16/32/64-bit native integers.
Logged

Pages: [1]   Go Up
Jump to: