I am messing around with optiboot to make it smaller for an ATTiny, and have found one place where I can save 8 byte (ends up being 64 as it allows the bootloader to be moved one page closer to the end)
To save space, i first declared this:
typedef union {
uint16_t integer;
uint8_t array[2];
}twoByte;
It replaces things like:
someUint16 = (someUint8 << 8 ) | someOtherUint8;
To be:
someTwoByte.array[1] = someUint8;
someTwoByte.array[0] = someOtherUint8;
And can then be used directly as a int, massively reducing the code.
There is one bit where I am trying to use it however where the compiler could make an easy optimisation, but doesn't:
bufPtr = buff;
addrPtr = (uint16_t)(void*)address;
ch = SPM_PAGESIZE / 2;
do {
twoByte a; //Again by using a union, code length is slashed, this time by 16 bytes.
a.array[0] = *bufPtr++;
a.array[1] = *bufPtr++;
__boot_page_fill_short((uint16_t)(void*)addrPtr,a.integer);
addrPtr += 2;
} while (--ch);
Becomes:
1e8e: a0 e0 ldi r26, 0x00 ; 0
1e90: b1 e0 ldi r27, 0x01 ; 1
//start of the do while loop here
twoByte a; //Again by using a union, code length is slashed, this time by 16 bytes.
a.array[0] = *bufPtr++;
1e92: 6c 90 ld r6, X ; brne jumps to here
a.array[1] = *bufPtr++;
1e94: 11 96 adiw r26, 0x01 ; 1
1e96: 7c 90 ld r7, X
1e98: 11 97 sbiw r26, 0x01 ; 1 - Subtract one
1e9a: 12 96 adiw r26, 0x02 ; 2 - Then add two?!?!?!?!?!
...
1eae: 89 f7 brne .-30 ; //end of the do-while loop
For some reason it not only decides to use the "adiw" instruction, but also decides to essentially do "*bufPtr -1 +2", rather than just "*bufPtr + 1".
The amusing thing is that if I modify the code above to be this:
twoByte a; //Again by using a union, code length is slashed, this time by 16 bytes.
a.array[0] = *bufPtr++;
a.array[1] = *bufPtr;//++;
It does what it is supposed to do, and uses this:
1e8e: a0 e0 ldi r26, 0x00 ; 0
1e90: b1 e0 ldi r27, 0x01 ; 1
//start of the do while loop here
twoByte a; //Again by using a union, code length is slashed, this time by 16 bytes.
a.array[0] = *bufPtr++;
1e92: 6d 90 ld r6, X+
a.array[1] = *bufPtr;//++;
1e94: 7c 90 ld r7, X
...
1eae: 89 f7 brne .-30 ; //end of the do-while loop
Notice the use of X+ instead of adiw, which is a great optimisation but doesn't help me unless I can find a way to make it generate this:
1e8e: a0 e0 ldi r26, 0x00 ; 0
1e90: b1 e0 ldi r27, 0x01 ; 1
//start of the do while loop here
twoByte a; //Again by using a union, code length is slashed, this time by 16 bytes.
a.array[0] = *bufPtr++;
1e92: 6d 90 ld r6, X+
a.array[1] = *bufPtr++;
1e94: 7c 90 ld r7, X+
...
1eae: 89 f7 brne .-30 ; //end of the do-while loop
Using X+ for both.
Could someone explain this one to me? as I can't make heads or tails of it, let alone fix it.