It's clear the compilers are different. This is the top of loop from the NOT working...
000009ca <loop>:
9ca: 80 91 30 02 lds r24, 0x0230
9ce: 90 91 31 02 lds r25, 0x0231
9d2: 00 97 sbiw r24, 0x00 ; 0
From WORKING...
00000956 <loop>:
956: 80 91 1e 01 lds r24, 0x011E
95a: 90 91 1f 01 lds r25, 0x011F
95e: 89 2b or r24, r25
The way they are used, the sbiw instruction and the or instruction are interchangeable so the code is fine. But, the code is different.
The linker must also be different. Data and code chunks are in a different order.
I think the code in the NOT working spi_transaction sets up an exception handler stack frame (the "rcall .+0")...
00000244 <_Z15spi_transactionhhhh>:
244: df 93 push r29
246: cf 93 push r28
248: 00 d0 rcall .+0 ; 0x24a <_Z15spi_transactionhhhh+0x6>
24a: 0f 92 push r0
...
27c: 0f 90 pop r0
27e: 0f 90 pop r0
280: 0f 90 pop r0
282: cf 91 pop r28
284: df 91 pop r29
286: 08 95 ret
I wonder if some default compiler option changed?
write_flash in the NOT working version is short a few machine instructions from the WORKING version. And at least one of the instructions is significant. I wonder if the NOT working version was built by a NOT working compiler?