Gcc can only generate code up to 128k, so using an atmega2560 brings you 0 advantages over the 128, only IAR can generate code for all the atmega2560.
I believe that is a incorrect statement. Your getting your flash word size mixed up with the AVR byte capacity rating. The Gcc compiler can indeed fill the atmega2560's 256k bytes of memory with code. Recall that Harvard architecture cpu have separate code memory (flash) and data memory (sram) , and they are not the same width memories in the AVR series.
Most AVR instructions have a single 16-bit word format. Gcc can generate code up to 128k WORDS
( = 256k bytes), so a mega2560 does contain twice the program memory as a mega1280 and Gcc can take advantage of all of it.