I made good progress this week! Much better than I had expected.
I was able to make a batch of flash copies and install them into the various LCD's I have on hand. So, I now have a 3.2", several 4.3's (in addition to the original 4.3" CTE screen that I purchased) and finally, a nice 5" unit with nice proportional fonts that all work fine with the CTE extensions to UTFT.
My approach was to leave the flash ic in place on the CTE screen that I purchased. Using a short piece of 6-conductor "DuPont" cable,I connected to the standard 40 pin male pins of the lcd. Connections were:
Pin 1 GND
Pin 2 3.3 volt vcc
Pin 16 F_CS
Pin 35 SD_DO
Pin 36 SD_CLK
Pin 37 SD_DIN
Pin 38 SD_CS
I then wired this to a 3.3 <> 5 v bi-directional translator ic (adafruit txb0108) and ultimately to the SPI terminals of an Uno. I then used a nice utility I found here: http://www.instructables.com/id/How-to-Design-with-Discrete-SPI-Flash-Memory/ to exercise the connected font ic. I hacked the utility to add a second CS line and a page by page copy routine. Then, adding a blank device to the second CS with the other lines in parallel, it was a simple task to create duplicates.
During all of the above, I wasted lots of time trying first to copy the flash to an sd card. Failing that, I tried reading from an sd card, writing the contents to flash. That too failed and the issue was that the sd module I had always kept the mosi line active regardless of the devices CS status, thus interfering with proper operation of the flash. I modified the sd interface so that is hi-z'ed the mosi line so that multiple spi devices would play well together. Still, I had no joy. I also battled a lack of ram due to the buffer requirements of the SD library. Ultimately, I have not been able to make a combination of SD and flash to co-exist. Two Winbond flash devices play well together, I could not get an sd card to work in parallel with flash. So, ultimately, I'm wondering how we jump start your efforts since you do not have a flash ic to copy. I have a binary image that I can post but without having a path to get that image onto a flash chip, I'm wondering what purpose it would serve.
I did study the memory image a bit and it has not been optimized in any way to conserve space. Further, even though it is a 4mbyte part (25Q32) only 2mbyte is used. The first 32k contains a simple ~200 byte text message detailing the fact that all images on the chip are public domain. I did copy it onto 8 mbyte parts but since they too are Winbond parts, no changes were needed to the software. Even your 16 mbyte parts will not require any code changes other than to perhaps address the extra fonts and images I'm sure we'll load into them! Ultimately, my attachment method proved out that future changes can be done in-situ which really makes this a very useful extension. You'll be able to take the same approach when you have the the flash on the shield rather than the lcd panel since everything terminates in the same male pins although the connections will be on the Arduino board side rather than the lcd.
Give this some thought and let me know if you have any experience with multiple spi devices working together. Perhaps a Due can support multiple spi channels which would ease your path forward. Failing that approach, I could always post you a programmed W25Q64. That's probably the lowest effort path!