I've got stm32duino (stm32F103zet6 board) with 64kB internal and 512kB of external sram (wired via FSMC). The 512kB of sram could be used for the heap. I can allocate the 64kB of CP/M's RAM with malloc() in that 512kB space easily.
So I may try to run your code on it, when available..
FYI:
External sram on stm32duino