NardJ:
AmForth, Tinybasic are good examples of other interpreted languages which have the same usecase.
AmForth is a particularly slow implementation of Forth and my understanding is that this is because AVR's Harvard architecture doesn't easily allow execution of opcodes from RAM.
I've recently been playing with with eForth on STM8 and Mecrisp-Stellaris on STM32 (ARM M3). They are worlds faster than AmForth on AtMega328. Even though my STM8 variant is essentially the same class processor as the AVR (8k flash, 1k RAM, 16 MHz core) it has a flat memory model which allows opcode execution from RAM essentially the same as Flash. On can, in fact insert machine code directly into an STM8 eForth word.
If you're doing this purely as an academic exercise, I don't want to discourage you, but wringing optimum software efficiency from a poorly chosen hardware platform seems a bit of a waste.
For what it's worth, I've been using the STM8 eForth in the use case you describe, that is, a bluetooth over-the-air interactive/re-programmable node controlled from an Android tablet.