Concurring in part and dissenting in part...

I think the current object implementation for the serial port might actually make it easier to implement multiple USB endpoints. At least for those who stick to the simplest stream-like methods.
DMA would certainly be difficult/impossible to do in a way that preserved downward compatibility with code written to do the kind of port I/O, and even bit-banging, that's common now. But I think it could be done in a way that fits within the Arduino paradigm, so it would be workable and less "alien" than switching to a different "OS" model.
Debug features
are pretty much doomed to being very CPU-dependent, with little chance of encapsulating them in a way that makes them consistent from target to target.
Otoh, features like pulseIn() and the Mstimer2 library should be portable to almost any CPU, and it seems feasible to add some more-sophisticated capabilities that would also be portable. When you add targets with unique timer features, you can use ifdefs to enable them much the way that the existing hardware serial library handles the multiple ports on a Mega.
Overall, I think the Arduino community would be better served by extending to platform to more-powerful CPUs, even at the expense of sacrificing easy access to some of their features, than by telling people "Once you exceed the limits of the ATMega family, you
must switch to a very different environment". Certainly
some people will want/need to, but most, especially artists and others who just want a flexible component to use in their "non-computer" projects, will be happier to have a range of choices that work "the Arduino Way".