There is no question that the 32 bit micro-controllers are considerably more capable. The Atmel SAM4E would do everything that is needed for my applications. But I always get a little suspicious about libraries and code bloat whenever a micro with a wider bus and more memory comes into play.
For example, Ethernet is a must-have interface for me. So I did a bit of poking around and after spending several hours researching how that would be done with the SAM4E, I came across just such an example. Ethernet is implemented using a port of LWIP. Although I couldn't find exact references, it looks like it uses around 40K of code space and 10K or so of RAM. So that may not sound like much, but the file dependency list is more than a page long! The code is full of lots of things that I don't need. So that makes me wonder just how bloated the remaining hardware support libraries are.
Sure, I could port a driver for the W5500 ethernet chip and eliminate a software TCP/IP stack completely, and that would be fine. But when you have an application that consists of several different hardware modules interacting, the prospect of trying to debug code with such a large number of dependent files can be a real killer.
At a former company we needed to add Ethernet support to a couple of different product lines. The solution was to use a Power PC processor and a Linux kernel. It worked, but the interface development to several months to implement. It functions fine, does the job, but all the time it took to get things working, in combination with the complexity of the code is an example of why trying to do everything in code isn't always the best approach.
A while back I needed to implement division in a micro-controller without a lot of horsepower. There was no support for at the assembly level for division. After a bit of research I stumbled across a different division method called the Kenyan double-and-halve method. No, it isn't Kenyan in origin so I have no clue where the name came from. Anyway, it uses shift and add instructions and is considerably faster than successive subtraction. For this particular project, implementing that form of division was about 120x faster than successive subtraction in situations where the divisor was considerably smaller in magnitude than the number to be divided.
At the end of the day, the method worked well and allowed a slow 8-bit processor to do the job without having to do a slow, floating point implementation.
A long time ago (showing my age a little), in the days of older Windows, floating point was needed for calibration routines. However, floating point was not available at the device driver level. The solution was to use large integers. The raw data was in integer format. All of the numbers to be used were scaled up by doing a multiply instruction to add a bunch of trailing zeros. Integer division and multiplication were performed instead. Once the math was complete, the result was scaled back down. No digits of precision were lost, and the device driver worked just fine.
Those two examples simply illustrate that there are ways of getting more performance out of slower processors, or processors with more limited capabilities. Although it may seem to be easier to throw more performance at a problem, it is easy to get into the habit of lazy coding because it seems like there will always be more than enough horsepower for everything. So, sometimes doing things a little bit more differently, and combining a little bit of custom hardware where needed can still be the do the same job, and be more manageable. It all depends on the situation, of course.