That kind of comparison makes sense if all you do with your mcu is to add 32-bit numbers.
Even in that comparison, what if you wanted to process char or short?
You're wrong. Even if your app is not about 32 bit math, remember that most ADCs conversions are 10 bits or more. You should remember too that you don't take a single conversion, but 4 or eight or more, then you average the result. In a 32-bit core like Cortex-M0 this is piece of cake:
for (sum = 0, i = 0; i < 16; i++) {
sum += ADC(channel0);
}
result = sum / 16;
In fact, there is no div operation at all, but 'sum' is right offset 4 positions. Overall result? Code faster and smaller.
A 8-bit core will take a lot of code just to make the 10-bit integrations inside the loop. Even if you think you are not to use 32-bit math, it will show up quicly whenever you want to implement a sensor-fusion (accelerometer + gyro + compass) in your Arduino.
Another important point that must be taken into account are the 32-bits timers inside the LPC1114. What are they good for? For example, in a servo-controller you gain a better granularity (finer steps).