maximum delay() is 32767ms ?

If the arduino distribution had delay declared as a macro (with the actual function renamed to _delay) then this would do the cast without the user knowing anything about it. It would look exactly the same as now but would operate as a long.

Again, the problem is with the arithmetic, not with the delay() function. When you multiply the 16-bit signed integers 60 and 1000, you don't get 60000, but rather -5536. (The biggest number you can represent as a 16-bit signed integer is 32767.) Casting that result to an unsigned long doesn't get you 60000 back. You need to do something to make sure the arithmetic is carried out using unsigned longs (32-bit integers) in the first place. It's not clear there's any convenient way to do that automatically, however.

A way of helping people avoid this mistake in the first place might be to provide special delay_() functions for longer time units, and then document clearly what the maximum argument for each is. If there were a delay_minutes() or delay_seconds() function, for instance, you wouldn't need to multiply 601000 to get the delay. These special variants could easily be implemented as macros, e.g.,

#define delay_seconds (_val) delay(1000L * (_val))

It doesn't fix the problem, of course, but ought to make it less likely that people get bitten by it.