Fast Fourier Transform in realtime

paulo999:
... sometimes for outright speed, or guaranteed timing (e.g. generating video signals) assembly is the only way.

I'll have to challenge you on that statement. See this thread:

I have jitter-free VGA signal generation, using C++. No assembler, except maybe NOP, in one spot, purely for timing in one of the examples.

I used a disassembly to check what was generated, and then tweaked the C code (eg. moving an array to address row first rather than column first or vice versa). The disassembly proved that the generated code was as good as "hand-coded" assembler would be. Thus actually doing the hand-coding would not achieve anything except obfuscation.