Do some simple arithmetic.
640x480 pixels at, let's say 30Hz.
A bit more than 9MHz. (It's worse than that, because you've got horizontal and vertical blanking, but let's keep it simple)
And you've got three channels of that.
So, how are you going to convert, input and store 27 million analogue samples per second?
RobSqui:
Why does the esp8266 have enough processing to do an output but not an input? What's the difference?
It has sufficient processing power to do an extremely crude VGA output of perhaps a few basic colours at a very low resolution.
You seem to misunderstand. Video processing of the sort you have in mind, is always performed by special purpose chips - called video processors - specifically designed for the job. That includes both video output and video input. To say that a microprocessor can do a party trick and generate a "blocky" image means nothing.
Abusing respondents who simply state facts is unlikely to be productive. All the more so if they are moderators. If you cross-posted the same question to different forums, you would usually have been merged; only because they are different languages have you been spared.
AWOL:
Technically, it's not arrogance if you actually are superior.