I know this is a hell of a stretch, but is there any A) shield or B) manual wiring method of interpreting video data as input?
the pixel clock on 640x480 vga is ~25Mhz so no your not going to be able to keep up with it even if you had the ram , but on the other hand someone just had "human tetris" (not on arduino, but the same chip) where they were sampling a video camera at something super low res like 39x39 or something, to control the game
so I guess it depends on your determination and the scale you have in mind
Could the Xduino handle it?
The 'human tetris' project (featured on hackaday) uses an ATmega644 and acheives 39x60 at 30fps. With a chip with more RAM and a higher clock speed you could get this higher. http://instruct1.cit.cornell.edu/courses/ee476/FinalProjects/s2010/aip23_kaf42/aip23_kaf42/index.html
There are not any shields for it that I know of as it would be pretty much impossible with the processing power of a 328. You would need an extra (and more powerful) chip on the shield which would remove the need for the arduino...