Video stream and servo control to and from a large amount of controllers

I have been doing a bit of research at night for the past few days on the following project:

I am helping the wife build a Halloween themed yard for trick or treaters. We're going with a graveyard theme, where she is hand crafting tombstones from foam (idk what kind atm) and a coffin or two right now. My job is to create a small to medium sized collection of animated skulls. The idea is to have these skulls all around the yard and a couple in the trees that look around, making some noise with their teeth, and possibly have a few, if not all, talk in some way (possibly to each other while no one is nearby or intentionally while someone or a group is coming to get some candy). Once someone or a group comes up, the skulls are to watch them come up to the house to pick up a candy, basically staring at them until they leave (which would be about 5 to 7 feet away before this would be activated / deactivated)

We have a couple thousand feet of CAT-5e, RJ-45 connectors, a small Cisco Catalyst 3524-PWR XL (PoE for some IP phones we have in the house), and several servers in the house that all running some form of VMware ESXi (either nested or barebones). The skulls should contain 3 servos: head tilt, head turning, and jaw control. We can get our hands on some cheap 1.3 to 2 MP USB webcams for a few dollars a piece, which we would put into an eye socket. From what I understand, I could use 2 of the cables to transmit 5v to the servos, use 3 for control, and still have 3 more for the camera.

I am used to using Python for programming within small virtual machines and have some ubuntu and centos images that are templated for quick deployment. Though the anduino could process the images, I would like to reduce costs by using minimal controllers or using a ATMega328 for processing. Is there a way to have a single Uno or ATMega328 take the raw or MJPEG versions of three cameras and transmit it either over the network to the servers so they can process everything as an offload processing, than make a decision as to which way to move the skulls?

If I can get between 3 to 6 of these working, I will go from probably a breadboard and start printing circuit boards to create a batch of 20 to 50 skulls. The ultimate goal is to get up between 100 to 150 skulls that basically litter the yard and watch anyone who comes by to pick up candy. The servers are LACP enabled and can handle each up to 2 Gb/s each of data, between 3 servers, with a total combined CPU power of 108.48 GHz and 144 GB of memory, so bandwidth and processing power shouldn't be an issue as the bandwidth should be around 12 Mb/s, which should give about 81 streams per 1 Gb/s, giving me a total of around 500 MJPEG streams total with the current bandwidth.

Long story short, can I do this?

short answer, probably not due to the limited bandwidth.

Like @raschemmel says. Basically No with the Arduino.

Consider using the Raspberry PI it may be fast enough (you'd need to ask on the RPI forum)