Image Processing on Arduino

Hi,
I am working on a project of making a obstacle sensing robot using image processing.
I have an Arduino Mega 2560 board with 16 MHz clock speed and 256k flash memory.
I plan to take images using a webcam at discrete time intervals(not real-time processing) and convert them to a 2-D array. Please advice me on what camera to use and how to get the image on the arduino.

I have an Arduino Mega 2560 board with 16 MHz clock speed and 256k flash memory.

...and 8kB RAM.
Not a very big image, I'm thinking.

...and 8kB RAM.
Not a very big image, I'm thinking.

Add our QuadRAM shield, and now you're up to 512kB :wink: A 640x480 image at 8bpp grayscale will fit into 307.2kB. Put this together with a CMOS camera and you could be in business.

--
The Ruggeduino: compatible with Arduino UNO, 24V operation, all I/O's fused and protected

I would advise trying to do some simple sums on the figures to see how quickly you can get an image into that memory card. Then look at what sort of image processing you would like to do. This is the sort of pie in the sky project that gets asked about sometimes here. It is much harder than you think and unless you are an expert coder it is bound o fail. Do a search of how many successful arduino / image processing projects there are. That as to tell you something.

I plan to take images using a webcam

So how are you thinking of attaching the webcam? The arduino doesn't do USB host and an external host shield requires you to write the drivers.

That is what I am not sure abt - hw 2 gt the image from the camera to the arduino....can u give any suggestions?

16MHz, 8-bit MCU, no floating-point and without the QuadRAM, just 8KB RAM... and task like image processing, "stretch" would be an understatement. The only thing going for you is the MEGAs have hw multiplier.

It is not impossible, for example if you are happy processing 1 image every few minutes, and don't need to react in real-time.

hw 2 gt the image from the camera to the arduino....can u give any suggestions?

It's the easiest part, look for cam with serial links, SPF or http://www.linksprite.com/product/product.php?lang=en&class2=96 , some examples code for arduino you also can find.
Than check on this link:
A Closer Look at Image Convolution

They say " On a 100 MHz Pentium personal computer, a 128×128 image can be convolved in about 15 seconds using FFT convolution, while a 512×512 image requires more than 4 minutes. "

With AtMega I'd expect 64x64 image (about size of the smiles :D) in half a minute or so.

But why on earth would you need to use an FFT for an obstacle avoiding robot?

With AtMega I'd expect 64x64 image (about size of the smiles smiley-grin) in half a minute or so.

I think you are being a bit optimistic with this. You need 4096 separate 64 byte FFT operations.

More likely, you'd do an edge detection, so you'd need two 64x64 images, one input and one output.
No, wait...

OP is asking

obstacle sensing robot

. Obviously, for obstacle avoiding he even doesn't need a cam. FFT is the one of pattern recognition via convolution method I know, could be others.

You need 4096 separate 64 byte FFT operations.

Lets estimate, 12 msec x 4096 = 48 sec with 16-bit , I think 8-bit will do at least twice faster. Project feasible, resolution could be lower than 64x64 ( which is pretty good face recognition, as it capable to spot if mister smiley is SAD or HAPPY not just if he presents in the frame)

Forget doing an FFT on a 64x64 image for the same reason as doing an edge detection, and then some.

This might do it, with added RAM, power, etc.

There are lots of flights of fancy here. Basically the problem is two fold:-

  1. Implement that hardware to get over the arduino's physical limitations of slow speed and lack of memory, to get an image into the arduino.
  2. Do something useful with that image information.

The first takes money, the second takes time and smarts and is likely where the project will fail. Therefore do the second part first.
Get a copy of processing on your computer. Attach a cheap web cam and acquire an image. Then work out how you are going to translate that image into useful information. When you have done that you will have an algorithm which you can then battle to get implemented on an arduino.

Just to give you heads up on the problems you will face:-
a) How do you distinguish between an obstacle and the background?
b) How do you find out the distance to the obstacle so you know when to turn away?
c) How often do you have to do this in order for the robot to move at a reasonable speed?

You could even implement a tethered robot with a lead going back to processing while you work all this lot out.

Grumpy_Mike:
Just to give you heads up on the problems you will face:-
a) How do you distinguish between an obstacle and the background?
b) How do you find out the distance to the obstacle so you know when to turn away?
c) How often do you have to do this in order for the robot to move at a reasonable speed?

I think if I was doing this, I'd approach it in a somewhat interesting manner:

a) Distinguishing between an obstacle and the background: I'd use a neural network
b) Distance to obstacle: I'd use a neural network
c) Doing this fast enough: Training done on a PC, algo implemented as a hard-coded set of weights in code in the Arduino

Now - how to do (a) and (b)?

Well - first off you need a series of images - the more the better; say, 10,000 images - each - of "obstacle images" and "non-obstacle images", with the "obstacles" at the distance you want to turn at.

Your "camera" would be whatever low-resolution image sensor you can find; maybe an optical mouse sensor IC running in raw mode, or if you can manage to get a low-res image in some other manner from another camera, that's fine too (maybe a small composite video output camera, coupled to an LM1881?). Shine a high-brightness IR LED on the "scene", add an IR filter to the camera. Basically, you want to try to get a pure "black and white" image; scan only an 8 x 8 pixel image, 1 bit per pixel - 64 bits, 8 bytes of RAM. You might even be able to build a "camera" eye using SMT phototransistors and a lens (plus matrix scanning). Yeah, this thing's going to fairly "blind"...

So - your input layer would be 64 bits, feeding into a hidden layer of n-nodes (try n=64 at first, you may need to reduce or increase this), then feed these into either another hidden layer of fewer nodes, or straight into a single output node that outputs a "1" or a "0" depending on whether it calcs for an obstacle (1) or not (0). If you added extra output nodes, you could even train for multiple distances, perhaps (close, medium, far?)...

Of course, you would have to run the training algorithm on a PC, taking your training images (break it up into training, cross-validation, and test sets first), running them through, computing back-prop, etc - then when you have your numbers, they are encoded directly into the arduino like so:

http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1295812656/5#5

Of course, the version of the neural net for this vision experiment will be a -lot- longer and more convoluted. I would think the majority of code space would be taken up by the neural network, plus a little extra for the camera reading routine; you'd have to interface a second Arduino to this one, likely (you probably wouldn't have code space leftover for much else). Now, I'm imagining this being a bare-bones ATMega328 type Arduino - maybe with a Mega, or something with a bit more RAM (like those RuggedCircuits expansions), you could get a bit more fancy (higher resolution, and gray scales would help).

I'm not sure this would really work (I'm only basing this on what's been percolating in my head from my machine learning class), but I bet it would be fun to try...

:slight_smile:

get a pure "black and white" image; scan only an 8 x 8 pixel image

This likely isn't going to be the resolution you need for object identification. An LCD character is 8x5, so an 8x8 object is only going to be slightly bigger. You won't be able to differentiate between a house, tree, person, or cookie.

A more manageable project might be a letter recognition system. The robot drives and when it sees an "L" it turns left, or "R" and it turns right.

I'd suggest get a wireless web cam, do the hard image processing on a PC, and use the Arduino just to actuate the robot. Perhaps you can do all that on an Arduino with enough extra hardware on it, but it's the wrong tool for the job. If you really really want the whole thing to be autonomous, get an embedded PC to do the heavy image processing. But you may find that you have a much thinner runtime environment so you'd have to provide more of the functionality yourself.

John_S:

get a pure "black and white" image; scan only an 8 x 8 pixel image

This likely isn't going to be the resolution you need for object identification. An LCD character is 8x5, so an 8x8 object is only going to be slightly bigger. You won't be able to differentiate between a house, tree, person, or cookie.

There are (and have been) various research projects out there studying the feasibility of building and using "low vision" systems for robotics control; maybe 8 x 8 is a little low, maybe 16 x 16 would be better (though 256+ nodes for a neural net's hidden layer(s) might be pushing the limits of an Arduino - even a Mega - maybe). It may not seem like much, but it would be interesting to try - the results might be surprising.

How many nodes you can run depends a lot on the SRAM used per node and number of kinds of nodes. With objects you only keep member data in SRAM, constants can go to PROGMEM. If you only keep maybe 4 bytes storage per node, 256 nodes might at least not crash the MCU... right away.

But a 16 MHz 8-bit MCU with cramped memory should not do a task suited for 1.6+ GHz 64-bit CPU with plenty of memory. For one thing, the response times would differ so incredibly the cost difference would be a bargain.

With a PC you can use your video card to process images like making and using cookie-cutters (masks). It runs at the speed of video, Arduino would take forever to do what that card does mainly in hardware with the video image. Even low end mobo-video is light years ahead of Ardiuno. They can use loads of megs of RAM instead of worrying about using more than a K, how deep will the stack go and how many local variables are called at once.

GoForSmoke:
How many nodes you can run depends a lot on the SRAM used per node and number of kinds of nodes. With objects you only keep member data in SRAM, constants can go to PROGMEM. If you only keep maybe 4 bytes storage per node, 256 nodes might at least not crash the MCU... right away.

Maybe some other kind of learning algorithm could be used instead? Perhaps a logistic regression classifier? Whatever it is, it would need to be trained on the PC, with as much as possible on the ATMega being constants or other non-SRAM using constructs; keep as much in flash as possible.

GoForSmoke:
But a 16 MHz 8-bit MCU with cramped memory should not do a task suited for 1.6+ GHz 64-bit CPU with plenty of memory. For one thing, the response times would differ so incredibly the cost difference would be a bargain.

The thing is, I like self-imposed challenges; if this were a project I was doing, attempting to do it with a 16 MHz 8-bit MCU may seem like madness and folly to you, but then I see things like this:

http://www.linusakesson.net/scene/craft/

...which was done on an ATMega88 (arguably worse than a 328 memory-wise, though it was clocked at 20 MHz), and I wonder what else could be done with such a "lowly" platform. Certainly a machine learning problem could be done on your typical PC; everyone knows that, it's done every day - what isn't done every day is trying to cram a similar problem/solution into something seemingly too small to fit it. That's the challenge. I like to strive toward such challenges on occasion.

GoForSmoke:
With a PC you can use your video card to process images like making and using cookie-cutters (masks). It runs at the speed of video, Arduino would take forever to do what that card does mainly in hardware with the video image. Even low end mobo-video is light years ahead of Ardiuno. They can use loads of megs of RAM instead of worrying about using more than a K, how deep will the stack go and how many local variables are called at once.

If you're going to use your video card for anything relating to machine learning, you use its vector processor to process the parallelized linear algebra of your algorithm (assuming you wrote your code properly and are using a library to take advantage of the vector capabilities of the GPU). Still, I wasn't pushing for full video processing; I was pushing for the concept of "low vision" processing - ie, can you create a simple vision system with very low resolution (16 x 16 1-bit pixels) that can discern patterns and use that information to take intelligent action?

I don't know the answer to that, but I would think trying to find out the answer would be a fun diversion which may lead to all sorts of interesting challenges, solutions, and learning opportunities.

:slight_smile:

Graphics like that was done on 1 MHz 6502's and like and that's about what it looked like too. They had video ram to work in, read as well as write. It makes the task easier.

Spewing out patterns is a long way from object recognition.