Web scraping w/out PHP?

Hi - I’m new here. Just got an Arduino and read about scraping a web page with an Xport and a PHP intermediary. How about if I want to just access a web page directly, searching the returned HTML with the Arduino? Can someone sketch a recipe for this? (Wait, sketch is overloaded here - I mean outline - given the following: Arduino, Xport (Direct?), cable modem to Charter Internet (which doesn’t allow anything but static member webpages - that’s why no PHP).


The problem is that Arduino has very limited memory, so you need to somehow return the HTML of the page in small chunks to Arduino. I don't know if the Xport can do that.

I think I get it. The page comes back as a big packet of text, rather than a serial stream that can be dealt with a line at a time?

The ethernet libraries recieve the stream 1 character at a time so its a matter of caching it till you get a match. Someone gave an example a little while ago that was better than I had managed to knock together.

http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1231812230 Here hes searching xml but theres no reason not to adapt the technique to look for html tags instead.

Should give you a start.