Help with string replace()

Let me start off by saying that I have no formal programming background... I'm more at home with solder. :smiley:
My project uses an ethernet shield and a variation on the "Web Client" sketch that comes with Arduino 0022.

I'm able to pull a web page, and I'm able to store the relatively small page of html code into a string, called theString, which was initialized using

String theString;

I can print this string using Serial.println().

My problem is that I would like to strip out everything except the text between the tags. I'm attempting to do this using string replace(), as in:

    theString = theString.replace("<html><head></head><body>", " ");
    Serial.println(theString);

which I am expecting will clear out the start of the string, but when I try printing it, it returns empty.

I've consulted Tom Igoe's tutorial at replace() - Arduino Reference, but I'm confused by the examples, which use double quotes in the first instance and single quotes in the second. I've tried both, but it's not working for me at all, which makes me think I'm somehow barking up the wrong tree. Any ideas?

It is looking for that exact string to replace. If that exact string does not exist (and it is unlikely to), no replacement is made.

You could do TWO replaces, getting rid of "" and "". Replace them with "", not " ".

PaulS:
It is looking for that exact string to replace. If that exact string does not exist (and it is unlikely to), no replacement is made.

You could do TWO replaces, getting rid of "" and "". Replace them with "", not " ".

That is the exact string to be replaced. But you're correct about the space character in the double quotes. I've deleted it, but the sketch doesn't behave any differently.

The body of the message generally goes inside the tags, which follow the tag, inside the tags.

If this is not the case for you, you need to post all of your code, and some sample data returned by the client.

PaulS:
The body of the message generally goes inside the tags, which follow the tag, inside the tags.

If this is not the case for you, you need to post all of your code, and some sample data returned by the client.

The html code that is received by the sketch is below. I've stripped out a lot of the extra stuff that is normally part of a standards-compliant site, to make it simpler to deal with once it is captured by the Arduino:

<html><head></head><body>. . . . . . body text goes here, less than 1200 characters total . . . . . . </body></html>

Here is the sketch:

/*

Media Circus 005
by Michael B. LeBlanc, NSCAD University

This sketch connects to a website using an Arduino Wiznet Ethernet shield.

Circuit:

  • Ethernet shield attached to pins 10, 11, 12, 13

code from "Web client", created 18 Dec 2009 by David A. Mellis

*/

#include <SPI.h>
#include <Ethernet.h>
String theString;

// Enter a MAC address and IP address for your controller below.
// The IP address will be dependent on your local network:
byte mac[] = {
0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED };
byte ip[] = {
192,168,1,45 };
byte server[] = {
24,222,117,110 }; // nscad.dyndns.org

// Initialize the Ethernet client library
// with the IP address and port of the server
// that you want to connect to (port 80 is default for HTTP):
Client client(server, 8888);
//Client client(server, 80);
//int flag = 0;

void setup() {
// start the Ethernet connection:
Ethernet.begin(mac, ip);
// start the serial library:
Serial.begin(57600);
// give the Ethernet shield a second to initialize:
delay(1500);
Serial.println("connecting...");

// if you get a connection, report back via serial:
if (client.connect()) {
Serial.println("connected");
// Make a HTTP request:
client.println("GET /index.php?");
client.println();
}
else {
// if you didn't get a connection to the server:
Serial.println("connection failed");
}
}

void loop()
{
// if there are incoming bytes available
// from the server, read them and print them:
if (client.available()) {
char c = client.read();
theString += c;
}

// if the server's disconnected, stop the client:
if (!client.connected()) {
Serial.println();
Serial.println("disconnecting.");
client.stop();

// do nothing forevermore:
//for(;:wink:
// ;

theString = theString.replace("", "");
Serial.println(theString);

delay(10000); //wait 10 secs
}
}

One final note regarding memory: I'm running this on a Duemilanove. The binary sketch size is around 8058 bytes (with 30k available). Although it seems to me that there is plenty of headroom in terms of storage on the chip, could I be running into memory problems with the string function? Are there limitations on the size of the string?

On the memory bit only: There may be 30K program memory, but you only have 2k RAM for all your variables and stack. And the ethernet library is quite greedy I hear (Note to self: Why don't you do an experiment with the ethernet shiled you bought two months ago?)

When you use text strings, they are stored in RAM (despite the fact they will not be changed) so that they can passed as arguments to the other routines. Search around for PROGMEM. This is a way to store things in the 30K area, and (by your own hard work code) copy the bits needed into a small reused RAM buffer.

When you use text strings, they are stored in RAM (despite the fact they will not be changed) so that they can passed as arguments to the other routines.

This is not the case for OP. OP's String is changed on every time a character is received.

OP: Have a look at how the += operator for the String class works. It measures the length of the existing String, and the length of the new String to append. It then allocates memory to hold the concatenated char arrays. Next, it copies the data from the existing String to the new space, and from the to be appended String to the new space. Then, it frees the memory used by the old Strings and assigns the new space to the existing String.

This means that if the old String contains 500 chars, and 1 is to be appended, 501 bytes are allocates, copied, and freed. Meaning that to append one character to 500 characters, 1001 bytes need to be available.

If you limit the amount of data collected, so that you can collect all of it in a fixed size character array, you can actually collect more data.

Parsing it on the fly, though, it likely going to be necessary.

Ah, yes, indeed. There is the "string" and the "String". Overloaded words always cause confusion, even when distinguished by small/large S :). I've never used the object-fancy stuff in the Arduino, because it's memory (RAM) requirements. (Well, I do define my own classes)

PaulS:
Parsing it on the fly, though, it likely going to be necessary.

Thanks everyone for your advice. I've been working on ways to simplify this process at the server end, so that the Arduino grabs a chunk of data at a time and moves it into an array for later use.

If you work directly with a text buffer, and ignore the HTML commands, then you'll need 1200 bytes, which I'm not sure is available on your setup. If it is, however, then you can simply set up flags to ignore text if you're within an HTML command (passed a '<') or save it if outside one (passed a '>'):

// 'eg' is only in example - you'd get data from client directly...
char eg1[]="<html><head></head><body>. . . . . . body text goes here, less than 1200 characters total . . . . . . </body></html>";
char *ptr=eg1;

int validData=1;
int i=0;
char buff[ 1201 ]="";

void loop()
{
char c='\0';
while ( 1 )
{
  c=*ptr; // in your code you'd get c directly from client...
  ptr++;
  if ('\n'==c||'\0'==c)
    break;
  if ('>'==c)
  {
    validData=1;
  }
  else if ('<'==c)
  {
    validData=0;
  }
  else if (validData)
  {
    buff[i++]=c;
  }
}
buff[i]='\0';
// non-HTML in buff now...
}

By using a char buffer instead of String, and ignoring the HTML on entry, you'll minimize your memory usage - whether it's enough, however...