Extract data from HTML

Hi everyone,
I hope you can help me with my little problem. I'm trying to extract a float value from a web page. This is the HTML with the float I need:

<div id="price" class="tab_box_val_big">92,92</div>

This is the part of my code I'm using right now:

client.print(String("GET ") + url + " HTTP/1.1\r\n" +
               "Host: " + host + "\r\n" + 
               "User-Agent: ESP8266\r\n" +
               "Connection: close\r\n\r\n");
               
  while (!client.available()){
    delay(50);
  }

  if (client.find("tab_box_val_big")){
    Serial.println("Found the specific point!");
    while (client.available()){
      String line = client.readStringUntil('<');
      Serial.println(line);
    } 
  } else{ 
    Serial.println("Specific point not found"); 
  }

If I run this code on the serial monitor, I will see all the code after "tab_box_val_big" till the end, because this parameter '<' is not recognized as a stop (as you can see the < was deleted from the output). Here the output:

Found the specific point!
">92,92
/div>
                
/div>
...

I think to be close to the solution, just a little push :laughing:
Thanks in advance for your time!

It is expecting a char... so maybe...

Try escaping:

readStringUntil('\<')

Try the decimal value:

readStringUntil(60)

I tried your suggestions. With '<' I receive a compiler error, and with 60 I obtain the same result as '<'.

Try double quotes around the waca.

You are on the right track.

Have a look at the Serial Input Basics tutorial, which shows you how to extract data surrounded by start and end markers, which in your example could be (if you take some care) '>' and '<'.

Obviously you will have to ignore irrelevant occurrences of those markers, so the solution will be highly specific to this type of message.

With Arduino it is best to avoid the problems caused by use of Strings and instead use the C-string processing functions in the <string.h> standard library.

I think that <div> and </div> are block level tags in html and are not seen as a group of individual characters.

Are you always trying to read xx,xx or 5 characters before the closing
or is the content variable?

It was recognized: readStringUntil will stop at the requested character and then discard it.

If it is not found, then it will behave like readString and timeout, typically after one second of no-more-characters.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.