Go Down

Topic: Parsing PHP (Read 19 times) previous topic - next topic

JanD

Hello,

I'm thinking about making a project what use takes data from the Arduino Forum (PHP), parse it in a C++ application on my computer and then send relevant data to my Arduino. Basic, this should be used to notify me if I have gotten any new replies, but could also be used to keep track of the total amount of post on the site etc.

I have started looking into the "main" site of the Arduino forum. There are things like this:
Code: [Select]
<tr>
<td  class="windowbg-l" width="6%" align="center" valign="top"><a href="http://arduino.cc/forum/index.php?action=unread;board=5.0"><img src="http://arduino.cc/forum/Themes/arduinoWide/images/on.gif" alt="New Posts" title="New Posts" /></a>
</td>

<td class="windowbg2">
<a class="boardName" href="http://arduino.cc/forum/index.php/board,5.0.html" name="b5">General Electronics</a><br />
resistors, capacitors, breadboards, soldering, etc.
<br /><span class="smalltext">
<b>Last post:</b> <a href="http://arduino.cc/forum/index.php/topic,52723.msg372450/boardseen.html#new" title="Re: Read voltage across a resistor and use it as an input to an anlog pin.">Re: Read voltage across ...</a>
by <a href="http://arduino.cc/forum/index.php?action=profile;u=15067">swetrack</a>
on <b>Today</b> at 10:11:20 PM
</span>

</td>
<td class="windowbg" valign="middle" align="center" style="width: 12ex;">
<span class="largetext">1004</span><br /><span class="smalltext">Posts</span>
</td>
<td class="windowbg" valign="middle" align="center" style="width: 12ex;">
<span class="largetext">137</span><br /><span class="smalltext">Topics</span>
</td>

</tr>


There you can see the string "General Electronics", the numbers 1004 (posts) and 137 (topics). For a beginning I want to place these values (one String, two ints) into a class(or struct) and then put that into a QList. The sending to Arduino can come later.

So, how do I parse PHP in C++?


This is the first time I'm unsure about that forum to post this in, so if it's wrong, please delete or move it.

JanD

PaulS

Quote
So, how do I parse PHP in C++?

The first thing you should do is post in the appropriate forum.
The second thing you should do is learn to recognize what it is you want to parse. The stuff you showed is HTML, not PHP.

Parsing data requires that is be in memory. You need to describe, or, even better, post some code, that shows how that data is stored in memory.

Each mechanism for storing the data provides some means for reading bits and pieces of the data. For example, if the data is stored in a char array, and is properly NULL terminated, the strtok function can return tokens, with each token delimited by a different (or the same) delimiter.

While this is useful for extracting the individual tokens, it is not of that much help in parsing well defined structures like HTML.

Googling "Parsing HTML" should provide plenty of information on how to parse HTML. Every browser in the world is capable, so it can't be that difficult.

JanD

Quote
The first thing you should do is post in the appropriate forum.
Do you mean in a C++ forum? Or a different part of this forum?
Quote
The second thing you should do is learn to recognize what it is you want to parse. The stuff you showed is HTML, not PHP.
I thought it looked like HTML, but all the main page of the Arduino forum has the name arduino.cc/forum/index.php What's why I think it's PHP.
Quote
Parsing data requires that is be in memory. You need to describe, or, even better, post some code, that shows how that data is stored in memory.
For the beginning just in a QString. Later it's going to be read from the internet in some way with either QtWebKit or QtNetwork (I have to look  further into them, but I think I need QtWebKit) and then stored into a QString.
Quote
Each mechanism for storing the data provides some means for reading bits and pieces of the data. For example, if the data is stored in a char array, and is properly NULL terminated, the strtok function can return tokens, with each token delimited by a different (or the same) delimiter.
As you wrote in the next line, this wouldn't help to much, it could be useful though to extract the "http://arduino.cc/forum/index.php/board,5.0.html" from the HTML (or PHP) document etc.
Quote
Googling "Parsing HTML" should provide plenty of information on how to parse HTML. Every browser in the world is capable, so it can't be that difficult.
What was my though to. If you couldn't parse HTML I couldn't write this :)

I've googled some after Parsing, C++, Qt etc. some time now, but I will continue.

JanD

Graynomad

Quote
I thought it looked like HTML, but all the main page of the Arduino forum has the name arduino.cc/forum/index.php What's why I think it's PHP.

"PHP" pages look like HTML because all you see is the output of the PHP program, not the PHP code itself.

______
Rob
Rob Gray aka the GRAYnomad www.robgray.com

robtillaart

think that writing a parser for HTML is overkill, just to get those numbers:
you know the boardnames and the order in which they appear => for them a simple search would be sufficient.

but to be more robust search for class="boardName", if found read further until a >   then the boardname can be read until the  <

then per found boardname you have to search for   class="largetext"> and read the int behind

in pseudocode
Code: [Select]

while(! end of stream)
{
  if (skipUntil("boardName")  == true)
  {
    skipUntil(">");
    char* s = readUntil("<");
    LCD.print(s);

    skipUntil("largetext");
    skipUntil(">");
    s = readUntil("<");
    LCD.print(s);

    skipUntil("largetext");
    skipUntil(">");
    s = readUntil("<");
    LCD.print(s);
  }
}


Two quite similar functions could do the trick.
boolean skipUntil(string s);    // returns true is string found and false otherwise; AND if false the stream is empty
char * readUntil(string s, int size );   // returns all the chars from a stream until it encounters string; an optional parameter size could take care of overflow conditions.

my 2 cnts,


Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

JanD

What's a good beginning, thanks. I guess I have to write skipUntil and readUntil by myself. But what's a good beginning, thanks.

JanD

robtillaart

Quote
I guess I have to write skipUntil and readUntil by myself

You are a very good guesser ;)

But they are quite similar ... try them with paper and pencil first  ...
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

JanD


Quote
I guess I have to write skipUntil and readUntil by myself

You are a very good guesser ;)

But they are quite similar ... try them with paper and pencil first  ...


No need to, already done;

Code: [Select]
QString GetData::readUntil(QString end){
    unsigned int lastPossition = possition;
    int newPossiton = stringFromForum.indexOf(end, possition, Qt::CaseSensitive);
    unsigned int toMove = newPossition - lastPossition;
    QString toReturn = stringFromForum.mid(possition, toMove);
    possition = newPossition;
    return toReturn;
}

bool GetData::skipUntil(QString end){
    int place = stringFromForum.indexOf(end, possition, Qt::CaseSensitive);
    if(place == -1) return false;
    else{
        possition = place;
        return true;
    }
}


I'm almost done with the part of the code what sends data to the Arduino, and I'm halfway done with the parse code, so at the moment I only need the code for getting things from the internet and the code what is going to run on the Arduino.

JanD

robtillaart

How much RAM do you have ?  The source of the forum homepage = in the order of 40 KB == big!

...
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

JanD

I know, I know. I'm adding
Code: [Select]
QString::~QString(); were ever I can.

But I have a pretty new Win7 computer with enough RAM for the most things. Later this is going to be executed from another computer what only does this.

JanD

robtillaart

I thought you were going to parse it on your Arduino 
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

PaulS

Quote
I know, I know. I'm adding

Explicit calls to the destructor are not going to free memory. You need to delete instances of the class.

Code: [Select]
QString *someJunk = "Not needed anymore";
delete someJunk;

This will invoke the destructor, and free the memory used by someJunk.

JanD

#12
Feb 20, 2011, 07:35 pm Last Edit: Feb 20, 2011, 08:03 pm by JanD Reason: 1
Thanks for the help, I will use that.

JanD

EDIT: I have a (half) working program available here. At the moment I wait for someone to post a new post so I can see if my program recognizes it. (C'mon, please post!  :) )

robtillaart

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

JanD

#14
Feb 20, 2011, 08:43 pm Last Edit: Feb 20, 2011, 09:49 pm by JanD Reason: 1
Thanks, but... Nothing happened  =(

So, next stop debug!

JanD

EDIT: I know what the problem is now. I do never arrive in replyFinished (internet.cpp). I'm getting right back to the while loop in main after calling fetch(). I wonder what the error can be?
EDIT2: I read the documentation a bit longer, and it seam like replyFinished is a slot called by a signal. I will look into that tomorrow.

Go Up
 

Quick Reply

With Quick-Reply you can write a post when viewing a topic without loading a new page. You can still use bulletin board code and smileys as you would in a normal post.

Warning: this topic has not been posted in for at least 120 days.
Unless you're sure you want to reply, please consider starting a new topic.

Note: this post will not display until it's been approved by a moderator.
Name:
Email:

shortcuts: alt+s submit/post or alt+p preview