Pages: [1] 2   Go Down
Author Topic: Parsing PHP  (Read 9186 times)
0 Members and 1 Guest are viewing this topic.
Sweden
Offline Offline
Full Member
***
Karma: 11
Posts: 237
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Hello,

I'm thinking about making a project what use takes data from the Arduino Forum (PHP), parse it in a C++ application on my computer and then send relevant data to my Arduino. Basic, this should be used to notify me if I have gotten any new replies, but could also be used to keep track of the total amount of post on the site etc.

I have started looking into the "main" site of the Arduino forum. There are things like this:
Code:
<tr>
<td  class="windowbg-l" width="6%" align="center" valign="top"><a href="http://arduino.cc/forum/index.php?action=unread;board=5.0"><img src="http://arduino.cc/forum/Themes/arduinoWide/images/on.gif" alt="New Posts" title="New Posts" /></a>
</td>

<td class="windowbg2">
<a class="boardName" href="http://arduino.cc/forum/index.php/board,5.0.html" name="b5">General Electronics</a><br />
resistors, capacitors, breadboards, soldering, etc.
<br /><span class="smalltext">
<b>Last post:</b> <a href="http://arduino.cc/forum/index.php/topic,52723.msg372450/boardseen.html#new" title="Re: Read voltage across a resistor and use it as an input to an anlog pin.">Re: Read voltage across ...</a>
by <a href="http://arduino.cc/forum/index.php?action=profile;u=15067">swetrack</a>
on <b>Today</b> at 10:11:20 PM
</span>

</td>
<td class="windowbg" valign="middle" align="center" style="width: 12ex;">
<span class="largetext">1004</span><br /><span class="smalltext">Posts</span>
</td>
<td class="windowbg" valign="middle" align="center" style="width: 12ex;">
<span class="largetext">137</span><br /><span class="smalltext">Topics</span>
</td>

</tr>

There you can see the string "General Electronics", the numbers 1004 (posts) and 137 (topics). For a beginning I want to place these values (one String, two ints) into a class(or struct) and then put that into a QList. The sending to Arduino can come later.

So, how do I parse PHP in C++?


This is the first time I'm unsure about that forum to post this in, so if it's wrong, please delete or move it.

JanD
Logged

Seattle, WA USA
Offline Offline
Brattain Member
*****
Karma: 601
Posts: 48543
Seattle, WA USA
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
So, how do I parse PHP in C++?
The first thing you should do is post in the appropriate forum.
The second thing you should do is learn to recognize what it is you want to parse. The stuff you showed is HTML, not PHP.

Parsing data requires that is be in memory. You need to describe, or, even better, post some code, that shows how that data is stored in memory.

Each mechanism for storing the data provides some means for reading bits and pieces of the data. For example, if the data is stored in a char array, and is properly NULL terminated, the strtok function can return tokens, with each token delimited by a different (or the same) delimiter.

While this is useful for extracting the individual tokens, it is not of that much help in parsing well defined structures like HTML.

Googling "Parsing HTML" should provide plenty of information on how to parse HTML. Every browser in the world is capable, so it can't be that difficult.
Logged

Sweden
Offline Offline
Full Member
***
Karma: 11
Posts: 237
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
The first thing you should do is post in the appropriate forum.
Do you mean in a C++ forum? Or a different part of this forum?
Quote
The second thing you should do is learn to recognize what it is you want to parse. The stuff you showed is HTML, not PHP.
I thought it looked like HTML, but all the main page of the Arduino forum has the name arduino.cc/forum/index.php What's why I think it's PHP.
Quote
Parsing data requires that is be in memory. You need to describe, or, even better, post some code, that shows how that data is stored in memory.
For the beginning just in a QString. Later it's going to be read from the internet in some way with either QtWebKit or QtNetwork (I have to look  further into them, but I think I need QtWebKit) and then stored into a QString.
Quote
Each mechanism for storing the data provides some means for reading bits and pieces of the data. For example, if the data is stored in a char array, and is properly NULL terminated, the strtok function can return tokens, with each token delimited by a different (or the same) delimiter.
As you wrote in the next line, this wouldn't help to much, it could be useful though to extract the "http://arduino.cc/forum/index.php/board,5.0.html" from the HTML (or PHP) document etc.
Quote
Googling "Parsing HTML" should provide plenty of information on how to parse HTML. Every browser in the world is capable, so it can't be that difficult.
What was my though to. If you couldn't parse HTML I couldn't write this smiley

I've googled some after Parsing, C++, Qt etc. some time now, but I will continue.

JanD
Logged

nr Bundaberg, Australia
Offline Offline
Tesla Member
***
Karma: 126
Posts: 8471
Scattered showers my arse -- Noah, 2348BC.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
I thought it looked like HTML, but all the main page of the Arduino forum has the name arduino.cc/forum/index.php What's why I think it's PHP.
"PHP" pages look like HTML because all you see is the output of the PHP program, not the PHP code itself.

______
Rob
Logged

Rob Gray aka the GRAYnomad www.robgray.com

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 211
Posts: 13478
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

think that writing a parser for HTML is overkill, just to get those numbers:
you know the boardnames and the order in which they appear => for them a simple search would be sufficient.

but to be more robust search for class="boardName", if found read further until a >   then the boardname can be read until the  <

then per found boardname you have to search for   class="largetext"> and read the int behind

in pseudocode
Code:
while(! end of stream)
{
  if (skipUntil("boardName")  == true)
  {
    skipUntil(">");
    char* s = readUntil("<");
    LCD.print(s);

    skipUntil("largetext");
    skipUntil(">");
    s = readUntil("<");
    LCD.print(s);

    skipUntil("largetext");
    skipUntil(">");
    s = readUntil("<");
    LCD.print(s);
  }
}

Two quite similar functions could do the trick.
boolean skipUntil(string s);    // returns true is string found and false otherwise; AND if false the stream is empty
char * readUntil(string s, int size );   // returns all the chars from a stream until it encounters string; an optional parameter size could take care of overflow conditions.

my 2 cnts,


Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Sweden
Offline Offline
Full Member
***
Karma: 11
Posts: 237
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

What's a good beginning, thanks. I guess I have to write skipUntil and readUntil by myself. But what's a good beginning, thanks.

JanD
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 211
Posts: 13478
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
I guess I have to write skipUntil and readUntil by myself
You are a very good guesser smiley-wink

But they are quite similar ... try them with paper and pencil first  ...
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Sweden
Offline Offline
Full Member
***
Karma: 11
Posts: 237
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
I guess I have to write skipUntil and readUntil by myself
You are a very good guesser smiley-wink

But they are quite similar ... try them with paper and pencil first  ...

No need to, already done;

Code:
QString GetData::readUntil(QString end){
    unsigned int lastPossition = possition;
    int newPossiton = stringFromForum.indexOf(end, possition, Qt::CaseSensitive);
    unsigned int toMove = newPossition - lastPossition;
    QString toReturn = stringFromForum.mid(possition, toMove);
    possition = newPossition;
    return toReturn;
}

bool GetData::skipUntil(QString end){
    int place = stringFromForum.indexOf(end, possition, Qt::CaseSensitive);
    if(place == -1) return false;
    else{
        possition = place;
        return true;
    }
}

I'm almost done with the part of the code what sends data to the Arduino, and I'm halfway done with the parse code, so at the moment I only need the code for getting things from the internet and the code what is going to run on the Arduino.

JanD
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 211
Posts: 13478
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

How much RAM do you have ?  The source of the forum homepage = in the order of 40 KB == big!

...
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Sweden
Offline Offline
Full Member
***
Karma: 11
Posts: 237
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I know, I know. I'm adding
Code:
QString::~QString();
were ever I can.

But I have a pretty new Win7 computer with enough RAM for the most things. Later this is going to be executed from another computer what only does this.

JanD
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 211
Posts: 13478
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I thought you were going to parse it on your Arduino 
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Seattle, WA USA
Offline Offline
Brattain Member
*****
Karma: 601
Posts: 48543
Seattle, WA USA
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
I know, I know. I'm adding
Explicit calls to the destructor are not going to free memory. You need to delete instances of the class.

Code:
QString *someJunk = "Not needed anymore";
delete someJunk;
This will invoke the destructor, and free the memory used by someJunk.
Logged

Sweden
Offline Offline
Full Member
***
Karma: 11
Posts: 237
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks for the help, I will use that.

JanD

EDIT: I have a (half) working program available here. At the moment I wait for someone to post a new post so I can see if my program recognizes it. (C'mon, please post!  smiley )
« Last Edit: February 20, 2011, 02:03:11 pm by JanD » Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 211
Posts: 13478
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

post for jan smiley
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Sweden
Offline Offline
Full Member
***
Karma: 11
Posts: 237
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks, but... Nothing happened  smiley-cry

So, next stop debug!

JanD

EDIT: I know what the problem is now. I do never arrive in replyFinished (internet.cpp). I'm getting right back to the while loop in main after calling fetch(). I wonder what the error can be?
EDIT2: I read the documentation a bit longer, and it seam like replyFinished is a slot called by a signal. I will look into that tomorrow.
« Last Edit: February 20, 2011, 03:49:09 pm by JanD » Logged

Pages: [1] 2   Go Up
Jump to: