I'm thinking about making a project what use takes data from the Arduino Forum (PHP), parse it in a C++ application on my computer and then send relevant data to my Arduino. Basic, this should be used to notify me if I have gotten any new replies, but could also be used to keep track of the total amount of post on the site etc.
I have started looking into the "main" site of the Arduino forum. There are things like this:
<tr>
<td class="windowbg-l" width="6%" align="center" valign="top"><a href="http://arduino.cc/forum/index.php?action=unread;board=5.0"><img src="http://arduino.cc/forum/Themes/arduinoWide/images/on.gif" alt="New Posts" title="New Posts" /></a>
</td>
<td class="windowbg2">
<a class="boardName" href="http://arduino.cc/forum/index.php/board,5.0.html" name="b5">General Electronics</a>
resistors, capacitors, breadboards, soldering, etc.
<span class="smalltext">
<b>Last post:</b> <a href="http://arduino.cc/forum/index.php/topic,52723.msg372450/boardseen.html#new" title="Re: Read voltage across a resistor and use it as an input to an anlog pin.">Re: Read voltage across ...</a>
by <a href="http://arduino.cc/forum/index.php?action=profile;u=15067">swetrack</a>
on <b>Today</b> at 10:11:20 PM
</span>
</td>
<td class="windowbg" valign="middle" align="center" style="width: 12ex;">
<span class="largetext">1004</span>
<span class="smalltext">Posts</span>
</td>
<td class="windowbg" valign="middle" align="center" style="width: 12ex;">
<span class="largetext">137</span>
<span class="smalltext">Topics</span>
</td>
</tr>
There you can see the string "General Electronics", the numbers 1004 (posts) and 137 (topics). For a beginning I want to place these values (one String, two ints) into a class(or struct) and then put that into a QList. The sending to Arduino can come later.
So, how do I parse PHP in C++?
This is the first time I'm unsure about that forum to post this in, so if it's wrong, please delete or move it.
The first thing you should do is post in the appropriate forum.
The second thing you should do is learn to recognize what it is you want to parse. The stuff you showed is HTML, not PHP.
Parsing data requires that is be in memory. You need to describe, or, even better, post some code, that shows how that data is stored in memory.
Each mechanism for storing the data provides some means for reading bits and pieces of the data. For example, if the data is stored in a char array, and is properly NULL terminated, the strtok function can return tokens, with each token delimited by a different (or the same) delimiter.
While this is useful for extracting the individual tokens, it is not of that much help in parsing well defined structures like HTML.
Googling "Parsing HTML" should provide plenty of information on how to parse HTML. Every browser in the world is capable, so it can't be that difficult.
The first thing you should do is post in the appropriate forum.
Do you mean in a C++ forum? Or a different part of this forum?
The second thing you should do is learn to recognize what it is you want to parse. The stuff you showed is HTML, not PHP.
I thought it looked like HTML, but all the main page of the Arduino forum has the name arduino.cc/forum/index.php What's why I think it's PHP.
Parsing data requires that is be in memory. You need to describe, or, even better, post some code, that shows how that data is stored in memory.
For the beginning just in a QString. Later it's going to be read from the internet in some way with either QtWebKit or QtNetwork (I have to look further into them, but I think I need QtWebKit) and then stored into a QString.
Each mechanism for storing the data provides some means for reading bits and pieces of the data. For example, if the data is stored in a char array, and is properly NULL terminated, the strtok function can return tokens, with each token delimited by a different (or the same) delimiter.
Googling "Parsing HTML" should provide plenty of information on how to parse HTML. Every browser in the world is capable, so it can't be that difficult.
What was my though to. If you couldn't parse HTML I couldn't write this
I've googled some after Parsing, C++, Qt etc. some time now, but I will continue.
think that writing a parser for HTML is overkill, just to get those numbers:
you know the boardnames and the order in which they appear => for them a simple search would be sufficient.
but to be more robust search for class="boardName", if found read further until a > then the boardname can be read until the <
then per found boardname you have to search for class="largetext"> and read the int behind
in pseudocode
while(! end of stream)
{
if (skipUntil("boardName") == true)
{
skipUntil(">");
char* s = readUntil("<");
LCD.print(s);
skipUntil("largetext");
skipUntil(">");
s = readUntil("<");
LCD.print(s);
skipUntil("largetext");
skipUntil(">");
s = readUntil("<");
LCD.print(s);
}
}
Two quite similar functions could do the trick.
boolean skipUntil(string s); // returns true is string found and false otherwise; AND if false the stream is empty
char * readUntil(string s, int size ); // returns all the chars from a stream until it encounters string; an optional parameter size could take care of overflow conditions.
I'm almost done with the part of the code what sends data to the Arduino, and I'm halfway done with the parse code, so at the moment I only need the code for getting things from the internet and the code what is going to run on the Arduino.
I know, I know. I'm adding QString::~QString(); were ever I can.
But I have a pretty new Win7 computer with enough RAM for the most things. Later this is going to be executed from another computer what only does this.
EDIT: I have a (half) working program available here. At the moment I wait for someone to post a new post so I can see if my program recognizes it. (C'mon, please post! )
EDIT: I know what the problem is now. I do never arrive in replyFinished (internet.cpp). I'm getting right back to the while loop in main after calling fetch(). I wonder what the error can be?
EDIT2: I read the documentation a bit longer, and it seam like replyFinished is a slot called by a signal. I will look into that tomorrow.