Searching through your own post history and backing up your posts

Like most active members of this forum, I often come across questions that have been asked many, many times, and that I already provided an answer to in the past.
In such cases, it would be useful to be able to search through your own posts, and link to the answer in question.

Sadly, the advanced search option of the forum does not work, and most topics on the site are not well indexed by Google. That makes it really hard to find the posts you're looking for, even if you know exact keywords or phrases to search for.

That inspired me to create a simple Python script, that just reads all of your posts into a database. Everything is indexed, it distinguishes between normal text, code and quotes.
By using SQL, you can search for specific keywords, exclude keywords, search for code only, only search for posts within a given time frame, only search a given board, etc.

I also created a simple web interface that allows you to easily search the database.

I thought that there might be other forum enthusiasts that would be interested in this project, so I shared all the code on GitHub (Python scripts, database models and SQL scripts, advanced example SQL queries, PHP scripts and HTML/CSS files).

https://github.com/tttapa/Arduino-Forum-Search

Enjoy!

Pieter

Interesting idea - thanks for sharing.

The first time you use it does it trawl through the entire Forum website to find every Reply that you posted?

Does it then copy the text of each Reply to a database on your own laptop?

And is all the subsequent searching done on your PC without actually needing access to the internet?

...R

PS ... maybe this Thread would be more appropriate in the Website and Forum section? If you think so then click Report to Moderator and ask for it to be moved.

I been semi-manually scraping my posts from the forums for a while now, and this sounds much nicer, but setting up a database server and web server to do it is a lot more effort than I want to go to :frowning:
(Even with 10k+ posts in 2000+ separate html files, something like grep takes mere seconds to go through it all, so I'm not sure a "real" database is worth it. Maybe I'll see if I can get the "scraping" more separated from the db...)

I bookmark any Thread that I start. And I also bookmark the occasional Post (by me or A N Other) that I think may be useful later. Between the two they represent a very tiny percentage of all the drivel that I post. And I rarely feel the need to find other stuff I have written.

...R

Robin2:
The first time you use it does it trawl through the entire Forum website to find every Reply that you posted?

It uses the "Show posts" section on your profile. (https://forum.arduino.cc/index.php?action=profile;area=showposts;

Robin2:
Does it then copy the text of each Reply to a database on your own laptop?

Yes, it splits up the post into actual post text, code snippets and quotes. It indexes these three fields for really fast text search, and it also saves the post's full HTML for nice formatting in the search interface.
It also saves some metadata (title, date, username, board id and title, topic ID, message ID). The topic- and message ID are used to link you back to the forum once you've found the post you were looking for.

Robin2:
And is all the subsequent searching done on your PC without actually needing access to the internet?

Exactly. Everything happens locally, no internet required.

Robin2:
PS ... maybe this Thread would be more appropriate in the Website and Forum section? If you think so then click Report to Moderator and ask for it to be moved.

You're probably right. I posted it here because it has way more traffic than the Website and Forum section, I suspect that many people just don't check that section.

Robin2:
I bookmark any Thread that I start. And I also bookmark the occasional Post (by me or A N Other) that I think may be useful later.

I have the bad habit of never cleaning up my bookmarks, and bookmarking many, many pages (from sites other than the forum as well). I was in need of something a bit more structured :slight_smile:

westfw:
I been semi-manually scraping my posts from the forums for a while now, and this sounds much nicer, but setting up a database server and web server to do it is a lot more effort than I want to go to :frowning:
(Even with 10k+ posts in 2000+ separate html files, something like grep takes mere seconds to go through it all, so I'm not sure a "real" database is worth it. Maybe I'll see if I can get the "scraping" more separated from the db...)

Currently, I just have everything running in an Ubuntu Server VM in VirtualBox. The Ubuntu installer installs all necessary software for you (Apache, MySQL, PHP, Python), so all you have to do is install some Python modules (through pip, or conda), run the SQL file provided to setup the database (mysql -u root -p < Database.sql), and optionally move the PHP files into your hosting folder. All that's left then, is to forward port 80 from the VM to the host machine (in the VirtualBox settings).
It's a matter of minutes if you're familiar with those kinds of things, especially if you already have a server VM you can just clone, but I understand that it's not exactly user-friendly.

I should probably look into SQLite for the database instead of MySQL, it doesn't require a DBMS server, and saves everything into a single file. It also supports full-text indices, like MySQL.

I like SQLite myself - it's a lot less trouble for a simple project and there is a nice program called SQLiteman that allows you to view and edit any database (great for when you screw things up).

Also, making a backup of the database just requires making a copy of one file.

...R

westfw:
Maybe I'll see if I can get the "scraping" more separated from the db...

+1

Having a text file of all/a range of my posts would be great. Setting up a web server and database is way overkill for finding a snippet of a conversation or code suggestion. I agree that the forum's google search is... lacking.