I have a project where I am parsing HTML/XML, and displaying some strings. Sometimes, the HTML has characters encoded as HTML entities, and I'd like to either strip them out, or better yet, transform the easy ones to their ASCII equivalent (where there is one).
This isn't rocket science, and I can certainly write the code to do it, but I dislike reinventing wheels. Can anyone point to a handy-dandy wheel I might use?
Hi Nick- thanks for the suggestion. I saw your notice about your regexp the other day. Not being much of a regexp jockey (whenever I use it, I have to look up the doc...), it didn't occur to me as a possible solution, but it seems like it should work.
That said, I think the code you've posted is for html tags. Entities are those things like:
<
-or-
ů
-or-
እ
Which should be simple enough to write a regexp for. Thanks for the idea.