I wrote a simple html Parser that worked with an older version of the arduino-ide, but with my current version (1.6.3) the character-set of the ide seems to have changed.
I wrote the following code to show you what I mean:
If you use UTF-8 for the files, string literals with char codes >127 will be broken; if you try to put something with a char code >127 in, that'll be converted to two or more bytes to store as UTF-8, but whatever is listening on serial is probably using ASCII, so each such character gets displayed as multiple other characters...