Want to receive ASCII not UTF-8 from bowser (ARDUINO+ESP8266)

I am using the following html-code for an arduino due and esp8266 based web-server:

<form action="192.168.4.1" method="GET" accept-charset="ascii">
.
.
.
<input type="url" name="FEEDURL1">
<input type="submit" value="Submit">
.
.
.

The from the client’s browser received and serial.printed result is (e.g.) “FEEDURL1=http%3A%2F%2Fwww.tagesschau.de%2Fxml%2Frss2”.
So seems it is UTF-8 although I stated “accept-charset=“ascii””!
Is there a way to make the browser send ascii, or do I have to parse the received data?
Thanks for the answers in advance!

The from the client's browser received and serial.printed result is (e.g.) "FEEDURL1=http%3A%2F%2Fwww.tagesschau.de%2Fxml%2Frss2". So seems it is UTF-8 although I stated "accept-charset="ascii""! Is there a way to make the browser send ascii, or do I have to parse the received data?

What are you expecting to get? It's not that difficult to change %3A to :, to change %2F to /,, etc.

So seems it is UTF-8 although I stated “accept-charset=“ascii””!

That’s not UTF-8. It’s called “URL encoding”.

http://www.w3schools.com/tags/ref_urlencode.asp

Like PaulS says, if you hit a %, take the next two (hex) characters, then convert that group (of 3) into one ASCII byte. (eg. %20 is space, etc.).

[quote author=Nick Gammon date=1430341111 link=msg=2210524] That's not UTF-8. It's called "URL encoding".

http://www.w3schools.com/tags/ref_urlencode.asp

Like PaulS says, if you hit a %, take the next two (hex) characters, then convert that group (of 3) into one ASCII byte. (eg. %20 is space, etc.). [/quote]

...good to know. At the moment I am using

string->replace("%3A", ":");
string->replace("%2F", "/");

, but now I know that there could be other codes, too, so I will convert the hex-value.

THANKS!

Absolutely, unless you want to write 256 "replace" function calls.

Although if you look here: http://en.wikipedia.org/wiki/Percent-encoding

... there might only be 18 of them. Still, converting the hex code is the simple way to go.

Feel free to use or to provide a better solution:

String hex2ascii(String string)
{
  string.replace("%21", "!");
  string.replace("%23", "#");
  string.replace("%24", "$");
  string.replace("%25", "%");
  string.replace("%26", "&");
  string.replace("%27", "'");
  string.replace("%28", "(");
  string.replace("%29", ")");
  string.replace("%2A", "*");
  string.replace("%2B", "+");
  string.replace("%2C", ",");
  string.replace("%2F", "/");
  string.replace("%3A", ":");
  string.replace("%3B", ";");
  string.replace("%3D", "=");
  string.replace("%3F", "?");
  string.replace("%40", "@");
  string.replace("%5B", "[");
  string.replace("%5D", "]");

  string.replace("%20", " ");
  string.replace("%22", """");
  string.replace("%2D", "-");
  string.replace("%2E", ".");
  string.replace("%3C", "<");
  string.replace("%3E", ">");
  string.replace("%5C", "\\");
  string.replace("%5E", "^");
  string.replace("%5F", "_");
  string.replace("%60", "`");
  string.replace("%7B", "{");
  string.replace("%7C", "|");
  string.replace("%7D", "}");
  string.replace("%7E", "~");
  string.replace("%C2%A3", "£");
  
  return string;
}

Regards

vcmorini:
Feel free to use or to provide a better solution:

[code]
String hex2ascii(String string)
{
 string.replace("%21", “!”);
 string.replace("%23", “#”);
 string.replace("%24", “$”);
 string.replace("%25", “%”);
 string.replace("%26", “&”);
 string.replace("%27", “’”);
 string.replace("%28", “(”);
 string.replace("%29", “)”);
 string.replace("%2A", “*”);
 string.replace("%2B", “+”);
 string.replace("%2C", “,”);
 string.replace("%2F", “/”);
 string.replace("%3A", “:”);
 string.replace("%3B", “;”);
 string.replace("%3D", “=”);
 string.replace("%3F", “?”);
 string.replace("%40", “@”);
 string.replace("%5B", “[”);
 string.replace("%5D", “]”);

string.replace("%20", " “);
 string.replace(”%22", “”"");
 string.replace("%2D", “-”);
 string.replace("%2E", “.”);
 string.replace("%3C", “<”);
 string.replace("%3E", “>”);
 string.replace("%5C", “\”);
 string.replace("%5E", “^”);
 string.replace("%5F", “_”);
 string.replace("%60", “`”);
 string.replace("%7B", “{”);
 string.replace("%7C", “|”);
 string.replace("%7D", “}”);
 string.replace("%7E", “~”);
 string.replace("%C2%A3", “£”);
 
 return string;
}

[/code]

Regards

I found a few issues with your function:

  • This will give the wrong output for something like “%2526” (it should be “%26”, but it incorrectly gives “&”).
  • “%22” is replaced with nothing.
  • None of the characters 01 through 1F are handled.
  • It cannot handle lower-case digits a-f.
  • It creates and destroys a lot of String objects (very wasteful).
  • “%C2%A3” is UTF-8, not ASCII.

I would write a function that follows PaulS’s and nickgammon’s advice from 4 years ago:

nickgammon:
Like PaulS says, if you hit a %, take the next two (hex) characters, then convert that group (of 3) into one ASCII byte. (eg. %20 is space, etc.).

Edit: I was wrong on this point:

  • It creates and destroys a lot of String objects (very wasteful).

There’s only one String being modified in-place, not a lot of String objects created and destroyed.

Joegi:
The from the client’s browser received and serial.printed result is (e.g.) “FEEDURL1=http%3A%2F%2Fwww.tagesschau.de%2Fxml%2Frss2”.
So seems it is UTF-8 although I stated “accept-charset=“ascii””!

That is not UTF-8 - it is www-form-encoded, which is what html forms always send. It’s part of the definition of html.

www-form-encoding is described here Forms in HTML documents . This link isn’t just some web article about the encoding, it is the source document that specifies the encoding. It is like the datasheet of an IC, it’s fully authoritative.

You are just going to have to decode the data, and that’s all there is to it I’m afraid. Happily, because percent encoding always expands a single character into 3, it can easily be decoded in-place with a pair of pointers.

To decode the www-form-encoding in place:

void decode_in_place(char *s) {
  char *d = s;

  while(*s) {
    switch(*s) {
    case '+': *d++ = ' '; s++; break;
    case '%':
      s++; if(!*s) break; // handle malformed input
      *d = hexValue(*s++) << 4;
      if(!*s) break; // handle malformed input
      *d |= hexValue(*s++);
      d ++;
      break;
    default:
      *d++ = *s++; break;
    }
  }

  *d = '\0'; // always add the terminator
}

inline int hexValue(char c) {
  if(c >= '0' && c<= '9') return c - '0';
  if(c >= 'A' && c<= 'F') return c - 'A' + 10;
  if(c >= 'a' && c<= 'f') return c - 'a' + 10;
  return 0;
}