I made an array with font-definitions for a LED-matrix.
That works fine and all letters and numbers show well.
BUT: in some other languages they want to use characters with a higher ASCII value, like the ü which has a value of 129..
Strange enough, when i want to know the ASCII value of 'ü' it returns 135 instead.
and when i want to know the value of all characters between 128 and 150, it returns all kinds of values but most of them are 195...
This is done on a ESP32C3. example-code:
scrollText = "ÇüéâäàåçêëèïîìÄÅÉæÆôöòû"; // all characters from 128-150
for(int i=0; i<23;i++){Serial.print(scrollText[i],DEC);Serial.print(" ");Serial.write(scrollText[i]);}
this is the output:
195 �135 �195 �188 �195 �169 �195 �162 �195 �164 �195 �160 �195 �165 �195 �167 �195 �170 �195 �171 �195 �168 �195 �
It does with the old DOS code page 437, but modern compilers use Unicode, where its code point is 252. Furthermore, strings are encoded as UTF-8, which what the Serial Monitor expects.
If you check strlen(scrollText), it's longer than 23.
The first byte is the 110xxxyy pattern, indicating a two-byte encoding. Do the bit math, and yy is 11: 192 + 3 = 195. That means that each of code points is at least 192, since the two high bits of yyyy are 11. 252 >= 192.
The second/last byte pattern is 10yyzzzz.
252 - 192 = 60
60 + 128 = 188
which is the fourth decimal value printed: the second byte for the second character ü
So to do this correctly, you need to decode UTF-8, and have your font table be indexed by Unicode, or its single-byte subset ISO-8859-1, if it has all the characters you want to support.
The 3-byte codes have a larger number of possible characters than the 2-byte codes, and since there are currently some 4-byte UTF-8 codes I'd assume the shorter codes are fully populated. The shorter codes would tend to be the more commonly used, since they were allocated first. Some obscure emoji is likely to be a 4-byte code.
Think of the terminal as dumb. There is no control capability. It uses the 7 bit ASCII character set. If you want that you can use a terminal emulator on your PC.
It only appears to support the lower 7 bits and a few control characters on my IDE. No cursor control or anything like that. Mainly tab, cr, lf. What is the trick to turn on the rest of it. UTF-8 - Wikipedia Variable length encoding, 1-4 bytes per code point. ASCII values are encoded as ASCII using 1 byte. How do I enable the rest of it?
int acar = int(scrollText[letter] - 0x20);
if(acar==163) acar=int(scrollText[++letter]-0x20);
Simply said: read a character, pointed by the value of "letter", from the string and if this is 195, increase the pointer "letter" and read that next character.
(as my fonttable starts with space=32, i subtract 32=0x20 from the value, so 195 will be 163).
Look up Extended ASCII, the chars from 128 to 255 ( x80 to xFF) as shown in the backs or fronts of programming books since the late 70's. All one single 8 bit chars.
They don't print without a FONT that goes that far and they do if it does.
They are great if you want to make linework boxes with text, I used them to make a POV 25x25 maze game.
I think that the 2 byte chars came along with LIMMS.
Where " " is 0x20, 0x0 perhaps ' ' is a better representation.
What is 'magical' about using hex?
Easily explained.
Arthur C. Clarke famously stated that "any sufficiently advanced technology is indistinguishable from magic".
0x20 is sufficiently advanced to appear magical then.
I have an attitude after being crapped on for using pre-calculated data in my on-the-fly word match algorithm by someone who played superiority at (not on) me about my "spaghetti" code who was unable to figure out how it works if his failed attempt to clean it up is any indication. What can I say? He left the bits out that make it work for wrong words so it passed a test that fed only correct words in to get worst-case match speed and when I pointed that out, did not return with corrected code after claiming he could make it smaller and faster with his superior yadda-yadda. What a shame, he made it a few bytes smaller by making it fail!
Magic numbers... when I read that, it triggers me. People who can read hex have never cracked hex dumps.