Figuring out the mysterious hidden character in serial

Hi All,

I'm reading data coming from an HM-10 Bluetooth LE module via software serial. Everything works OK, and a typical scan data result (i.e. message coming in from the HM-10 over the software serial) would be:

OK+DISISOK+DISC:4C000215:FDA50693A4E24FB1AFCFC6EB07647825:0005000664:E6F4357FEB3E:-091OK+DISC:4C000215:FDA50693A4E24FB1AFCFC6EB07647825:2714D4D0C0:A0E6F869A527:-082OK+DISC:00000000:00000000000000000000000000000000:0000000000:7A2BC9341733:-081OK+DISC:4C000215:00001803494C4F474943544543480000:0001000200:A0E6F8474ED0:-095OK+DISCE

One long string of data with no line breaks or nothin'. You will need to scroll the above window to the right to see it all.

The sketch is very basic, but for the record, here it is:

#include <SoftwareSerial.h>

SoftwareSerial HM10(2, 3); //HM10(Receive Pin, Transmit Pin)

void setup()
{
  Serial.begin(57600);  // Begin the Serial Monitor connection at 57600bps
  HM10.begin(57600);  // Begin the HM-10 connection at 57600bps
}

void loop()
{
  if (HM10.available()) // Read from HM-10 and send to Serial Monitor
    Serial.write(HM10.read());
  
  if (Serial.available()) // Read from Serial Monitor and send to HM-10
    HM10.write(Serial.read());
}

Now here's the weird part: When I cut and pasted this data from the serial monitor window into TextEdit on my iMac, it suddenly looked like this:

OK+DISISOK+DISC:4C000215:FDA50693A4E24FB1AFCFC6EB07647825:0005000664:E6F4357FEB3E:-091OK+DISC:
4C000215:FDA50693A4E24FB1AFCFC6EB07647825:2714D4D0C0:A0E6F869A527:-082OK+DISC:
00000000:00000000000000000000000000000000:0000000000:7A2BC9341733:-081OK+DISC:
4C000215:00001803494C4F474943544543480000:0001000200:A0E6F8474ED0:-095OK+DISCE

Nothing special about the first "DISC:", but every subsequent "DISC:" appears to have a hidden "new line" or maybe "carriage return" or something after it. It's not visible, but you can know it's there by the appearance in the TextEdit window. So my question is: What is this mysterious hidden character? I would really like to know since I want to use it as the terminating character when reading data into a char.

I took the same text and pasted it into BBEdit (with Show Invisibles turned on) to see if the hidden character would show up there, but nothing.

Any ideas what it might be or how I can figure it out?

Thanks!

Update:
Just realized that there is a hidden mystery character after the first DISC: also. If I narrow the window of TextEdit, the text suddenly looks like this:

OK+DISISOK+DISC:
4C000215:FDA50693A4E24FB1AFCFC6EB07647825:0005000664:E6F4357FEB3E:-091OK+DISC:
4C000215:FDA50693A4E24FB1AFCFC6EB07647825:2714D4D0C0:A0E6F869A527:-082OK+DISC:
00000000:00000000000000000000000000000000:0000000000:7A2BC9341733:-081OK+DISC:
4C000215:00001803494C4F474943544543480000:0001000200:A0E6F8474ED0:-095OK+DISCE

Try this:

  if (HM10.available()) { // Read from HM-10 and send to Serial Monitor
    char c = HM10.read();
    if (c < 0x20) {  // If it's a control character
      Serial.write('\\');  // Put in a marker...
      Serial.write(c+ 0x20);  // ... and shift it up to the printable characters
      Serial.write('\\');  // Put in another marker.

      Serial.println();  // New Line
    } else
      Serial.write(c);
  }

You'll get a character between backslashes at the end of the line if the character is not printable.

johnwasser:
Try this:
...

You'll get a character between backslashes at the end of the line if the character is not printable.

Nope, nothing appeared. Message back from HM-10 is still one long line:

OK+DISISOK+DISC:4C000215:FDA50693A4E24FB1AFCFC6EB07647825:0005000664:E6F4357FEB3E:-091OK+DISC:4C000215:00001803494C4F474943544543480000:0001000200:A0E6F8474ED0:-094OK+DISC:4C000215:FDA50693A4E24FB1AFCFC6EB07647825:2714D4D0C0:A0E6F869A527:-076OK+DISC:4C000215:FDA50693A4E24FB1AFCFC6EB07647825:0003900039:C0F91C1771A8:-062OK+DISCE

But when pasted into TextEdit, it appears as if it was formatted with something after each "DISC:" still

My guess is that the "mysterious character" is just TextEdit deciding that a colon is a good place to break a line that is too long to fit in the window. Try making the window even narrower to see if TextEdit decides to break at other colons.

johnwasser:
My guess is that the "mysterious character" is just TextEdit deciding that a colon is a good place to break a line that is too long to fit in the window. Try making the window even narrower to see if TextEdit decides to break at other colons.

Yeah, that was my guess too, but when I resize the window to different widths, it always seems to give special treatment to "DISC:". Examples:

Figuring out the mysterious hidden character in serial - Programming Questions - Arduino ForumHM-10 data1.png|482x131

Figuring out the mysterious hidden character in serial - Programming Questions - Arduino ForumHM-10 Data2.png|351x129

Figuring out the mysterious hidden character in serial - Programming Questions - Arduino ForumHM-10 Data3.png|1100x48

HM-10 data1.png

HM-10 Data2.png

If someone has a Mac with TextEdit, try to copy one of my lines of seemingly unformatted text and paste it into TextEdit. I think you will also find that it is behaving mysteriously, as if there is some hidden charter after "DISC:". Let me know what you find.

OK+DISISOK+DISC:4C000215:FDA50693A4E24FB1AFCFC6EB07647825:0005000664:E6F4357FEB3E:-091OK+DISC:4C000215:FDA50693A4E24FB1AFCFC6EB07647825:2714D4D0C0:A0E6F869A527:-082OK+DISC:00000000:00000000000000000000000000000000:0000000000:7A2BC9341733:-081OK+DISC:4C000215:00001803494C4F474943544543480000:0001000200:A0E6F8474ED0:-095OK+DISCE

Another failed attempt to understand: I downloaded a Hex Editor and opened up the TextEdit file to see if I could find anything. But there was nothing mysterious between the first "DISC:" and the "4C000215" following it.

The text showed as:
DISC:4C000215

And the hex showed as:
444953433A3443303030323135

The 3A hex corresponds to a regular colon mark (":"). Then the next hex, 34, corresponds to the 4 at the beginning of 4C000215.

I guess the only conclusion is that Apple's TextEdit can read my mind and knows how I want my data chopped up. :confused:

After playing with this string it appears that Text Edit prefers to break lines at after the ':' of a Letter Colon Digit sequence. It doesn't break at Digit Colon Letter, Letter Colon Letter, or Digit Colon Digit. This means that there is no "hidden" character in your string, just a pattern that Text Edit prefers for inserting line breaks when soft wrapping text.

AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:AAAAAAAAAAA1:BBBBBBBBB:2CCCCCCCCCCC:DDDDDDDDDDDD4:4EEEEEEEEEEE:

The OP could easily check for "funny" characters by iterating over the string and printing each character and its ascii value.

...R

johnwasser:
After playing with this string it appears that Text Edit prefers to break lines at after the ':' of a Letter Colon Digit sequence. It doesn't break at Digit Colon Letter, Letter Colon Letter, or Digit Colon Digit. This means that there is no "hidden" character in your string, just a pattern that Text Edit prefers for inserting line breaks when soft wrapping text.

That's it; You're a genius!

I played around with this question last night until around midnight...googling ever possible explanation...I even typed in several strings by hand to see if it had something to do with pasting it over from the serial window. No difference.

I guess I already knew there were no hidden characters after looking at the hex of the file contents. But it was still kinda driving me crazy.

I then downloaded about 5 different text editors to see if they would do the same thing. The first four editors did not have virtual line breaks inserted as TextEdit does. Finally, the fifth editor I tried, un-creatively named Tex-Edit, also had virtual line breaks added...but it added them BEFORE the "DISI:" instead of after! Actually, nicer looking, but pretty clear indirect evidence that there are/were no hidden characters.

Thanks for working on this puzzle with me. Next we should tackle peace in the Middle East.

Zimbu:
Next we should tackle peace in the Middle East.

I would, but I wouldn't want to put Jared Kushner out of a job (even though he has so many). Besides, he looks much better than I would touring a war zone in a flack jacket.

johnwasser:
I would, but I wouldn't want to put Jared Kushner out of a job (even though he has so many). Besides, he looks much better than I would touring a war zone in a flack jacket.

Plus, I'm already too tired from all this "winning"...