Printing Strings drops some characters

I’ve got this code to receive some UDP data from TouchOSC, on an IPad

    // read the packet into packetBufffer and get the senders IP addr and port number
    Udp.readPacket(packetBuffer,UDP_TX_PACKET_MAX_SIZE, remoteIp, remotePort);

    //I'm unsure why I get different results from code below:

    // Example 1 failing
    Serial.println(packetBuffer);

    // Produces : /1/fader1

    // Whereas

    // Example 2 works
    for (int_index_of_string = 0; int_index_of_string < sizeof(packetBuffer)-1; int_index_of_string++)
    {
        Serial.print(char(packetBuffer[int_index_of_string]));     
    }

    Produces : /1/fader1000,f00?XouA"0

I’ve substituted the missing data with characters that you can see in a webpage without having to use escape characters.
Why does the first lot of code drop them? I’ve spent some time chasing a way to discover if they where there.

The dropped characters are important as they are the Data of an OSC packet received on the UDP interface.
The size of the array is correct as the second code example proves.
It seems that some code within Serial.print is scanning the array and looking for Null so that it can end the print process early.

Is this the only byte value scanned for, for the early termination of the print process?

It dies on the first 0, that should tell you the problem I think.

You must have binary data (including 0s) in packetBuffer, println will terminate on the first 0 because it prints a string and by definition a 0 is the end of a string.


Rob

This is one of the nastinesses of the C programming language on which the Arduino system is based I'm afraid - strings and arrays carry no length information at runtime.

Also printing a binary buffer as a string is fraught anyway since many binary char values are not printable characters. You should print it in hexadecimal perhaps?

This is one of the nastinesses of the C programming language

I think that that is a pretty harsh way of describing a design decision that was made when the C language was defined. It is simply a reality, not a “nastiness” designed into the language for the sole purpose of annoying you.

OP: The comments about binary data in the packets are true, because OSC packets are binary packets. That some of the bytes in the packet are human-readable is just lucky for you. You should be using the OSC library to print OSC packet data. You need to know something about the packets being sent in order to know how to extract the int, char, and float data embedded in the packets.

Well I note languages based on string processing don't do things this way, and 0 is a valid ASCII and Unicode code point, so C technically can't actually handle ASCII or Unicode strings - that's why they are usually called "C-style strings". Also a very large number of bugs and remote code execution vulnerabilities are due to this 'decision', so it never impressed me.

As soon as a C program tries to read in string like data from files or databases or network packets this problem rears its head and bites the unwary.

This is one of the nastinesses of the C programming language on which the Arduino system is based I'm afraid - strings and arrays carry no length information at runtime.

I understand that string length information is not avalible at runtime, but doesn't sizeof(array1) work with arrays at run time?

Lefty

retrolefty:

This is one of the nastinesses of the C programming language on which the Arduino system is based I'm afraid - strings and arrays carry no length information at runtime.

I understand that string length information is not avalible at runtime, but doesn't sizeof(array1) work with arrays at run time?

Lefty

sizeof only reports compile time length of the char array so it always reports say 32 if you do char msg[32]; regardless the content of msg.

Unfortunately there is no intrinsic run-time method of finding a string length except scanning for the null terminator.


Rob

Unfortunately there is no intrinsic run-time method of finding a string length except scanning for the null terminator.

Which is, of course, what the strlen() function does.

also strlen_P for PROGMEM string length :)

Since your code data can apparently have null bytes, you'll need some insider info as to the length of the data - perhaps the data has a length byte, or the standard you are using specifies the length? Does Udp.readPacket() provide any information on the read size?

Given the size information, you'll have to use example 2 to output, since example 1 will always choke on null bytes in the string.

PaulS:
OP: The comments about binary data in the packets are true, because OSC packets are binary packets.

Yeah I know, this is why I’m trying to pull them apart as I work on the protocol. I’ve got no issues with pulling apart the protocol it’s just wrangling with the Language that I’m struggling with (first time programming).

I get this output:

/1/push4

2F:31:2F:70:75:73:68:34:0:0:0:0:2C:66:0:0:3F:FFFFFF80:0:0:FFFFFFC0:FFFFFFA8:0:

From this code

    Serial.println(packetBuffer);

    for (int_index_of_string = 0; int_index_of_string < sizeof(packetBuffer)-1; int_index_of_string++)
    {
      Serial.print(packetBuffer[int_index_of_string],HEX);
      Serial.print(":"); // delimit each byte with a colon for standard Hex notation eg MAC addresses.
    }

I thought great I’ll just look at the Hex value of each byte, but wait a minute what is the long hex values of FFFFFF80 (4 bytes) where are they coming from, I thought my string was an array of bytes but it looks like a string is an array of strings? what’s going on?

Yeah I know, this is why I'm trying to pull them apart as I work on the protocol.

Treat the packets as the black boxes they are supposed to be. Use the various OSC library functions to get the nth integer or nth float from the packet. Do NOT try to parse (or print) the packet itself. You have no idea how the packet was constructed, nor do you need to know how the packet was actually constructed.

PaulS: Treat the packets as the black boxes they are supposed to be. Use the various OSC library functions to get the nth integer or nth float from the packet. Do NOT try to parse (or print) the packet itself. You have no idea how the packet was constructed, nor do you need to know how the packet was actually constructed.

Apologies if I've taken this the wrong way but your post has come across as a very arrogant response. OSC is an OPEN protocol so I have access to the protocol description. I need to know so I can learn, which is why I came to this forum, with the exception of your post telling me what I need, I've learnt from the discussion.

If your reacting to something I've posted then I'm sorry if I've offended you, it was completely unintentional.

If this is not the way to reverse engineer a protocol then I'd be happy to accept pointers on where to go.

I didn't mean to be offensive, but there is a big difference between open and transparent.

By telling you what type of message is being sent, the sending application is defining the protocol for the message - the order that the ints and floats are arranged in the message.

If your sending application does not define that the order for the various message types, then that application is not being open.

If you know the order of the ints and floats in the packet, for a given packet type, then you can extract the ints and floats from the arrangement of bytes in the packet - either the hardware or the easy way.

However, it appeared to me that you were complaining that you couldn't just print the packet and see something like: 1/push4/14,67.8,121

Beyond the message number and message type, the rest of the data is binary data. You need to know the type of data that the bytes represent so you extract the correct number of bytes and reassemble them in the correct order to recreate the value that the sender embedded in the packet.

If you don't know the type(s) of the value(s) embedded in the packet, you won't be successful at extracting the data, and the open source nature of the messaging system is not being adhered to.

MarkT: ... so C technically can't actually handle ASCII or Unicode strings - that's why they are usually called "C-style strings".

C can, but the string libraries which expect null-terminated strings obviously cannot handle imbedded nulls, by a design decision. There have historically been lots of ways of storing string lengths, all with advantages and disadvantages, some I recall are:

  • Fixed length (eg. Cobol) with padding of spaces to the right
  • Some kind of trailing delimiter (eg. 0x00) - delimiters could be one or more bytes
  • A leading length byte (eg. Pascal)
  • A leading length word (which can handle string lengths up to 65535)
  • A separate string length (eg. a long) stored not adjacent to the string
  • Leading and trailing delimiters (like C source, eg. "swordfish")

The various methods have design trade-offs relating to storage space, speed of access, speed of determining the string length, maximum string size that can be handled, ease of handling imbedded delimiters, etc. For example, the "source-code" strings can handle imbedded delimiters by preceding them by a backslash.

Unicode is a whole new ball game, and of course Unicode can be managed in C (but not necessarily with the C string libraries like strlen).

To be fair to the C string library, it handles "printable strings" OK. And the code point 0 is not printable.

Quite an interesting discussion here:

http://en.wikipedia.org/wiki/String_%28computer_science%29

As for the original question, you can't print what amounts to "binary" data using the C string libraries. The approach used in earlier posts of iterating through the string, byte by byte, and printing each one individually until you reach the "length" (however you know that) is one way of doing it.

Without knowing more about the OSC protocol I can't say more, but perhaps the data can be split apart at the 0x00 bytes? In which case they have a useful purpose.

Thanks, I do appreciate the time people take to respond, I just struggle with the logic of stepping through the [u]bytes[/u] of a string but getting 32bit data returned.

My bad for not explaining what I'm doing clearly. I really am just playing with the language, I've done some programming (for fun about 15 years ago and it was low level stuff with exactly what was going on known, not as simple to use but inherently logical). I'm transitioning about 15 years of advancement at the moment so I will struggle I suppose.

It looks as though the index of the string is indeed a pointer to the next segment of data as opposed to the next byte and that the String is aware of the data type and size of each segment, this might just be the case for strings returned by the OSC library, or it may be universal behaviour.

As frustrated as I get atleast I'm not dealing with page boundaries and DMA transfers.

--_Adam: I get this output:

/1/push4

2F:31:2F:70:75:73:68:34:0:0:0:0:2C:66:0:0:3F:FFFFFF80:0:0:FFFFFFC0:FFFFFFA8:0:




...

I thought great I'll just look at the Hex value of each byte, but wait a minute what is the long hex values of FFFFFF80 (4 bytes) where are they coming from, I thought my string was an array of bytes but it looks like a string is an array of strings? what's going on?

...

--_Adam: Thanks, I do appreciate the time people take to respond, I just struggle with the logic of stepping through the [u]bytes[/u] of a string but getting 32bit data returned.

You are getting bytes. What you are seeing is a manifestation of "sign extension", that is all. The string is a string of bytes, and the string library is not aware of the "type and size of each segment" - it's much simpler than that.

Say the string contains 0x80 (a byte). And say you copy that into a signed long. Well a char is signed so 0x80 technically is considered to be -128 (in decimal). In a long it is also -128, but since that is a twos-complement number it becomes 0xFFFFFF80. Same number, just represented with extra bits in a 4-byte field. And that is because, in a long, the number 0x00000080 would simply be 128 (not -128).

So you are being caught be the way you printed the numbers, nothing more. You would find that every number (using that printing technique) that had the 8-bit set would have the extra FFFFFF prepended to it.

OK Thanks I understand I should have thought a bit more about the FF values I was seeing.

It's a strange behaviour though, I don't have any declared variable of type long in my sketch. And I had specifically asked the string to return the HEX value of a byte, not load the byte into a long and return the hex value of the 2's complement.

if I had a string of 255 length where the first character was 01h and the last was FFh then using Serial.write(packetBuffer[int_index_of_string], hex); would return 2 character hex values eg 0Dh for the first 127 results then 8 character hex values for the rest eg FFFFFFB7h

How would I go about looking at/displaying the actual byte, as hex or binary with 2 Hex characters or 8 bits, regardless of whether the most significant bit was set to 1.

You may not have defined any long types but the underlying code "helpfully" coerced it into a long for you:

void Print::print(char c, int base)
{
  print((long) c, base);
}

(I know it's Print not Serial, but just take my word for it that after a bit of subclassing this is where we end up).

As for working around it, sprintf perhaps? You can specify hex types, and lengths. Something like:

char buf [3];
sprintf (buf, "%02X", c);

(untested, uncompiled code).

Then send "buf" instead of the original character.