ASCII(?) to binary. PDU conversion

I have a bunch of ascii characters I want to convert to binary and store the result in a variable so I can manipulate them later on.

I know I can easily print them out the serial port as binary.

But I am not sure how to store it in a variable.

See if "atoi" will do the trick.

http://en.wikipedia.org/wiki/Atoi

Im not too sure how to use 'atoi' but after a quick google I came up with this:

void setup() {
  Serial.begin(9600);
  }

void loop() {
char * thisChar = "a";
int a = atoi(thisChar);
 Serial.println(a);

  delay(1000);
}

It returns 0. :S

That’s because “a” isn’t a decimal digit.
Having re-read your OP, I think you’re not talking about numbers.
Can you explain some more?

Alright, I want to convert characters a-z, 0-9 etc to their binary equivalent. And then convert them into an octet, if its possible to go straight from a alphanumeric character to an octet then that is what I would like to do.

I want to create PDU strings for sending sms.

I want to convert characters a-z, 0-9 etc to their binary equivalent.

You don't have to convert anything to anything.

All variables are stored as binary. So to set a variable called v to be an ASCII A character you can say a number of things:- v = 0x41; // using a hex description of the letter v = 65; // using a decimal description of the letter v = 'A'; // using a character description of the letter

All those three methods end up with v containing a binary value of 0100 0001

What you describe as an Septep is simply this bit pattern displayed as a hex value so:- Serial.Print(v,HEX); would print it as the Septep Serial.Print(v,BIN); would print it as binary and so on see:- http://www.arduino.cc/en/Serial/Print

What the above table is talking about is squashing 7 bit ASCII values into 8 bits. Where it talks of adding you should be logical OR ing.

What Mike says about all variables being stored as binary is correct, but for this particular application some conversion is necessary.

Orac - If you read a little further in the document from which you got 'Table 2-10' (www.atmel.com/dyn/resources/prod_documents/doc8016.pdf) you will find a flowchart that outlines the procedure that you need to follow.

Don

Ok I have been experimenting, at the moment I have this:

void setup() {
  Serial.begin(9600);
}

void loop() {
  char sms[] = "hello";
  int i;
  for (i = 0; i < 5; i = i + 1) {
  Serial.println(sms[i], BIN);
  }
  delay(1000);
}

It prints out the binary (septet) equivalent of each character in the array. Now I need to figure out how to convert a septet to 8 bit representation before I send it out the serial port.

Here I already have a 8bit representation of the letter ‘h’, and print it out as HEX which is E8. So now I know that I just need to create the 8bit representation then print it out to the modem as HEX and it should be the octet that it is expecting.

void setup() {
  Serial.begin(9600);
}

void loop() {
  int b = B11101000;
  Serial.println(b, HEX);
 
  delay(1000);
}

Following the flowchart diagram in the Atmel document, this is what I understand from it:

1.) Have an array with the data I want to send in it

2.) Check to see if we have read the last character in the array, if not continue to next character in array, if we have continue with rest of program

3.) Read 2 characters from array, convert into 8bit representation by taking the right most bit of the second character and adding it to the first character. Im not sure if we add it, or just stick it in the left most position of the first character. And how do I do this?

4.) Make octet out of the read character pair, store octet. Im not sure I would have to do this. I could just save the 8bit representation from step 3 into an array and when it comes to sending it out the serial port I will just send it as hex.

5.) array pointer % 8 == 0. I dont quite understand this, but looking at the arduino reference modulo (%) can be used for updating an array.

Is it possible that it goes thru an array of characters, updating each one in the array with its octet value?

Example array:

char smsMsg = “hello”

updates the array with its octet values such as:

E8 32 9B FD 06

I could probably be more confused if you tried just a little-bit harder.

Yeah, it probally dosent help that I dont know what im doing and suck at explaining at what im trying to do.

You do realize that ‘ASCII’ values are only 7-bits in length?
That in memory are 8-bits wide with the hi-bit being 0?
That in memory the ‘string’ “hello” would (shown as ASCII character, hex byte and binary representation) be:

h 0x68 0110 1000
e 0x65 0110 0101
l 0x6C 0110 1100
l 0x6C 0110 1100
o 0x6F 0110 1111
\0 0x00 0000 0000

or as and array of bytes ‘char array = { ‘h’, ‘e’, ‘l’, ‘l’, ‘o’ };’

h 0x68 0110 1000
e 0x65 0110 0101
l 0x6C 0110 1100
l 0x6C 0110 1100
o 0x6F 0110 1111

Yep, I understand that now.

But to convert it into a PDU octet, I need to take the right most bit of the second character and append it to the start of the first character.

According to this diagram:

http://i25.tinypic.com/5eaan7.png

What the perpetrators of Table 2-10 have done is chosen a poor example to illustrate what is being done. They have shown how to convert 5 bytes of data into 5 different bytes of data. Had they started with an example word with eight letters they could have shown how to compress the 8 bytes into 7.

What they are doing is using the fact (mentioned in reply #10) that ASCII codes are only seven bits in length. Instead of padding each seven-bit ASCII code with a ‘0’ they are concatenating the ASCII codes.

Here’s my example for the eight letter word ‘helloabc’.

h --> 1101000
e --> 1100101
l --> 1101100
l --> 1101100
o --> 1101111
a --> 1100001
b --> 1100010
c --> 1100011

Now show these on one line in the order that they would be transmitted, with the least significant bit of the first letter first (like squeezing toothpaste out of a tube).

   c       b       a       o       l       l        e      h
1100011 1100010 1100001 1101111 1101100 1101100 1100101 1101000

Now remove the spaces, keeping the order the same.

11000111100010110000111011111101100110110011001011101000

Now regroup these in bytes, keeping the order the same.

11000111 10001011 00001110 11111101 10011011 00110010 11101000

We have converted the 8 ASCII characters, which normally would require 8 bytes, into 7 bytes of data.

Don

Orac:

... and append it to the start of the first character.

You are overlooking the fact that the data is sent out lsb (least significant bit) first so you are appending it to the end of the first character.

Don

OK, I see I had accidentally "paged" past that chart you'd already posted. My mistake, and now a little thought ...

... although it looks like someone will post something useful before I get back to it!

I find the Atmel App Note flow chart singularly uninformative as a help to program design.

Here’s my take on it.

A naive approach to the scheme outlined in the GSM standard is to handle things one bit at a time.

For each ASCII char:
Shift seven bits out of the right end of the ASCII char into the MSB of an output byte, shifting input and output byte (to the right) each time After seven shifts,get the next input character. Whenever there are eight bits in the output byte, print it and reset the shift counter.

After all ASCII chars have been serviced, if there are any bits left in the output byte, shift them down and print it.

//
// Encode ASCII message to GSM PDU Octets according to
// Section 6.1.2 of GSM 03.38
//
// davekw7x
//

void setup()
{
  Serial.begin(9600);
}

void loop()
{
  /* change #if 1 to #if 0 for longer message */
#if 1
    char asc_msg[] = "hellohello";
#else
    char asc_msg[] = "WAVECOM ASIA PACIFIC LIMITED\r\n"
                     "2nd Floor, Shui On Centre\r\n"
                     "6-8 Harbour Road\r\n"
                     "Wan Chai, HONG KONG";
#endif
    int asc_len = strlen(asc_msg);
    int i, j;
    int pdu_len;
    byte outchar;
    char inchar;
    int outbit;
    /*
       I use sprintf for consistent formatting
       of output hex bytes and for convenience in
       certain other situations. Here's a buffer
       that is certainly large enough to handle
       everything that I use here.
    */
    char buffer[200];
    sprintf(buffer, "ASCII message: %s\n\n", asc_msg);
    Serial.print(buffer);
    Serial.println("ASCII chars:");

    for (i = 0; i < asc_len; i++) {
        sprintf(buffer, "0x%02x", asc_msg[i]);
        Serial.print(buffer);
        if (i % 16 == 15) {
            Serial.println();
        }
        else {
            Serial.print(" ");
        }
    }
    sprintf(buffer,"\nNumber of ASCII chars = %d\n\n", asc_len);
    Serial.print(buffer);
    
    Serial.println("PDU Octets:");
    pdu_len = 0;
    outbit = 0;
    outchar = 0;
    for (i = 0; i < asc_len; i++) {
        inchar = asc_msg[i];
        /* 
           Take lsb of inchar and shift it into msb of outchar
        */

        /*
           Shift seven bits of inchar 
        */
        for (j = 0; j < 7; j++) {
            outchar >>= 1;
            outchar |= (inchar & 1) << 7;
            inchar >>= 1;
            if (++outbit == 8) {
                sprintf(buffer,"0x%02x", outchar);
                Serial.print(buffer);
                if (pdu_len++ % 16 == 15) {
                    Serial.println();
                }
                else {
                    Serial.print(" ");
                  }
                outbit = 0;
                outchar = 0;
            }
        }
    }

    /*
       After shifting all input chars, if there are any bits
       left in the output octet, shift them down to the lsb
       and print it out.
    */
    if (outbit < 8) {
        ++pdu_len;
        while (outbit++ < 8) {
            outchar >>= 1;
        }
        sprintf(buffer,"0x%02x", outchar);
        Serial.print(buffer);
    }

    Serial.println();
    sprintf(buffer, "Number of PDU octets = %d\n\n", pdu_len);
    Serial.print(buffer);
    while(1)
    ;
}

Output:

ASCII message: hellohello

ASCII chars:
0x68 0x65 0x6c 0x6c 0x6f 0x68 0x65 0x6c 0x6c 0x6f 
Number of ASCII chars = 10

PDU Octets:
0xe8 0x32 0x9b 0xfd 0x46 0x97 0xd9 0xec 0x37
Number of PDU octets = 9

Regards,

Dave

[edit]Hmmm...This was posted in response to another Poster who has apparently withdrawn it. I meant no disrespect, and I hope nothing that I said was taken the wrong way. I'll leave this in place, since I do mention a sources of my examples. If it's too long, well, I hate to repeat myself, but sometimes I just can't help it.[/edit]

but it works

Matters of fact about your functions: 1. Wrong count for output octets. See Footnote.

2. Does not comply with the algorithm or example in the Original Poster's table (based on GSM 03..38 and GSM 03.40). See Footnote (again).

Mater of style (opinion): 3. With embedded processors such as used in the Arduino project (the topic of this forum), system resources (RAM, for example) are so very limited that using run-time time-bombs such as the alloc functions gets you booted out of my design review. You will be disparaged. (Not by me---I'm gentle; by my boss---she's tough. Don't get me started...)

Regards,

Dave

Footnote: With the "hello" message, your output has four octets instead of five. Also: Apparently you are shifting the bits MSB first, but the table in the Original Poster's problem (and the GSM PDU packing scheme) shifts them LSB first. (A couple of other questions came up as I scanned your code, but I really didn't try to figure them out.) Bottom line: It does not agree with the simple example in the table that the Original Poster supplied.

My output does, although I didn't show the results of that short example, since I figured people could put it in my sketch on their own and compare with the published results.

Furthermore...

Run your code with "hellohello" as my program showed, and as illustrated in many documentation examples on the web. (See, for example http://www.gsm-modem.de/sms-pdu-mode.html) For those 10 input chars you get eight output octets, not nine, as shown in the link above and in my sample. (And of course your octets are still wrong.)

My example was not intended to be production code, and it really couldn't be, since the Original Poster didn't tell us exactly what was going to be done with the output octets. Instead of using Serial.print() a "real" program (probably) would have called some function (maybe named "spew()") to take care of each output octet as it is generated.

My point was to try to illustrate a way to get the desired compressed output. Not necessarily the best way, but a way.

Furthermore, a way that might actually be used in a "real" Atmega328p project and can be demonstrated on an Arduino board: A complete sketch that gives output verbose enough to let users debug and compare with their more complete programs.

If I use the more lengthy message in my example (change #if 1 to #if 0 at the place indicated in the comments), I get the 83 output octets shown in the example in An introduction to the SMS in PDU mode...

Your program gives 82 bytes, not 83. Furthermore, it's not just a matter of incorrect octets due to using MSB first instead of LSB first. The last eleven output octets of your program are 0x00. Something is drastically wrong.

The reason I give complete programs that people can run on their Arduino systems is so that they can compare results with my examples (the examples from other resources) with whatever method they come up with. (And there are a number of "improvements" that might be worth exploring.)

There are (obviously) other ways of doing it (I have a way that works on multiple bits at a time and is somewhat faster but a little more involved---not very much more code), but I just wanted to show something concrete that, I believe, will illustrate a correct application of the principles.

Note that something like six lines of code in my sketch actually do the compression. Everything else is to give enough reasonably formatted output to make running (and debugging) not terribly distasteful. Some people have told me that my posts are too wordy, and maybe the code is too verbose also, but that's how it goes. Sometimes I just can't help myself.

If there are flaws in the substance of my example I sincerely hope that someone will point them out. I come here to learn, and I learn the most by seeing other people's approach to problem solving. That's why I sometimes go into such great detail: The devil is in the details. Like getting the correct answer.

Thanks for your contribution dave, it has been a great help.

Unfortunately I missed the post that had been removed.

I hope to make a function that can be used to send an SMS with a cellphone or modem. I am currently using a Sony Ericsson T280 which has an external serial port.

Something like this: SendSMS(phonenumber, message to be sent);

send an SMS

If you have read the references you know that there is more (a lot more) to sending an SMS than just encoding the text part. Maybe the few lines of code in my sketch that actually do the compression can lead to a starting point for this little piece of the action.

Regards,

Dave

Yep, Im aware that there is much more to sending an sms than just encoding the message. But those are things that will not change, I will be sending to the same number all the time, so the only other thing to work out is user message length iirc.