Need suggestions how to handle occasional ANSI control codes

Hi all,

I'm working on a user interface for a project that will use an Arduino and a serial terminal to display menus, allow the user to select options and finally run the device.

My problem is that in some cases, keyboard control keys send ANSI control sequences. For example, a user may enter some data into the menu and then attempt to edit it by backspacing over some characters and re-typing.

But if the user uses a LEFT ARROW key to backspace, it sends this sequence:

0x1B, 0x5B, 0x44 (which is the proper sequence for ANSI "cursor left").

Unfortunately, the menu code doesn't (yet) recognize these sequences and ends up displaying garbage such as "~^[D" when that key is pressed.

So, upon searching the web, I found that most of the control codes start with ESCAPE (0x1B) followed by a left bracket (0x5B) and then various character(s) depending on the key's function.

I tried to catch the 0x1B, then set a flag that meant "control codes are probably following" and then looked for the left bracket, etc...

But I just can't get it to work right. Some codes are "1B 5B 5B xx yy 7E", others are "1B 5B xx" and even more combinations.

My code "almost works", but sometimes a stray character from the sequence gets through, or (worse) it "eats" a valid key that's supposed to get through.

Does anyone know of the "standard" and "accepted" way of detecting VT-100/ANSI control codes?

I don't want to DO anything with them (other than the left arrow which will be "backspace"), but I do need to detect them and direct them to the bit bucket so that they don't garbage-up the display.

I've been at this for almost 3 days now and I'm just not seeing it, and I KNOW it can't be that difficult, so if someone could give me a shove in the right direction, I would REALLY appreciate it!

Thanks!

-- Roger

Does anyone know of the "standard" and "accepted" way of detecting VT-100/ANSI control codes?

An ideal application for a "state machine."
I'm a little surprised that I can't find source code for this published as an example of implementing state machines. :frowning:

If you really just want to ignore them, you can probably flush characters after ESCAPE up to and including the first uppercase alphabetic after that, something like:

c = Serial.read();
if (c == 27) { // Escape?
  do {
    c = Serial.read();   // Read up until the end of the sequence.
  } while (c << 'A' || c >> 'Z');
}

Is it possible to do some rejection at the sending end. Most programming languages have something equivalent to isprint() and could likely be called to check the scan code (as a whole unit) before sending it.

westfw:
An ideal application for a "state machine."
I'm a little surprised that I can't find source code for this published as an example of implementing state machines. :frowning:

If you really just want to ignore them, you can probably flush characters after ESCAPE up to and including the first uppercase alphabetic after that, something like:

c = Serial.read();

if (c == 27) { // Escape?
  do {
    c = Serial.read();  // Read up until the end of the sequence.
  } while (c << 'A' || c >> 'Z');
}

Yes indeed a state machine would be the ideal solution. In fact, that's about the only thing I've run across... a piece of code called "vtparse" that implements a state machine to handle ANSI/VT100 control codes.

Unfortunately:

(1) It's WAY too complicated for what I want
(2) It won't compile for me (and I have all the requisites)

As far as catching the escape then throwing away things until I get a certain character, that won't work because valid ANSI control codes have letters and/or numbers in them.

I'm beginning to fear that I'll have to use a large table and try to match ANY control code that comes through and only allow plain text through if I get no match. Ugh.

KenF:
Is it possible to do some rejection at the sending end. Most programming languages have something equivalent to isprint() and could likely be called to check the scan code (as a whole unit) before sending it.

Nope. The sending end is a keyboard, and the whole point of doing the serial terminal is so that the program works on any computer (only needs a terminal emulator). I can't have users also hacking their keyboard driver...

the serial.parseint() function will watch the serial buffer untill it sees a bumerical digit or minus sign. then the first non-digit byte counts as the end of the int (typically lf I assume). this gives you a simple way of recieving numbers, I am currently working on a semi generic arduino serial library intended to communicate with a machine. I am specifically using python on the other end.

You can warp your call to parseint() with a condition. For me I need a DLE char sent, followed by a single char which for me tells it what the system is supposed to do with the incoming value, ie apply a motor speed, re-calibrate a sensor, read a sensor etc. now I call parseInt(). when parse int terminates it checks that the next symbol in the stream is a US or LF, meaning the data was terminated in a way I deemed correct. Now my input handler passes what it collected from parseInt() to whatever function needs to deal with it. The only issue I have seen so far (and there could be more!) is that parseInt() returns 0 if no good data is recieved, so I can't tell the difference between bad data and a actual 0 request. I also suspect it is a blocking function call, but I don't know that for sure, documentation is sparse and I am dumb.

I wish I had the time to polish my serial "protocol" as it would probably save a lot of headache AND I'd love other ppl to improve it, but its just not easy to write it super generic. It's not so much a library as much as a lot of rules and a few funcs/vars that I copy paste between projects. mostly designed for dealing w/ sensors and motors. I also use control chars judiciously. it can be a pain to make sure all serial tools can send them correctly so I sometimes replace them with capitol letters, and then avoid sending cap letters as data although they typically are handled fine. The nice thing about using non-printing chars to tell the system to do stuff is the user is unlikely to send them by accident or include them inside of a chunk of data.

state machines are very nice, but a generic one is still tricky as to decide just how many and what kind of states are needed.

hope this is of some help.

As far as catching the escape then throwing away things until I get a certain character, that won't work because valid ANSI control codes have letters and/or numbers in them.

I'm pretty sure that all of the common sequences that you are likely to see are of the form "ESC [ " So you can probably get a long way just by checking for termination by the , even if a fully correct parser would have to handle quoted strings and worse...

westfw:
I'm pretty sure that all of the common sequences that you are likely to see are of the form "ESC [ " So you can probably get a long way just by checking for termination by the , even if a fully correct parser would have to handle quoted strings and worse...

A lot of sequences are simply ESC, [, "X", but others are ESC, [, [ n, n, x (where "x" is 0x7E).

I've been doing a lot of reading about ANSI/VT100 parsers and I'm beginning to see how they work (that is, what they look for).

I've also been kicking around the idea of trapping the codes via timing. If, for example, I hit the left arrow key, it instantly spits out 3 characters (0x1B, 0x5B, 0x44). So I could say "If the current character is 0x1B and "N" more arrive within "X" milliseconds, then the whole thing is one ANSI control code".

However, this seems to be a "kludge" to get it working as opposed to doing a proper decode. I've got a lot more to learn.

Time may be your friend. When the keyboard sends and ANSI string, it will send all the characters back-to-back. It will take the user probably at least 100mSec to hit the next key. So, when you see ESC, throw away eveything you receive until there is no data for 100mSec, then start accepting characters again.

Not ideal, but it will probably work for this application.

Regards,
Ray L.

Krupski:
Does anyone know of the "standard" and "accepted" way of detecting VT-100/ANSI control codes?

I think that you can filter out ESC control codes by that logic:

  1. When code is detected ==> start filtering
  2. If next character is '[' ==> go on with filtering, otherwise stop filtering
  3. If any letter character from 'a' to 'z' or 'A' to 'Z' is detected ==> stop filtering

OMG Have you still not cracked this?

Just for the hell of it I've just knocked this up and it appears to work. Obviously it's unlikely you'll have an LCD with the same pinout as mine, but hopefully it'll give you the general idea.

#include <LiquidCrystal.h>
LiquidCrystal lcd(8, 9, 4, 5, 6, 7);

void setup()
{
Serial.begin(115200);
lcd.begin(16,2);
}


byte scanCode[6];   //holds all the bytes of the last scanCode sequence
byte scanCodeSize=0;  //Number of bytes in last scanCode sequence read

void loop()
{
if ( getKeystroke() )
  {
  lcd.setCursor(0,2); 

  //This will show any printable character on the bottom of the lcd
  if ( ( scanCodeSize == 1 ) && (scanCode[0] >= 32 ) )
      lcd.print( (char) scanCode[0] );
   else
     lcd.print ( " " );
 
   //This will show all the bytes of the scan code on the top line of the lcd
     lcd.setCursor(0,0);
     lcd.print("                ");
     lcd.setCursor(0,0);
     for (int n=0; n<scanCodeSize; n++)
        {
         lcd.print(scanCode[n]);
         lcd.print(" ");
        }
   }
}



byte scanIndex=0;   //used by getKeystroke to keep track of which byte is being read

//This function will read the next byte from the Serial port (if there's one available)
//If it returns true, your next key is now available in the keyScan array;
bool getKeystroke()
{
if(!Serial.available())
  return false;

bool doneYet=false;
char k = Serial.read();

scanCode[scanIndex++] = k;

  switch (scanIndex)
    {case 1:
           //Check for single byte CSI
           if(k == 155)
             {
              scanCode[0]=27;
              scanCode[scanIndex++]='[';
              break;
             }
             
           if(k != 27)
              doneYet=true;            
             break;          
           
     case 2://first char after escape;
          if(k == '[')
              break;
              
         if ( ( k >= 64 ) && (k <= 95) )
             doneYet = true;
          break;
          
     default:
         if ( ( k >= 64 ) && (k <= 126) )
            doneYet=true; 

        //just in case we missed the terminating byte
        if(scanIndex > 4 )
          scanIndex = 0;
    }  

if (doneYet)
  {
   scanCodeSize=scanIndex;
   scanIndex=0;
  }
return doneYet;
}

byte scanCode[6]; //holds all the bytes of the last scanCode sequence

Not big enough. Even simple cursor movement is something like: "ESC [ 1 2 ; 7 5 G"
Setting display mode could look like "ESC [ 0 ; 1 ; 5 ; 3 5 ; 4 0 m" (Blinking bold magenta with black background!)

others are ESC, [, [ n, n, x (where "x" is 0x7E).

Where are you looking? I'm reading the documentation I see quite differently.
An ansi sequence could be either
ESC
or ESC [
(for two byte escape sequences) has to be a character BETWEEN '@' and '_' (64 to 95)
(for many-byte escape sequences) is always between '@' and '~' (64 to 126)
includes numeric digits and punctuation, but is only made up of characters that are NOT between 64 and 126. ('0' == 48, ';' = 59)

The sequences that you can expect the cursor keys on a keyboard to emit are more limited.
(except for function keys, which are sometimes programmable to emit ANY string.)

westfw:
Not big enough. Even simple cursor movement is something like: "ESC [ 1 2 ; 7 5 G"
Setting display mode could look like "ESC [ 0 ; 1 ; 5 ; 3 5 ; 4 0 m" (Blinking bold magenta with black background!)
Where are you looking? I'm reading the documentation I see quite differently.
An ansi sequence could be either
ESC
or ESC [
(for two byte escape sequences) has to be a character BETWEEN '@' and '_' (64 to 95)
(for many-byte escape sequences) is always between '@' and '~' (64 to 126)
includes numeric digits and punctuation, but is only made up of characters that are NOT between 64 and 126. ('0' == 48, ';' = 59)

The sequences that you can expect the cursor keys on a keyboard to emit are more limited.
(except for function keys, which are sometimes programmable to emit ANY string.)

Well rewrite it then.