New regular expression library released

Well you previously said the input was:

Got it working as you describe below, ok so appreciate your thoughts on the best way to parse this data format:

100,100,150,40 255,154,246,124

So:

R:234,G:342,B:342,T:23

Is hardly in that format.

ahem :blush:

:)

SirNickity: Amazing, Nick. It never even occurred to me to try and pull this off on a micro. Heck of a job there.

Almost all of the credit goes to Roberto Ierusalimschy, the lead programmer of Lua. I just adapted it for the Arduino.

To start, this library is great. I really extends the list of projects now possible with Arduino.

One problem for me is the lack of 'or' pattern matching. Say I am looking for a 'yes' or 'no' answer in the following strings.

"I apologize, but my answer is no" "Yes, you have been approved" "No, that is not allowed"

I would use the pattern "[yes|no]" to match that. The '|' seems to be missing.

Interesting in extending your library?

When using

char buf [100] = "+254.55E";

Why does this work

([%+,%-]?%d+%.?%d*)([e,E])

but not this?

([%+,%-]?%d+%.?%d*)([e,E])?

The only difference is that the first requires there to be one ‘e’ or ‘E’ at the end, while the second makes it optional to have one ‘e’ or ‘E’ at the end. The ‘buf’ should work in both, but it doesn’t…why?

#include <Regexp.h>

// called for each match
void match_callback  (const char * match,          // matching string (not null-terminated)
                      const unsigned int length,   // length of matching string
                      const MatchState & ms)      // MatchState in use (to get captures)
{
char cap [10];   // must be large enough to hold captures
  
  Serial.print ("Matched: ");
  Serial.write ((byte *) match, length);
  Serial.println ();
  
  for (byte i = 0; i < ms.level; i++)
    {
    Serial.print ("Capture "); 
    Serial.print (i, DEC);
    Serial.print (" = ");
    ms.GetCapture (cap, i);
    Serial.println (cap); 
    }  // end of for each capture

}  // end of match_callback 


void setup ()
{
  Serial.begin (115200);
  Serial.println ();
  unsigned long count;

  // what we are searching (the target)
  char buf [100] = "+254.55E";

  // match state object
  MatchState ms (buf);

  // original buffer
  Serial.println (buf);

  // search for three letters followed by a space (two captures)
  count = ms.GlobalMatch ("([%+,%-]?%d+%.?%d*)([e,E])?", match_callback);
  // show results
  Serial.print ("Found ");
  Serial.print (count);            // 8 in this case
  Serial.println (" matches.");
 

}  // end of setup  

void loop () {}

[url=http://www.gammon.com.au/scripts/doc.php?lua=string.find]string.find[/url]: The repetition characters, which can follow a character, class or set, are:

Which implies repetition characters cannot follow a capture group.

Move the question mark inside...

([%+,%-]?%d+%.?%d*)([e,E]?)

Michael75: I would use the pattern "[yes|no]" to match that. The '|' seems to be missing.

Interesting in extending your library?

The library is based on Lua patterns (which don't support the "or" functionality). To extend that would basically be a rewrite, since doing an "or" involves backtracking. I don't think it is warranted on this platform. For simple alternatives like you posted, it would be more sensible to just have a series of "if" tests.

Ps991: When using

char buf [100] = "+254.55E";

Why does this work

([%+,%-]?%d+%.?%d*)([e,E])

but not this?

([%+,%-]?%d+%.?%d*)([e,E])?

The Lua patterns do not support repetition operators on groups (captures), as Coding Badly has guessed.

The simple solution is to get rid of the group (and the comma, I'm not sure what that is doing there).

([%+,%-]?%d+%.?%d*)[eE]?

[quote author=Coding Badly date=1472626163 link=msg=2903257] Which implies repetition characters cannot follow a capture group.

Move the question mark inside...

([%+,%-]?%d+%.?%d*)([e,E]?)

[/quote]

But what if I wanted

([eE]%d+)?

I thought commas were need in a bracket, I guess not, just an assumption Basically what if I want the E# to be optional? I can't put the "?" on the inside because I need the group to be optional, not it's components...

According to reply #26 there is no backtracking so what you want is not available.

But, if you use greedy, this should work...

([%+,%-]?%d+%.?%d*)([eE]?%d?)

You will have to write a bit of code to ensure a number actually follows the "E".

I guess that will work...

I find that pretty silly that Lua would not include that when it can be very useful...

Regardless of what Lua does or does not do, a backtracking regular expression evaluator on a memory limited device is a bad idea.

It's useful alright. But is it fast? Is it compact? These are design trade-offs.

I have just come across this awesome library - being an ardent fan of RegEx this is very much what I like :)

I got a question, though: is it possible to re-use a MatchState object by assigning a different target to it? So just do a "MatchState ms;" once and then do repeated "ms.Target()" calls?

Yes, absolutely.

I don't know how to Thank you, Nick Gammon, for your very very good idea to create regex with Arduino.

I use regex often, expecially for my site.

I need a little help.

If I have:

char buf [100] = "Date: Tue, 12 Feb 2019 18:28:38 +0100";

I want be sure to found it:

char result = ms.Match ("^Date+.18:28");

But I don't remember the right regex and if is the same...

oh well...

I think it can work:

char result = ms.Match ("^Date.+ [0-9][0-9]:[0-9][0-9]");

But... how can I put more ":" character? I mean: often time instead "00:00:00" should be "00.00.00" or another mode...

Thanks

Character set should work...

  [nobbc][:.][/nobbc]

However, that appears to be an RFC header in which case dot separators indicate a non-compliant system. In my experience such things are best ignored.

Hi Coding Badly.

Thank you.

And the best idea for all characters instead numbers and words is [^a-zA-Z0-9]

Or there is something about /WN????

Thank you for this very usefull libarary, I n arduino i want to parse received sms and return the value in the end of sms : sms = dfghhhcg ghhj ghhh hjjj relay_on_10 I want to return 10 in this example. Value =10 Because i want to use it in an other function. Thank you very much for your help.