« 12 more Gecode models. e.g. circuit, clique, and Hidato puzzle | Main | Comet: version 1.2-rev2 released and a facelift of the site »

Regular expressions in Gecode

Using regular expressions for constraint programming is not new. The following examples from the Gecode distribution uses regular expressions in some way (the links are to the Gecode site): I have now started to learn more about Gecode's support for regular expression. The following operations are supported (from Gecode::REG Class Reference:
  • =: Assign to regular expression r.
  • +: Return expression for: this expression followed by r.
  • +=: This expression is followed by r.
  • |: Return expression for: this expression or r.
  • |=: This expression or r.
  • *: Return expression for: this expression arbitrarily often (Kleene star).
  • +: Return expression for: this expression at least once.
  • (unsigned int n, unsigned int m): Return expression for: this expression at least n and at most m times.
  • (unsigned int n): Return expression for: this expression at least n times.
See below for some examples.

Global contiguity

For a simple use of regular expressions to implement a global constraint is global contiguity which is explained is Global constraint catalog as:
Enforce all variables of the VARIABLES collection to be assigned to 0 or 1. In addition, all variables assigned to value 1 appear contiguously.
The model global_contiguity.cpp implements this by the simple regular expression: 0*1*0*, i.e.
REG r0(0), r1(1);
extensional(*this, x, *r0 + *r1 + *r0);
Note that the operators are before the REG element.

(This model also implements global contiguity by a variant of the slide
constraint.)

Generating names via regular expressions

In April 2004 Geoffrey K. Pullum (at Language Log) wondered about the many spellings of the swedish author Henning Mankell in The mysteries of... what's his name?.

I then wrote an Icon program for generating all the different versions of the names, and presented the program and solutions in the (swedish) blog posts: It was a fun problem and I have now implemented a version in Gecode: all_regexp.cpp.

There are two names (regular expressions) which are generated.

Henning Mankell
This uses the regular expression [hm][ea](nk|n|nn)(ing|ell|all), which in Gecode is stated as
extensional(*this, X, 
            (h|m) +                            // [hm]
            (e|a) +                            // [ea]
            ((n+k) | n | (n+n) ) +             // (nk|n|nn)
            ( (i+n+g) | (e+l+l) | (a+l+l))     // (ing|ell|all)
            ); 
In the original both the first and last name was generated; here we just generate the first name (or the last name).

Here are all 36 solutions:
Total: 36 solutions:
hanall
hanell
haning
henall
henell
hening
manall
manell
maning
menall
menell
mening
hankall
hankell
hanking
hannall
hannell
hanning
henkall
henkell
henking
hennall
hennell
henning
mankall
mankell
manking
mannall
mannell
manning
menkall
menkell
menking
mennall
mennell
menning
kjellerstrand
This is my last name and I have seen a lot of misspellings of this name. The regular expression for some of the variants is k(je|ä)ll(er|ar)?(st|b)r?an?d which in Gecode is implemented as:
extensional(*this, X, 
            k +                  // k
            (j+e|auml) +         // (je|ä)
            l+l +                // ll
            ((e+r)|(a+r))(0,1) + // (er|ar)?
            ((s + t)|b) +        // (st|b)
            r(0,1) +             // r?
            a+                   // a
            n(0,1) +             // n?
            d                    // d
            );
Different sizes, collecting all solutions
The model actually loops through many models with different sizes (length of the array) and collects all the solutions in a vector that is printed at the end of the program. Here is a the main function:
int
main(int argc, char* argv[]) {
  SizeOptions opt("AllRegexp");
  opt.solutions(0);
  opt.model(AllRegexp::MODEL_MANKELL);
  opt.model(AllRegexp::MODEL_MANKELL, "mankell");
  opt.model(AllRegexp::MODEL_KJELLERSTRAND, "kjellerstrand");
  opt.parse(argc,argv);
  // default for the mankell model
  int min = 6;
  int max = 7;
  if (opt.model() == AllRegexp::MODEL_KJELLERSTRAND) {
    min = 7;
    max = 13;
  }

  // loop through all the different sizes
  for(int s = min; s <= max ; s++) {
    opt.size(s);
    Example::run(opt);    
  }

  std::cout << "Total: " << all_solutions.size() << " solutions: " << std::endl;
  for(unsigned int i = 0; i < all_solutions.size(); i++) {
    std::cout << all_solutions[i] << std::endl;
  }

  return 0;