Regular expressions in Gecode
Using regular expressions for constraint programming is not new. The following examples from the Gecode distribution uses regular expressions in some way (the links are to the Gecode site):
I have now started to learn more about Gecode's support for regular expression. The following operations are supported (from Gecode::REG Class Reference:
I then wrote an Icon program for generating all the different versions of the names, and presented the program and solutions in the (swedish) blog posts:
There are two names (regular expressions) which are generated.
Henning Mankell
This uses the regular expression
Here are all 36 solutions:
This is my last name and I have seen a lot of misspellings of this name. The regular expression for some of the variants is
The model actually loops through many models with different sizes (length of the array) and collects all the solutions in a vector that is printed at the end of the program. Here is a the
-
=
: Assign to regular expression r. +
: Return expression for: this expression followed by r.-
+=
: This expression is followed by r. -
|
: Return expression for: this expression or r. -
|=
: This expression or r. -
*
: Return expression for: this expression arbitrarily often (Kleene star). -
+
: Return expression for: this expression at least once. -
(unsigned int n, unsigned int m)
: Return expression for: this expression at least n and at most m times. -
(unsigned int n)
: Return expression for: this expression at least n times.
Global contiguity
For a simple use of regular expressions to implement a global constraint isglobal contiguity which is explained is Global constraint catalog as:
Enforce all variables of the VARIABLES collection to be assigned to
0 or 1. In addition, all variables assigned to value 1 appear contiguously.
The model global_contiguity.cpp implements this by the simple regular expression: 0*1*0*
, i.e.
REG r0(0), r1(1);
extensional(*this, x, *r0 + *r1 + *r0);
Note that the operators are before the REG element.
(This model also implements global contiguity
by a variant of the slide
constraint.)
Generating names via regular expressions
In April 2004 Geoffrey K. Pullum (at Language Log) wondered about the many spellings of the swedish author Henning Mankell in The mysteries of... what's his name?.I then wrote an Icon program for generating all the different versions of the names, and presented the program and solutions in the (swedish) blog posts:
- Programspråket Icon och stavningen av "Henning Mankell" (Google translated The programming language Icon, and the spelling of "Henning Mankell")
- Fortsatt stavning av "Henning Mankell". Samt lite om agrep (Translated: Continued spelling of "Henning Mankell". Samt lite om agrep And a little about AGREP)
- Skapa strängar från reguljära uttryck - eller: Tystnaden de senaste dagarna (Translated: Create strings from regular expressions - or: The silence in recent days)
There are two names (regular expressions) which are generated.
Henning Mankell
This uses the regular expression
[hm][ea](nk|n|nn)(ing|ell|all)
, which in Gecode is stated as
extensional(*this, X, (h|m) + // [hm] (e|a) + // [ea] ((n+k) | n | (n+n) ) + // (nk|n|nn) ( (i+n+g) | (e+l+l) | (a+l+l)) // (ing|ell|all) );In the original both the first and last name was generated; here we just generate the first name (or the last name).
Here are all 36 solutions:
Total: 36 solutions: hanall hanell haning henall henell hening manall manell maning menall menell mening hankall hankell hanking hannall hannell hanning henkall henkell henking hennall hennell henning mankall mankell manking mannall mannell manning menkall menkell menking mennall mennell menningkjellerstrand
This is my last name and I have seen a lot of misspellings of this name. The regular expression for some of the variants is
k(je|ä)ll(er|ar)?(st|b)r?an?d
which in Gecode is implemented as:
extensional(*this, X, k + // k (j+e|auml) + // (je|ä) l+l + // ll ((e+r)|(a+r))(0,1) + // (er|ar)? ((s + t)|b) + // (st|b) r(0,1) + // r? a+ // a n(0,1) + // n? d // d );Different sizes, collecting all solutions
The model actually loops through many models with different sizes (length of the array) and collects all the solutions in a vector that is printed at the end of the program. Here is a the
main
function:
int main(int argc, char* argv[]) { SizeOptions opt("AllRegexp"); opt.solutions(0); opt.model(AllRegexp::MODEL_MANKELL); opt.model(AllRegexp::MODEL_MANKELL, "mankell"); opt.model(AllRegexp::MODEL_KJELLERSTRAND, "kjellerstrand"); opt.parse(argc,argv); // default for the mankell model int min = 6; int max = 7; if (opt.model() == AllRegexp::MODEL_KJELLERSTRAND) { min = 7; max = 13; } // loop through all the different sizes for(int s = min; s <= max ; s++) { opt.size(s); Example::run(opt); } std::cout << "Total: " << all_solutions.size() << " solutions: " << std::endl; for(unsigned int i = 0; i < all_solutions.size(); i++) { std::cout << all_solutions[i] << std::endl; } return 0;