My Constraint Programming Blog: Crossword construction in MiniZinc using table constraints - a small benchmark on "72 Gecode problems"

Almost all models, wordlists, and files mentioned in this blog post is available at my MiniZinc Crossword page.

Abstract

The method presented here for construction (solving, generating) crosswords is based of "just a bunch" of table constraints representing the words together with a rather simple unicity constraint. After an introduction of the problem and a description of the general approach used, a benchmark of several (72) crossword problem instances is reported between two FlatZinc solvers (Gecode/fz and Chuffed) and the crossword solver distributed with Gecode (written in C++) using element constraints for the intersections and alldifferent constraint for the unicity requirement. We will find that this MiniZinc model and the FlatZinc solvers are competitive and in some case even faster than the Gecode (C++) model.

Introduction

Some weeks ago I started to read Ivan Bratko's "Prolog Programming for Artificial Intelligence" (the very new 4th edition, ISBN: 9780321417466). On page 27f there was a very simple crossword problem:

The problem is to fill this crossword with words:
    
    L1   L2    L3   L4    L5   XXX
    L6   XXX   L7   XXX   L8   XXX
    L9   L10   L11  L12   L13  L14
    L15  XXX   XXX  XXX   L16  XXX

Where the L* are letters to be identified.

and also a couple of words to use:

dog, run, top
five, four, lost, mess, unit
baker, forum, green, super
prolog, vanish, wonder, yellow

One common approach of solving/generating crosswords is to identify the intersections of the words and then using a wordlist for matching these intersections (e.g. the very simple crossword.mzn). This is usually done with element constraints for the intersections and alldifferent constraint to ensure unicity of the selected words.

Instead of this approach I got the idea (not very revolutionary, but still) to try using only "a bunch of" table constraint representing the words (called "segments" in the model) which would then handle the intersections via the naming of the variables. This was implemented in the MiniZinc model crossword_bratko.mzn. The table constraints where manually created by just using the numbers of the "free" letters in the problem (L1, L2, etc). The array of decision variables (L) is in the domain is in the domain 1..numletters (1..26, for "a".."z"). The dummy variable L[0] was later added to handle the fill-outs (and is hard-coded to a single value).

Here is the kernel of the Bratko problem which consists of the two row (across) words, and the three column (down) words:

array[0..N] of var 1..num_letters: L;
constraint
   % across
   table([L[1],L[2],L[3],L[4],L[5]], words5)
   /\
   table([L[9],L[10],L[11],L[12],L[13],L[14]], words6)


   /\ % down
   table([L[1],L[6],L[9],L[15]], words4)
   /\
   table([L[3],L[7],L[11]], words3)
   /\
   table([L[5],L[8],L[13],L[16]], words4);

The second argument of table is a matrix where all words of the same size are collected, e.g. words5 contains all words of length 5. In earlier crossword models I have tend to collect all words in a single - and often huge - matrix with a lot of "0" as fillers to make it a proper matrix.

As mentioned above the intersections are not explicitly identified in the model; instead they are just a consequence that the same letter identifier "happens" to be in a across word and a down word. E.g. L[3] represents the third letter of the first across word, and the first letter of the first down word. The index (3) is just a counter of the "free" letters. In this problem there are 14 free letters, represented by L[1]..L[14].

Also note that in this problem we don't have to care about 1-letter words. In the general model presented below, we will see that 1-letter words must be handled in a special way.

The constraint that all words should be distinct is implemented which the constraint shown below. The matrix segments is the segments (i.e. the words) in the crossword grid where all letters are identified as a unique integer (the index in L), and "0" (zero) is used as a filler so the segments has the same length and can be represented as a matrix. It has two parts: the segments across (rows) and the segments down (columns), and is the same structure as the table constraints.

segments = array2d(1..num_segments, 1..max_length, 
[
   % across
   1, 2, 3, 4, 5, 0,
   9,10,11,12,13,14,

   % down
   1, 6, 9,15, 0, 0,
   3, 7,11, 0, 0, 0,
   5, 8,13,16, 0, 0
]);

% the segments/words should be all distinct
constraint
   forall(I,J in 1..num_segments where I < J) (
      not(forall(K in 1..max_length) (
          L[segments[I,K]] = L[segments[J,K]]
      ))
   )
;

(The structure of the table constraints and the segments are much the same, and I have tried to represent this in a single matrix, but stumbled on the problem how to represent an appropriate wordlist structure. This may very well be fixed in a later version.)

This model has a single solution (given the wordlist show above):

L = [6, 15, 18, 21, 13, 9, 21, 5, 22, 1, 14, 9, 19, 8, 5, 19]

f o r u m *
i * u * e *
v a n i s h
e * * * s *

Generalization of the model and Gecode's 72 crossword problem

Since Bratko's problem was so small, I then wondered how well MiniZinc would do on a variety of larger crossword problems using the same basic approach, for example on the 72 crossword instances in Gecode's crossword.cpp (it is a large file but most of it are definitions of these 72 problem instances). This model uses element constraints by projecting words to their individual letters. In constraint, the MiniZinc model use table constraints for encoding the entire words. One of the objectives of the benchmark is to compare these two approaches.

The 72 problem instances are of different sizes which ranges from some small problems with grids of 5x5 cells to larger 23x23 grids. See the comments in the Gecode model for more details. The problems has been collected from different sources, e.g. Herald Tribune Crosswords and in early studies of backtracking, e.g. M.L Ginsberg Dynamic Backtracking and M.L. Ginsberg et al Search Lessons Learned from Crossword Puzzles. Some of them has also been used in later studies as well, e.g. the paper by Anbulagan and Adi Botea on phase transitions in crossword problems: Crossword Puzzles as a Constraint Problem (PDF).

The problems are represented as a grid in text form. For example the Bratko problem mentioned above would be represented as the following grid , where "_" represents a letter to be identified, and "*" is a blocked cell.

_ _ _ _ _ *
_ * _ * _ *
_ _ _ _ _ _
_ * * * _ *

A larger example is problem the 15x15 problem #10 (from Gecode's crossword.cpp) which is also identified as "15.01, 15 x 15".

_ _ _ _ * _ _ _ _ _ * _ _ _ _
_ _ _ _ * _ _ _ _ _ * _ _ _ _
_ _ _ _ _ _ _ _ _ _ * _ _ _ _
_ _ _ _ _ _ _ * * _ _ _ _ _ _
* * * _ _ _ * _ _ _ _ _ _ * *
_ _ _ _ _ * _ _ _ * _ _ _ _ _
_ _ _ * _ _ _ _ _ _ * _ _ _ _
_ _ _ * _ _ _ _ _ _ _ * _ _ _
_ _ _ _ * _ _ _ _ _ _ * _ _ _
_ _ _ _ _ * _ _ _ * _ _ _ _ _
* * _ _ _ _ _ _ * _ _ _ * * *
_ _ _ _ _ _ * * _ _ _ _ _ _ _
_ _ _ _ * _ _ _ _ _ _ _ _ _ _
_ _ _ _ * _ _ _ _ _ * _ _ _ _
_ _ _ _ * _ _ _ _ _ * _ _ _ _

However, since MiniZinc don't have the facility of parsing this kind of text representation, I wrote some Perl programs to convert a problem instance to a MiniZinc model. The MiniZinc model for problem #10 is crossword3_10.mzn, which contains the problem specific table constraints and segments definitions, much in the same way as in the Bratko model.

Here is one solution (using Gecode's SCOWL wordlist, see below for more about the wordlists):

roar*breed*area
urge*lager*weal
laughingly*also
elegant**aerate
***and*gadget**
acmes*err*osier
bra*ornate*tore
aah*magneto*non
snap*madras*add
heron*gay*idles
**imaged*pea***
vesper**surmise
echo*ambassador
trim*beach*gear
suss*snaky*ears

Note: By construction this approach requires that all 1-letter segments (words) must be distinct. This means that 1-letter segments must be handled with care since these segments are the same when seeing them across and down. More specific this means that the unicity constraint has a special check for 1-letter words:

forall(I,J in 1..num_segments where I < J) (
  if sum(K in 1..max_length) (
     bool2int(segments[I,K] > 0 ) ) = 1
     /\
     segments[I,1] = segments[J,1]
  then
     true 
  else 
    not(forall(K in 1..max_length) (
      L[segments[I,K]] = L[segments[J,K]]
    ))
  endif
)

Structure of the MiniZinc files

The structure of the model are

the general model crossword3.mzn: which includes the wordlist and unicity constraint
a wordlist, which is included in the crossword3.mzn
problem instance, e.g. crossword3_10.mzn: includes the specific data instance and table constraints. This instance file includes crossword3.mzn

The instances can be run as follows:


# Using G12/MiniZinc default solver

$ minizinc -v --no-optimize -s  crossword3_0.mzn



# Using Gecode/fz

$ minizinc -v --no-optimize  -G gecode -s  crossword3_0.mzn -f "fz   -mode stat  -n 1    "

For larger problems, the parameter --no-optimize might be a good choice, otherwise mzn2fzn (the converter to FlatZinc) can take very long time.

More about the plain Gecode model

The (plain) Gecode model crossword.cpp is described in detail chapter 19 in Modeling and Programming with Gecode (PDF), which also include some benchmark results.

Benchmarking

One of the objectives with the benchmark was to see how well the MiniZinc model (together with a FlatZinc solver) would compete with Gecode's crossword model (Gecode's crossword.cpp, available in the Gecode distribution). This model will be named "plain Gecode model" to be able to discern it when also mentioning the Gecode/fz FlatZinc solver.

After doing some preliminary tests of several FlatZinc solvers using different labelings, I selected these two solvers to continue with the full benchmarks of all 72 instances:

Gecode/fz: SVN version (based on the version 3.7.0) per 2011-09-21.
Chuffed: Version compiled 2011-04-17 (no explicit version). Chuffed is a lazy clause solver written by Geoffrey Chu. As of writing it is not yet public available but has been mentioned several times in connection with MiniZinc Challenge where it has shown very impressive results; see MiniZinc challenge 2011 Results and MiniZinc challenge 2011 Results. For more information see this description.

After systematically testing all labelings on some problem instances (#20, #30, and with some wordlists #40), some labelings where selected for the full tests. Note: I looked for a problem instance that could be used as a "proxy" for the complete problem set in order to get the best labelings, but found no instance that was representative enough for this.

Benchmark

The benchmarks consists of testing all 72 problem instances with the following wordlists:

Gecode's SCOWL wordlist (66400 English words), available from MiniZinc Crossword page
Swedish wordlist based on an earlier version of Den stora svenska ordlistan ("The Big Swedish Wordlist"), (388493 Swedish words), available from MiniZinc Crossword page. Note that crossword3.mzn must be changed to handle the national characters "å", "ä", and "ö" (this is commented out in the current model).
73871 English words from /usr/share/dict/words (some words where filtered out, e.g. the one not matching ^[A-Za-z]$), available from MiniZinc Crossword page
80.txt (242606 English words)

Notes:

The time reported is for all 72 problem instances, with a timeout of 10 minutes (600 seconds) per problem instance.
The plain Gecode model was run with default parameters (except for the timeout of 10 minutes and statistics)
The solver stated as chuffed4 below is Chuffed using the single parameter --toggle-vsids (and the timeout)
The solver Gecode/fz was run with the parameter for timeout and statistics.

The reported runs are not all the tests that where made. Instead just the top results are shown.

Which time to compare: Total time, runtime, or solvetime?

The time for running the plain Gecode executable was measured via the Unix time command. This includes everything: parsing of the problem (compiled in the model), setting up the constraints, and then solving the model.

However, the way MiniZinc works is different: the MiniZinc model (.mzn) is first parsed and then translated (flattened) into a FlatZinc format (.fzn) which is the file used by the solvers. This flattening process can take some time: As the number of words in the wordlist increases, the time for flattening increases as a consequence. The time is about a couple of seconds for smaller wordlists to about 30 seconds for the largest Swedish wordlist. Also, it takes some seconds to generate the "fancy" output where the resulting grid is presented (see the output section in the MiniZinc model). This output is used to check that the solution is correct and don't include any duplicate words (this is done via a separate Perl program).

Example: The time for running problem #33 (a grid of size 21x21) using the plain Gecode crossword model with SCOWL wordlist is 44.9 seconds. The total time for the Gecode/fz solver using size_afc_min/indomain_min to solve the same problem takes 59.6 seconds. However, Gecode/fz reports two times in its output: runtime: 49.1s and solvetime: 47.6s. This means that - for this problem instance - there is an overhead of about 5 seconds to generate the FlatZinc file and then output the nice output grid. In comparison, Chuffed using first_fail/indomain_max took 13.22s total time, with a reported 3.41s runtime and 1.56s as solvetime.

In the benchmark below all the three different times for the FlatZinc solvers are reported: total time, runtime, and solvetime (summed for all 72 instances). One way of comparing the performance of the FlatZinc solvers with the plain Gecode model is to use the runtime which would exclude the overhead for flattening to FlatZinc and generating the output. On the other hand, comparing runtime values is really not fair to plain Gecode, since the flattening process might do some relevant optimization of the model. As we will see, some of the solvers has in fact a total time that is better than the plain Gecode model.

The benchmarks below are grouped on the different wordlists, and the ordering is on the runtime. We will see that there is no single FlatZinc solver + labeling that has the best runtimes (or total time) for all wordslists.

Wordlist Gecode SCOWL

This is the wordlist used in the "plain" Gecode model containing 66400 English words.

The total time for plain Gecode crossword (with default parameters) on all 72 problem instances is 57:16.25 minutes. The timeout/failures are for problem: #15, #30, #39, #40, #45, #49.

Problem #40 is not possible to solve with this wordlist since the problem requires two words of length 23. However the wordlist contain no word of this length.

It is interesting that Chuffed's total time (36 minutes 37 seconds) is significant less than plain Gecode's total time (and runtime is of course even faster). We also see that the overhead of flattening to FlatZinc format and then handling the output is about 6 minutes.

Solver	var select val select	#solved	Total time	Runtime	Solvetime
chuffed4	first_fail indomain_max	69	2197.65s (36 minutes and 37 seconds)	1744.44s (29 minutes and 4 seconds)	1681.32s (28 minutes and 1 second)
chuffed4	most_constrained indomain_max	69	2198.98s (36 minutes and 38 seconds)	1746.56s (29 minutes and 6 seconds)	1683.45s (28 minutes and 3 seconds)
chuffed4	max_regret indomain_max	67	3368.83s (56 minutes and 8 seconds)	2917.64s (48 minutes and 37 seconds)	2854.8s (47 minutes and 34 seconds)
fz	size_afc_min indomain_min	66	4386.15s (1 hour, 13 minutes, and 6 seconds)	3931.794s (1 hour, 5 minutes, and 31 seconds)	3883.713s (1 hour, 4 minutes, and 43 seconds)
fz	size_afc_min indomain_split	66	4564.53s (1 hour, 16 minutes, and 4 seconds)	4114.111s (1 hour, 8 minutes, and 34 seconds)	4066.457s (1 hour, 7 minutes, and 46 seconds)

Wordlist 80.txt

The 80.txt wordlist contains 242606 English words.

Total time for plain Gecode: 21:48.28 minutes with 70 instances solved.

Here we see that the best total time for the MiniZinc solver solves all 72 problems in about the same time as the plain Gecode model. The runtime reported is just about 3 minutes which is kind of weird. (The reported solvetime of 17 seconds is even weirder.)

Solver	var select val select	#solved	Total time	Runtime	Solvetime
chuffed4	first_fail indomain_min	72	1324.96s (22 minutes and 4 seconds)	157.42s (2 minutes and 37 seconds)	17.71s (17 seconds)
fz	size_afc_min indomain_max	72	1375.62s (22 minutes and 55 seconds)	214.765s (3 minutes and 34 seconds)	103.848s (1 minute and 43 seconds)
fz	size_afc_min indomain_median	72	1370.48s (22 minutes and 50 seconds)	214.772s (3 minutes and 34 seconds)	103.495s (1 minute and 43 seconds)
chuffed4	max_regret indomain_split	72	1410.68s (23 minutes and 30 seconds)	266.42s (4 minutes and 26 seconds)	126.43s (2 minutes and 6 seconds)

/usr/share/dict/words

This wordlists contains 73871 English words from /usr/share/dict/words. Some words where filtered out from the original file: the one not matching ^[A-Za-z]$.

Total time for plain Gecode: 33:43.19 minutes, 69 instances solved.

This is another benchmark where Chuffed's is the best FlatZinc solver. The total time (34 minutes and 1 second) is almost the same as plain Gecode, with the runtime (25 minutes and 56 seconds) is significantly faster.

Solver	var select val select	#solved	Total time	Runtime	Solvetime
chuffed4	first_fail indomain_max	69	2041.68s (34 minutes and 1 second)	1556.97s (25 minutes and 56 seconds)	1484.82s (24 minutes and 44 seconds)
chuffed4	most_constrained indomain_max	69	2048.91s (34 minutes and 8 seconds)	1563.16s (26 minutes and 3 seconds)	1491.13s (24 minutes and 51 seconds)
chuffed4	first_fail indomain_split	69	2196.82s (36 minutes and 36 seconds)	1712.9s (28 minutes and 32 seconds)	1639.85s (27 minutes and 19 seconds)
chuffed4	first_fail indomain_min	68	2522.89s (42 minutes and 2 seconds)	2045.56s (34 minutes and 5 seconds)	1974.13s (32 minutes and 54 seconds)
fz	size_afc_min indomain_min	68	2563.63s (42 minutes and 43 seconds)	2085.042s (34 minutes and 45 seconds)	2029.719s (33 minutes and 49 seconds)

Swedish wordlist

This wordlist contains 388493 Swedish words.

There is no plain Gecode runtime for this, instead it is just a comparison of the FlatZinc solvers. I wanted to include this in the benchmark for two reasons: to see how/if MiniZinc could handle this large wordlist, and also because I'm quite curious about Swedish solutions for the problems (much because I am a Swede).

The best solver this time is Gecode/fz (using size_afc_min/indomain_min) with a runtime of 3 minutes and a solvetime of 42 seconds. Though the total time is much larger: almost 39 minutes. This means that there is a massive overhead of flattening to FlatZinc and presenting the output.

Solver	var select val select	#solved	Total time	Runtime	Solvetime
fz	size_afc_min indomain_min	72	2330.14s (38 minutes and 50 seconds)	181.685s (3 minutes and 1 second)	42.129s (42 seconds)
fz	size_afc_min indomain_split	72	2349.9s (39 minutes and 9 seconds)	197.006s (3 minutes and 17 seconds)	57.683s (57 seconds)
fz	size_afc_min indomain_max	72	2393.97s (39 minutes and 53 seconds)	255.495s (4 minutes and 15 seconds)	116.09s (1 minute and 56 seconds)
chuffed4	most_constrained indomain_split	72	2415.61s (40 minutes and 15 seconds)	258.14s (4 minutes and 18 seconds)	89.89s (1 minute and 29 seconds)
chuffed4	first_fail indomain_split	72	2413.29s (40 minutes and 13 seconds)	258.3s (4 minutes and 18 seconds)	89.9s (1 minute and 29 seconds)

Summary and conclusions

The objective of the benchmark was to see how well the MiniZinc model would compete with the plain Gecode model. For all wordlists there is at least one FlatZinc solver with a total time that is near plain Gecode's total times, and for one wordlist (SCOWL) there is a solver that is much faster. Comparing the reported runtimes all measurable wordlists has a FlatZinc solver that is faster. For one wordlist (Swedish words) there where no run of the plain Gecode model.

As mentioned above, there is no single FlatZinc solver/labeling that is the best for all wordlist. Comparing just the FlatZinc solvers we see that Chuffed solver (with some labeling) was the best for SCOWL, 80.txt, and /usr/share/dict/words; whereas Gecode/fz was the best for the Swedish wordlist.

In the preliminary tests the best variable selection for Gecode/fz is size_afc_min, though the best value selection is not that clear. For Chuffed there is single variable/value selection combination that dominates, though both first_fail and most_constrained often gave quite good results.

As noted several times before in the CP field, these kind of benchmarks might be misleading and of limited value. The comparison between the plain Gecode model and the MiniZinc model (+ FlatZinc solvers) might be even worse since it compares at the same time:

two different CP systems: compiled C++ code vs MiniZinc eco system
two different approaches: element constraint on intersections vs. table constraint on words.
different time measurements

Still, it is probably not unfair to draw the conclusion that the MiniZinc model and the two tested FlatZinc solvers at least did give the plain Gecode model a match in generating/solving the selected problem instances.

Also, there might be some parameter or labeling that is much better than these tested. This includes - of course - parameters of the plain Gecode model.

Further research

It would be interesting to study how well the table approach would do as a plain Gecode model. It would also be interesting to do a MiniZinc model using a element/alldifferent approach.

System

Here is the system and version used in the benchmark:

Linux Ubuntu 11.04, 64-bit 8 cores (i7 930, 2.80GHz) with 12Gb RAM
Gecode: SVN version (based on the version 3.7.0) per 2011-09-21.
MiniZinc version: Version 1.4
Chuffed version: Version compiled 2011-04-17 (no explicit version).

Thanks

Thanks to Christian Schulte and Guido Tack for some explanations and intuitions. Also, thanks to Geoffrey Chu and Peter Stuckey for the opportunity to test Chuffed.