Almost all models, wordlists, and files mentioned in this blog post is available at my
MiniZinc Crossword page.
Abstract
The method presented here for construction (solving, generating) crosswords is based of "just a bunch" of
table
constraints representing the words together with a rather simple unicity constraint. After an introduction of the problem and a description of the general approach used, a benchmark of several (72) crossword problem instances is reported between two FlatZinc solvers (Gecode/fz and Chuffed) and the crossword solver distributed with Gecode (written in C++) using
element
constraints for the intersections and
alldifferent
constraint for the unicity requirement. We will find that this MiniZinc model and the FlatZinc solvers are competitive and in some case even faster than the Gecode (C++) model.
Introduction
Some weeks ago I started to read Ivan Bratko's "Prolog Programming for Artificial Intelligence"
(the very new 4th edition, ISBN: 9780321417466). On page 27f there was a very simple crossword problem:
The problem is to fill this crossword with words:
L1 L2 L3 L4 L5 XXX
L6 XXX L7 XXX L8 XXX
L9 L10 L11 L12 L13 L14
L15 XXX XXX XXX L16 XXX
Where the L* are letters to be identified.
and also a couple of words to use:
dog, run, top
five, four, lost, mess, unit
baker, forum, green, super
prolog, vanish, wonder, yellow
One common approach of solving/generating crosswords is to identify the intersections of the words and then using a wordlist for matching these intersections (e.g. the very simple
crossword.mzn). This is usually done with
element
constraints for the intersections and
alldifferent
constraint to ensure unicity of the selected words.
Instead of this approach I got the idea (not very revolutionary, but still) to try using only "a bunch of"
table
constraint representing the words (called "segments" in the model) which would then handle the intersections via the naming of the variables. This was implemented in the MiniZinc model
crossword_bratko.mzn. The
table
constraints where manually created by just using the numbers of the "free" letters in the problem (L1, L2, etc). The array of decision variables (
L
) is in the domain is in the domain
1..numletters
(1..26, for "a".."z"). The dummy variable
L[0]
was later added to handle the fill-outs (and is hard-coded to a single value).
Here is the kernel of the Bratko problem which consists of the two row (across) words, and the three column (down) words:
array[0..N] of var 1..num_letters: L;
constraint
% across
table([L[1],L[2],L[3],L[4],L[5]], words5)
/\
table([L[9],L[10],L[11],L[12],L[13],L[14]], words6)
/\ % down
table([L[1],L[6],L[9],L[15]], words4)
/\
table([L[3],L[7],L[11]], words3)
/\
table([L[5],L[8],L[13],L[16]], words4);
The second argument of
table
is a matrix where all
words of the same size are collected, e.g.
words5
contains all words of length 5.
In earlier crossword models I have tend to collect all words in a single - and often huge - matrix
with a lot of "0" as fillers to make it a proper matrix.
As mentioned above the intersections are not explicitly identified in the model; instead
they are just a consequence that the same letter identifier "happens"
to be in a across word and a down word. E.g.
L[3]
represents the third letter of the
first across word, and the first letter of the first down word. The index (3) is just a counter
of the "free" letters. In this problem there are 14 free letters, represented by L[1]..L[14].
Also note that in this problem we don't have to care about 1-letter words. In the general model
presented below, we will see that 1-letter words must be handled in a special way.
The constraint that all words should be
distinct is implemented
which the constraint shown below. The matrix
segments
is the segments (i.e. the words) in the crossword grid where all letters
are identified as a unique integer (the index in L), and "0" (zero) is used as
a filler so the segments has the same length and can be represented as a matrix. It has two parts:
the segments across (rows) and the segments down (columns), and is the same
structure as the
table
constraints.
segments = array2d(1..num_segments, 1..max_length,
[
% across
1, 2, 3, 4, 5, 0,
9,10,11,12,13,14,
% down
1, 6, 9,15, 0, 0,
3, 7,11, 0, 0, 0,
5, 8,13,16, 0, 0
]);
% the segments/words should be all distinct
constraint
forall(I,J in 1..num_segments where I < J) (
not(forall(K in 1..max_length) (
L[segments[I,K]] = L[segments[J,K]]
))
)
;
(The structure of the
table
constraints and the
segments
are much the same,
and I have tried to represent this in a single matrix, but stumbled on the problem how to represent
an appropriate wordlist structure. This may very well be fixed in a later version.)
This model has a single solution (given the wordlist show above):
L = [6, 15, 18, 21, 13, 9, 21, 5, 22, 1, 14, 9, 19, 8, 5, 19]
f o r u m *
i * u * e *
v a n i s h
e * * * s *
Generalization of the model and Gecode's 72 crossword problem
Since Bratko's problem was so small, I then wondered how well MiniZinc would do on a variety of larger crossword
problems using the same basic approach, for example on the 72 crossword instances in
Gecode's
crossword.cpp (it is a large file but most of it are definitions of these 72 problem instances). This model uses
element
constraints by projecting words to their individual letters. In constraint, the MiniZinc model use
table
constraints for encoding the entire words. One of the objectives of the benchmark is to compare these two approaches.
The 72 problem instances are of different sizes which ranges from some small problems
with grids of 5x5 cells to larger 23x23 grids. See the comments in the Gecode model for more details.
The problems has been collected from different sources, e.g. Herald Tribune Crosswords and in early studies of backtracking, e.g. M.L Ginsberg
Dynamic Backtracking and M.L. Ginsberg et al
Search Lessons Learned
from Crossword Puzzles. Some of them has also been used in later studies as well, e.g. the paper by
Anbulagan and Adi Botea on phase transitions in crossword problems:
Crossword
Puzzles as a Constraint Problem (
PDF).
The problems are represented as a grid in text form. For example the Bratko problem
mentioned above would be represented as the following grid , where "_" represents a letter to
be identified, and "*" is a blocked cell.
_ _ _ _ _ *
_ * _ * _ *
_ _ _ _ _ _
_ * * * _ *
A larger example is problem the 15x15 problem #10 (from Gecode's crossword.cpp) which is also identified as "15.01, 15 x 15".
_ _ _ _ * _ _ _ _ _ * _ _ _ _
_ _ _ _ * _ _ _ _ _ * _ _ _ _
_ _ _ _ _ _ _ _ _ _ * _ _ _ _
_ _ _ _ _ _ _ * * _ _ _ _ _ _
* * * _ _ _ * _ _ _ _ _ _ * *
_ _ _ _ _ * _ _ _ * _ _ _ _ _
_ _ _ * _ _ _ _ _ _ * _ _ _ _
_ _ _ * _ _ _ _ _ _ _ * _ _ _
_ _ _ _ * _ _ _ _ _ _ * _ _ _
_ _ _ _ _ * _ _ _ * _ _ _ _ _
* * _ _ _ _ _ _ * _ _ _ * * *
_ _ _ _ _ _ * * _ _ _ _ _ _ _
_ _ _ _ * _ _ _ _ _ _ _ _ _ _
_ _ _ _ * _ _ _ _ _ * _ _ _ _
_ _ _ _ * _ _ _ _ _ * _ _ _ _
However, since MiniZinc don't have the facility of parsing this kind of text representation, I wrote some
Perl programs to convert a problem instance to a MiniZinc model. The MiniZinc model for problem #10 is
crossword3_10.mzn, which contains the problem specific
table
constraints and
segments
definitions, much in the same way as in the Bratko model.
Here is one solution (using Gecode's SCOWL wordlist, see below for more about the wordlists):
roar*breed*area
urge*lager*weal
laughingly*also
elegant**aerate
***and*gadget**
acmes*err*osier
bra*ornate*tore
aah*magneto*non
snap*madras*add
heron*gay*idles
**imaged*pea***
vesper**surmise
echo*ambassador
trim*beach*gear
suss*snaky*ears
Note: By construction this approach requires that all 1-letter segments (words) must be distinct. This means that 1-letter segments must be handled with care since these segments are the same when seeing them across and down. More specific this means that the unicity constraint has a special check for 1-letter words:
forall(I,J in 1..num_segments where I < J) (
if sum(K in 1..max_length) (
bool2int(segments[I,K] > 0 ) ) = 1
/\
segments[I,1] = segments[J,1]
then
true
else
not(forall(K in 1..max_length) (
L[segments[I,K]] = L[segments[J,K]]
))
endif
)
Structure of the MiniZinc files
The structure of the model are
- the general model crossword3.mzn: which includes the wordlist and unicity constraint
- a wordlist, which is
included
in the crossword3.mzn
- problem instance, e.g. crossword3_10.mzn: includes the specific data instance and
table
constraints. This instance file includes crossword3.mzn
The instances can be run as follows:
# Using G12/MiniZinc default solver
$ minizinc -v --no-optimize -s crossword3_0.mzn
# Using Gecode/fz
$ minizinc -v --no-optimize -G gecode -s crossword3_0.mzn -f "fz -mode stat -n 1 "
For larger problems, the parameter
--no-optimize
might be a good choice, otherwise mzn2fzn (the converter to FlatZinc) can take very long time.
More about the plain Gecode model
The (plain) Gecode model
crossword.cpp is described in detail chapter 19 in
Modeling and Programming with Gecode (PDF), which also include some benchmark results.
Benchmarking
One of the objectives with the benchmark was to see how well the MiniZinc model (together with a FlatZinc solver) would compete with Gecode's crossword model (
Gecode's
crossword.cpp, available in the Gecode distribution). This model will be named "plain Gecode model" to be able to discern it when also mentioning the Gecode/fz FlatZinc solver.
After doing some preliminary tests of several FlatZinc solvers using different labelings, I selected these two solvers to continue with the full benchmarks of all 72 instances:
- Gecode/fz: SVN version (based on the version 3.7.0) per 2011-09-21.
- Chuffed: Version compiled 2011-04-17 (no explicit version). Chuffed is a lazy clause solver written by Geoffrey Chu. As of writing it is not yet public available but has been mentioned several times in connection with MiniZinc Challenge where it has shown very impressive results; see MiniZinc challenge 2011 Results and MiniZinc challenge 2011 Results. For more information see this description.
After systematically testing all labelings on some problem instances (#20, #30, and with some wordlists #40), some labelings where selected for the full tests. Note: I looked for a problem instance that could be used as a "proxy" for the complete problem set in order to get the best labelings, but found no instance that was representative enough for this.
Benchmark
The benchmarks consists of testing all 72 problem instances with the following wordlists:
- Gecode's SCOWL wordlist (66400 English words), available from MiniZinc Crossword page
- Swedish wordlist based on an earlier version of Den stora svenska ordlistan ("The Big Swedish Wordlist"), (388493 Swedish words), available from MiniZinc Crossword page. Note that crossword3.mzn must be changed to handle the national characters "å", "ä", and "ö" (this is commented out in the current model).
- 73871 English words from
/usr/share/dict/words
(some words where filtered out, e.g. the one not matching ^[A-Za-z]$
), available from MiniZinc Crossword page
- 80.txt (242606 English words)
Notes:
- The time reported is for all 72 problem instances, with a timeout of 10 minutes (600 seconds) per problem instance.
- The plain Gecode model was run with default parameters (except for the timeout of 10 minutes and statistics)
- The solver stated as
chuffed4
below is Chuffed using the single parameter --toggle-vsids
(and the timeout)
- The solver Gecode/fz was run with the parameter for timeout and statistics.
The reported runs are not all the tests that where made. Instead just the top results are shown.
Which time to compare: Total time, runtime, or solvetime?
The time for running the plain Gecode executable was measured via the Unix
time
command. This includes everything: parsing of the problem (compiled in the model), setting up the constraints, and then solving the model.
However, the way MiniZinc works is different: the MiniZinc model (.mzn) is first parsed and then translated (flattened) into a FlatZinc format (.fzn) which is the file used by the solvers. This flattening process can take some time: As the number of words in the wordlist increases, the time for flattening increases as a consequence. The time is about a couple of seconds for smaller wordlists to about 30 seconds for the largest Swedish wordlist. Also, it takes some seconds to generate the "fancy" output where the resulting grid is presented (see the
output
section in the MiniZinc model). This output is used to check that the solution is correct and don't include any duplicate words (this is done via a separate Perl program).
Example: The time for running problem #33 (a grid of size 21x21) using the plain Gecode
crossword model with SCOWL wordlist is 44.9 seconds. The
total time for the Gecode/fz solver using size_afc_min/indomain_min to solve the same problem takes 59.6 seconds. However, Gecode/fz reports two times in its output:
runtime: 49.1s and
solvetime: 47.6s. This means that - for this problem instance - there is an overhead of about 5 seconds to generate the FlatZinc file and then output the nice output grid. In comparison, Chuffed using first_fail/indomain_max took 13.22s
total time, with a reported 3.41s
runtime and 1.56s as
solvetime.
In the benchmark below all the three different times for the FlatZinc solvers are reported:
total time,
runtime, and
solvetime (summed for all 72 instances). One way of comparing the performance of the FlatZinc solvers with the plain Gecode model is to use the
runtime which would exclude the overhead for flattening to FlatZinc and generating the output. On the other hand, comparing runtime values is really not fair to plain Gecode, since the flattening process might do some relevant optimization of the model. As we will see, some of the solvers has in fact a
total time that is better than the plain Gecode model.
The benchmarks below are grouped on the different wordlists, and the ordering is on the
runtime. We will see that there is no single FlatZinc solver + labeling that has the best runtimes (or total time) for all wordslists.
Wordlist Gecode SCOWL
This is the wordlist used in the "plain" Gecode model containing 66400 English words.
The total time for plain Gecode crossword (with default parameters) on all 72 problem instances is
57:16.25 minutes. The timeout/failures are for problem: #15, #30, #39, #40, #45, #49.
Problem #40 is not possible to solve with this wordlist since the problem requires two words of length 23. However the wordlist contain no word of this length.
It is interesting that Chuffed's
total time (36 minutes 37 seconds) is significant less than plain Gecode's total time (and
runtime is of course even faster). We also see that the overhead of flattening to FlatZinc format and then handling the output is about 6 minutes.
Solver | var select val select | #solved | Total time | Runtime | Solvetime |
chuffed4 | first_fail indomain_max | 69 | 2197.65s (36 minutes and 37 seconds) | 1744.44s (29 minutes and 4 seconds) | 1681.32s (28 minutes and 1 second) |
chuffed4 | most_constrained indomain_max | 69 | 2198.98s (36 minutes and 38 seconds) | 1746.56s (29 minutes and 6 seconds) | 1683.45s (28 minutes and 3 seconds) |
chuffed4 | max_regret indomain_max | 67 | 3368.83s (56 minutes and 8 seconds) | 2917.64s (48 minutes and 37 seconds) | 2854.8s (47 minutes and 34 seconds) |
fz | size_afc_min indomain_min | 66 | 4386.15s (1 hour, 13 minutes, and 6 seconds) | 3931.794s (1 hour, 5 minutes, and 31 seconds) | 3883.713s (1 hour, 4 minutes, and 43 seconds) |
fz | size_afc_min indomain_split | 66 | 4564.53s (1 hour, 16 minutes, and 4 seconds) | 4114.111s (1 hour, 8 minutes, and 34 seconds) | 4066.457s (1 hour, 7 minutes, and 46 seconds) |
Wordlist 80.txt
The 80.txt wordlist contains 242606 English words.
Total time for plain Gecode:
21:48.28 minutes with 70 instances solved.
Here we see that the best
total time for the MiniZinc solver solves all 72 problems in about the same time as the plain Gecode model. The
runtime reported is just about 3 minutes which is kind of weird. (The reported
solvetime of 17 seconds is even weirder.)
Solver | var select val select | #solved | Total time | Runtime | Solvetime |
chuffed4 | first_fail indomain_min | 72 | 1324.96s (22 minutes and 4 seconds) | 157.42s (2 minutes and 37 seconds) | 17.71s (17 seconds) |
fz | size_afc_min indomain_max | 72 | 1375.62s (22 minutes and 55 seconds) | 214.765s (3 minutes and 34 seconds) | 103.848s (1 minute and 43 seconds) |
fz | size_afc_min indomain_median | 72 | 1370.48s (22 minutes and 50 seconds) | 214.772s (3 minutes and 34 seconds) | 103.495s (1 minute and 43 seconds) |
chuffed4 | max_regret indomain_split | 72 | 1410.68s (23 minutes and 30 seconds) | 266.42s (4 minutes and 26 seconds) | 126.43s (2 minutes and 6 seconds) |
/usr/share/dict/words
This wordlists contains 73871 English words from
/usr/share/dict/words
. Some words where filtered out from the original file: the one not matching
^[A-Za-z]$
.
Total time for plain Gecode:
33:43.19 minutes, 69 instances solved.
This is another benchmark where Chuffed's is the best FlatZinc solver. The
total time (34 minutes and 1 second) is almost the same as plain Gecode, with the
runtime (25 minutes and 56 seconds) is significantly faster.
Solver | var select val select | #solved | Total time | Runtime | Solvetime |
chuffed4 | first_fail indomain_max | 69 | 2041.68s (34 minutes and 1 second) | 1556.97s (25 minutes and 56 seconds) | 1484.82s (24 minutes and 44 seconds) |
chuffed4 | most_constrained indomain_max | 69 | 2048.91s (34 minutes and 8 seconds) | 1563.16s (26 minutes and 3 seconds) | 1491.13s (24 minutes and 51 seconds) |
chuffed4 | first_fail indomain_split | 69 | 2196.82s (36 minutes and 36 seconds) | 1712.9s (28 minutes and 32 seconds) | 1639.85s (27 minutes and 19 seconds) |
chuffed4 | first_fail indomain_min | 68 | 2522.89s (42 minutes and 2 seconds) | 2045.56s (34 minutes and 5 seconds) | 1974.13s (32 minutes and 54 seconds) |
fz | size_afc_min indomain_min | 68 | 2563.63s (42 minutes and 43 seconds) | 2085.042s (34 minutes and 45 seconds) | 2029.719s (33 minutes and 49 seconds) |
Swedish wordlist
This wordlist contains 388493 Swedish words.
There is no plain Gecode runtime for this, instead it is just a comparison of the FlatZinc solvers. I wanted to include this in the benchmark for two reasons: to see how/if MiniZinc could handle this large wordlist, and also because I'm quite curious about Swedish solutions for the problems (much because I am a Swede).
The best solver this time is Gecode/fz (using size_afc_min/indomain_min) with a
runtime of 3 minutes and a
solvetime of 42 seconds. Though the
total time is much larger: almost 39 minutes. This means that there is a massive overhead of flattening to FlatZinc and presenting the output.
Solver | var select val select | #solved | Total time | Runtime | Solvetime |
fz | size_afc_min indomain_min | 72 | 2330.14s (38 minutes and 50 seconds) | 181.685s (3 minutes and 1 second) | 42.129s (42 seconds) |
fz | size_afc_min indomain_split | 72 | 2349.9s (39 minutes and 9 seconds) | 197.006s (3 minutes and 17 seconds) | 57.683s (57 seconds) |
fz | size_afc_min indomain_max | 72 | 2393.97s (39 minutes and 53 seconds) | 255.495s (4 minutes and 15 seconds) | 116.09s (1 minute and 56 seconds) |
chuffed4 | most_constrained indomain_split | 72 | 2415.61s (40 minutes and 15 seconds) | 258.14s (4 minutes and 18 seconds) | 89.89s (1 minute and 29 seconds) |
chuffed4 | first_fail indomain_split | 72 | 2413.29s (40 minutes and 13 seconds) | 258.3s (4 minutes and 18 seconds) | 89.9s (1 minute and 29 seconds) |
Summary and conclusions
The objective of the benchmark was to see how well the MiniZinc model would compete with the plain Gecode model. For all wordlists there is at least one FlatZinc solver with a
total time that is near plain Gecode's total times, and for one wordlist (SCOWL) there is a solver that is much faster. Comparing the reported
runtimes all measurable wordlists has a FlatZinc solver that is faster. For one wordlist (Swedish words) there where no run of the plain Gecode model.
As mentioned above, there is no single FlatZinc solver/labeling that is the best for all wordlist. Comparing just the FlatZinc solvers we see that Chuffed solver (with some labeling) was the best for SCOWL, 80.txt, and /usr/share/dict/words; whereas Gecode/fz was the best for the Swedish wordlist.
In the preliminary tests the best variable selection for Gecode/fz is
size_afc_min
, though the best value selection is not that clear. For Chuffed there is single variable/value selection combination that dominates, though both
first_fail
and
most_constrained
often gave quite good results.
As noted several times before in the CP field, these kind of benchmarks might be misleading and of limited value. The comparison between the plain Gecode model and the MiniZinc model (+ FlatZinc solvers) might be even worse since it compares at the same time:
- two different CP systems: compiled C++ code vs MiniZinc eco system
- two different approaches:
element
constraint on intersections vs. table
constraint on words.
- different time measurements
Still, it is probably not unfair to draw the conclusion that the MiniZinc model and the two tested FlatZinc solvers at least did give the plain Gecode model a match in generating/solving the selected problem instances.
Also, there might be some parameter or labeling that is much better than these tested. This includes - of course - parameters of the plain Gecode model.
Further research
It would be interesting to study how well the
table
approach would do as a plain Gecode model. It would also be interesting to do a MiniZinc model using a
element/alldifferent
approach.
System
Here is the system and version used in the benchmark:
- Linux Ubuntu 11.04, 64-bit 8 cores (i7 930, 2.80GHz) with 12Gb RAM
- Gecode: SVN version (based on the version 3.7.0) per 2011-09-21.
- MiniZinc version: Version 1.4
- Chuffed version: Version compiled 2011-04-17 (no explicit version).
Thanks
Thanks to Christian Schulte and Guido Tack for some explanations and intuitions. Also, thanks to Geoffrey Chu and Peter Stuckey for the opportunity to test Chuffed.