Main

October 04, 2011

Crossword construction in MiniZinc using table constraints - a small benchmark on "72 Gecode problems"

Almost all models, wordlists, and files mentioned in this blog post is available at my MiniZinc Crossword page.

Abstract

The method presented here for construction (solving, generating) crosswords is based of "just a bunch" of table constraints representing the words together with a rather simple unicity constraint. After an introduction of the problem and a description of the general approach used, a benchmark of several (72) crossword problem instances is reported between two FlatZinc solvers (Gecode/fz and Chuffed) and the crossword solver distributed with Gecode (written in C++) using element constraints for the intersections and alldifferent constraint for the unicity requirement. We will find that this MiniZinc model and the FlatZinc solvers are competitive and in some case even faster than the Gecode (C++) model.

Introduction

Some weeks ago I started to read Ivan Bratko's "Prolog Programming for Artificial Intelligence" (the very new 4th edition, ISBN: 9780321417466). On page 27f there was a very simple crossword problem:
The problem is to fill this crossword with words:
    
    L1   L2    L3   L4    L5   XXX
    L6   XXX   L7   XXX   L8   XXX
    L9   L10   L11  L12   L13  L14
    L15  XXX   XXX  XXX   L16  XXX

Where the L* are letters to be identified.
and also a couple of words to use:
dog, run, top
five, four, lost, mess, unit
baker, forum, green, super
prolog, vanish, wonder, yellow
One common approach of solving/generating crosswords is to identify the intersections of the words and then using a wordlist for matching these intersections (e.g. the very simple crossword.mzn). This is usually done with element constraints for the intersections and alldifferent constraint to ensure unicity of the selected words.

Instead of this approach I got the idea (not very revolutionary, but still) to try using only "a bunch of" table constraint representing the words (called "segments" in the model) which would then handle the intersections via the naming of the variables. This was implemented in the MiniZinc model crossword_bratko.mzn. The table constraints where manually created by just using the numbers of the "free" letters in the problem (L1, L2, etc). The array of decision variables (L) is in the domain is in the domain 1..numletters (1..26, for "a".."z"). The dummy variable L[0] was later added to handle the fill-outs (and is hard-coded to a single value).

Here is the kernel of the Bratko problem which consists of the two row (across) words, and the three column (down) words:
array[0..N] of var 1..num_letters: L;
constraint
   % across
   table([L[1],L[2],L[3],L[4],L[5]], words5)
   /\
   table([L[9],L[10],L[11],L[12],L[13],L[14]], words6)


   /\ % down
   table([L[1],L[6],L[9],L[15]], words4)
   /\
   table([L[3],L[7],L[11]], words3)
   /\
   table([L[5],L[8],L[13],L[16]], words4);
The second argument of table is a matrix where all words of the same size are collected, e.g. words5 contains all words of length 5. In earlier crossword models I have tend to collect all words in a single - and often huge - matrix with a lot of "0" as fillers to make it a proper matrix.

As mentioned above the intersections are not explicitly identified in the model; instead they are just a consequence that the same letter identifier "happens" to be in a across word and a down word. E.g. L[3] represents the third letter of the first across word, and the first letter of the first down word. The index (3) is just a counter of the "free" letters. In this problem there are 14 free letters, represented by L[1]..L[14].

Also note that in this problem we don't have to care about 1-letter words. In the general model presented below, we will see that 1-letter words must be handled in a special way.

The constraint that all words should be distinct is implemented which the constraint shown below. The matrix segments is the segments (i.e. the words) in the crossword grid where all letters are identified as a unique integer (the index in L), and "0" (zero) is used as a filler so the segments has the same length and can be represented as a matrix. It has two parts: the segments across (rows) and the segments down (columns), and is the same structure as the table constraints.
segments = array2d(1..num_segments, 1..max_length, 
[
   % across
   1, 2, 3, 4, 5, 0,
   9,10,11,12,13,14,

   % down
   1, 6, 9,15, 0, 0,
   3, 7,11, 0, 0, 0,
   5, 8,13,16, 0, 0
]);

% the segments/words should be all distinct
constraint
   forall(I,J in 1..num_segments where I < J) (
      not(forall(K in 1..max_length) (
          L[segments[I,K]] = L[segments[J,K]]
      ))
   )
;
(The structure of the table constraints and the segments are much the same, and I have tried to represent this in a single matrix, but stumbled on the problem how to represent an appropriate wordlist structure. This may very well be fixed in a later version.)

This model has a single solution (given the wordlist show above):
L = [6, 15, 18, 21, 13, 9, 21, 5, 22, 1, 14, 9, 19, 8, 5, 19]

f o r u m *
i * u * e *
v a n i s h
e * * * s *

Generalization of the model and Gecode's 72 crossword problem

Since Bratko's problem was so small, I then wondered how well MiniZinc would do on a variety of larger crossword problems using the same basic approach, for example on the 72 crossword instances in Gecode's crossword.cpp (it is a large file but most of it are definitions of these 72 problem instances). This model uses element constraints by projecting words to their individual letters. In constraint, the MiniZinc model use table constraints for encoding the entire words. One of the objectives of the benchmark is to compare these two approaches.

The 72 problem instances are of different sizes which ranges from some small problems with grids of 5x5 cells to larger 23x23 grids. See the comments in the Gecode model for more details. The problems has been collected from different sources, e.g. Herald Tribune Crosswords and in early studies of backtracking, e.g. M.L Ginsberg Dynamic Backtracking and M.L. Ginsberg et al Search Lessons Learned from Crossword Puzzles. Some of them has also been used in later studies as well, e.g. the paper by Anbulagan and Adi Botea on phase transitions in crossword problems: Crossword Puzzles as a Constraint Problem (PDF).

The problems are represented as a grid in text form. For example the Bratko problem mentioned above would be represented as the following grid , where "_" represents a letter to be identified, and "*" is a blocked cell.
_ _ _ _ _ *
_ * _ * _ *
_ _ _ _ _ _
_ * * * _ *
A larger example is problem the 15x15 problem #10 (from Gecode's crossword.cpp) which is also identified as "15.01, 15 x 15".
_ _ _ _ * _ _ _ _ _ * _ _ _ _
_ _ _ _ * _ _ _ _ _ * _ _ _ _
_ _ _ _ _ _ _ _ _ _ * _ _ _ _
_ _ _ _ _ _ _ * * _ _ _ _ _ _
* * * _ _ _ * _ _ _ _ _ _ * *
_ _ _ _ _ * _ _ _ * _ _ _ _ _
_ _ _ * _ _ _ _ _ _ * _ _ _ _
_ _ _ * _ _ _ _ _ _ _ * _ _ _
_ _ _ _ * _ _ _ _ _ _ * _ _ _
_ _ _ _ _ * _ _ _ * _ _ _ _ _
* * _ _ _ _ _ _ * _ _ _ * * *
_ _ _ _ _ _ * * _ _ _ _ _ _ _
_ _ _ _ * _ _ _ _ _ _ _ _ _ _
_ _ _ _ * _ _ _ _ _ * _ _ _ _
_ _ _ _ * _ _ _ _ _ * _ _ _ _
However, since MiniZinc don't have the facility of parsing this kind of text representation, I wrote some Perl programs to convert a problem instance to a MiniZinc model. The MiniZinc model for problem #10 is crossword3_10.mzn, which contains the problem specific table constraints and segments definitions, much in the same way as in the Bratko model.

Here is one solution (using Gecode's SCOWL wordlist, see below for more about the wordlists):
roar*breed*area
urge*lager*weal
laughingly*also
elegant**aerate
***and*gadget**
acmes*err*osier
bra*ornate*tore
aah*magneto*non
snap*madras*add
heron*gay*idles
**imaged*pea***
vesper**surmise
echo*ambassador
trim*beach*gear
suss*snaky*ears


Note: By construction this approach requires that all 1-letter segments (words) must be distinct. This means that 1-letter segments must be handled with care since these segments are the same when seeing them across and down. More specific this means that the unicity constraint has a special check for 1-letter words:
forall(I,J in 1..num_segments where I < J) (
  if sum(K in 1..max_length) (
     bool2int(segments[I,K] > 0 ) ) = 1
     /\
     segments[I,1] = segments[J,1]
  then
     true 
  else 
    not(forall(K in 1..max_length) (
      L[segments[I,K]] = L[segments[J,K]]
    ))
  endif
)

Structure of the MiniZinc files

The structure of the model are
  • the general model crossword3.mzn: which includes the wordlist and unicity constraint
  • a wordlist, which is included in the crossword3.mzn
  • problem instance, e.g. crossword3_10.mzn: includes the specific data instance and table constraints. This instance file includes crossword3.mzn
The instances can be run as follows:
# Using G12/MiniZinc default solver
$ minizinc -v --no-optimize -s crossword3_0.mzn

# Using Gecode/fz
$ minizinc -v --no-optimize -G gecode -s crossword3_0.mzn -f "fz -mode stat -n 1 "

For larger problems, the parameter --no-optimize might be a good choice, otherwise mzn2fzn (the converter to FlatZinc) can take very long time.

More about the plain Gecode model

The (plain) Gecode model crossword.cpp is described in detail chapter 19 in Modeling and Programming with Gecode (PDF), which also include some benchmark results.

Benchmarking

One of the objectives with the benchmark was to see how well the MiniZinc model (together with a FlatZinc solver) would compete with Gecode's crossword model (Gecode's crossword.cpp, available in the Gecode distribution). This model will be named "plain Gecode model" to be able to discern it when also mentioning the Gecode/fz FlatZinc solver.

After doing some preliminary tests of several FlatZinc solvers using different labelings, I selected these two solvers to continue with the full benchmarks of all 72 instances:
  • Gecode/fz: SVN version (based on the version 3.7.0) per 2011-09-21.
  • Chuffed: Version compiled 2011-04-17 (no explicit version). Chuffed is a lazy clause solver written by Geoffrey Chu. As of writing it is not yet public available but has been mentioned several times in connection with MiniZinc Challenge where it has shown very impressive results; see MiniZinc challenge 2011 Results and MiniZinc challenge 2011 Results. For more information see this description.
After systematically testing all labelings on some problem instances (#20, #30, and with some wordlists #40), some labelings where selected for the full tests. Note: I looked for a problem instance that could be used as a "proxy" for the complete problem set in order to get the best labelings, but found no instance that was representative enough for this.

Benchmark

The benchmarks consists of testing all 72 problem instances with the following wordlists:
  • Gecode's SCOWL wordlist (66400 English words), available from MiniZinc Crossword page
  • Swedish wordlist based on an earlier version of Den stora svenska ordlistan ("The Big Swedish Wordlist"), (388493 Swedish words), available from MiniZinc Crossword page. Note that crossword3.mzn must be changed to handle the national characters "å", "ä", and "ö" (this is commented out in the current model).
  • 73871 English words from /usr/share/dict/words (some words where filtered out, e.g. the one not matching ^[A-Za-z]$), available from MiniZinc Crossword page
  • 80.txt (242606 English words)
Notes:
  • The time reported is for all 72 problem instances, with a timeout of 10 minutes (600 seconds) per problem instance.
  • The plain Gecode model was run with default parameters (except for the timeout of 10 minutes and statistics)
  • The solver stated as chuffed4 below is Chuffed using the single parameter --toggle-vsids (and the timeout)
  • The solver Gecode/fz was run with the parameter for timeout and statistics.
The reported runs are not all the tests that where made. Instead just the top results are shown.

Which time to compare: Total time, runtime, or solvetime?

The time for running the plain Gecode executable was measured via the Unix time command. This includes everything: parsing of the problem (compiled in the model), setting up the constraints, and then solving the model.

However, the way MiniZinc works is different: the MiniZinc model (.mzn) is first parsed and then translated (flattened) into a FlatZinc format (.fzn) which is the file used by the solvers. This flattening process can take some time: As the number of words in the wordlist increases, the time for flattening increases as a consequence. The time is about a couple of seconds for smaller wordlists to about 30 seconds for the largest Swedish wordlist. Also, it takes some seconds to generate the "fancy" output where the resulting grid is presented (see the output section in the MiniZinc model). This output is used to check that the solution is correct and don't include any duplicate words (this is done via a separate Perl program).

Example: The time for running problem #33 (a grid of size 21x21) using the plain Gecode crossword model with SCOWL wordlist is 44.9 seconds. The total time for the Gecode/fz solver using size_afc_min/indomain_min to solve the same problem takes 59.6 seconds. However, Gecode/fz reports two times in its output: runtime: 49.1s and solvetime: 47.6s. This means that - for this problem instance - there is an overhead of about 5 seconds to generate the FlatZinc file and then output the nice output grid. In comparison, Chuffed using first_fail/indomain_max took 13.22s total time, with a reported 3.41s runtime and 1.56s as solvetime.

In the benchmark below all the three different times for the FlatZinc solvers are reported: total time, runtime, and solvetime (summed for all 72 instances). One way of comparing the performance of the FlatZinc solvers with the plain Gecode model is to use the runtime which would exclude the overhead for flattening to FlatZinc and generating the output. On the other hand, comparing runtime values is really not fair to plain Gecode, since the flattening process might do some relevant optimization of the model. As we will see, some of the solvers has in fact a total time that is better than the plain Gecode model.

The benchmarks below are grouped on the different wordlists, and the ordering is on the runtime. We will see that there is no single FlatZinc solver + labeling that has the best runtimes (or total time) for all wordslists.

Wordlist Gecode SCOWL

This is the wordlist used in the "plain" Gecode model containing 66400 English words.

The total time for plain Gecode crossword (with default parameters) on all 72 problem instances is 57:16.25 minutes. The timeout/failures are for problem: #15, #30, #39, #40, #45, #49.

Problem #40 is not possible to solve with this wordlist since the problem requires two words of length 23. However the wordlist contain no word of this length.

It is interesting that Chuffed's total time (36 minutes 37 seconds) is significant less than plain Gecode's total time (and runtime is of course even faster). We also see that the overhead of flattening to FlatZinc format and then handling the output is about 6 minutes.
Solvervar select
val select
#solvedTotal timeRuntimeSolvetime
chuffed4first_fail
indomain_max
69 2197.65s (36 minutes and 37 seconds)1744.44s (29 minutes and 4 seconds)1681.32s (28 minutes and 1 second)
chuffed4most_constrained
indomain_max
69 2198.98s (36 minutes and 38 seconds)1746.56s (29 minutes and 6 seconds)1683.45s (28 minutes and 3 seconds)
chuffed4max_regret
indomain_max
67 3368.83s (56 minutes and 8 seconds)2917.64s (48 minutes and 37 seconds)2854.8s (47 minutes and 34 seconds)
fzsize_afc_min
indomain_min
66 4386.15s (1 hour, 13 minutes, and 6 seconds)3931.794s (1 hour, 5 minutes, and 31 seconds)3883.713s (1 hour, 4 minutes, and 43 seconds)
fzsize_afc_min
indomain_split
66 4564.53s (1 hour, 16 minutes, and 4 seconds)4114.111s (1 hour, 8 minutes, and 34 seconds)4066.457s (1 hour, 7 minutes, and 46 seconds)

Wordlist 80.txt

The 80.txt wordlist contains 242606 English words.

Total time for plain Gecode: 21:48.28 minutes with 70 instances solved.

Here we see that the best total time for the MiniZinc solver solves all 72 problems in about the same time as the plain Gecode model. The runtime reported is just about 3 minutes which is kind of weird. (The reported solvetime of 17 seconds is even weirder.)
Solvervar select
val select
#solvedTotal timeRuntimeSolvetime
chuffed4first_fail
indomain_min
72 1324.96s (22 minutes and 4 seconds)157.42s (2 minutes and 37 seconds)17.71s (17 seconds)
fzsize_afc_min
indomain_max
72 1375.62s (22 minutes and 55 seconds)214.765s (3 minutes and 34 seconds)103.848s (1 minute and 43 seconds)
fzsize_afc_min
indomain_median
72 1370.48s (22 minutes and 50 seconds)214.772s (3 minutes and 34 seconds)103.495s (1 minute and 43 seconds)
chuffed4max_regret
indomain_split
72 1410.68s (23 minutes and 30 seconds)266.42s (4 minutes and 26 seconds)126.43s (2 minutes and 6 seconds)

/usr/share/dict/words

This wordlists contains 73871 English words from /usr/share/dict/words. Some words where filtered out from the original file: the one not matching ^[A-Za-z]$.

Total time for plain Gecode: 33:43.19 minutes, 69 instances solved.

This is another benchmark where Chuffed's is the best FlatZinc solver. The total time (34 minutes and 1 second) is almost the same as plain Gecode, with the runtime (25 minutes and 56 seconds) is significantly faster.
Solvervar select
val select
#solvedTotal timeRuntimeSolvetime
chuffed4first_fail
indomain_max
69 2041.68s (34 minutes and 1 second)1556.97s (25 minutes and 56 seconds)1484.82s (24 minutes and 44 seconds)
chuffed4most_constrained
indomain_max
69 2048.91s (34 minutes and 8 seconds)1563.16s (26 minutes and 3 seconds)1491.13s (24 minutes and 51 seconds)
chuffed4first_fail
indomain_split
69 2196.82s (36 minutes and 36 seconds)1712.9s (28 minutes and 32 seconds)1639.85s (27 minutes and 19 seconds)
chuffed4first_fail
indomain_min
68 2522.89s (42 minutes and 2 seconds)2045.56s (34 minutes and 5 seconds)1974.13s (32 minutes and 54 seconds)
fzsize_afc_min
indomain_min
68 2563.63s (42 minutes and 43 seconds)2085.042s (34 minutes and 45 seconds)2029.719s (33 minutes and 49 seconds)

Swedish wordlist

This wordlist contains 388493 Swedish words.

There is no plain Gecode runtime for this, instead it is just a comparison of the FlatZinc solvers. I wanted to include this in the benchmark for two reasons: to see how/if MiniZinc could handle this large wordlist, and also because I'm quite curious about Swedish solutions for the problems (much because I am a Swede).

The best solver this time is Gecode/fz (using size_afc_min/indomain_min) with a runtime of 3 minutes and a solvetime of 42 seconds. Though the total time is much larger: almost 39 minutes. This means that there is a massive overhead of flattening to FlatZinc and presenting the output.
Solvervar select
val select
#solvedTotal timeRuntimeSolvetime
fzsize_afc_min
indomain_min
72 2330.14s (38 minutes and 50 seconds)181.685s (3 minutes and 1 second)42.129s (42 seconds)
fzsize_afc_min
indomain_split
72 2349.9s (39 minutes and 9 seconds)197.006s (3 minutes and 17 seconds)57.683s (57 seconds)
fzsize_afc_min
indomain_max
72 2393.97s (39 minutes and 53 seconds)255.495s (4 minutes and 15 seconds)116.09s (1 minute and 56 seconds)
chuffed4most_constrained
indomain_split
72 2415.61s (40 minutes and 15 seconds)258.14s (4 minutes and 18 seconds)89.89s (1 minute and 29 seconds)
chuffed4first_fail
indomain_split
72 2413.29s (40 minutes and 13 seconds)258.3s (4 minutes and 18 seconds)89.9s (1 minute and 29 seconds)

Summary and conclusions

The objective of the benchmark was to see how well the MiniZinc model would compete with the plain Gecode model. For all wordlists there is at least one FlatZinc solver with a total time that is near plain Gecode's total times, and for one wordlist (SCOWL) there is a solver that is much faster. Comparing the reported runtimes all measurable wordlists has a FlatZinc solver that is faster. For one wordlist (Swedish words) there where no run of the plain Gecode model.

As mentioned above, there is no single FlatZinc solver/labeling that is the best for all wordlist. Comparing just the FlatZinc solvers we see that Chuffed solver (with some labeling) was the best for SCOWL, 80.txt, and /usr/share/dict/words; whereas Gecode/fz was the best for the Swedish wordlist.

In the preliminary tests the best variable selection for Gecode/fz is size_afc_min, though the best value selection is not that clear. For Chuffed there is single variable/value selection combination that dominates, though both first_fail and most_constrained often gave quite good results.

As noted several times before in the CP field, these kind of benchmarks might be misleading and of limited value. The comparison between the plain Gecode model and the MiniZinc model (+ FlatZinc solvers) might be even worse since it compares at the same time:
  • two different CP systems: compiled C++ code vs MiniZinc eco system
  • two different approaches: element constraint on intersections vs. table constraint on words.
  • different time measurements
Still, it is probably not unfair to draw the conclusion that the MiniZinc model and the two tested FlatZinc solvers at least did give the plain Gecode model a match in generating/solving the selected problem instances.

Also, there might be some parameter or labeling that is much better than these tested. This includes - of course - parameters of the plain Gecode model.

Further research

It would be interesting to study how well the table approach would do as a plain Gecode model. It would also be interesting to do a MiniZinc model using a element/alldifferent approach.

System

Here is the system and version used in the benchmark:
  • Linux Ubuntu 11.04, 64-bit 8 cores (i7 930, 2.80GHz) with 12Gb RAM
  • Gecode: SVN version (based on the version 3.7.0) per 2011-09-21.
  • MiniZinc version: Version 1.4
  • Chuffed version: Version compiled 2011-04-17 (no explicit version).

Thanks

Thanks to Christian Schulte and Guido Tack for some explanations and intuitions. Also, thanks to Geoffrey Chu and Peter Stuckey for the opportunity to test Chuffed.

September 13, 2011

MiniZinc Challenge 2011 Results

The results of the MiniZinc Challenge 2011 has been published.

From the result page:
The entrants for this year (with their descriptions, when provided):

BProlog. A CLP(FD) solver.
Bumblebee. Translates to SAT, uses CryptoMiniSAT.
Gecode. A C++ FD solver.
JaCoP. A Java FD solver.
Fzn2smt . Translates to SMT, uses Yices.
SCIP. A CP/MIP solver.

In addition, the challenge organisers entered the following FlatZinc implementations:

Chuffed. A C++ FD solver using Lazy clause generation.
CPX. A C++ FD Solver using Lazy Clause Generation.
G12/FD. A Mercury FD solver (the G12 FlatZinc interpreter's default solver).
G12/LazyFD. A Mercury FD solver using Lazy Clause Generation.
G12/CBC. Translates to MIP, uses Cbc version 2.6.2.
G12/CPLEX. Translates to MIP, uses CPLEX version 12.1
. G12/Gurobi. Translates to MIP, uses Gurobi Optimizer version 4.5.

As per the challenge rules, these entries are not eligible for prizes, but do modify the scoring results. Furthermore, entries in the FD search category (BProlog, Gecode, JaCoP, Chuffed, CPX and G12/FD) were automatically included in the free search category, while entries in the free search category (Bumblebee, Fzn2smt, SCIP, CBC, CPLEX, Gurobi and promoted FD entries) were automatically included in the parallel search category.
The slides for the presentation of the results at CP2011 are here (PDF).

Summary of results

The results for the MiniZinc Challenge 2011 were

Fixed search:
* Gold medal: Gecode
* Silver medal: JaCoP
* Bronze medal: BProlog

Free search:
* Gold medal: Gecode
* Silver medal: fzn2smt
* Bronze medal: JaCoP

Parallel search:
* Gold medal: Gecode
* Silver medal: fzn2smt
* Bronze medal: JaCoP

Congratulations to all!

As in the MiniZinc Challenge 2010, the Chuffed solver did very well on the benchmarks (though not eligible for a price). More details in presentation and the result page (where the models and data instances are shown).

January 09, 2011

Rogo grid puzzle in Answer Set Programming (Clingo) and MiniZinc

Encoding/models:
ASP (Clingo): rogo.lp, rogo2.lp
MiniZinc: rogo.mzn, rogo2.mzn
(See below for some problem instances.)

Later update: I have now also implemented versions with symmetry breaking constraints in the two encodings: rogo2.lp and rogo2.mzn. See below for more into

In Operations Research, Sudoko, Rogo, and Puzzles Mike Trick presented the Rogo puzzle, a grid puzzle where the object is to find a loop of a number of steps and get as many points as possible. He writes
The loop must be a real loop: it must return where it started and can’t cross itself or double back. Steps can be either horizontal or vertical: they cannot be diagonal. The loop cannot include any of the black squares. ... Rogo was created by two faculty members (Nicola Petty and Shane Dye)  at the University of Canterbury in Christchurch, New Zealand.  Not surprisingly, Nicola and Shane teach management science there:  the problem has a very strong operations research flavor.
From Creative Heuristics Ltd (the Rogo puzzle site): Rogo is an entirely new type of puzzle. The object is to collect the biggest score possible using a given number of steps in a loop around a grid. The best possible score for a puzzle is given with it, so you can easily check that you have solved the puzzle. Rogo puzzles can also include forbidden squares, which must be avoided in your loop..

Below I have assumed that the path must be in exactly the given number of steps and the programs are written with this assumption in mind (and further reading of the problem descriptions support this interpretation). However, my first approach - probably caused by sloppy reading - was that the optimal path could possible be in lesser steps. It took several hours trying to implement a program supporting this, but was abandoned after re-reading of the problem descriptions. This seems to be a harder problem; maybe it could be used as a variant of the original problem?

I was inspired to solve these puzzle in part because of Mike's last words: Creating a solver would make a nice undergraduate project (and I suspect there are at least a few master's theses and perhaps a doctoral dissertation on algorithmic aspects of creating and solving these). One other reason was to see how to do this with Answer Set Programming - here using Clingo (a Potassco tool) and to compare it with a Constraint Programming system, MiniZinc.

Some more links about Rogo:
  • Instructions
  • Rogo blog
  • YouTube clip.
  • Nicola Petty, Shane Dye: Determining Degree Of Difficulty In Rogo, A TSP-based Paper Puzzle (PDF)
    From the Conclusions: The Rogo puzzle format has a number of aspects that can be controlled to potentially affect degree of difficulty of solving. As a pilot, this study showed that there are many aspects of puzzle-solving related to the nature of the puzzle that can be explored, and there appear to be some general effects, though there are still marked individual differences between people solving the puzzles. This research has the potential to provide interesting insights into both human behaviour, and the nature of puzzles.

    Note: I didn't noticed this until I was almost finished with this blog post (and have just glanced through it).

Example

Here is the Rogo example from Mike Trick's site (pictures borrowed from his site; click on them for larger versions).

Problem:

Rogo puzzle, problem.

One solution:

Rogo puzzle, solution

Note that there is not an unique solution to these puzzles. All three problem instances I tested have more than one solution. For example, the Mike Trick problem has 48 solutions including path symmetries. Since there are 12 steps there are (removing the path symmetry) 48 / 12 = 4 distinct different paths. These different paths are shown below as MiniZinc solutions, where the first step have been fixed to (2,2), i.e. x[1]=2 and y[1]=2<, and it has 3 points (points[1]):

points = array1d(1..12, [3, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0]);
x = array1d(1..12, [2, 3, 4, 4, 5, 5, 5, 4, 3, 2, 2, 2]);
y = array1d(1..12, [2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 4, 3]);
----------
points = array1d(1..12, [3, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0]);
sum_points = 8;
x = array1d(1..12, [2, 3, 3, 4, 5, 5, 5, 4, 3, 2, 2, 2]);
y = array1d(1..12, [2, 2, 3, 3, 3, 4, 5, 5, 5, 5, 4, 3]);
----------
points = array1d(1..12, [3, 0, 0, 1, 0, 0, 2, 0, 0, 2, 0, 0]);
sum_points = 8;
x = array1d(1..12, [2, 2, 2, 2, 3, 4, 5, 5, 5, 4, 3, 3]);
y = array1d(1..12, [2, 3, 4, 5, 5, 5, 5, 4, 3, 3, 3, 2]);
----------
points = array1d(1..12, [3, 0, 0, 1, 0, 0, 2, 0, 0, 2, 0, 0]);
sum_points = 8;
x = array1d(1..12, [2, 2, 2, 2, 3, 4, 5, 5, 5, 4, 4, 3]);
y = array1d(1..12, [2, 3, 4, 5, 5, 5, 5, 4, 3, 3, 2, 2]);
----------

Answer Set Programming, Clingo

Here is the full ASP (Clingo) encoding, without the data:

% domains
rows(1..r).
cols(1..c).

% max number of steps
steps(1..max_steps).

% define adjacency between cells
adj(R,C, R1,C1) :- rows(R;R1), cols(C;C1), |R-R1| + |C-C1|==1.

% the path: unique index
0 { path(I, Row, Col) : steps(I) } 1 :- rows(Row), cols(Col).
1 { path(I, Row, Col) : rows(Row) : cols(Col) } 1 :- steps(I).

% close the circuit: ensure that the first and last cells
% in the path are connected.
:- path(1, R1, C1), path(max_steps, R2, C2), not adj(R1,C1,R2,C2).

% remove bad paths
:- path(I-1,R1,C1), path(I,R2,C2), not adj(R1,C1, R2,C2).

% no black cells in the path
:- path(I, R,C), black(R,C).

% total points, needed since
% "Optimization:" don't show the proper value.
total(Total) :- Total = #sum[got_points(R,C,Value) = Value].

% list the cells in path with points
got_points(R,C, Value) :- point(R,C,Value), path(I, R, C).

% maximize the number of points
% #maximize [ path(I,R,C) : steps(I) : point(R,C,P) = P ].

% alternative: we can add an second objective to
% start with the cell with lowest indices
#maximize [ path(I,R,C) : steps(I) : point(R,C,P) = P@2 ].
#minimize [ path(1,R,C) = R*c+C@1].

#hide.
#show path(I, Row, Col).
#show total(Total).
#show got_points(R,C,Value).

Here is the encoding for Mike Trick's problem instance:
#const max_steps = 12.
#const r = 5.
#const c = 9.

% the point cells
point(1,1,2).
point(2,2,3).
point(2,5,1).
point(2,8,2).
point(3,9,3).
point(4,3,2).
point(5,5,2).
point(5,8,1).

% black cells (to avoid)
black(3,7).
black(4,4).
The solution of Mike Trick's problem (edited), using the following command line:
clingo --heuristic=Vmtf --stat rogo_data_mike_trick.lp rogo.lp
total: 8
got_points:
  5,5,2
  4,3,2
  2,5,1
  2,2,3

path:
  1,2,2
  2,2,3
  3,2,4
  4,2,5
  5,3,5
  6,4,5
  7,5,5
  8,5,4
  9,5,3
  10,4,3
  11,4,2
  12,3,2
Statistics for this solution:
Models      : 1     
  Enumerated: 6
  Optimum   : yes
Optimization: 184 20 
Time        : 0.960
  Prepare   : 0.060
  Prepro.   : 0.020
  Solving   : 0.880
Choices     : 19826
Conflicts   : 16539
Restarts    : 1

Atoms       : 912   
Bodies      : 22839 
Tight       : Yes

  Deleted   : 10406 
Update With the following symmetry breaking added, the problem is solved in 0.58 seconds.

% symmetry breaking: the cell with the lowest coordinates
% should be in the first step
:- path(1, R, C), steps(Step), Step > 1, path(Step, R2, C2),
R*c+C > R2*c+C2.


The statistics for this variant:

Time        : 0.580
  Prepare   : 0.080
  Prepro.   : 0.030
  Solving   : 0.470
Choices     : 8727
Conflicts   : 6914
Restarts    : 2
End of update

Some notes:
One nice feature in Clingo (and lparse) is that it is possible to have many optimization objective. Here we we first maximize the number of points (#maximize [ path(I,R,C) : steps(I) : point(R,C,P) = P@2 ] , and as a second objective (with lower priority @1) we minimize the start cell to start with the cell with the lowest coordinate: #minimize [ path(1,R,C) = R*c+C@1].. Sometimes this is faster than the plain #maximize objective, sometimes not.

The size of "core" of the encoding is quite small. Here is the code with the comments and the helper predicates (for outputs) removed.

rows(1..r).
cols(1..c).
steps(1..max_steps).
adj(R,C, R1,C1) :- rows(R;R1), cols(C;C1), |R-R1| + |C-C1|==1.
0 { path(I, Row, Col) : steps(I) } 1 :- rows(Row), cols(Col).
1 { path(I, Row, Col) : rows(Row) : cols(Col) } 1 :- steps(I).
:- path(1, R1, C1), path(max_steps, R2, C2), not adj(R1,C1,R2,C2).
:- path(I-1,R1,C1), path(I,R2,C2), not adj(R1,C1, R2,C2).
:- path(I, R,C), black(R,C).
#maximize [ path(I,R,C) : steps(I) : point(R,C,P) = P ].

The corresponding "core" of the MiniZinc program (see below for the full code) is larger.

Constraint Programming, MiniZinc

Here is the full MiniZinc code (without data). Compared with the ASP approach, the decision variables are represented in another way: the paths is represented by two arrays x and y, and the points are collected in a separate array (points) so we can simply sum over it for the optimization.

include "globals.mzn";

int: W = 0; % white (empty) cells
int: B = -1; % black cells
int: max_val = max([problem[i,j] | i in 1..rows, j in 1..cols]);

% define the problem
int: rows;
int: cols;
int: max_steps; % max length of the loop
array[1..rows, 1..cols] of int: problem;

% the coordinates in the path
array[1..max_steps] of var 1..rows: x :: is_output;
array[1..max_steps] of var 1..cols: y :: is_output;

% the collected points
int: max_point = max([problem[i,j] | i in 1..rows, j in 1..cols]);
array[1..max_steps] of var 0..max_point : points :: is_output;

% objective: sum of points in the path
int: max_sum = sum([problem[i,j] | i in 1..rows, j in 1..cols where problem[i,j] > 0]);
var 0..max_sum: sum_points :: is_output;

% solve satisfy;
solve maximize sum_points;
% solve :: int_search(x ++ y, first_fail, indomain_min, complete) maximize sum_points;

% all coordinates must be unique
constraint forall(s in 1..max_steps, t in s+1..max_steps) (
x[s] != x[t] \/ y[s] != y[t]
);

% calculate the points (to maximize)
constraint forall(s in 1..max_steps) (
points[s] = problem[x[s], y[s]]
)
/\
sum_points = sum(points);

% ensure that there are no black cells
% in the path
constraint forall(s in 1..max_steps) (
problem[x[s],y[s]] != B
);

% get the path
constraint forall(s in 1..max_steps-1) (
abs(x[s] - x[s+1]) + abs(y[s] - y[s+1]) = 1
)
/\ % close the path around the corner
abs(x[max_steps] - x[1]) + abs(y[max_steps] - y[1]) = 1;


Except for more declaration of the arrays and decision variables, this code don't have more logic than the ASP encoding. However it is more verbose.

The solution for Mike Trick's problem, using LazyFD, takes 1.1 second, slightly slower than using Clingo (see below for more comparison of times):

points = array1d(1..12, [0, 0, 3, 0, 0, 2, 0, 0, 2, 0, 0, 1]);
sum_points = 8;
x = array1d(1..12, [2, 2, 2, 3, 3, 4, 5, 5, 5, 4, 3, 2]);
y = array1d(1..12, [4, 3, 2, 2, 3, 3, 3, 4, 5, 5, 5, 5]);


Update
After some thoughts I decided to try the same symmetry breaking that was an option the ASP encoding. It is implemented in rogo2.mzn and use the following extra constraint that ensure that the cell with the lowest coordinate is in the first step. % symmetry breaking: the cell with lowest coordinates % should be in the first step. constraint forall(i in 2..max_steps) ( x[1]*cols+y[1] < x[i]*cols+y[i] );

With this model, LazyFD solves Mike Trick's problem in 0.627 seconds. Also see under "Comparison" below.

End of update

Comparison

I was curious how well the systems should do so here is a "recreational" comparison. Please don't read too much into it:
  • it is just 3 problems.
  • there is probably better heuristics for both Clingo and Gecode/fz
The following 3 problem instances was used in the test. Unfortunately, I have not found any direct links for the two latter instances (see below for links to my encodings). Instead a variant of the coding used in MiniZinc is shown, where "_" ("W" in the MiniZinc code) is white/blank, "B" is a black cell to avoid, and a number represent a point of the cell.
  • Mike Trick's example. 5 rows, 9 column, 12 steps; good: 6, best: 8.
    %1 2 3 4 5 6 7 8 9  
     2,_,_,_,_,_,_,_,_, % 1
     _,3,_,_,1,_,_,2,_, % 2
     _,_,_,_,_,_,B,_,2, % 3
     _,_,2,B,_,_,_,_,_, % 4
     _,_,_,_,2,_,_,1,_, % 5
    
  • The Paper Rogo puzzle from Creative Heuristics Ltd for 20110106. 9 rows, 7 columns, 16 steps; good: 28, best: 31 points.
     %1 2 3 4 5 6 7
      B,_,6,_,_,3,B, % 1
      2,_,3,_,_,6,_, % 2
      6,_,_,2,_,_,2, % 3
      _,3,_,_,B,B,B, % 4
      _,_,_,2,_,2,B, % 5
      _,_,_,3,_,_,_, % 6
      6,_,6,B,_,_,3, % 7
      3,_,_,_,_,_,6, % 8
      B,2,_,6,_,2,B, % 9
    
  • The Paper Rogo puzzle from Creative Heuristics Ltd for 20110107. 12 rows, 7 columns, 16 steps; good: 34 points, best: 36 points.
     %1 2 3 4 5 6 7
      4,7,_,_,_,_,3, % 1
      _,_,_,_,3,_,4, % 2
      _,_,4,_,7,_,_, % 3
      7,_,3,_,_,_,_, % 4
      B,B,B,_,3,_,_, % 5
      B,B,_,7,_,_,7, % 6
      B,B,_,_,_,4,B, % 7
      B,4,4,_,_,_,B, % 8
      B,_,_,_,_,3,B, % 9
      _,_,3,_,4,B,B, % 10
      3,_,_,_,_,B,B, % 11
      7,_,7,4,B,B,B  % 12
    
For the ASP encoding I used clingo (a combined grounder and solver) and the parameter --heuristi=Vmtf after some minimal testing with different parameters. For MiniZinc, both solvers LazyFD and Gecode/fz where used and I settled with the search heuristic solve minimize sum_points which seems to be fast enough for this experiment. Note that LazyFD tend to behave best without any explicit search heuristics . (Also: there is always a better search heuristics that the one you settle with.)

The time reported is the total time, i.e. including the grounding time/convert to FlatZinc.

Update I have added the second version of the MiniZinc model with the added symmetry breaking constraint as a separate entry: End of update

Mike Trick problem

Clingo: 0.96 seconds, 19826 choices, 16539 conflicts, 1 restart.
Clingo with symmetry breaking: 58 seconds, 8727 choices, 6914 conflicts, 2 restarts.
LazyFD: 1.1 second. (failure is not reported)
LazyFD with symmetry breaking: 0.6 seconds (failure is not reported)
Gecode/fz: 2.62s, 92113 failures2577853 failures
Gecode/fz: with symmetry breaking: 0.4 seconds, 9418 failures

20110106 problem

Clingo: 1:57.07 minutes, 1155290 choices, 1044814 conflicts, 1 restart
Clingo with symmetry breaking: 20.4 seconds, 157146 choices, 135178 conflicts, 3 restarts
LazyFD: 2:58 minutes
LazyFD with symmetry breaking: 19.9 seconds (failure is not reported)
Gecode/fz: 58.6 seconds, 1380512 failures
Gecode/fz with symmetry breaking: 7.8 seconds, failures

20110107 problem

Clingo: 3:13.72 1541808 choices, 1389396 conflicts, 1 restart
Clingo with symmetry breaking: 31.6 seconds 178301 choices, 151439 conflicts, 1 restart
LazyFD: 2:55.18 minutes
LazyFD with symmetry breaking: 44.5 seconds (failure is not reported)
Gecode/fz: 1:54.50 minutes 2577853 failures
Gecode/fz with symmetry breaking: 11.3 seconds, failures

Here we see that Gecode/fz without (symmetry breaking) is the fastest for the two larger problems (but the slowest on the first), and both Clingo and LazyFD was each placed second in the harder problems. So, I'm not really sure we can draw any real conclusions from this small test.

Update With symmetry breaking added the result is more unanimous. All three solvers benefit much from this. Gecode/fz is faster on all three problems, and the other two still one second place each. We also see how much symmetry breaking can do. End of update

Some comments/conclusions

Here are some general comments about the problem and the coding.

ASP

As mentioned above, my initial assumption of the problem was that the given number of steps was not always the best path length, and trying to code this was not easy. After a long time sitting with this approach (say 6 hours coding time?), I re-read the description and couldn't find any real support for this assumption, so I skipped it in favor for the "fixed length" approach.

To get to the final version it took several hours, say 9-10 hours total coding (in smaller sessions over several days). This includes time for coding the initial assumption/interpretation. It also includes the time trying to get the first assumption approach to work with the incremental solver iclingo which - I hoped - should first try the longest length and if that gave no solution it should try at the the lengths in decrement of 1; but I didn't get this to work.

As usual I realized that several of the predicates I first thought was needed could just be thrown away. An example is this snipped to ensure "positive" that there is a path between the cell R1,C1 and the cell R2,C2.

{ x(I-1, R1, C1) : adj(R1,C1, R2,C2) : rows(R1;R2) : cols(C1;C2) : not black(R1,C1) : not black(R2,C2) } max_steps :- x(I, R2, C2), steps(I).

Instead it could be replaced with :- x(I, R,C), black(R,C).
:- x(I-1,R1,C1), x(I,R2,C2), not adj(R1,C1, R2,C2).


Well, I hope that with time I'll recognize these things faster.

MiniZinc

It took about 30 minutes to code the MiniZinc model (after a unnecessarily long detour of debugging bad thinking and a bug in the data). The reason for this much shorter time is two-fold:
  • It's much easier to code something when the general strategy (approach) of the problem is known and has been coded in another system. All the (de)tours when writing the ASP version made much more comfortable with the given problem.
  • I'm much more comfortable coding in MiniZinc than in ASP for two reasons: 1) I have programmed in MiniZinc much longer than in ASP. 2) I am also more used with the high-level CP approach of stating the requirements/constraints with traditional loops and using arrays/matrices.
Programs and data files Here are all the files used, both program and data files for the two systems. ASP (Clingo)
MiniZinc

November 11, 2010

Google CP Solver: A much faster Nonogram solver using DefaultSearch

This is an extension of the comparison done in Comparison of some Nonogram solvers: Google CP Solver vs Gecode and two MiniZinc solvers with includes a new and much faster variant in Google CP Solver.

The day after the above blog post was written, Laurent Perron and I discussed alternative ways of solving these Nonogram problems. Laurent suggested another solving strategy: DefaultSearch (solver.DefaultPhase) which gave excellent overall result. Then I did some more experiment and tweaking which gave the results shown in the table below.

The new Nonogram model

Model: nonogram_default_search.py

The changes compared to the previous version nonogram_table2.py was very small in the Python program (there where also some changes in the C++ code). Here are the changes:
parameters = pywrapcp.DefaultPhaseParameters()
parameters.heuristic_period = 200000
db = solver.DefaultPhase(board_label, parameters)  

DefaultSearch

To be honest, I am not exactly sure why DefaultSearch is so fast for the Nonogram problems. It is a variant of Impact-Based Search (see Philippe Refalo: Impact-Based Search Strategies for Constraint Programming, 2004) combined with periodic heuristics. The source code is here: constraint_solver/default_search.cc.

heuristic_period

The value of 200000 for parameters.heuristic_period was found by experimenting. For quite long I used the value of 11000 and later found that even though increasing this value didn't do much difference for the simpler problems (same time and same number of failures), for the harder problems - such as Lion, Marley, and Nature - the number of failures decreased and the time was improved a bit. Larger values above that, such as 2000000, decreased the number of failures somewhat for the Nature problem, but the time seemed to be slightly worse. So I settled with 200000.

This may or may not be a good value for general Nonogram problems. Doing this kind of testing on a limited - albeit mixed - problem set may have biased the search in some unfortunate way.

After deciding on the above heuristics, I finally tested (as a verification set) with some other problem instances and found that they still was easy solved:
  • P200: 0.25s / 1637 failures
  • P199: 0.06s / 242 failures
  • Dragonfly: 0.03s / 0 failures
P200 was slightly worse in this version: the previous version solved it in 0.14s and with 520 failures. The other two problem was better or about the same with the current version.

As an experiment, I changed the order of variables to board_flat (instead of board_label), then Lion was solved in 2 seconds (compared to 4.45 seconds) and Marley in 11 seconds (compared to 7:18 minutes). On the down side, 9-Dom then took 9 seconds (compared to 3.5 seconds) and Nature timed out (compared to 20:25 minutes), and other problems took slightly longer time. So this is probably not a good way to go.

I also experimented with restarts (which is reported to work well with Impact-Based Search), and found some initial promising results. I may come back to this later.

I should also note that the current model is quite different from my first model nonogram_regular.py presented in Google CP Solver: Regular constraint, Nonogram solver, nurse rostering, etc. Laurent has done a great job of optimizing all sort of things, both in the Python code and in the C++ code. For example, he added the "regular like" constraint solver.TransitionConstraint which obsoleted my decomposition of regular for this problem.

A comment on the results

None of the Nonogram solvers that Jan Wolter had tested in his excellent Survey of Paint-by-Number Puzzle Solvers has solved all the problem instances under the 30 minutes time limit. The table below shows that the DefaultSearch variant solves all the problems. At least on my computer, and hopefully also on Wolter's benchmark computer if he want to test this solver. All problems - except Marley and Nature - where solved well under 10 seconds, including the infamous Lion problem (4.46s).

In the previous version of the Nonogram solver a couple of the problems timed out, marked + in the table. Some of these, such as Hot and Light, is now solved under 2 seconds. All these "remarkable" improvements in bold.

Test machine

The machine used for the test is the same as in the former test: Linux Ubuntu 10.4 LTS with 12Gb RAM, 8 core (Intel i7/930, 2.80GHz). All programs ran on a single processor. Python version 2.6. Google CP Solver used was revision 276.

Comparison - table

Here is the full table with the new version (DefaultSearch) in the last column. The number in parenthesis is Wolter's times. The second row in each entry is my own timing. The number of failures where applicable; LazyFD always returned 0 choice points so these are skipped. A + indicates time out (30 minutes).

The links for the puzzle is to Wolter's puzzle pages.
Puzzle Size Notes Gecode MiniZinc
Gecode/fz
MiniZinc
LazyFD
Google
CP Solver
Google
CP Solver
DefaultSearch
#1: Dancer* 5 x 10 Line (0.01s)
0.01s/0 failures
(0,01s)
0.08s/0 failures
(0.04s)
0.1s
0.02s
0 failures
0.02s
0 failures
#6: Cat* 20 x 20 Line (0.01s)
0.1s/0 failures
(0.02s)
0.1s/0 failures
(0.44s)
0.95s
0.12s
0 failures
0.03s
0 failures
#21: Skid* 14 x 25 Line, Blanks (0.01s)
0.01s/0 failures
(0,02s)
0.09s/13 failures
(0.64s)
0.9s
0.03s
0 failures
0.03s
0 failures
#27: Bucks* 27 x 23 Blanks (0.01s)
0.2s/2 failures
(0.02s)
0.1s/3 failures
(1.1s)
1.6s
0.05s
3 failures
0.04s
6 failures
#23: Edge* 10 x 11   (0.01s)
0.01s/15 failures
(0.01s)
0.09s/25 failures
(0.08s)
0.16s
0.03s
25 failures
0.03s
58 failures
#2413: Smoke 20 x 20   (0.01s)
N/A
(0.02s)
0.11s/5 failures
(0.60s)
1.18s
0.05s
8 failures
0.04s
20 failures
#16: Knot* 34 x 34 Line (0.01s)
0.01s/0 failures
(0.02s)
0.14s/0 failures
(5.5s)
5.6s
0.12s
0 failures
0.06s
0 failures
#529: Swing* 45 x 45 Line (0.02s)
0.02s/0 failures
(0.04s)
0.24s/0 failures
(13.2s)
22.3s
0.3s
0 failures
0.11s
0 failures
#65: Mum* 34 x 40   (0.02s)
0.02s/17
(0.04s)
0.18s/20 failures
(11.0s)
12.4s
0.15s
22 failures
0.10s
46 failures
#7604: DiCap 52 x 63   (0.02s)
N/A
(0.06s)
0.29s/0 failures
(+)
1:48m
0.45s
0 failures
0.18s
0 failures
#1694: Tragic 45 x 50   (0.14s)
N/A
(3.6m)
2:14m/394841 failures
(32.1s)
34.57s
2:33m
394841 failures
0.35s
1862 failures
#1611: Merka* 55 x 60 Blanks (0.03s)
N/A
(+)
+
(1.1m)
1:10m
0.47s
27 failures
0.20s
96 failures
#436: Petro* 40 x 35   (1.4m[s?])
0.05s/48 failures
(1.6s)
1.15s/1738 failures
(6.2s)
9.3s
1.19s
1738 failures
0.245s
770 failures
#4645: M&M 50 x 70 Blanks (0.93s)
N/A
(0.28s)
0.41s/89 failures
(48.1s)
1:14m
0.48s
82 failures
0.46s
376 failures
#3541: Signed 60 x 50   (0.57s)
N/A
(0.56s)
0.61s/929 failures
(1.0m)
1:02m
11.7s
6484 failures
0.92s
3821 failures
#803: Light* 50 x 45 Blanks (+)
N/A
(+)
+
(4.7s)
18.32s
+ 1.5s
4052 failures
#6574: Forever* 25 x 25   (4.7s)
N/A
(2.3s)
1.5s/17143 failures
(1,7s)
1.99s
1.6s
17143 failures
0.26s
2421 failures
#2040: Hot 55 x 60   (+)
N/A
(+)
+
(2.6m)
1:05m
+ 0.62s
2394 failures
#6739: Karate 40 x 40 Multiple (56.0s)
N/A
(57.8s)
38.0s/215541 failures
(13.6s)
22.52s
56.1s
215541 failures
0.2s
1361 failures
#8098: 9-Dom* 19 x 19   (12.6m)
N/A
(3.8s)
2.59s/45226 failures
(1.8s)
3.11s
3.9s
45226 failures
3.54s
27650 failures
#2556: Flag 65 x 45 Multiple, Blanks (3.4s)
N/A
(+)
+
(4.2s)
16.18s
1.6s
14859 failures
0.15s
442 failures
#2712: Lion 47 x 47 Multiple (+)
N/
(+)
+
(+)
8:54m
+ 4.34s
15429 failures
#10088: Marley 52 x 63 Multiple (+)
N/A
(+)
+
(2.9m)
+
+ 7:18m
309293 failures
#9892: Nature 50 x 40 Multiple (+)
N/A
(+)
+
(2.4m)
25.29s
+ 20:25m
7743891 failures

November 08, 2010

Comparison of some Nonogram solvers: Google CP Solver vs Gecode and two MiniZinc solvers

After writing Google CP Solver: Regular constraint, Nonogram solver, nurse rostering, etc yesterday, I thought it would be interesting to benchmark the new Nonogram solver written in Google CP Solver with some other solvers. And the benchmark is of course from Jan Wolter's great Survey of Paint-by-Number Puzzle Solvers, though I compare only with Gecode, and two MiniZinc solvers: Gecode/fz (Gecode's FlatZinc solver), and MiniZinc's LazyFD since I know them best and can test them my self.

In the table below, I have kept Wolter's original value is in parentheses to get a feeling for the differences in our machines and to check with my missing problems with Gecode.

System notes:
  • The timeout is 30 minutes as in Wolter's test.
  • My machine is a Linux Ubuntu 10.4 LTS with 12Gb RAM, 8 core (Intel i7/930, 2.80GHz), though all programs ran on a single processor.
  • Also, I was "working" (Spotify, web surfing, running other tests in parallel, etc) on the machine as the test ran.
  • Due to copyrights issues, I cannot distribute the examples. They can be downloaded from Wolter's Sample Paint-by-Number Puzzles Download.
And here are some comments about the different solvers.

Google CP Solver

Google CP Solver revision 259, with revision 259 of the Python solver nonogram_table2.py, which includes a lot of nice improvements, thanks by Laurent Perron.

Gecode

Gecode and Gecode/fz is version 3.4.2 (latest svn revision: 11517)
Please note that I have just tested the problem instances in the existing files for Gecode. Hence a lot of instances where not tested on my machine (they are marked N/A).

Also, I think that Wolter's time for Petro should be 1.4s (not 1.4m).

MiniZinc

MiniZinc version: 64bit Linux, snapshot version per 2010-11-05.

Note: In contrast to Wolter's result, the times for Gecode/fz and LazyFD includes generating the FlatZinc file, which may be considerable for large problem instances. Hence some of my results are larger for these solvers.

Some comments about Google CP Solver / Python solver

Wolter has done a great job analyzing the three other solvers (Gecode, Gecode/fz, and LazyFD), so I just comment on Google CP Solver solver.

It looks like Google CP Solver/Python solver is quite similar to Gecode/fz solver, and in some case (e.g. 9-Dom, Bucks, Tragic, Petro, etc) it has exactly the same number of failures. There are some exceptions, though:
  • Solved Merka and Dicap quite fast. Gecode/fz timed out for both these problems
  • Also solved Flag fast where Gecode/fz timed out. Here it is also faster than LazyFD and Gecode (note: Wolter's time).
  • Slower than Gecode/fz on Karate, Signed
  • Slightly slower than Gecode/fz on Tragic, with the same number of failures

Comparison - table

Here is the full table. The number in parenthesis is Wolter's times. The second row is my own timing. I also added the number of failures where applicable; LazyFD always returned 0 choice points so I skipped that. A + indicates time out (30 minutes).

The links for the puzzle is to Wolter's puzzle pages.
Puzzle Size Notes Gecode MiniZinc
Gecode/fz
MiniZinc
LazyFD
Google
CP Solver
#1: Dancer* 5 x 10 Line (0.01s)
0.01s/0 failures
(0,01s)
0.08s/0 failures
(0.04s)
0.1s
0.02s
0 failures
#6: Cat* 20 x 20 Line (0.01s)
0.1s/0 failures
(0.02s)
0.1s/0 failures
(0.44s)
0.95s
0.12s
0 failures
#21: Skid* 14 x 25 Line, Blanks (0.01s)
0.01s/0 failures
(0,02s)
0.09s/13 failures
(0.64s)
0.9s
0.03s
0 failures
#27: Bucks* 27 x 23 Blanks (0.01s)
0.2s/2 failures
(0.02s)
0.1s/3 failures
(1.1s)
1.6s
0.05s
3 failures
#23: Edge* 10 x 11   (0.01s)
0.01s/15 failures
(0.01s)
0.09s/25 failures
(0.08s)
0.16s
0.03s
25 failures
#2413: Smoke 20 x 20   (0.01s)
N/A
(0.02s)
0.11s/5 failures
(0.60s)
1.18s
0.05s
8 failures
#16: Knot* 34 x 34 Line (0.01s)
0.01s/0 failures
(0.02s)
0.14s/0 failures
(5.5s)
5.6s
0.12s
0 failures
#529: Swing* 45 x 45 Line (0.02s)
0.02s/0 failures
(0.04s)
0.24s/0 failures
(13.2s)
22.3s
0.3s
0 failures
#65: Mum* 34 x 40   (0.02s)
0.02s/17
(0.04s)
0.18s/20 failures
(11.0s)
12.4s
0.15s
22 failures
#7604: DiCap 52 x 63   (0.02s)
N/A
(0.06s)
0.29s/0 failures
(+)
1:48m
0.45s
0 failures
#1694: Tragic 45 x 50   (0.14s)
N/A
(3.6m)
2:14m/394841 failures
(32.1s)
34.57s
2:33m
394841 failures
#1611: Merka* 55 x 60 Blanks (0.03s)
N/A
(+)
+
(1.1m)
1:10m
0.47s
27 failures
#436: Petro* 40 x 35   (1.4m[s?])
0.05s/48 failures
(1.6s)
1.15s/1738 failures
(6.2s)
9.3s
1.19s
1738 failures
#4645: M&M 50 x 70 Blanks (0.93s)
N/A
(0.28s)
0.41s/89 failures
(48.1s)
1:14m
0.48s
82 failures
#3541: Signed 60 x 50   (0.57s)
N/A
(0.56s)
0.61s/929 failures
(1.0m)
1:02m
11.7s
6484 failures
#803: Light* 50 x 45 Blanks (+)
N/A
(+)
+
(4.7s)
18.32s
+
#6574: Forever* 25 x 25   (4.7s)
N/A
(2.3s)
1.5s/17143 failures
(1,7s)
1.99s
1.6s
17143 failures
#2040: Hot 55 x 60   (+)
N/A
(+)
+
(2.6m)
1:05m
+
#6739: Karate 40 x 40 Multiple (56.0s)
N/A
(57.8s)
38.0s/215541 failures
(13.6s)
22.52s
56.1s
215541 failures
#8098: 9-Dom* 19 x 19   (12.6m)
N/A
(3.8s)
2.59s/45226 failures
(1.8s)
3.11s
3.9s
45226 failures
#2556: Flag 65 x 45 Multiple, Blanks (3.4s)
N/A
(+)
+
(4.2s)
16.18s
1.6s
14859 failures
#2712: Lion 47 x 47 Multiple (+)
N/
(+)
+
(+)
8:54m
+
#10088: Marley 52 x 63 Multiple (+)
N/A
(+)
+
(2.9m)
+
+
#9892: Nature 50 x 40 Multiple (+)
N/A
(+)
+
(2.4m)
25.29s
+

November 08, 2009

Update on Nonogram: Jan Wolter's Survey and my own new benchmark

Survey of Paint-by-Number Puzzle Solvers

In Some new models, etc I mentioned the great Survey of Paint-by-Number Puzzle Solvers, created by Jan Wolter (also author of the Nonogram solver pbnsolve).

In this survey he included both Gecode's Nonogram solver written by Mikael Lagerkvist as well as my own Nonogram model (with Gecode/FlatZinc).

Since the last update on the former blog post the following has happeded:
  • both our solver has now the assessment: "An amazingly good solver, especially for a simple demo program", and are placed 4, 5, and 6 of 10 tested systems
  • my Gecode/FlatZinc model has been tested for "Full results"; it came 4 out of 5.
  • my Nonogram model with the Lazy FD solver is now included in the "Sample result", at 6'th place
It seems than Wolter has appreciated constraint programming as a general tool for solving these kind of combinatorial problems, much for its ease of experimentation, e.g. with labeling strategies and (for the MiniZinc models) changing solvers:

From the analysis of Lagerkvist's Gecode model:
This is especially impressive because the faster solvers are large, complex programs custom designed to solve paint-by-number puzzles. This one is a general purpose solver with just a couple hundred custom lines of code to generate the constraints, run the solver, and print the result. Considering that this is a simple application of a general purpose solving tool rather than hugely complex and specialized special purpose solving tool, this is an astonishingly good result.

Getting really first class search performance usually requires a lot of experimentation with different search strategies. This is awkward and slow to do if you have to implement each new strategies from scratch. I suspect that a tool like Gecode lets you try out lots of different strategies with relatively little coding to implement each one. That probably contributes a lot to getting to better solvers faster.
From the analysis of my MiniZinc model:
If you tried turning this into a fully useful tool rather than a technology demo, with input file parsers and such, it would get a lot bigger, but clearly the constraint programming approach has big advantages, achieving good search results with low development cost.

...

These two results [Gecode/FlatZinc and LazyFD] highlight the advantage of a solver-independent modeling language like MiniZinc. You can describe your problem once, and then try out a wide variety of different solvers and heuristics without having to code every single one from scratch. You can benefit from the work of the best and the brightest solver designers. It's hard to imagine that this isn't where the future lies for the development of solvers for applications like this.
And later in the Conclusions:
The two constraint-based systems by Lagerkvist and Kjellerstrand come quite close in performance to the dedicated solvers, although both are more in the category of demonstrations of constraint programming than fully developed solving applications. The power of the underlaying search libraries and the ease of experimentation with alternative search heuristics obviously serves them well. I think it very likely that approaches based on these kinds of methods will ultimately prove the most effective.
I think this is an important lesson: Before starting to write very special tools, first try a general tool like a constraint programming system and see how well it perform.

The Lazy FD solver and the Lion problem

Most of the problems in the Sample Results where solved by some solver within the time limit of 30 minutes. However, one problem stand out as extra hard: The Lion problem. When I tested the MiniZinc's Lazy FD solver on my machine I was very excited that it took just about 2 minutes, and mentioned this to Wolter. He also tested this but on his 64-bit machine it took 80 minutes to solve (and since it was above the time limit this is not in the result table). This is how he describes the Lazy FD solver:
But the remarkable thing is that [the Lazy FD solver] solves almost everything. Actually, it even solved the Lion puzzle that no other solver was able to solve, though, on my computer, it took 80 minutes to do it. Now, I didn't allow a lot of other solvers 80 minutes to run, but that's still pretty impressive. (Note that Kjellerstrand got much faster solving times for the Lion than I did. Lagerkvist also reported that his solver could solve it, but I wasn't able to reproduce that result even after 30 CPU hours. I don't know why.)
After some discussion, we come to the conclusion that the differences was probably due to the fact that I use a 32-bit machine (and the 32-bit version of MiniZinc) with 2 Gb memory, and Wolter use a 64-bit machine with 1 Gb memory.

One should also note that the all other solvers was compiled without optimization, including Gecode/FlatZinc; however the LazyFD does not come with source so it is running optimized. This may be an unfair advantage to the LazyFD solver.

My own benchmark of the Sample Results

The times in the Sample Results is, as mentioned above, for solvers compiled with no optimization. I have now run the same problems on my machine (Linux Mandriva, Intel Dual 3.40GHz, with 2Gb memory), but the solvers uses the standard optimization. All problems was run with a time limit of 10 minutes (compared to Wolters 30 minutes) and searched for 2 solutions, which checks for unique solutions. The last three problems (Karate, Flag, Lion) has multiple solutions, and it is considered a failure if not two where found in the time limit. I should also note that during the benchmark I am using the machine for other things, such as surfing etc.

The problems
I downloaded the problems from Wolter's Webbpbn: Puzzle Export. For copyright reasons I cannot republish these models, but it is easy to download each problem. Select ".DZN" for the MiniZinc files, and "A compiled in C++ format" for Gecode. There is no support for Comet's format, but it's quite easy to convert a .dzn file to Comet's.

The solvers + labeling strategies
Here is a description of each solver and its labeling strategy:
  • fz, "normal" (column_row)
    MiniZinc model with Gecode/FlatZinc. The usual labeling in nonogram_create_automaton2.mzn, i.e. where the columns are labeled before rows:
    solve :: int_search(
          [x[i,j] | j in 1..cols, i in 1..rows], 
          first_fail, 
          indomain_min, 
          complete) 
    satisfy;
    
  • fz, "row_column"
    MiniZinc model with Gecode/FlatZinc. Here the order of labeling is reversed, rows are labeled before columns. Model is nonogram_create_automaton2_row_column.mzn
    solve :: int_search(
          [x[i,j] | i in 1..rows, j in 1..cols], 
          first_fail, 
          indomain_min, 
          complete) 
    satisfy;
    
  • fz, "mixed"
    MiniZinc model with Gecode/FlatZinc: nonogram_create_automaton2_mixed.mzn.
    I have long been satisfied with the "normal" labeling in the MiniZinc model because P200 (the hardest problem I until now has tested) was solved so fast. However, the labeling used in the Comet Nonogram model described in Comet: Nonogram improved: solving problem P200 from 1:30 minutes to about 1 second, and which is also used in the Gecode model, is somewhat more complicated since it base the exacl labeling by comparing the hints for the rows and the column.

    I decided to try this labeling in MiniZinc as well. However, labeling in MiniZinc is not so flexible as in Comet and Gecode. Instead we have to add a dedicated array for the labeling (called labeling):
    array[1..rows*cols] of var 1..2: labeling;
    
    and then copy the element in the grid to that array based on the relation between rows and column hints:
    constraint
          % prepare for the labeling
          if rows*row_rule_len < cols*col_rule_len then
               % label on rows first
               labeling = [x[i,j] | i in 1..rows, j in 1..cols]
          else 
               % label on columns first
               labeling = [x[i,j] | j in 1..cols, i in 1..rows]
          endif
          /\
          % .... 
    
    and last, the search is now based just on this labeling array:
    solve :: int_search(
            labeling,
            first_fail, 
            indomain_min, 
            complete)
    satisfy;
    
  • jacop, "normal"
    MiniZinc normal model with JaCoP/FlatZinc using the same model as for fz "normal".
  • lazy, satisfy
    Model: nonogram_create_automaton2_mixed.mzn. This use the MiniZinc LazyFD solver with the search strategy:
    solve satisfy;
    
    This labeling is recommended by the authors of LazyFD. See MiniZinc: the lazy clause generation solver for more information about this solver.

    Note: The solver in MiniZinc latest official version (1.0.3) don't support set vars. Instead I (and also Jan Wolter) used the latest "Release Of The Day" version (as per 2009-11-02).
  • Comet, normal
    Model: nonogram_regular.co. This is the Comet model I described in Comet: Nonogram improved: solving problem P200 from 1:30 minutes to about 1 second. No changes has been done.
  • Gecode, normal
    This is the Nonogram model distributed with Gecode version 3.2.1. The labeling is much like the one used in the Comet model, as well as fz, "mixed". (In fact the labeling in the Gecode model was inspired by the labeling in the Comet model).


Here is the results. For each model (+ labeling strategy) two values are presented:
  • time (in seconds)
  • number of failures if applicable (the LazyFD solver always returns 0 here).
The result
Model fz
normal
fz
row_column
fz
mixed
jacop
normal
lazy
satisfy
Comet
normal
Gecode
normal
Dancer
(#1)
0.48s
(0)
0.31s
(0)
1.00s
(0)
3.64s
(0)
0.91s
(0)
0.691s
(0)
0.199s
(0)
Cat
(#6)
0.24s
(0)
0.24s
(0)
0.25s
(0)
1.20s
(0)
1.13s
(0)
0.6s
(0)
0.025s
(0)
Skid
(#21)
0.24s
(13)
0.23s
(3)
0.28s
(13)
0.78s
(13)
1.37s
(0)
0.586s
(0)
0.217s
(0)
Bucks
(#27)
0.32s
(3)
0.32s
(9)
0.37s
(3)
0.96s
(3)
2.37s
(0)
1.366s
(5)
0.026s
(2)
Edge
(#23)
0.16s
(25)
0.17s
(25)
0.18s
(25)
0.59s
(25)
0.31s
(0)
0.521s
(43)
0.175s
(25)
Smoke
(#2413)
0.27s
(5)
0.27s
(8)
0.28s
(8)
0.83s
(5)
1.44s
(0)
0.616s
(14)
0.275s
(5)
Knot
(#16)
0.42s
(0)
0.43s
(0)
0.48s
(0)
1.19s
(0)
8.15s
(0)
1.307s
(0)
0.329s
(0)
Swing
(#529)
0.95s
(0)
0.94s
(0)
0.96s
(0)
2.19s
(0)
21.94s
(0)
1.782s
(0)
0.431s
(0)
Mum
(#65)
0.64s
(20)
0.64s
(22)
0.66s
(22)
1.68s
(20)
16.34s
(0)
1.268s
(39)
0.491s
(22)
Tragic
(#1694)
340.32s
(394841)
1.02s
(255)
436.92s
(394841)
--
(198329)
45.97s
(0)
477.39s
(702525)
1.139s
(255)
Merka
(#1611)
--
(361587)
1.44s
(79)
--
(294260)
--
(136351)
80.92s
(0)
1.654s
(46)
0.645s
(13)
Petro
(#436)
2.97s
(1738)
143.09s
(106919)
3.42s
(1738)
7.27s
(1738)
9.86s
(0)
3.103s
(3183)
151.09s
(106919)
M_and_m
(#4645)
1.41s
(89)
601.27s
(122090)
1.59s
(89)
3.43s
(89)
66.98s
(0)
2.215s
(162)
2.797s
(428)
Signed
(#3541)
1.87s
(929)
23.12s
(6484)
28.23s
(6484)
5.75s
(929)
73.02s
(0)
20.369s
(12231)
1.648s
(929)
Light
(#803)
600.47s--
(400660)
--
(621547)
--
(485056)
601.53s--
(171305)
8.64s
(0)
--
(0)
--
(538711)
Forever
(#6574)
4.14s
(17143)
7.86s
(30900)
6.22s
(17143)
12.21s
(17143)
3.27s
(0)
7.5s
(27199)
8.077s
(30900)
Hot
(#2040)
--
(303306)
--
(330461)
--
(248307)
--
(119817)
165.72s
(0)
--
(0)
--
(312532)
Karate
(#6739)
95.78s
(215541)
67.27s
(130934)
133.43s
(215541)
373.02s
(215541)
19.32s
(0)
120.02s
(272706)
80.56s
(170355)
Flag
(#2556)
--
(1686545)
5.69s
(14915)
7.93s
(14915)
--
(243222)
9.29s
(0)
7.28s
(24678)
3.998s
(16531)
Lion
(#2712)
--
(542373)
--
(1124697)
--
(420216)
--
(187215)
115.56s
(0)
--
(0)
--
(869513)

Some conclusions, or rather notes

Here are some conclusions (or notes) about the benchmark.
  • The same solver Gecode/FlatZinc is here compared with three different labelings. There is no single labeling that is better than the other. I initially has some hopes that the "mixed" labeling should take the best labeling from the two simpler row/columns labelings, but this is not really the case. For example for Tragic the row_column strategy is better than "normal" and "mixed". I am, however, somewhat tempted, to use the "row_column" labeling, but the drawback is that "P200" problem (not included in Wolfter's sample problems) takes much longer with this labeling.
  • The same model and labeling but with different solvers is compared: Gecode/FlatZinc is faster than JaCoP/FlatZinc on all the problems. For the easier problems this could be explained by the extra startup time of Java for JaCoP, but that is not the complete explanation for the harder problems. Note: Both Gecode/FlatZinc and JaCoP/FlatZinc has dedicated and fast regular constraints (whereas the LazyFD, and the Comet solvers use a decomposition).
  • The LazyFD solver is the only one that solves all problems (including Lion), but is somewhat slower on the middle problems than most of the others. It emphasizes that this is a very interesting solver.
  • It is also interesting to compare the result of the Comet model and Gecode/FlatZinc "mixed", since they use the same principle of labeling. However there are some differences. First, the MiniZinc model with Gecode/FlatZinc use a dedicated regular constraint, and Comet use my own decomposition of the constraint. For the Merka problem the Comet version outperforms the Gecode/FlatZinc version, otherwise it's about the same time (and number of failures).
  • The Light problem: It is weird that this problem was solved in almost exactly 10 minutes (the timeout is 10 minutes) for Gecode/FlatZinc and JaCoP/FlatZinc. The solutions seems correct but I'm suspicious of this. Update: Christian Schulte got me on the right track. Here is was happened: The first (unique) solution was found pretty quick and was printed, but the solvers could not prove a unique solution so it timed out. JaCoP/FlatZinc actually printed "TIME-OUT" but I didn't observe that. Case closed: They both FAILED on this test. Thanks, Christian.End update
As said above, I can only agree with Jan Wolter in his comment that the ease of experimenting, for example changing labeling, and solver for the FlatZinc solvers, is a very nice feature.

Last word

No benchmark or comparison of (constraint programming) models is really complete without the reference to the article On Benchmarking Constraint Logic Programming Platforms. Response to Fernandez and Hill's "A Comparative Study of Eight Constraint Programming Languages over the Boolean and Finite Domains" by Mark Wallace, Joachim Schimpf, Kish Shen, Warwick Harvey. (Link is to ACM.)

From the Abstract:
... The article analyses some pitfalls in benchmarking, recalling previous published results from benchmarking different kinds of software, and explores some issues in comparative benchmarking of CLP systems.