My Constraint Programming Blog: Update on Nonogram: Jan Wolter's Survey and my own new benchmark

Survey of Paint-by-Number Puzzle Solvers

In Some new models, etc I mentioned the great Survey of Paint-by-Number Puzzle Solvers, created by Jan Wolter (also author of the Nonogram solver pbnsolve).

In this survey he included both Gecode's Nonogram solver written by Mikael Lagerkvist as well as my own Nonogram model (with Gecode/FlatZinc).

Since the last update on the former blog post the following has happeded:

both our solver has now the assessment: "An amazingly good solver, especially for a simple demo program", and are placed 4, 5, and 6 of 10 tested systems
my Gecode/FlatZinc model has been tested for "Full results"; it came 4 out of 5.
my Nonogram model with the Lazy FD solver is now included in the "Sample result", at 6'th place

It seems than Wolter has appreciated constraint programming as a general tool for solving these kind of combinatorial problems, much for its ease of experimentation, e.g. with labeling strategies and (for the MiniZinc models) changing solvers:

From the analysis of Lagerkvist's Gecode model:

This is especially impressive because the faster solvers are large, complex programs custom designed to solve paint-by-number puzzles. This one is a general purpose solver with just a couple hundred custom lines of code to generate the constraints, run the solver, and print the result. Considering that this is a simple application of a general purpose solving tool rather than hugely complex and specialized special purpose solving tool, this is an astonishingly good result.

Getting really first class search performance usually requires a lot of experimentation with different search strategies. This is awkward and slow to do if you have to implement each new strategies from scratch. I suspect that a tool like Gecode lets you try out lots of different strategies with relatively little coding to implement each one. That probably contributes a lot to getting to better solvers faster.

From the analysis of my MiniZinc model:

If you tried turning this into a fully useful tool rather than a technology demo, with input file parsers and such, it would get a lot bigger, but clearly the constraint programming approach has big advantages, achieving good search results with low development cost.

...

These two results [Gecode/FlatZinc and LazyFD] highlight the advantage of a solver-independent modeling language like MiniZinc. You can describe your problem once, and then try out a wide variety of different solvers and heuristics without having to code every single one from scratch. You can benefit from the work of the best and the brightest solver designers. It's hard to imagine that this isn't where the future lies for the development of solvers for applications like this.

And later in the Conclusions:

The two constraint-based systems by Lagerkvist and Kjellerstrand come quite close in performance to the dedicated solvers, although both are more in the category of demonstrations of constraint programming than fully developed solving applications. The power of the underlaying search libraries and the ease of experimentation with alternative search heuristics obviously serves them well. I think it very likely that approaches based on these kinds of methods will ultimately prove the most effective.

I think this is an important lesson: Before starting to write very special tools, first try a general tool like a constraint programming system and see how well it perform.

The Lazy FD solver and the Lion problem

Most of the problems in the Sample Results where solved by some solver within the time limit of 30 minutes. However, one problem stand out as extra hard: The Lion problem. When I tested the MiniZinc's Lazy FD solver on my machine I was very excited that it took just about 2 minutes, and mentioned this to Wolter. He also tested this but on his 64-bit machine it took 80 minutes to solve (and since it was above the time limit this is not in the result table). This is how he describes the Lazy FD solver:

But the remarkable thing is that [the Lazy FD solver] solves almost everything. Actually, it even solved the Lion puzzle that no other solver was able to solve, though, on my computer, it took 80 minutes to do it. Now, I didn't allow a lot of other solvers 80 minutes to run, but that's still pretty impressive. (Note that Kjellerstrand got much faster solving times for the Lion than I did. Lagerkvist also reported that his solver could solve it, but I wasn't able to reproduce that result even after 30 CPU hours. I don't know why.)

After some discussion, we come to the conclusion that the differences was probably due to the fact that I use a 32-bit machine (and the 32-bit version of MiniZinc) with 2 Gb memory, and Wolter use a 64-bit machine with 1 Gb memory.

One should also note that the all other solvers was compiled without optimization, including Gecode/FlatZinc; however the LazyFD does not come with source so it is running optimized. This may be an unfair advantage to the LazyFD solver.

My own benchmark of the Sample Results

The times in the Sample Results is, as mentioned above, for solvers compiled with no optimization. I have now run the same problems on my machine (Linux Mandriva, Intel Dual 3.40GHz, with 2Gb memory), but the solvers uses the standard optimization. All problems was run with a time limit of 10 minutes (compared to Wolters 30 minutes) and searched for 2 solutions, which checks for unique solutions. The last three problems (Karate, Flag, Lion) has multiple solutions, and it is considered a failure if not two where found in the time limit. I should also note that during the benchmark I am using the machine for other things, such as surfing etc.

The problems
I downloaded the problems from Wolter's Webbpbn: Puzzle Export. For copyright reasons I cannot republish these models, but it is easy to download each problem. Select ".DZN" for the MiniZinc files, and "A compiled in C++ format" for Gecode. There is no support for Comet's format, but it's quite easy to convert a .dzn file to Comet's.

The solvers + labeling strategies
Here is a description of each solver and its labeling strategy:

fz, "normal" (column_row)
MiniZinc model with Gecode/FlatZinc. The usual labeling in nonogram_create_automaton2.mzn, i.e. where the columns are labeled before rows:
```
solve :: int_search(
      [x[i,j] | j in 1..cols, i in 1..rows], 
      first_fail, 
      indomain_min, 
      complete) 
satisfy;
```
fz, "row_column"
MiniZinc model with Gecode/FlatZinc. Here the order of labeling is reversed, rows are labeled before columns. Model is nonogram_create_automaton2_row_column.mzn
```
solve :: int_search(
      [x[i,j] | i in 1..rows, j in 1..cols], 
      first_fail, 
      indomain_min, 
      complete) 
satisfy;
```
fz, "mixed"
MiniZinc model with Gecode/FlatZinc: nonogram_create_automaton2_mixed.mzn.
I have long been satisfied with the "normal" labeling in the MiniZinc model because P200 (the hardest problem I until now has tested) was solved so fast. However, the labeling used in the Comet Nonogram model described in Comet: Nonogram improved: solving problem P200 from 1:30 minutes to about 1 second, and which is also used in the Gecode model, is somewhat more complicated since it base the exacl labeling by comparing the hints for the rows and the column.

I decided to try this labeling in MiniZinc as well. However, labeling in MiniZinc is not so flexible as in Comet and Gecode. Instead we have to add a dedicated array for the labeling (called labeling):
```
array[1..rows*cols] of var 1..2: labeling;
```
and then copy the element in the grid to that array based on the relation between rows and column hints:
```
constraint
      % prepare for the labeling
      if rows*row_rule_len < cols*col_rule_len then
           % label on rows first
           labeling = [x[i,j] | i in 1..rows, j in 1..cols]
      else 
           % label on columns first
           labeling = [x[i,j] | j in 1..cols, i in 1..rows]
      endif
      /\
      % .... 
```
and last, the search is now based just on this labeling array:
```
solve :: int_search(
        labeling,
        first_fail, 
        indomain_min, 
        complete)
satisfy;
```
jacop, "normal"
MiniZinc normal model with JaCoP/FlatZinc using the same model as for fz "normal".
lazy, satisfy
Model: nonogram_create_automaton2_mixed.mzn. This use the MiniZinc LazyFD solver with the search strategy:
```
solve satisfy;
```
This labeling is recommended by the authors of LazyFD. See MiniZinc: the lazy clause generation solver for more information about this solver.

Note: The solver in MiniZinc latest official version (1.0.3) don't support set vars. Instead I (and also Jan Wolter) used the latest "Release Of The Day" version (as per 2009-11-02).
Comet, normal
Model: nonogram_regular.co. This is the Comet model I described in Comet: Nonogram improved: solving problem P200 from 1:30 minutes to about 1 second. No changes has been done.
Gecode, normal
This is the Nonogram model distributed with Gecode version 3.2.1. The labeling is much like the one used in the Comet model, as well as fz, "mixed". (In fact the labeling in the Gecode model was inspired by the labeling in the Comet model).

Here is the results. For each model (+ labeling strategy) two values are presented:

time (in seconds)
number of failures if applicable (the LazyFD solver always returns 0 here).

The result

Model	fz normal	fz row_column	fz mixed	jacop normal	lazy satisfy	Comet normal	Gecode normal
Dancer (#1)	0.48s (0)	0.31s (0)	1.00s (0)	3.64s (0)	0.91s (0)	0.691s (0)	0.199s (0)
Cat (#6)	0.24s (0)	0.24s (0)	0.25s (0)	1.20s (0)	1.13s (0)	0.6s (0)	0.025s (0)
Skid (#21)	0.24s (13)	0.23s (3)	0.28s (13)	0.78s (13)	1.37s (0)	0.586s (0)	0.217s (0)
Bucks (#27)	0.32s (3)	0.32s (9)	0.37s (3)	0.96s (3)	2.37s (0)	1.366s (5)	0.026s (2)
Edge (#23)	0.16s (25)	0.17s (25)	0.18s (25)	0.59s (25)	0.31s (0)	0.521s (43)	0.175s (25)
Smoke (#2413)	0.27s (5)	0.27s (8)	0.28s (8)	0.83s (5)	1.44s (0)	0.616s (14)	0.275s (5)
Knot (#16)	0.42s (0)	0.43s (0)	0.48s (0)	1.19s (0)	8.15s (0)	1.307s (0)	0.329s (0)
Swing (#529)	0.95s (0)	0.94s (0)	0.96s (0)	2.19s (0)	21.94s (0)	1.782s (0)	0.431s (0)
Mum (#65)	0.64s (20)	0.64s (22)	0.66s (22)	1.68s (20)	16.34s (0)	1.268s (39)	0.491s (22)
Tragic (#1694)	340.32s (394841)	1.02s (255)	436.92s (394841)	-- (198329)	45.97s (0)	477.39s (702525)	1.139s (255)
Merka (#1611)	-- (361587)	1.44s (79)	-- (294260)	-- (136351)	80.92s (0)	1.654s (46)	0.645s (13)
Petro (#436)	2.97s (1738)	143.09s (106919)	3.42s (1738)	7.27s (1738)	9.86s (0)	3.103s (3183)	151.09s (106919)
M_and_m (#4645)	1.41s (89)	601.27s (122090)	1.59s (89)	3.43s (89)	66.98s (0)	2.215s (162)	2.797s (428)
Signed (#3541)	1.87s (929)	23.12s (6484)	28.23s (6484)	5.75s (929)	73.02s (0)	20.369s (12231)	1.648s (929)
Light (#803)	~~600.47s~~-- (400660)	-- (621547)	-- (485056)	~~601.53s~~-- (171305)	8.64s (0)	-- (0)	-- (538711)
Forever (#6574)	4.14s (17143)	7.86s (30900)	6.22s (17143)	12.21s (17143)	3.27s (0)	7.5s (27199)	8.077s (30900)
Hot (#2040)	-- (303306)	-- (330461)	-- (248307)	-- (119817)	165.72s (0)	-- (0)	-- (312532)
Karate (#6739)	95.78s (215541)	67.27s (130934)	133.43s (215541)	373.02s (215541)	19.32s (0)	120.02s (272706)	80.56s (170355)
Flag (#2556)	-- (1686545)	5.69s (14915)	7.93s (14915)	-- (243222)	9.29s (0)	7.28s (24678)	3.998s (16531)
Lion (#2712)	-- (542373)	-- (1124697)	-- (420216)	-- (187215)	115.56s (0)	-- (0)	-- (869513)

Some conclusions, or rather notes

Here are some conclusions (or notes) about the benchmark.

The same solver Gecode/FlatZinc is here compared with three different labelings. There is no single labeling that is better than the other. I initially has some hopes that the "mixed" labeling should take the best labeling from the two simpler row/columns labelings, but this is not really the case. For example for Tragic the row_column strategy is better than "normal" and "mixed". I am, however, somewhat tempted, to use the "row_column" labeling, but the drawback is that "P200" problem (not included in Wolfter's sample problems) takes much longer with this labeling.
The same model and labeling but with different solvers is compared: Gecode/FlatZinc is faster than JaCoP/FlatZinc on all the problems. For the easier problems this could be explained by the extra startup time of Java for JaCoP, but that is not the complete explanation for the harder problems. Note: Both Gecode/FlatZinc and JaCoP/FlatZinc has dedicated and fast regular constraints (whereas the LazyFD, and the Comet solvers use a decomposition).
The LazyFD solver is the only one that solves all problems (including Lion), but is somewhat slower on the middle problems than most of the others. It emphasizes that this is a very interesting solver.
It is also interesting to compare the result of the Comet model and Gecode/FlatZinc "mixed", since they use the same principle of labeling. However there are some differences. First, the MiniZinc model with Gecode/FlatZinc use a dedicated regular constraint, and Comet use my own decomposition of the constraint. For the Merka problem the Comet version outperforms the Gecode/FlatZinc version, otherwise it's about the same time (and number of failures).
The Light problem: It is weird that this problem was solved in almost exactly 10 minutes (the timeout is 10 minutes) for Gecode/FlatZinc and JaCoP/FlatZinc. The solutions seems correct but I'm suspicious of this. Update: Christian Schulte got me on the right track. Here is was happened: The first (unique) solution was found pretty quick and was printed, but the solvers could not prove a unique solution so it timed out. JaCoP/FlatZinc actually printed "TIME-OUT" but I didn't observe that. Case closed: They both FAILED on this test. Thanks, Christian.End update

As said above, I can only agree with Jan Wolter in his comment that the ease of experimenting, for example changing labeling, and solver for the FlatZinc solvers, is a very nice feature.

Last word

No benchmark or comparison of (constraint programming) models is really complete without the reference to the article On Benchmarking Constraint Logic Programming Platforms. Response to Fernandez and Hill's "A Comparative Study of Eight Constraint Programming Languages over the Boolean and Finite Domains" by Mark Wallace, Joachim Schimpf, Kish Shen, Warwick Harvey. (Link is to ACM.)

From the Abstract:

... The article analyses some pitfalls in benchmarking, recalling previous published results from benchmarking different kinds of software, and explores some issues in comparative benchmarking of CLP systems.

Comments

Cool that you did this. I've added a link to it in my page.

Note that the order of columns in my sample table shouldn't really be taken as a ranking of solvers. Though generally faster things are at the left and slower things are at the right, I've also clustered together the CP solvers for easier comparison.

I wouldn't know how to rank them anyway. The lazy solver solves the most puzzles, but it is substantially slower than almost anything else on 90% of all real world puzzles. It would be brilliant in some applications, terrible in others.

Posted by: Jan Wolter | November 9, 2009 12:02 AM

Jan: Thanks for your comment and the link. And for your great Survey.

I see what you mean about the ranks and I may have to change the wordings to not indicate "winning rank".

An idea of (simple) ranking is to just sum the times and punish failures by multiplying with something like 2*(time limit).

Posted by: hakank | November 9, 2009 07:56 AM

Really interesting comparisons from both of you! As it comes to ranking results, one possibility would be to use a point system that is based on the idea of a purse that is distributed according to time and whether a solution has been found at all.

For an example of a purse-based evaluation, you might want to check the rules for the 2009 MiniZinc challenge.

Posted by: Christian Schulte | November 9, 2009 09:17 PM

Christian: Thanks for your kind words.

Your suggestion of ranking is great. I hope Jan is interested in that as well.

Posted by: hakank | November 9, 2009 09:35 PM

My Constraint Programming Blog

Update on Nonogram: Jan Wolter's Survey and my own new benchmark