The
INFORMS blog challenge the last month (February) was
O.R. and Love. The entry by Capgemini O.R. Blog Team was
Cupids with Computers where a bunch of OR'ers (including me) was matched according to some - not exactly stated - compatibility scores. It is both interesting and fun to read. Also, see their companion blog post
Two Deadly Sins that discuss some of the problems with matching.
I was curious how well a Constraint Programming system like MiniZinc would do on this matching problem so I wrote the model
or_matching.mzn.
And it did quite well: it (Gecode/fz) solves the problem in 0.1 seconds (with 191 failures). Some comments about the model:
- Since few CP system have good support for floats I converted the scores to integer (x 10), i.e. 3.7 is represented as 37 etc. However, this can lead to some errors since I assume that the scores are rounded.
- it also identify the left out person which is just for fun.
The result is the following matchings (slightly edited):
Total score: 519
P1 P2 Score
--------------
1 9: 92
5 14: 69 (tie)
10 18: 69 (tie)
8 16: 59
2 19: 55
3 12: 52
6 11: 44
13 15: 42
4 7: 37
Left out: 17
One reason that I modeled the problem was to see if it was possible to identify the OR'ers, or at least myself, and with some assumptions I have a some guesses. The assumptions is that the matchings in the blog post are presented from best match (score) and then ordered in score order.
My model includes a symmetry breaking which place the person with the lower index in the left position of the pair which means that I can just identify a pair. Also the second and third pair has a tie so it is even harder to guess these.
Ranking My guess (pair)
--------------------------------------------------------------------
- Anna Nagurney and Laura McLay: 1,9
- Ian Frommer and Paul Rubin: 5,14 or 10,18 (the score is a tie)
- Michael Trick and Tallys Yunes: 5,14 or 10,18 (ibid.)
- Larry D'Agostino and Robert Randall: 8,16
- David Smith and Thiago Serra: 2,19
- Francisco Marco-Serrano and Niclola Petty: 3,12
- John Angelis and Michale Watson: 6,11
- John Poppelaars and Patricia Randall: 13,15
- Hakan Kjellerstrand and Phillip Mah: 4,7
- Shiva Subramanian (the leftout): 17
So, based on this, I guess that I am either #4 or #7.
Let us further investigate this. In
Two Deadly Sins there is some smaller score matrices where I happen to be named:
Hakan John Patricia Phillip Shiva
- 3.7 2.2 2.2 1.3 Hakan
- - 4.4 2.2 1.8 John
- - - 4.2 2.7 Patricia
- - - - 2.2 Philip
- - - - - Shiva
Now assume that this is the same score and names as in the main matrix. My (Hakan's) scores for
John, Patricia, Phillip, and Shiva, are 3.7, 2.2, 2.2, and 1.3 respectively,
and these value can be found for #4 but not for #7. There is just one score 1.3 and it is
for #4 and #17 so we might draw the conclusion that #17 is Shiva.
However, since Hakan is #4 and is connected to #7, this would mean that Philip should be #7 if my matchings
are correct. But it can't be since there is no 4.4 score for Hakan and Philip. Hmmm.
OK. If we skip my ranking and instead look at the score matrices we can see that there is one 1.8 score,
between John and Shiva (in the small score matrix), and #17 and #7. So this means that #7 is John.
And if #7 is John it makes more sense since there is a score of 3.7 between Hakan (#4) and John (#7)
in the big score matrix. John and Patricia has a score of 4.4, and the only 4.4 for #7 is #13 so
Patricia is #13. And the last one to find is Philip which is #11 given by the score 4.2 with Patricia.
So now we have found the following identifies from the small scores:
Hakan: #4
John: #7
Patricia: #13
Philip: #11
Shiva: #17
And let us compare these findings with my matchings.
- Hakan: #4. This seems to be correct. But the matching to #7 is not Philip (he is #11).
- John: #7. Now, there are two Johns but neither of them has been attributed to #7 in my matching, so this is plain #fail.
- Patricia: #13. My matching matched Patricia with John Poppelaars and is could be correct. This means that the "other John" is #15 (according to my matching)
- Philip: #11. Well, this is the same #fail as for Hakan.
- Shiva: #17. This seems to be correct.
Well, my method of identifying the persons with the matching seems to be quite wrong, but it was fun trying.
Back to the model and some more details. The kernel of the model is really just the following code, including some
symmetry breaking constraints.
% decision variables:
% the matchings
array[1..m, 1..2] of var 1..n: x;
% the scores for the found matching
array[1..m] of var 0..max_score: s;
constraint
alldifferent([x[i,j] | i in 1..m, j in 1..2]) :: domain
;
constraint
forall(i in 1..m) (
s[i] = scores[x[i,1], x[i,2]]
% symmetry breaking
/\ x[i,1] <= x[i,2]
)
;
% another symmetry breaking
constraint decreasing(s);
I optimized this model for Gecode/fz using these two things found by some experimentation:
- branching is on the score array
s
instead of the seating array x
- alldifference has the
:: domain
annotation. Without this, it takes much longer.
These two makes Gecode solve the problem quite fast:
Gecode/fz: 0.16s, 191 failures
However, for other solvers the branching on the score array
s
is not so good so here are
the statistics for branching on the array
x
instead, and using instead
the variable/value heuristics:
first_fail/indomain_median
which seems to give the best overall result for most solvers.
- Gecode/fz: 4.5s, 98469 failures
- G12/LazyFD: 2.16s
- fzn2smt: 6.0s
- fzntini: 16:02 minutes (I'm not sure why this is so hard for fzntini)
- G12/fd: 10.127s, 189346 choice points
- G12/minizinc: 10.465s, 189346 choice points
- G12/fdmip: 10.226s, 189346 choice points
- JaCoP/fz: 10.358s, 192957 wrong decisions
- SICStus/fz: 18.419s, 362122 backtracks
- ECLiPSe/ic: 48.313s
- ECLiPSe/fd: 19.557s
Update 2011-03-08
Julien Fischer and Sebastian Brand (of G12 team) was kind to point
out that there is a better way of getting the identification of
the leftout:
constraint
if n mod 2 != 0 then
% This version was suggested by
% Sebastian Brand and Julien Fischer.
leftout > 0 /\
forall(j in 1..m, k in 1..2) (
leftout != x[j,k]
)
else
leftout = 0
endif;
Below is shown the statistics using this version instead. Julien also pointed out that G12/fd
and G12/minizinc is really the same solver and - for this problem -
G12/fdmip doesn't make use of the MIP component so it's the same
as G12/fd. So I skipped these two latter in this test.
The branching is (still)
first_fail/indomain_median
* Gecode/fz: 3.3s, 98469 failures
* G12/LazyFD: 1.5s
* fzn2smt: (I stopped after 10 minutes.)
* fzntini: 5:52 minutes
* G12/fd: 6.4s, 107178 choice points
* JaCoP/fz: 5.4s, 192975 wrong decisions
* SICStus/fz: 7.5s, 362122 backtracks
* ECLiPSe/ic: 35.5s
* ECLiPSe/fd: 11.3s
This change boosted the times for most solvers, quite much for some.
It's strange that fzn2smt was much slower, and I don't understand why.
For fzn2smt I also tested different things, e.g. to removed the
:: domain
annotation on
alldifferent
but it made no difference.
Testing indomain_random
I was also curious how some solvers would do with the heuristic
indomain_random
, so I ran each solver three times and
took the best result. Some solvers was "stable" and
got exactly the same statistics (Gecode, G12/fd, SICStus, ECLiPSe/ic), but some of them
got quite varying result and all three results are shown (JaCoP and ECLiPSe/fd).
* Gecode/fz: 1.7, 27422 failures (stable)
* G12/fd: 19.0s, 1401656 choice points (stable)
* JaCoP/fz: 2.3s, 22178 wrong decisions
8.5s 164599 wrong decisions
4.1s 50575 wrong decisions
2.3s 22178 wrong decisions
* SICStus/fz: 3.3s, 116386 backtracks (stable)
* ECLiPSe/ic: 34.5s (stable)
* ECLiPSe/fd: 5.6s
7.41s
5.6s
26.7s
Almost all of the solver got at least one result better than before, except G12/fd which
is surprising.
End of Update 2011-03-08
2011-03-12 Yet another update
After some interesting discussions with Sebastian Brand (thank you, Sebastian!) I have included some improvements.
1) Since the calculation of the leftout don't contribute much to the constraint store, it has been removed as a decision variable and just calculated in the output section:
output [
let {
int: leftout = max(0..n diff { fix(x[j,k]) | j in 1..m, k in 1..2 })
}
in
% ...
2) The symmetry breaking constraint to sort the matchings in score order
decreasing(s)
has been replaced with another symmetry breaking constraint which seems to be faster.
Unfortunately, there is no sorting anymore but since it is faster for all solvers I prefer it.
forall(i in 2..m)( x[i-1,1] < x[i,1] )
3) The branching has been changed to
first_fail/indomain_split
With these three changes there are now improvements of most solvers; for ECLiPSe's solvers a radical improvement (compare with the listing above):
-
G12/lazy: 0.89s
-
Gecode: 0.10s and 649 failures
Note: without the
::domain annotation on alldifferent it's slighly worse: 0.16s and 1898
That difference has been much larger in earlier versions.
-
JaCoP: 1.4s: 3253 wrong search decisions
-
SICStus: 0.29s 3291 backtracks
-
Choco: 1.8s 5928 backtracks
-
ECLiPSe/ic: 1.6s
-
ECLiPSe/fd: 0.94s
-
smt2fzn is still very slow (I contacted the fzn2smt developers about this)
-
tini: 4:44min (slightly better, but still very slow)
4) Sebastian also suggested that I should test a
table
constraint. That is,
instead of
s[i] = scores[x[i,1], x[i,2]]
we use this:
table([x[i,1], x[i,2], s[i]], scores_table)
This also requires another representation of the score matrix:
scores_table = array2d(1..n*n, 1..3,
[
1, 1, 00,
1, 2, 30,
1, 3, 10,
1, 4, 33,
1, 5, 49,
% ....
Here is the result of using this table approach (together with the previous changes):
-
G12/fd: 2.0s, 708 choice points (slower)
-
G12/lazy: 2.442s (slower)
-
Gecode: 0.11s 686 failures (about the same)
-
JaCoP: 1.2s 785 wrong search decisions (slightly better)
-
SICStus: 0.43 746 backtracks (better in backtracks, but slightly slower)
-
Choco: 1.9s: 2299 backtracks (better in backtracks, but slightly slower)
-
ECLiPSe/ic: 2.8s (slower)
-
ECLiPSe/fd: 1.4s (slower)
-
fzn2smt: 11.1s (and SMT is back in the game again!)
Note: Using the
decreasing/1
constraint instead, it takes 7.0s.
-
tini: stopped after 10 minutes, so it's slower than the previous version.
To sum: Most solvers where a tiny bit slower with this table approach, and some have slightly less choice points/failures/etc. Hmmm, maybe this (slight) increment of time for some solvers is caused by the slightly larger file to process by mzn2fzn?
The big winner here is SMT, but for this problem it's still slower than most other solvers.
I've put the improvements in the same model as before
or_matching.mzn (the older model is now in
or_matching_orig.mzn).
Also, thanks to Krzysztof Kuchcinski for an interesting discussion about this.
End of update 2011-03-12
A related seating problem
This matching problem inspired me also to model a related problem using the score matrix from the matching model:
When all these persons meet, what is the best way to place them around a (topological circular) table if the
objective is to get the highest possible compatibility score for the adjacent seated persons.
Another MiniZinc model (
or_seating.mzn) gives an optimal value of 996 score points, in about 1.5 seconds (with 17629 failures) using Gecode.
There are some possible way of approaching this problem, and I took the TSP way using a global constraint
circuit
. This model use my decomposition of a
circuit
constraint which, given the score matrix, gives an optimal circuit. However, what we really want is the
path (the
seating), so we have do extract the path. After a while I realized that the temporary array
z
used in the
circuit_me
decomposition is exactly this path, so it's just to use it in the parameter of the predicate. A consequence of this is that all seatings ends with person #1, which shouldn't really be a problem since the path is a permutation.
predicate circuit_me(array[int] of var int: x, array[int] of var int: z) =
let {
int: lbx = min(index_set(x)),
int: ubx = max(index_set(x))
}
in
all_different(x) :: domain /\
all_different(z) :: domain /\
% put the orbit of x[1] in in z[1..n]
z[lbx] = x[lbx] /\
forall(i in lbx+1..ubx) (
z[i] = x[z[i-1]]
)
/\ % may not be 1 for i < n
forall(i in lbx..ubx-1) (
z[i] != lbx
)
/\ % when i = n it must be 1
z[n] = lbx
;
Here are the two optimal solutions found by this model (with the implicit symmetry breaking that #1 is always the last position). However, the second solution is just the first reversed.
9 10 18 8 16 13 15 12 3 17 19 2 11 6 14 5 7 4 1
4 7 5 14 6 11 2 19 17 3 12 15 13 16 8 18 10 9 1
For the systems that support the
circuit/1
constraint (right now only Gecode and JaCoP support this), there is a variant: one can use
circuit
to get the circuit and then extract the path (seating) with this "extractor" predicate where the solution always start with #1.
predicate circuit_path(array[int] of var int: x,
array[int] of var int: p) =
% we always starts the path at 1
p[1] = 1
/\
forall(i in 2..n) (
p[i] = x[p[i-1]]
);
Then we got these two optimal solutions, which is just permutations of the two above.
1 9 10 18 8 16 13 15 12 3 17 19 2 11 6 14 5 7 4
1 4 7 5 14 6 11 2 19 17 3 12 15 13 16 8 18 10 9
Connection to the matching problem:
Comparing these seating placements with the matching above, it seems to be correct, e.g. #9 and #1 sit together, as do #18 and #6, #2 and #19, etc.
Some other seating problems
For some reason I am fascinated by seating problems and I'm not really alone. John Toczek's PuzzlOr problem
Best Host this February was a seating problem (the object is to minimize seating conflicts given that some persons would like to sit together) that I solved using a rather simple MiniZinc model. However, I will not link to the model since the contest (to win a “O.R. The Science of Better” T-shirt) is still open. And let us not forget Karen Petrie's Wedding planning problem:
How a computer scientist organised her wedding.
Here are two (classical) combinatorial seating problems, the formulation taken from Fei Dai, Mohit Dhawan, Changfu Wu and Jie Wu:
On Solutions to a Seating Problem, page 1:
Seating, row
MiniZinc model
seating_row.mzn
n couples are seated in one row. Each person, except two seated at the two ends, has two neighbors. It is required that each neighbor of person k should either have the same gender or be the spouse of k.
Seating, table:
MiniZinc model
seating_table.mzn
A classical [...] seating problem is the following: n couples are seated around a table in such a way that two neighbors of each person k are the different gender of k and they are not the spouse of k.
Both these problems is about constraints on
couples and
gender. There are different way of representing who is in a couple with "anonymous persons", i.e. using just integers to represent the couples and individuals. Here are two variants (I included both of these in the seating_row model):
1) Using math:
predicate is_couple1(int: n, var int: a, var int: b) =
% exists(i in 1..n) (
exists(i in max(lb(a), lb(b))..min(ub(a), ub(b))) (
(2*(i-1)+1 = a /\ 2*(i-1)+2 = b)
\/
(2*(i-1)+1 = b /\ 2*(i-1)+2 = a)
);
2) Using set representation, i.e. an array of the pairs:
% define the couple
array[1..n] of set of int: couples = [{2*(i-1)+1, 2*(i-1)+2} | i in 1..n];
predicate is_couple(int: n, var int: a, var int: b, array[int] of set of int: c) =
exists(j in 1..n) (
a in c[j] /\ b in c[j]
);
The second approach of using sets, is in my opinion the nicest, and it's also easier to use if one has a concrete list of couples.
The gender is simply represented by modulo 2.
The paper "On Solutions to a Seating Problem" (cited above) studies "reasonable patterns":
A reasonable pattern is an assignment of n couples without indices that meet the above requirement. For example, when n = 3 all reasonable patterns are as follows:
wwwhhh whhwwh whhhww hhwwwh
hhhwww hwwhhw hwwwhh wwhhhw
However, I am more interested in the concrete seating placements so it's not quite the same problems. For n=3 there is a total of 60 patterns. Here are all 8 different patterns generated for 3 pairs (n=3). The first number indicates the occurrences of each pattern.
12 gender2:hhhwww
6 gender2:hhwwwh
6 gender2:hwwhhw
6 gender2:hwwwhh
6 gender2:whhhww
6 gender2:whhwwh
6 gender2:wwhhhw
12 gender2:wwwhhh
I tried to generate exactly one of each pattern but had no luck. A model with the following simple symmetry breaking generates 20 solutions, less than 60 but more than exactly 8 patterns.
constraint (x[1] = 1 \/ x[1] = 2);
Statistics for Seating, row problem
Here are the number of solutions (using no symmetry breaking) for some different number of couples (
n
). The solver statistics below is for Gecode/fz where just the last solutions is printed (it took about 2.5 minutes to print the solutions for n=6, without printing it took just 15.7 seconds).
n #solutions
-------------
1 2 (0.001s, 0 failures)
2 8 (0.001s, 5 failures)
3 60 (0.01s, 73 failures)
4 816 (0.02s, 1181 failures)
5 17520 (0.45s, 29872 failures)
6 550080 (15.7s, 1051417 failures)
7 23839200 (12:36min, 49685576 failures)
The sequence of these numbers seems to correspond to the OEIS Integer Sequence
A096121 which is described as
A rook must land on each square exactly once, but may start and end anywhere and may intersect its own path. Inspired by Leroy Quet in a Jul 05 2004 posting to the Seqfan mailing list.
Well, I'm not sure if there is actually a deep connection between these two problems or just a freaky coincidence...
Statistics for Seating, table problem
For the seating table problem with 3 couples (n=3) there are 12 solutions (two different patterns). Here are the
number of solutions for different
n
.
n #solutions
-------------
1 0 (0s, 0 failures)
2 0 (0s, 4 failures)
3 12 (0s, 0 failures)
4 96 (0.01s, 36 failures)
5 3120 (0.04s, 340 failures)
6 115200 (0.9s, 4436 failures)
7 5836320 (43.4s, 146938 failures)
8 382072320 (47:28minutes, 6593012)
The related OEIS sequence is
A059375 (
Total menage numbers 2*n!*A000179(n)) (the sequence starts with 1 which is the value for n=0, but this is weird in our current setting). This problem is also known as the
Menage problem (Wikipedia) so the correspondence between these numbers is not surprising at all. Also see Wolfram MathWorld:
Married Couples Problem.