de Bruijn sequences in Gecode (and other systems)
First, the Gecode model of de Bruijn sequences that is refered below: debruijn.cpp.
Given:
Here is a simple run of the program with the following command line (see below for a discussion of the options):
Result:
However, calculating the same*
One of things that took the longest time to do what the "channeling" from the integer array
For accessing the matrix, we - of course - use the same order:
Here is the result of running the program with the
Some notes and other findings about this:
Introduction
I have been fascinated by de Bruijn sequences (Wikipedia) for years, and made some web based programs:- de Bruijn sequence, "classic" version, CGI version
- de Bruijn sequence, "classic" version, Java version
- de Bruijn arbitrary sequences, "arbitrary" version, CGI (not using constraint programming approach)
- "classic" de Bruijn sequence: the sequence length is base^n (n is the number of bits),
- "arbitrary" sequence: where the sequence length is arbitrary.
Principle used
The basic principle in generating de Bruijn sequences used in this model is the following. Note: The names for the parametersbase
, n
, m
, and bin_code
are perhaps unfortunate and confusing, but are kept since they are used in all my other implementations (see below).
Given:
- a base (parameter
base
) - number of bits (
n
) - length of sequence (
m
)
- make a list of distinct integers in the range 0..(base^n)-1. This array is called
x
in the model. These are the nodes in a de Bruijn graph. The goal of this model is to find nodes that really are "de Bruijn nodes". - calculate the "bit sequence" (in base
base
) for each integer. This is a matrix with m rows and n columns, here calledbinary
. - apply the de Bruijn condition for each consecutive integers, i.e. the first elements in binary[r] is the same as the last elements in binary[r-1], and also "around the corner".
- the de Bruijn sequence is then the first element in each row, here called
bin_code
.
Here is a simple run of the program with the following command line (see below for a discussion of the options):
debruijn.exe -solutions 1 -base 13 -n 4 -m 52 -print-matrix 1 -int-var smallest -int-val indomain-min
Result:
DeBruijn base: 13 number of bits (n): 4 length of sequence (m): 52 x:{0, 1, 13, 169, 2197, 2, 26, 338, 4394, 3, 39, 507, 6591, 4, 52, 676, 8788, 5, 65, 845, 10985, 6, 78, 1014, 13182, 7, 91, 1183, 15379, 8, 104, 1352, 17576, 9, 117, 1521, 19773, 10, 130, 1690, 21970, 11, 143, 1859, 24167, 12, 156, 2029, 26389, 325, 4225, 26364} de Bruijn sequence{0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 4, 0, 0, 0, 5, 0, 0, 0, 6, 0, 0, 0, 7, 0, 0, 0, 8, 0, 0, 0, 9, 0, 0, 0, 10, 0, 0, 0, 11, 0, 0, 0, 12, 0, 1, 12} binary matrix: 0 0 0 0 0 0 0 1 0 0 1 0 ... 0 1 12 0 1 12 0 0 12 0 0 0 Initial propagators: 312 branchings: 3 Summary runtime: 0.020 (20.000000 ms) solutions: 1 propagations: 1119 nodes: 16 failures: 0 peak depth: 15 peak memory: 100 KBFor calculating the standard "keypad problem" (i.e.
base
= 10, n
= 4, and m
= 10000), this model takes too much memory for my computer (2 Gb RAM, a Linux 3.4MHz dual core). There are probably some things to make this model more efficient.
However, calculating the same
base
and n
with a length of 1000 is fast:
Initial propagators: 6000 branchings: 3 Summary runtime: 1.660 (1660.000000 ms) solutions: 1 propagations: 138390 nodes: 253 failures: 0 peak depth: 252 peak memory: 42651 KB
Findings
Here are some findings when modeling this Gecode model.* Matrix<IntVarArray>
One of things that took the longest time to do what the "channeling" from the integer array x
and the matrix binary
. The reason was simply that I haven't read the documentation how to use the Matrix
wrapper. The order in the call is columns, rows
, not the other way around which I assumed. Here is how to use it (a simple example):
...
IntVarArray some_array(*this, number_of_columns*number_of_rows, 0, 100);
Matrix matrix_version(some_array, number_of_columns, number_of_rows);
...
For accessing the matrix, we - of course - use the same order:
matrix_version(column, row);
.
* Options
This model has a lot of different parameters to play with:-
base
-
n
-
m
-
print_matrix
(printing the binary matrix) -
int-var
IntVarBranch -
int-val
IntValBranch
Option
for this, with initial help from Mikael Zayenz Lagerkvist.
The most interesting part is handling different branching options (see the model debruijn.cpp how it is implemented). The default option branching
is not enough since both IntVarBranch and IntValBranch should be possible to change via the command line. (A simpler version using the branching
option is in donald_gerald_robert.cpp.)
Here is the result of running the program with the
-help
option showing the different options (the standard options is removed):
... -base (unsigned int) default: 2 base to use -n (unsigned int) default: 3 number of bits to use. -m (unsigned int) default: 0 length of the sequence. -print-matrix (unsigned int) default: 0 1 prints the binary matrix. -int-var (input-order, first-fail, anti-first-fail, smallest, largest, occurrence, max-regret) default: smallest options for IntVarBranch input-order: use VAR_NONE first-fail: use VAR_SIZE_MIN anti-first-fail: use VAR_SIZE_MAX smallest: use VAR_MIN_MIN largest: use VAR_MAX_MAX occurrence: use VAR_DEGREE_MAX max-regret: use VAR_REGRET_MIN_MAX -int-val (indomain-min, indomain-max, indomain-median, indomain-split) default: indomain-min options for IntValBranch indomain-min: use VAL_MIN indomain-max: use VAL_MAX indomain-median: use VAL_MED indomain-split: use VAL_SPLIT_MIN
Some notes and other findings about this:
- If no
-m
(the sequence length) is stated, it is defaulted tobase^n
. This is done in the classDeBruijnOptions
. - The names used in
-int-var
and-int-val
is taken from the mappings that Gecode/FlatZinc use for the MiniZinc branching options. - It seems that these extra options must be stated after all of the builtin options from the (standard) Options class.
- This is another finding: When adding an option and you want to use it (say) in the print function, don't forget to add it to the clone constructor. Here is the constructor:
// // Constructor for cloning s // DeBruijn(bool share, DeBruijn& s) : Example(share,s), n(s.n), m(s.m), base(s.base), print_matrix(s.print_matrix), int_var(s.int_var), int_val(s.int_val) { x.update(*this, share, s.x); binary.update(*this, share, s.binary); bin_code.update(*this, share, s.bin_code); // gcc.update(*this, share, s.gcc); }
- In the model, there are a (commented) extra constraint which states that the occurrences of the different values in the de Bruijn sequence should be the same, using the global cardinality constraint
count
. I will investigate how to make this constraint optional in the way I want.
Other implementations
Here are my other constraint programming implementations of de Bruijn sequences:- MiniZinc: debruijn_binary.mzn
- Comet : debruijn.co
- Choco : DeBruijn.java
- JaCoP : DeBruijn.java
- Gecode/R: debruijn_binary.rb
See also
- My swedish blog post about arbitrary length sequences de Bruijn-sekvenser av godtycklig längd (google-translated: de Bruijn sequences of arbitrary length).
- My swedish blog post presenting the classic de Bruijn sequences programs: de Bruijn-sekvenser av godtycklig längd (translated: de Bruijn sequences (portkod problem))
- Stefan Geens discuss de Bruijn sequences further: The de Bruijn Code.