My Constraint Programming Blog: de Bruijn sequences in Gecode (and other systems)

First, the Gecode model of de Bruijn sequences that is refered below: debruijn.cpp.

Update 2009-03-30: Thanks to Mikael Zayenz Lagerkvist, I have fixed/updated/added some things. These comments are last in this post.

Introduction

I have been fascinated by de Bruijn sequences (Wikipedia) for years, and made some web based programs:

de Bruijn sequence, "classic" version, CGI version
de Bruijn sequence, "classic" version, Java version
de Bruijn arbitrary sequences, "arbitrary" version, CGI (not using constraint programming approach)

Explanation of the terms "classic" and "arbitrary" (which is not standard vocabulary):

"classic" de Bruijn sequence: the sequence length is base^n (n is the number of bits),
"arbitrary" sequence: where the sequence length is arbitrary.

Modeling de Bruijn sequences is also one of the "standard" problems I always model when learning a new constraint programming system. See below for the implementation in other systems.

Principle used

The basic principle in generating de Bruijn sequences used in this model is the following. Note: The names for the parameters base, n, m, and bin_code are perhaps unfortunate and confusing, but are kept since they are used in all my other implementations (see below).

Given:

a base (parameter base)
number of bits (n)
length of sequence (m)

Then:

make a list of distinct integers in the range 0..(base^n)-1. This array is called x in the model. These are the nodes in a de Bruijn graph. The goal of this model is to find nodes that really are "de Bruijn nodes".
calculate the "bit sequence" (in base base) for each integer. This is a matrix with m rows and n columns, here called binary.
apply the de Bruijn condition for each consecutive integers, i.e. the first elements in binary[r] is the same as the last elements in binary[r-1], and also "around the corner".
the de Bruijn sequence is then the first element in each row, here called bin_code.

For "classic" de Bruin sequences this is really overkill, but it makes it possible to calculate sequences of arbitrary lengths.

Here is a simple run of the program with the following command line (see below for a discussion of the options):

debruijn.exe -solutions 1 -base 13 -n 4 -m 52 -print-matrix 1 -int-var smallest -int-val indomain-min

Result:

DeBruijn
base: 13
number of bits (n): 4
length of sequence (m): 52

x:{0, 1, 13, 169, 2197, 2, 26, 338, 4394, 3, 39, 507, 6591, 4, 52, 676, 8788, 5, 65, 845, 10985, 6, 78, 1014, 13182, 7, 91, 1183, 15379, 8, 104, 1352, 17576, 9, 117, 1521, 19773, 10, 130, 1690, 21970, 11, 143, 1859, 24167, 12, 156, 2029, 26389, 325, 4225, 26364}
de Bruijn sequence{0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 4, 0, 0, 0, 5, 0, 0, 0, 6, 0, 0, 0, 7, 0, 0, 0, 8, 0, 0, 0, 9, 0, 0, 0, 10, 0, 0, 0, 11, 0, 0, 0, 12, 0, 1, 12}
binary matrix:
0 0 0 0
0 0 0 1
0 0 1 0
...
0 1 12 0
1 12 0 0
12 0 0 0


Initial
        propagators:  312
        branchings:   3

Summary
        runtime:      0.020 (20.000000 ms)
        solutions:    1
        propagations: 1119
        nodes:        16
        failures:     0
        peak depth:   15
        peak memory:  100 KB

For calculating the standard "keypad problem" (i.e. base = 10, n = 4, and m = 10000), this model takes too much memory for my computer (2 Gb RAM, a Linux 3.4MHz dual core). There are probably some things to make this model more efficient. Update: See below how to handle this.

However, calculating the same base and n with a length of 1000 is fast:

Initial
        propagators:  6000
        branchings:   3

Summary
        runtime:      1.660 (1660.000000 ms)
        solutions:    1
        propagations: 138390
        nodes:        253
        failures:     0
        peak depth:   252
        peak memory:  42651 KB

Findings

Here are some findings when modeling this Gecode model.

* `Matrix<IntVarArray>`

One of things that took the longest time to do what the "channeling" from the integer array x and the matrix binary. The reason was simply that I haven't read the documentation how to use the Matrix wrapper. The order in the call is columns, rows, not the other way around which I assumed. Here is how to use it (a simple example):


  ...


  IntVarArray some_array(*this, number_of_columns*number_of_rows, 0, 100); 

  Matrix matrix_version(some_array, number_of_columns, number_of_rows);



  ...

For accessing the matrix, we - of course - use the same order:
matrix_version(column, row);.

* Options

This model has a lot of different parameters to play with:

base
n
m
print_matrix (printing the binary matrix)
int-var IntVarBranch
int-val IntValBranch

Testing different variants via re-compilation is simply not feasible. Hence I wrote a subclass of Option for this, with initial help from Mikael Zayenz Lagerkvist. The most interesting part is handling different branching options (see the model debruijn.cpp how it is implemented). The default option branching is not enough since both IntVarBranch and IntValBranch should be possible to change via the command line. (A simpler version using the branching option is in donald_gerald_robert.cpp.)

Here is the result of running the program with the -help option showing the different options (the standard options is removed):

...
       -base (unsigned int) default: 2
                base to use
        -n (unsigned int) default: 3
                number of bits to use.
        -m (unsigned int) default: 0
                length of the sequence.
        -print-matrix (unsigned int) default: 0
                1 prints the binary matrix.
        -int-var (input-order, first-fail, anti-first-fail, smallest, largest, occurrence, max-regret) default: smallest
                options for IntVarBranch
                  input-order: use VAR_NONE
                  first-fail: use VAR_SIZE_MIN
                  anti-first-fail: use VAR_SIZE_MAX
                  smallest: use VAR_MIN_MIN
                  largest: use VAR_MAX_MAX
                  occurrence: use VAR_DEGREE_MAX
                  max-regret: use VAR_REGRET_MIN_MAX
        -int-val (indomain-min, indomain-max, indomain-median, indomain-split) default: indomain-min
                options for IntValBranch
                  indomain-min: use VAL_MIN
                  indomain-max: use VAL_MAX
                  indomain-median: use VAL_MED
                  indomain-split: use VAL_SPLIT_MIN

Some notes and other findings about this:

If no -m (the sequence length) is stated, it is defaulted to base^n. This is done in the class DeBruijnOptions.
The names used in -int-var and -int-val is taken from the mappings that Gecode/FlatZinc use for the MiniZinc branching options.
It seems that these extra options must be stated after all of the builtin options from the (standard) Options class.

This is another finding: When adding an option and you want to use it (say) in the print function, don't forget to add it to the clone constructor. Here is the constructor:

  //
  // Constructor for cloning s
  //
  DeBruijn(bool share, DeBruijn& s) : Example(share,s),
                                      n(s.n), 
                                      m(s.m),
                                      base(s.base),
                                      print_matrix(s.print_matrix),
                                      int_var(s.int_var),
                                      int_val(s.int_val)
  {
    x.update(*this, share, s.x);
    binary.update(*this, share, s.binary);
    bin_code.update(*this, share, s.bin_code);
    // gcc.update(*this, share, s.gcc);
  }

In the model, there are a (commented) extra constraint which states that the occurrences of the different values in the de Bruijn sequence should be the same, using the global cardinality constraint count. I will investigate how to make this constraint optional in the way I want. Update:: See below for more about this.

Other implementations

Here are my other constraint programming implementations of de Bruijn sequences:

My swedish blog post about arbitrary length sequences de Bruijn-sekvenser av godtycklig längd (google-translated: de Bruijn sequences of arbitrary length).
My swedish blog post presenting the classic de Bruijn sequences programs: de Bruijn-sekvenser av godtycklig längd (translated: de Bruijn sequences (portkod problem))
Stefan Geens discuss de Bruijn sequences further: The de Bruijn Code.

Update 2009-03-30
Thanks to Mikael Zayenz Lagerkvist there is a new version of debruijn.cpp. The improvements are:

It now compiles clean with gcc -Wall. For some reason I have forgotten to include that option in my Makefile.
There is a a new option -print_x which print x, i.e. the array of integers in the de Bruijn graph. The default it is now off (0).

This version always calculates the occurrences of elements in the de Bruijn sequence (the bin_code array), which is handled by a new IntVarArray variable gcc (for "global cardinality constraint").

There is also a new option, -same-occurrences, which - when set to 1 - requires that all elements in the de Bruijn sequence should be the same.

An example: Here is a problem where base=13, n=4 and m=52 and -same-occurrence 1

de Bruijn sequence: {0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11, 11, 12, 12, 12, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1}
gcc: {4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4}


Initial
        propagators:  314
        branchings:   4

Summary
        runtime:      0.040 (40.000000 ms)
        solutions:    1
        propagations: 2516
        nodes:        46
        failures:     0
        peak depth:   45
        peak memory:  254 KB

Exhaused memory

I wrote above that this problem:
base=10
n=4 (number of bits) and
m = 10000 (length)
required too much memory.

This was fixed by incrementing the options -c-d ("recomputation commit distance") and -a-d ("recomputation adaption distance"). Mikael's suggestion was to use the following values:
-c-d 256 -a-d 8

With these values set, the problem took about 5:14 minutes on my computer (3.4MHz dual core, Mandriva Linux and 2Gb RAM) and required only about 400 Mb (before I had to abort since the memory was exhausted). Here is a result of running debruijn.exe -base 10 -n 4 -c-d 256 -a-d 8
```
Initial
        propagators:  60002
        branchings:   4

Summary
        runtime:      5:12.300 (312300.000000 ms)
        solutions:    1
        propagations: 12793015
        nodes:        2529
        failures:     0
        peak depth:   2528
        peak memory:  346288 KB
debruijn.exe -base 10 -n 4 -c-d 256 -a-d 8  310,60s user 1,92s system 99% cpu 5:13,53 total
```
I also tested with debruijn.exe -base 10 -n 4 -c-d 1256 -a-d 18, i.e. by just adding a "1" before each of the -c-d and -a-d values. The result was:
```
Initial
        propagators:  60002
        branchings:   4

Summary
        runtime:      5:05.150 (305150.000000 ms)
        solutions:    1
        propagations: 12793015
        nodes:        2529
        failures:     0
        peak depth:   2528
        peak memory:  61039 KB
debruijn.exe -base 10 -n 4 -c-d 1256 -a-d 18  303,65s user 1,66s system 99% cpu 5:05,62 total
```
The interesting part is that it required just about 20% of the memory. I have to study these things more.
The two options -c-d, and -a-d 8are documented here.

Later note: See more about these options in Mikael's comment below (which I didn't see when writing the update).

As usual, thanks Mikael!

Comments

To manage the memory-consumption of a Gecode-model, the -c-d and -a-d parameters are very useful. Their meaning are explained in Section 6.2.6 of Modeling with Gecode.

For the -base 10 -n 4 -m 1000 problem on a 64-bit machine, using -c-d 64 brought down memory consumption from 60 MB to 8.5 MB without increasing the run-time. The large depth of the search-tree (252 peak depth) is an indication that the commit-distance can be increased. Since no failures occur, the commit-distance can be safely increased to an arbitrary amount if only one solution is required.

For the real problem (m=10000) -c-d 256 gives the solution in about 8 minutes using around 500 MB of memory.

Posted by: Mikael Zayenz Lagerkvist | March 30, 2009 11:35 AM

Not that I really think that it would be of very much interest to you but I've translated the C code at http://www.theory.csc.uvic.ca/~cos/inf/neck/NecklaceInfo.html to Ruby at http://alicebobandmallory.com/articles/2009/09/23/why-you-should-use-four-different-digits-for-keypad-locks

Posted by: Jonas Elfström | September 23, 2009 09:30 PM

My Constraint Programming Blog

de Bruijn sequences in Gecode (and other systems)

Introduction

Principle used

Findings

* `Matrix<IntVarArray>`

* Options

Other implementations

See also

Comments

Post a comment

My Constraint Programming Blog

de Bruijn sequences in Gecode (and other systems)

Introduction

Principle used

Findings

* Matrix<IntVarArray>

* Options

Other implementations

See also

Comments

Post a comment

* `Matrix<IntVarArray>`