Arrays in Flux: Experimenting with Eureqa's API

In Eureqa version 0.78beta released I mentioned that there is an API for connecting to the Eureqa server. Now I have tested it, and it is really nice.

Installation

I followed the steps in Getting Started on Linux or Mac (the Windows variant is here). Here are some comments and findings during this installation and preparation step.

Before starting anything Eureqa related, I had to install a newer version of
the Boost library since Eureqa requires version 1.42.0. It did take about half an hour but there where no problems during this step.

The Eureqa API archive must be downloaded, and unpacked.

After these preliminaries, I first tested the simplest example: minimal_client. Unfortunately it didn't work right from the box on my Mandriva Linux machine, and I had to add two things (bold below) in the Makefile:

minimal_client: minimal_client.o

	g++ minimal_client.o \

	$(BOOST_LIBRARY_PATH)libboost_thread.a \

	$(BOOST_LIBRARY_PATH)libboost_system.a \

	$(BOOST_LIBRARY_PATH)libboost_serialization.a \

	-o minimal_client -lpthread

The Makefile for other example basic_client, already has these lines, and worked without any problems.

Before running the program, a running Eureqa standalone server is needed. It can be downloaded from Eureqa's download page, or from the directory ./server in the installed API archive. The real work is done in the Eureqa server. The client program first tells the conditions of the run to the server (what data, variables, functions, to use), and later on ask the server for new/better solutions which is then presented by the client program.

To start the server:
./eureqa_server &

Now we are ready to start the minimal_client program. This example reads the data file ../data_sets/default_data.txt (it seems to be the same as the default data set as in the Eureqa GUI).

./minimal_client

Here is the first lines of output from the program. If you have running the GUI version of Eureqa (which is really recommended) you will recognize most of this output.

Data: 100 data points, 3 variables Options: "y = f(x)", 8 building-block types, Absolute Error fitness Connection: Connected to 127.0.0.1 Server: xxxxxxxx, Eureqa 0.78 (linux), 2 CPU cores 0 generations, 1864 evaluations Size: Fitness: Equation: ----- -------- --------- 7 -1.4854 f(x) = -1.50204e-07 + sin(-1.50204e-07 + x)



39 generations, 764432 evaluations

Size:   Fitness:        Equation:

-----   --------        ---------

7       -1.4854 f(x) = -1.50204e-07 + sin(-1.50204e-07 + x)

1       -1.73044        f(x) = x

173 generations, 4.04115e+06 evaluations 304 generations, 7.28129e+06 evaluations 458 generations, 1.04966e+07 evaluations Size: Fitness: Equation: ----- -------- --------- 7 -1.4854 f(x) = -1.50204e-07 + sin(-1.50204e-07 + x) 1 -1.73044 f(x) = x 5 -1.61304 f(x) = sin(x/x) ...

A small issue: I don't understand why the fitness is negative here; absolute error should always be positive. Maybe it's just a tiny presentation bug, with a misplaced "-"?

Example: Closed form of Fibonacci number

In order to test the API more, I tried one of the problems from Eureqa: Equation discovery with genetic programming, namely trying to find a closed form of the Fibonacci numbers.

The program eureqa_apitest1.cpp is based on the example eureqa_api_1_00_0/examples/minimal_client/minimal_client.cpp mentioned above. The changes are not big, but some common options has been explicit:

building_blocks
All the building blocks that are in the GUI client seems to be supported via the API, see building blocks for a full list. Instead of the default building blocks, they have been stated, and the functions power (a^b), and sqrt (sqrt) was added (the sin and cosine functions was removed).
options.building_blocks_.clear(); options.building_blocks_.push_back("a"); // variables options.building_blocks_.push_back("a+b"); // adds options.building_blocks_.push_back("a-b"); // subtracts options.building_blocks_.push_back("a*b"); // multiplies options.building_blocks_.push_back("a/b"); // divides options.building_blocks_.push_back("a^b"); // power options.building_blocks_.push_back("sqrt(a)"); // sqrt

Note that the names in the building blocks don't have to match the variable names in the data file.
search_relationship
The relationship, i.e. the formula we want to find, is stated in the same way as in the GUI: t1 = f(ix):
```
options.search_relationship_ = "t1 = f(ix)";
```
fitness_metric
Also, I stated the fitness metric (which happens to be the default):
options.fitness_metric_ = eureqa::fitness_types::absolute_error;

There are more fitness metrics to use, see Fitness Metric Identifiers.

Well, that's about it.

The program reads the file fib_38_ix.txt consisting of the first 38 Fibonacci numbers with the index (1..38). Note: In this problem we just use the first two variables in the file ix, and t1. The instances for 39..50 has been commented out to make it simpler.

The object is to find the closed form of the Fibonacci numbers, which is usually stated as:
(phi^n - (1-phi)^n)/sqrt(5)
where phi = (1+sqrt(5))/2 = ~ 1.61803 (golden ratio), and sqrt(5) ~ 2.2361.

See Fibonacci_number#Closed_form_expression (Wikipedia) for more about this.

Here is one solution (the 6 best solutions) from running the program a couple of minutes. Since the program don't have any stop criteria it will run forever if not manually stopped.

    Size:   Fitness:        Equation:

    -----   --------        ---------

    7       -104.178        f(ix) = 1.61808^(ix - 1.67436)

    9       -103.999        f(ix) = 1.61808^(ix - 1.67436) + 1.61808

    11      -101.371        f(ix) = 1.61808^(ix - 1.67436) + ix - 1.67436

    5       -79382.2        f(ix) = 1.58323^ix

    1       -2.55834e+06    f(ix) = ix

    3       -2.53729e+06    f(ix) = ix/0.00018853

The first solution in the list has an fitness error of about 104: 1.61808^(ix - 1.67436).
Note the constant 1.61808 which is quite close to phi (1.61803).

When rounded, this program (solution) gives the following results for ix = 1..38. It is correct for the first 15 numbers (1..15), but will then deviate.
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 986(987), 1596(1597), 2583(2584), 4179(4181), 6762(6765), 10941(10946), 17703(...), 28646, 46351, 75000, 121355, 196362, 317730, 514113, 831876, 1346042, 2178003, 3524183, 5702410, 9226955, 14929952, 24157857, 39089345

The correct sequence is:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887,9227465, 14930352, 24157817, 39088169

Here is the deviation from the correct sequence:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1, -2, -3, -5, -8, -11, -17, -25, -38, -56, -81, -116, -164, -227, -306, -395, -477, -510, -400, 40, 1176

Maybe this is a wrong track, but it is nice to see the solutions evolve, which is one advantage of symbolic regression, and genetic programming in general.

Documentation

The API documentation is well structured and all pages has small examples making it easy to start programming. Maybe later experiments requires some reading in the included C++ header files.

Some useful pages:

Other comments

I will continue experimenting with Eureqa and its API by doing more general program, etc. However, it will probably not be as general as my JGAP symbolic regression program.

Also, see my Eureqa page.