Symbolic Regression with JGAP - some improvements
The SymbolicRegression program (using JGAP in Java) has been updated with some improvements.
New configuration options
Some of of these new options are explained in the examples below.-
show_similar
: Alternative name ofshow_similiar
. -
similiar_sort_method
: Method of sorting the similiar solutions when usingshow_similiar
, which shows all solutions that has the same fitness value as the best found solution. Alternative name:similar_sort_method
. Valid options are:-
occurrence
: descending number of occurrences (default) -
length
: length of solutions (ascending)
-
-
error_method
: Error method to use. Valid options are-
totalError
: sum of (absolute) errors (default) -
minError
: minimum error -
meanError
: mean error -
medianError
: median error -
maxError
: max error
-
-
no_terminals
: If true then no Terminal is used, i.e. no numbers, just variables. Default: false. -
make_time_series
: Make a time series of the first line of data. The value ofnum_input_variable
determines the number of laps (+1 for the output variable. See below for some examples. -
make_time_series_with_index
: Asmake_time_series
with an extra input variable for the index of the series. (Somewhat experimental.)
New examples
Some new examples has been published as well.- leap_years.conf
This example tries to figure out how to calculate the leap years. See Leap_year (Wikipedia) for more on leap years.The fitness cases consists of all years 1890..2030, and 1200, 1300, 1400, 1500, 1600, 1700, and 1800.
The functions used are:
Multiply,Divide,Add,Subtract,ModuloD,IfElseD
whereIfElseD
may be replaced withIfLessThanOrEqualD
, or removed completely.ModuloD
is not the normal modulo operator. Instead it is "protected modulo" where the arguments are first converted to integers and then taken modulo. However, if the second argument is 0 (zero), the result is 0 (zero). This function is represented as eithermodp
or%
below.The program found a lot of solutions with error 1 (for year 1900).
Using
IfLessThanOrEqualD
if(y <= ((modp(y,(y / 471.0))) * (296.0 * y))) { (y - y) } else { (327.0 / 327.0) }
Without
IfElseD
:
(326.0 / (((((y - 536.0) % 536.0) + y) % (y / 226.0)) + 326.0)) % (283.0 % y)
(y / (((y * 654.0) % (24.0 % y)) + y)) % y
(y / (((y * (330.0 % y)) % (24.0 % y)) + y)) % y
- number_puzzle4.conf
Number puzzle inspired by Richard Wiseman's It's the Friday Puzzle (2010-02-26). The problem is to find the result 24 from the numbers 5,5,5,1 and the operators +,-,*,/. However, the requirement that the numbers should be used exactly once is not held here. (It would be quite useful to have these kind of "global functions" requiring that all variables should be different, or used exactly once etc. Compare with "global constraints" in constraint programming.)Note also that this configuration uses only one fitness case and let the program find any solution that comply to the equation. It also use the new option
no_terminals
for using just variables (no Terminal numbers) which was implemented for this example.Here is a result from a sample run. The number in [] is the number of occurrences of the specific programs. In this example we also see the new option
similiar_sort_method: length
at work, which sorts the similiar solutions according to length (normally it it sorted on the number of occurrences). The variables in the solutions means: a = 5, b = 5, c = 5 and d = 1.
All solutions with the best fitness (0.0):
Sort method: length
(b * c) - d [5]
(a * c) - d [4162]
(b * b) - d [4]
(c * a) - d [251]
(a * a) - d [10]
(c * c) - d [424]
(c * b) - d [1]
(b * a) - d [36]
(c - d) * (a + d) [1]
(b * a) - (b / c) [121]
(b * a) - (a / c) [2]
(c * b) - (c / c) [5]
(b * b) - (a / a) [3]
(c * a) - (b / b) [2]
(a * c) - (d * d) [633]
(a - d) * (d + b) [4]
(c * b) - (a / c) [1]
(a * b) - (c / b) [2]
(c * c) - (b / b) [1]
It was 19 different solutions with fitness 0.0
None of these are a solution to Wiseman's puzzle.
Here we have limited the number of nodes with
max_modes: 7
(4 variables + 3 terminals), but there is no standard option in JGAP to state the minimum number of nodes. However, with a "node validator" this could probably be done. I plan to experiment more with node validators for these kind of constraints and "global functions" mentioned above. - sunspots_timeseries.conf
Two version of sunspots data usingmake_time_series
. See below for more about this option. - timeseries_test1.conf
Some other examples of themake_time_series
. See below. - timeseries_dailyisbn.conf
Another time series example: the classic time series "Daily closing price of IBM stock, Jan 1, 1980 to Oct. 8, 1992" , DAILYIBM.DAT from Rob J Hyndman's TSDL (Time Series Data Library)
make_time_series
The optionmake_time_series
may require some explanation.
The following configuration file is all that is needed for the Fibonacci problem (in time series representation). Actually, the two lines in bold are the only needed, since the other options has defaults that would work well here.
make_time_series: true
num_input_variables: 4
terminal_range: -10 10
functions: Multiply,Divide,Add,Subtract
max_init_depth: 4
population_size: 100
num_evolutions: 100
max_crossover_depth: 8
max_nodes: 21
data
1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946,17711,28657,46368
The option make_time_series
will then transform the data into a data set and then proceed as if the data set has been stated explicit. Note: the SymbolicRegression program works with double, hence the somewhat unusual presentation.
The number of time lags is the number of input variables (num_input_variables
) + 1 for the output variable; here 4 + 1 = 5 time lags. The program prints the transformed data first, i.e.:
Making timeseries, #elements: 24
1.0 1.0 2.0 3.0 5.0
1.0 2.0 3.0 5.0 8.0
2.0 3.0 5.0 8.0 13.0
3.0 5.0 8.0 13.0 21.0
5.0 8.0 13.0 21.0 34.0
8.0 13.0 21.0 34.0 55.0
13.0 21.0 34.0 55.0 89.0
21.0 34.0 55.0 89.0 144.0
34.0 55.0 89.0 144.0 233.0
55.0 89.0 144.0 233.0 377.0
89.0 144.0 233.0 377.0 610.0
144.0 233.0 377.0 610.0 987.0
233.0 377.0 610.0 987.0 1597.0
377.0 610.0 987.0 1597.0 2584.0
610.0 987.0 1597.0 2584.0 4181.0
987.0 1597.0 2584.0 4181.0 6765.0
1597.0 2584.0 4181.0 6765.0 10946.0
2584.0 4181.0 6765.0 10946.0 17711.0
4181.0 6765.0 10946.0 17711.0 28657.0
It was 19 data rows
And then, as mentioned above, the program proceeds as usual. See Symbolic regression (using genetic programming) with JGAP