### Symbolic Regression with JGAP - further improvements: minNodes, alldifferent, ForLoopD

In Symbolic Regression with JGAP - some improvements I mentioned some small improvements that would be nice to have in my Symbolic Regression program:

- constraint that a program has at least
`minNodes`

nodes (akin to the existing`maxNodes`

This have been implemented with the option`minNodes`

.

- constraint that the variables in a program should be unique.

This have been implemented with the option`alldifferentVariables`

.

I talked there about building a node validator that restricted the programs with these constraints. However, a better way - and more genetic programming-ish - is to "penalty" programs that do not satisfy these restrictions. And this is the way I have taken.

The new options are both used in the recreational problem by Richard Wiseman (Friday puzzle 2010-02-26) of finding an equation with result 24 using the numbers `5, 5, 5, 1`

exactly once, and the four arithmetic operators (+,-,*,/). Richard Wiseman's solution can be read Answer to the Friday puzzle…. (2010-03-01).

The problem is modeled in number_puzzle4.conf. Here is the configuration file, where the new options are marked in bold (see below for the `ForLoopD`

option).

`presentation: Puzzle`

return_type: DoubleClass

num_input_variables: 4

variable_names: a b c d e

# With ForLoopD

# functions: Multiply,Divide,Add,Subtract,**ForLoopD**

functions: Multiply,Divide,Add,Subtract

# We don't use any numeric terminals

no_terminals: true

max_init_depth: 4

population_size: 1000

max_crossover_depth: 4

num_evolutions: 400

max_nodes: 7

**min_nodes: 7 100**

**alldifferent_variables: true 100**

show_similiar: true

similiar_sort_method: length

data

5 5 5 1 24

Here we require that there ought to be minimum of 7 nodes (as well as maximum number of nodes), i.e. the 4 variables (`a, b, c, d`

), and 3 operators between them. If a program has less number of nodes, then we "penalty" the program with 100 (the second value) points (errors). Note that there is no guarantee that the constraint is held, just quite probable with this large penalty.

The other option, `alldifferentVariables`

, is used in the same way: If there is an variable in the program that has already been use, we penalty it by 100 (the second value) points.

Also, I had increased the number of evolution to 400 (from 100) because of these constraints.

With these new options the required solution to the problem is found rather easy, though maybe not on each run. Remember that a=5, b=5, c=5, and d = 1, and the target is the number 24.

(c - (d / a)) * b

b * (c - (d / a))

The numeric solution of the problem is `5*(5-1/5)`

, and the two programs are just permutations of this solutions.

### Further experiments

As an experiment I also set both the`minNodes`

and `maxNodes`

to 8 and ran again.
```
min_nodes: 8
max_nodes: 8
```

Since there can be no solution with 8 nodes there must be some penalty. Then the following solutions came, all with an error of 100, the penalty for not been minimum 8 nodes. The variables are, however, all different as they should so there is no penalty.

`All solutions with the best fitness (100.0):`

Sort method: occurrence

(a - (d / c)) * b [42831]

b * (a - (d / c)) [28531]

(b - (d / c)) * a [72]

a * (b - (d / c)) [9]

b * (c - (d / a)) [9]

(a - (d / b)) * c [2]

It was 6 different solutions with fitness 100.0

It is interesting that there are more solutions with the constraint of 8 nodes than with 7.

Increasing the the min, and max number of nodes to 9 and 9, respectively, then there are solutions with the stated number of nodes. But now there is a penalty of 100 for not been all different, and they are - of course - not a real solution to the problem.

All solutions with the best fitness (100.0):

Sort method: occurrence

(c * b) - ((c / d) / a) [17588]

(c - (d - (b * b))) - a [2]

It was 2 different solutions with fitness 100.0

### Another example: 1 2 3 4 5

Yet another example using the same configuration file is the following problem, i.e. the result should be 5 using the numbers 1, 2, 3, and 4 and the four operators.`data`

1 2 3 4 5

One run give the following 46 solutions with 0 errors. The number in [] is the number of found occurrences of the specific solution.

`All solutions with the best fitness (0.0):`

Sort method: occurrence

d - (a / (b - c)) [70602]

(c + d) - (a * b) [22871]

(c - (b * a)) + d [8724]

c + (d / (b * a)) [3109]

d - (b - (c * a)) [116]

((d - b) * a) + c [107]

(d - b) + (a * c) [58]

(c - b) + (a * d) [40]

(d * b) - (c * a) [35]

((c / b) * d) - a [24]

(c - b) * (d + a) [24]

c + ((a / b) * d) [19]

(b * d) - (c / a) [16]

d + ((c - a) / b) [15]

(d + a) / (c - b) [15]

(d + c) - (b / a) [13]

(c - b) + (d / a) [12]

(c / a) + (d - b) [8]

(b * d) - (a * c) [8]

(d * a) - (b - c) [8]

(d / (a * b)) + c [8]

(a * d) - (b - c) [7]

(a * d) + (c - b) [7]

(a + d) * (c - b) [6]

(a * c) + (d / b) [6]

(c / a) + (d / b) [5]

(c + d) - (b * a) [4]

(d + c) - (a * b) [4]

(c + (d / b)) * a [4]

(a + d) / (c - b) [4]

(d + a) * (c - b) [3]

(d / a) - (b - c) [3]

(b * d) - (c * a) [2]

((d / b) * c) - a [2]

(d / b) + (c / a) [2]

(d * a) + (c - b) [1]

((c - b) / a) + d [1]

c + (d - (a * b)) [1]

(c * a) - (b - d) [1]

(a * c) - (b - d) [1]

((d * a) + c) - b [1]

(d * b) - (a * c) [1]

(d + c) - (b * a) [1]

(d / a) + (c - b) [1]

a + ((c - b) * d) [1]

((c + d) - b) / a [1]

It was 46 different solutions with fitness 0.0

Here is much more solution, which indicates that it is a simpler program than the above.

## ForLoopD

I have also implemented a double version of JGAP's existing`ForLoop`

, which also can be used in this program. (This was done by copying the code in the JGAP distribution, org.jgap.gp.function.ForLoop, and then do some small changes.)
The logic of this function is to create a for loop and for each loop add the result of the code in the body of the loop ("some code") to the final result which is then returned as a value of the loop. In a normal programming language this should be coded like this. The number of loops (the variable `a`

) is dynamic selected.

` double x = 0.0d;`

for(int i=0;i<a;i++) { x += some code }

return x;

As a test, I added `ForLoopD`

to the function list in the `5 5 5 1 24`

problem (see the configuration above):

functions: Multiply,Divide,Add,Subtract,**ForLoopD**

One solution is the following with 0 errors:

for(int i=0;i<b;i++) { (c - (d / a)) }

Which is - of course - just another way of stating the following solution:

b*(c - (d / a))

Well, I have to see if this function is of any real use...