happy-algorithms-league / hal-cgp Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 10.0 10.61 MB

Cartesian genetic programming (CGP) in pure Python.

License: GNU General Public License v3.0

Python 100.00%

cartesian-genetic-programming evolutionary-algorithm symbolic-regression

hal-cgp's People

Contributors

Stargazers

Watchers

Forkers

jakobj henrikmettler syedmalimustafa minghao2016 svenkat19 ondrejmelichar23 cosijopiii lukashino mackoel silver-times

hal-cgp's Issues

Add tests for gradient-based local search

test cases should cover at least:

increase in fitness in a single step
stability of optimal solution, i.e., a single does not take you away if the parameters are already optimal

Investigate random timeouts from Travis

like this https://travis-ci.org/github/jakobj/python-gp/jobs/675940992

Add type hints and static type checking in CI

Add type hints to all classes and functions and enable static type checking via mypy in CI.

`pretty_str` cuts off input nodes if n_inputs > n_rows

For example in test_node.py::test_add we use two inputs, but only one row. pretty_str assumes that we have more rows than inputs and hence does not print the second input node. this should be fixed.

`print_progress` in hl_api.evolve is not behaving as expected

As reported by @mschmidt87 output is only printed right before exiting. most likely we need to flush stdout to continuously see the progress (e.g., https://stackoverflow.com/questions/230751/how-to-flush-output-of-print-function#230774)

Add more nodes

As an extension to the nodes currently implemented, add nodes that perform:

comparisons: less than (<) and greater than (>)
if/else branches
binary operators: AND, OR, NOT

README figure needs reference to manuscript

the figure used in the README is from our manuscript, we should hence add a reference as soon as we can cite it

Add consistency check to caching decorator

Currently, the caching decorator does not check whether the function to be cached is actually the same as the one that the cache was created for. Thus, it can happen that the decorator mistakes a different function for the cached function.

This is a minimal reproducer:

import gp


@gp.utils.disk_cache("example_caching_cache.pkl")
def f1(x):
    return x


@gp.utils.disk_cache("example_caching_cache.pkl")
def f2(x):
    return x**2

for i in range(10):
    print(f"x = {f1(i)}, x^2 = {f2(i)}")

yields

x = 0, x^2 = 0
x = 1, x^2 = 1
x = 2, x^2 = 2
x = 3, x^2 = 3
x = 4, x^2 = 4
x = 5, x^2 = 5
x = 6, x^2 = 6
x = 7, x^2 = 7
x = 8, x^2 = 8
x = 9, x^2 = 9

Proposed solution is: Let the decorator verify for a cached function that the function called with the first cached argument yields the same result as the cached result.

Make pip installable

Remove `BinaryPopulation` class?

What to do with this class? Remove? Rename?

Review and fix examples

CPGGraph.to_torch() generated class does not return correct shape for constant functions

When a genome represents a constant function, e.g. f(x)=1, the torch class generated by CPGGraph does not return the correct shape. I created a test for this on the bugfix/torch_constant_output_shape branch: https://github.com/jakobj/python-gp/blob/5a50c07876adf9e8a263b4e10c0e01c5b582fb68/test/test_cgp_graph.py#L161

This can probably be fixed by modifying the CGPConstantFloat.format_output_str_torch() method.
https://github.com/jakobj/python-gp/blob/3bdff24be10ef503652b44b0be1606a639555627/gp/cgp_node.py#L191

Introduce multiplicative and additive parameters for each node

Currently the only way to introduce constants in the expressions is via the ConstantFloat or Parameter nodes. The values of the parameter nodes can be tuned by local search to specific values in order to increase the fitness of individuals.
However, there is a fundamental problem. As nodes can be used in more than one subsequent node as inputs, it can happen that a particular Parameter node feeds into two subsequent nodes. If one of them would require the parameter to be large, the other to be small, the gradient would point in opposite directions, thereby bringing the local search for this particular parameter to a halt, or at least slowing it down significantly. These can lead to difficult-to-escape local minima in the search landscape.

This can be avoided by additionally introducing parameters private to each node (e.g., Izzo et al., 2017). These private parameters are by definition not used by any other node and hence are never subject to "opposing" gradient information. For simplicity a first implementation would introduce two parameters, one that scales the output of each node and one that shifts it. By default these would have the values 1.0 and 0.0, respectively and hence not change the behavior of the node from the current implementation. I guess it could be helpful to introduce a mechanic that allows users to choose between updating only values of Parameter nodes, only private node parameters or both. As a side remark, this would allow the evolution of simple ANNs by introducing nodes that only apply an non-linearity to their input (e.g., CGPANNs; Khan et al., 2013).

Izzo, D., Biscani, F., & Mereta, A. (2017). Differentiable Genetic Programming. In European Conference on Genetic Programming (pp. 35-51). Springer, Cham.
Khan, M. M., Ahmad, A. M., Khan, G. M., & Miller, J. F. (2013). Fast Learning Neural Networks using Cartesian Genetic Programming. Neurocomputing, 121, 274-289.

User-level documentation

We need documentation for users, using some automated generation tool, eg. sphinx.

Remove `CPGGRaph.to_...` methods?

The to_func, to_torch, and to_sympy methods of the CGPGraph class simply call the respective compile_... methods and do not add any functionality. I therefore suggest to remove them or rename the compile_... methods.

Continuous integration

Can be set up only after making the repository public.

Use `pytest.approx` functions in tests

Some of the tests are using constructions like assert abs(1.0 - y[0]) < 1e-15. We should probably use pytest.approx in these for consistent error reporting.

Do not return a history dict but allow user to provide a callback to track evolution progress

Remove the history recording logic from hl_api.py and replace with a similar mechanism as scipy uses, e.g., https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html#scipy.optimize.minimize

Avoid random failures of `test_cgp_evolve`

sometimes, i guess depending on the seed, the test for evolve fails as it does not converge to the solution. to avoid frequently restarting travis we should fix this by adapting the parameters of the search, e.g., allow for more generations

Add tests for CGPPopulation

Fix examples

Currently, the example scripts are broken due to changes in the API.

Provide `to_torch` compilation in `Individual`

The pattern

graph = gp.CartesianGraph(individual.genome)
f = graph.to_torch()

is used often enough to provide a convenience function, similar to those for to_func and to_sympy in the Individual class. This would shorten the above to

f = individual.to_torch()

Use code formatter

As discussed in #3 we should use a formatter (black) to ensure consistent code style

Remove `cgp_constant_float` factory

The factory seems pretty pointless (https://github.com/jakobj/python-gp/blob/master/gp/cgp_node.py#L236) and should be removed: it takes only three lines to define your own constant float node (and avoiding pickling issues):

class CustomCGPConstantFloat(CGPConstantFloat):
    def __init__(self, idx, inputs):
      super().__init__(idx, inputs)
      self._output = val

isn't that short enough? @mschmidt87

`CGPPrimitives` class unused?

The CGPPrimitives which collects primitives into one class seems unused at the moment. Can it be removed?

Move parameter value storage from `Individual` to `Genome`

The output values of Parameter nodes are currently stored in each individual (https://github.com/Happy-Algorithms-League/python-gp/blob/master/gp/individual.py#L33). I would argue this is rather surprising since they define the values of node encoded in the Genome and like it are passed on to the next generation.
I think it would hence be more natural to store these values within the Genome. This would also remove the need to pass around the dictionary from the Individual to all to... functions (e.g., https://github.com/Happy-Algorithms-League/python-gp/blob/master/gp/individual.py#L123).

Add abstract class for algorithms

In order to simplify extending the library by more evolutionary algorithms, we should create a base class for algorithms that particular algorithms inherit from.

Provide `to_numpy` compilation

As far as I can see the to_func produces a python function that can only operate on scalar values. We should provide a to_numpy function that returns a python function able to digest vectors. What do you think @mschmidt87?

Remove explicit system path manipulation

Once #19 has been merged, we can remove the lines that add the location of the package to the system path explicitly from all example and test scripts.

Make sympy/torch optional

I think we don't require those packages for basic functionality, they should hence be marked as optional and the package should work even if they are not available on the system

Remove "CGP/cgp" prefixes where possible

We are using a lot of unnecessary prefixes, e.g., "CGPGraph", "CGPInputNode" etc. which was i guess motivated by not only having a CGPPopulation but also the BinaryPopulation class. Since we decided to remove the latter we should also go ahead and remove these prefixes to make the code more readable and avoid typing

Improve README example

the example in the README should correspond to our convention regarding the definition of parameters (four dicts, population, genome, ea, evolve) that is used in tests/examples.

Provide convenient way of choosing default values for `Parameter` nodes

Currently all Parameter nodes are initialized to the same value 1.0 (https://github.com/Happy-Algorithms-League/python-gp/blob/master/gp/individual.py#L33). We should provide a convenient way of choosing this default value.

Write/update README

Include caching example

we should add an example of an expensive fitness evaluation, e.g., just using sleep where we demonstrate how to use the caching decorator

Use f-strings for `format_output_str` methods

Minor refactoring: The code will become more readable if we use f-strings instead of the " ".format() syntax in the format_output_str methods of CGPNode.

Remove `utils.primitives_from_class_names` method

The utils.primitives_from_class_names function is currently not being used. Can it be removed?

Developer documentation

docstrings in all functions.

Caching decorator doubles runtime for n_processes > 1

As shown in #86, the decorator should not be used on nested functions as the consistency check (one execution of the objective on cached input) is applied on the first call after decoration. In a multiprocessing scenario (n_processes > 1) the decorator is applied on each evaluation of the decorated function even if the function is defined globally due to the pickling/unpickling within multiprocessing thus doubling the runtime.

One way to avoid this problem is suggested in the discussion of #86:

the user can provide an optional keyword argument, e.g., check_consistency to the decorator which by default is True. if check_consistency [is true] we use the currently implemented checks that work for single processes. in addition we provide a function check_cache_consistency(<fn>, <objective>) that the user can call manually. So

single process: decorate with disk_cache(<fn>)

mulitple processes: decorate with disk_cache(<fn>, check_consistency=False) and call check_cache_consistency(<fn>, <objective>) manually once at the beginning, e.g., before evolve

Why do we print champion.idx in `print_progress`?

This does not seem to provide any relevant information to the user or am I wrong? It would be cool to print how many times the objective has been called, but i don't think it's this number?! @mschmidt87

`InvalidSympyExpression` just passes?

The InvalidSympyExpression exception currently just passes and does not raise any error or print any information. Is this the intended, final behavior?

Migrate tests from `test_cgp_evolve.py` to `test_hl_api.py`

Provide mechanisms to combine evolution with local search

In particular a mechanism for the updating of node values from pytorch classes in which parameters have been optimized via gradient descent would be desirable. #57 provides a first stab at this.

Add tests for Div and Pow

Enable full install/test on Travis

The installation in travis via pip install . does not install extra requirements and hence all tests for sympy and torch are skipped. We should perform an installation with extras on travis to be able to test also those functions.

Provide mechanisms to cache python functions/pytorch classes

Currently the example for caching uses SymPy expressions. This makes sense since SymPy can simplify expressions, thereby identifying functionally equivalent individuals even when their genotypes and even phenotypes (unsimplified) differ. However one might want to avoid converting the cartesian graph to a sympy expressions and rather work with python functions or pytorch classes. In these cases caching would still make sense for individuals that only different in inactive genes. How could we provide such functionality most easily?

`CGPGraph.pretty_print` cuts off letters

for some graph pretty_print cuts of the last letter of "outputs":

00   Input              01 * Const              03   Mul                05 * Outpu (01)

i guess that's a bug.

also, this function should not have "print" in the name as it only returns a string but does not actually print anything.

Check whether silent mutations in nodes inputs are allowed

In principle each node has the same number of input genes, independent of its currently selected operation, e.g., a node with arity=1 still has 2 input genes if the max arity in the graph is 2. Does the current implementation allow the mutation of the silent input genes or are these excluded? Since silent mutations seem to be beneficial generally, we should make sure this is the case.

Define versions in (extra)requirements

currently we only state the package name without specifying the minimal version. should we just use the current version from pip for all packages?

Investigate separation of Population class and evolutionary algorithm.

Currently, the evolutionary algorithm is a method of the Population class (in generate_new_parent and generate_offspring). We should investigate whether it is possible to disentangle those to separate the representation of genome and phenotype (CGP) from the optimization algorithm (NSGA II).