happy-algorithms-league / hal-cgp Goto Github PK
View Code? Open in Web Editor NEWCartesian genetic programming (CGP) in pure Python.
License: GNU General Public License v3.0
Cartesian genetic programming (CGP) in pure Python.
License: GNU General Public License v3.0
test cases should cover at least:
Add type hints to all classes and functions and enable static type checking via mypy in CI.
For example in test_node.py::test_add we use two inputs, but only one row. pretty_str
assumes that we have more rows than inputs and hence does not print the second input node. this should be fixed.
As reported by @mschmidt87 output is only printed right before exiting. most likely we need to flush stdout to continuously see the progress (e.g., https://stackoverflow.com/questions/230751/how-to-flush-output-of-print-function#230774)
As an extension to the nodes currently implemented, add nodes that perform:
<
) and greater than (>
)the figure used in the README is from our manuscript, we should hence add a reference as soon as we can cite it
Currently, the caching decorator does not check whether the function to be cached is actually the same as the one that the cache was created for. Thus, it can happen that the decorator mistakes a different function for the cached function.
This is a minimal reproducer:
import gp
@gp.utils.disk_cache("example_caching_cache.pkl")
def f1(x):
return x
@gp.utils.disk_cache("example_caching_cache.pkl")
def f2(x):
return x**2
for i in range(10):
print(f"x = {f1(i)}, x^2 = {f2(i)}")
yields
x = 0, x^2 = 0
x = 1, x^2 = 1
x = 2, x^2 = 2
x = 3, x^2 = 3
x = 4, x^2 = 4
x = 5, x^2 = 5
x = 6, x^2 = 6
x = 7, x^2 = 7
x = 8, x^2 = 8
x = 9, x^2 = 9
Proposed solution is: Let the decorator verify for a cached function that the function called with the first cached argument yields the same result as the cached result.
What to do with this class? Remove? Rename?
When a genome represents a constant function, e.g. f(x)=1
, the torch class generated by CPGGraph does not return the correct shape. I created a test for this on the bugfix/torch_constant_output_shape
branch: https://github.com/jakobj/python-gp/blob/5a50c07876adf9e8a263b4e10c0e01c5b582fb68/test/test_cgp_graph.py#L161
This can probably be fixed by modifying the CGPConstantFloat.format_output_str_torch()
method.
https://github.com/jakobj/python-gp/blob/3bdff24be10ef503652b44b0be1606a639555627/gp/cgp_node.py#L191
Currently the only way to introduce constants in the expressions is via the ConstantFloat
or Parameter
nodes. The values of the parameter nodes can be tuned by local search to specific values in order to increase the fitness of individuals.
However, there is a fundamental problem. As nodes can be used in more than one subsequent node as inputs, it can happen that a particular Parameter
node feeds into two subsequent nodes. If one of them would require the parameter to be large, the other to be small, the gradient would point in opposite directions, thereby bringing the local search for this particular parameter to a halt, or at least slowing it down significantly. These can lead to difficult-to-escape local minima in the search landscape.
This can be avoided by additionally introducing parameters private to each node (e.g., Izzo et al., 2017). These private parameters are by definition not used by any other node and hence are never subject to "opposing" gradient information. For simplicity a first implementation would introduce two parameters, one that scales the output of each node and one that shifts it. By default these would have the values 1.0 and 0.0, respectively and hence not change the behavior of the node from the current implementation. I guess it could be helpful to introduce a mechanic that allows users to choose between updating only values of Parameter
nodes, only private node parameters or both. As a side remark, this would allow the evolution of simple ANNs by introducing nodes that only apply an non-linearity to their input (e.g., CGPANNs; Khan et al., 2013).
Izzo, D., Biscani, F., & Mereta, A. (2017). Differentiable Genetic Programming. In European Conference on Genetic Programming (pp. 35-51). Springer, Cham.
Khan, M. M., Ahmad, A. M., Khan, G. M., & Miller, J. F. (2013). Fast Learning Neural Networks using Cartesian Genetic Programming. Neurocomputing, 121, 274-289.
We need documentation for users, using some automated generation tool, eg. sphinx.
The to_func
, to_torch
, and to_sympy
methods of the CGPGraph
class simply call the respective compile_...
methods and do not add any functionality. I therefore suggest to remove them or rename the compile_...
methods.
Can be set up only after making the repository public.
Some of the tests are using constructions like assert abs(1.0 - y[0]) < 1e-15
. We should probably use pytest.approx
in these for consistent error reporting.
Remove the history recording logic from hl_api.py
and replace with a similar mechanism as scipy uses, e.g., https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html#scipy.optimize.minimize
sometimes, i guess depending on the seed, the test for evolve
fails as it does not converge to the solution. to avoid frequently restarting travis we should fix this by adapting the parameters of the search, e.g., allow for more generations
Currently, the example scripts are broken due to changes in the API.
The pattern
graph = gp.CartesianGraph(individual.genome)
f = graph.to_torch()
is used often enough to provide a convenience function, similar to those for to_func
and to_sympy
in the Individual
class. This would shorten the above to
f = individual.to_torch()
As discussed in #3 we should use a formatter (black) to ensure consistent code style
The factory seems pretty pointless (https://github.com/jakobj/python-gp/blob/master/gp/cgp_node.py#L236) and should be removed: it takes only three lines to define your own constant float node (and avoiding pickling issues):
class CustomCGPConstantFloat(CGPConstantFloat):
def __init__(self, idx, inputs):
super().__init__(idx, inputs)
self._output = val
isn't that short enough? @mschmidt87
The CGPPrimitives
which collects primitives into one class seems unused at the moment. Can it be removed?
The output values of Parameter
nodes are currently stored in each individual (https://github.com/Happy-Algorithms-League/python-gp/blob/master/gp/individual.py#L33). I would argue this is rather surprising since they define the values of node encoded in the Genome
and like it are passed on to the next generation.
I think it would hence be more natural to store these values within the Genome
. This would also remove the need to pass around the dictionary from the Individual
to all to...
functions (e.g., https://github.com/Happy-Algorithms-League/python-gp/blob/master/gp/individual.py#L123).
In order to simplify extending the library by more evolutionary algorithms, we should create a base class for algorithms that particular algorithms inherit from.
As far as I can see the to_func
produces a python function that can only operate on scalar values. We should provide a to_numpy
function that returns a python function able to digest vectors. What do you think @mschmidt87?
Once #19 has been merged, we can remove the lines that add the location of the package to the system path explicitly from all example and test scripts.
I think we don't require those packages for basic functionality, they should hence be marked as optional and the package should work even if they are not available on the system
We are using a lot of unnecessary prefixes, e.g., "CGPGraph", "CGPInputNode" etc. which was i guess motivated by not only having a CGPPopulation
but also the BinaryPopulation
class. Since we decided to remove the latter we should also go ahead and remove these prefixes to make the code more readable and avoid typing
the example in the README should correspond to our convention regarding the definition of parameters (four dicts, population, genome, ea, evolve) that is used in tests/examples.
Currently all Parameter
nodes are initialized to the same value 1.0 (https://github.com/Happy-Algorithms-League/python-gp/blob/master/gp/individual.py#L33). We should provide a convenient way of choosing this default value.
we should add an example of an expensive fitness evaluation, e.g., just using sleep
where we demonstrate how to use the caching decorator
Minor refactoring: The code will become more readable if we use f-strings instead of the " ".format()
syntax in the format_output_str
methods of CGPNode
.
The utils.primitives_from_class_names
function is currently not being used. Can it be removed?
docstrings in all functions.
As shown in #86, the decorator should not be used on nested functions as the consistency check (one execution of the objective on cached input) is applied on the first call after decoration. In a multiprocessing scenario (n_processes > 1
) the decorator is applied on each evaluation of the decorated function even if the function is defined globally due to the pickling/unpickling within multiprocessing thus doubling the runtime.
One way to avoid this problem is suggested in the discussion of #86:
the user can provide an optional keyword argument, e.g.,
check_consistency
to the decorator which by default isTrue
. ifcheck_consistency
[is true] we use the currently implemented checks that work for single processes. in addition we provide a functioncheck_cache_consistency(<fn>, <objective>)
that the user can call manually. So
- single process: decorate with
disk_cache(<fn>)
- mulitple processes: decorate with
disk_cache(<fn>, check_consistency=False)
and callcheck_cache_consistency(<fn>, <objective>)
manually once at the beginning, e.g., beforeevolve
This does not seem to provide any relevant information to the user or am I wrong? It would be cool to print how many times the objective has been called, but i don't think it's this number?! @mschmidt87
The InvalidSympyExpression
exception currently just passes and does not raise any error or print any information. Is this the intended, final behavior?
In particular a mechanism for the updating of node values from pytorch classes in which parameters have been optimized via gradient descent would be desirable. #57 provides a first stab at this.
The installation in travis via pip install .
does not install extra requirements and hence all tests for sympy and torch are skipped. We should perform an installation with extras on travis to be able to test also those functions.
Currently the example for caching uses SymPy expressions. This makes sense since SymPy can simplify expressions, thereby identifying functionally equivalent individuals even when their genotypes and even phenotypes (unsimplified) differ. However one might want to avoid converting the cartesian graph to a sympy expressions and rather work with python functions or pytorch classes. In these cases caching would still make sense for individuals that only different in inactive genes. How could we provide such functionality most easily?
for some graph pretty_print
cuts of the last letter of "outputs":
00 Input 01 * Const 03 Mul 05 * Outpu (01)
i guess that's a bug.
also, this function should not have "print" in the name as it only returns a string but does not actually print anything.
In principle each node has the same number of input genes, independent of its currently selected operation, e.g., a node with arity=1 still has 2 input genes if the max arity in the graph is 2. Does the current implementation allow the mutation of the silent input genes or are these excluded? Since silent mutations seem to be beneficial generally, we should make sure this is the case.
currently we only state the package name without specifying the minimal version. should we just use the current version from pip for all packages?
Currently, the evolutionary algorithm is a method of the Population class (in generate_new_parent
and generate_offspring
). We should investigate whether it is possible to disentangle those to separate the representation of genome and phenotype (CGP) from the optimization algorithm (NSGA II).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.