erp12 / pyshgp Goto Github PK
View Code? Open in Web Editor NEWPush Genetic Programming in Python.
Home Page: http://erp12.github.io/pyshgp
License: MIT License
Push Genetic Programming in Python.
Home Page: http://erp12.github.io/pyshgp
License: MIT License
I've just cloned a fresh repo (and upgraded all my Python binaries), and when I run an example (any of the three) I get the following:
[all the standard setup works OK]
Creating Initial Population
Traceback (most recent call last):
File "examples/integer_regression.py", line 60, in <module>
gp.evolution(error_func, problem_params)
File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/gp.py", line 166, in evolution
population = generate_random_population(evolutionary_params)
File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/gp.py", line 97, in generate_random_population
rand_genome = r.random_plush_genome(evolutionary_params)
TypeError: random_plush_genome() takes exactly 2 arguments (1 given)
I am noticing instructions like _exec_empty
cropping up quite a lot. I realize that in Python variable names can only include underscores and Alnum
characters, but I wonder if it might make Push code a bit more readable to name these in the Mathematica style: ending in Q
(indicating "question", I guess?).
So for example _exec_empty_Q
.
Just a minor suggestion. In Clojure implementations I use exec-empty?
, and it helps readability quite a bit.
_handle_?_instruction()
implementation should be moved out of interpreter and class definition of corresponding instruction type.PushInterpreter.execute_instruction()
should be broken into PushInterpreter.eval_atom()
and a execute()
method found in each instruction class definition.These will clean up code surrounding checking values on stack.
Now that the Push-Redux has most of its content, we can simplify and the Pysh ReadTheDocs by referencing the redux.
Due to lots of recent changes the readme need to be completely re-written.
After automatic program simplification, run this function on the program to determine if it generalizes. generalization_function
will look very similar to error function.
Also, consider refactoring examples to include 1 function which produces an error_function
and a generalization_function
.
This will significantly clean up the evolution()
function in the gp/gp.py
file.
It should be easier to use (and modify) the Push interpreter without having to worry about adversely changing evolution.
Docs should be overhauled and largely gutted. Most of the PushGP descriptions should live with the nearly-complete Push-Redux which currently lives at https://erp12.github.io/push-redux/
I also just found out about the power of pairing ReadTheDocs and Autodoc. I am in the process of adding an auto-generated API pages to the ReadTheDocs documentation site. Unfortunately, I don't think there is a good way to use this to document the instruction set, so we will continue to rely on the hack-y comment scraper for now.
Pysh has changed a lot during development, not all runs documented in the examples/README.md are accurate anymore.
I tried to run the odd number tutorial and got an error message reading[ init missing 1 required positional argument: "spawner"], not sure why this is happening
Current tests should be removed because they are too difficult to maintain and are hand written, so they probably don't cover enough cases. Tests should ideally be generated.
The difficulty of this is that the output of a program is the state of all the stacks. It is difficult (not possible?) to know what the expected output of a generated push program without running it, unless it isn't generated completely randomly. How can you generate a program in such a way that you know what its output should be?
Also, it is just as important to know what the output of a program should not be. In other words, if we are testing a random program we will have to determine what values should be expected on the stacks after execution. We will also have to check that no other values are on the stacks after execution. This is difficult to check for with programs that were generated with any degree of randomness in them.
Great to see all this progress on the pyshgp package. I strongly recommend using a separate development branch for developing the module, and dedicate the master branch to the latest release on pip. I got thrown off for a bit because I was installing pyshgp via pip but referring to examples on the latest dev version on GitHub.
Currently class vote instructions require a numeric argument, which adds burden to evolution. It would be beneficial to add vote instructions that increment and decrement vote levels for each class by a constant number baked into the instruction.
I was running one of the examples on my laptop, and got super boooooored, but ctrl-C
just raises some kind of caught exception and leaves a pile of Python processes running.
Are you by any chance catching all exceptions, including KeyboardInterrupt
? Because that's not really the way I would like ctrl-C
to work, as it turns out. ๐
Uniform mutation is a rather large variation operator. Now that pyshgp
supports GeneticOperatorPipelines
it would give the user more control to break UM into:
Note: Maybe this whole issue can be ignored if we want to migrate the push interpreter to CPushPush.
We should easily be able to rely more on inheritance and add more functionality to the constructor of Instruction
in order to remove a lot of code duplication.
See the way instructions are made in Propel and CPushPush for examples of patterns that are much better than what is currently in pyshgp.
Each python module which defines Push Instructions should define __all__
and then we can safely use import *
in the instructions __init__.py
. This should make the whole instructions sub-package much easier to understand.
This time we should put a lot more thought into removing the code duplication. Here is a sketch of what I would currently suggest, although more thought should maybe be put into this:
common_tests = [
[
{'_integer': [1, 2, 3]},
{'_integer': [1, 2]},
'_integer_pop'
],
[
{'_boolean': [True]},
{'_boolean': []},
'_boolean_pop'
],
[
{'_string': ['A']},
{'_string': ['A', 'A']},
'_boolean_pop'
],
[
{'_float': [1.5, 2.3]},
{'_float': [2.3, 1.5]},
'_float_swap'
],
[
{'_integer': [1, 2, 3]},
{'_integer': [2, 3, 1]},
'_integer_rot'
]
]
for test in common_tests:
assert run_test(*test)
The concepts of plush genes and epigenetic markers need to be merged. Silent and close markers should just be attributes of plush genes. There is no need to users to control the epigenetic markers.
This refactor will mainly impact Spawner and Translate code.
Because
Okay, so I didn't exactly use Test Driven Development... Unit tests should still be added for GP related operations.
Exact implementation TBD because of the random nature of all the operators.
Blocked by #71
Hi, when I try running the regression.py with n_jobs = -1 I get the following error
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
It was a fun idea, but difficult to make usable for anyone other than me. ๐ข
Then we can avoid have csv, json, etc files in the repo.
It is hard to judge the impact of many of our changes during CI because full evolutionary runs can take weeks of CPU time. The best we can hope for is a benchmarking tool that can manually start a significant number of benchmarks and creates a report.
These benchmarks should be tracking runtime and pyshgp's ability to find solutions.
Things that need to be done to complete this work:
n
number of runs on x
non-trivial problems, and track the runtime and solution of each. Ideally runs would happen in parallel. On fly? On digital ocean?Currently I am leaning towards always using the _
when referencing instruction names because it is indicates that the string is probably and instruction name, and will make expressing programs as lists of strings as more reliable... although I am not sure it is good to encourage (or support) the latter.
Pysh has changed enough over the past few months that the documentations about the examples is getting fairly out of date.
This could wait until #44 is done.
Create wrapper classes for Regression and Classification problems that implement the base class of scikit-learn.
I tried python examples/idea_of_numbers.py --population_size=200
and it errors out immediately with
Traceback (most recent call last):
File "examples/idea_of_numbers.py", line 59, in <module>
gp.evolution(error_func, problem_params)
File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/gp.py", line 148, in evolution
params.grab_command_line_params(evolutionary_params)
File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/params.py", line 143, in grab_command_line_params
while sys.argv[i+j].startswith('-'):
IndexError: list index out of range
(the idea_of_numbers.py
file is my own)
pyshgp's current genetic operators are large and relatively complex.
Replacing them with many smaller operations that each have their own probabilities could be beneficial. In additon, making pyshgp's current system to combine operations into something more robust and easy to use (which I am calling Operator Pipelines) might be good as well.
An Operator Pipeline which includes all of the above would be equivalent of Uniform Mutation.
An Operator Pipeline could also include recombination.
Often fails to make programs more reasonable as is.
At least should add replacement with no-ops instructions.
Without the ability to generate random number, strings, vectors, etc. it is impossible to evolve probabilistic programs.
To learn more about tags, see this.
In order to enable research on the best use of tag (or tag-like) systems, we need a basic framework for push program general information storage.
Some simple ideas include:
Python 2 is more work than it is worth. Type hints are a nice.
This will involve
is_?_type()
functions in util.Related to #35 but more general. This is currently pretty well covered by the current tests for the instruction set but those tests will ideally be removed at some point.
I tried tweaking the string demo where the target function takes a string (s) and returns s[:-2]+s[:-2]
by making it duplicate and then concatenate the reverse of the string by just changing the return to s[::-1]+s[::-1] and got the error
AttributeError Traceback (most recent call last)
in
40 )
41
---> 42 est.fit(X=X, y=y, verbose=True)
43 print(est._result.program)
44 print(est.predict(X))
~/anaconda3/lib/python3.7/site-packages/pyshgp/gp/estimators.py in fit(self, X, y, verbose)
211 else:
212 y_types = [type(y[0])]
--> 213 output_types = [push_type_for_type(t).name for t in y_types]
214
215 self.evaluator = DatasetEvaluator(X, y, interpreter=self.interpreter)
~/anaconda3/lib/python3.7/site-packages/pyshgp/gp/estimators.py in (.0)
211 else:
212 y_types = [type(y[0])]
--> 213 output_types = [push_type_for_type(t).name for t in y_types]
214
215 self.evaluator = DatasetEvaluator(X, y, interpreter=self.interpreter)
AttributeError: 'NoneType' object has no attribute 'name'
not sure why
Is there a way to dump a model and load it later for prediction? Specifically, how do I use the final program printed out at the end of the GP run to perform a prediction task at later times?
I'm starting with the iris example:
from sklearn import datasets, model_selection
import numpy as np
import pyshgp.gp.base as gp
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data,
iris.target,
test_size=0.5)
model = gp.PushGPClassifier(population_size=100, max_generations=50)
model.fit(X_train, y_train)
I tried model.score(X_test, y_test)
, but it complained that there is no predict
function. Is there an easy way to create a predict
function?
Probably bad form to use the generic Exception object for everything...
I've got a couple of example problems I use in my GP classes and workshops, and they seem to be working already in Pysh. Aside from in-file docs, what needs to be added, and where, before I submit a PR?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.