erp12 / pyshgp Goto Github PK

Push Genetic Programming in Python.

Home Page: http://erp12.github.io/pyshgp

License: MIT License

Python 99.01% Makefile 0.20% Batchfile 0.25% Shell 0.54%

genetic-programming python software-synthesis artificial-intelligence machine-learning evolutionary-algorithms evolutionary-computation programming-by-example

pyshgp's Issues

Bug in gp.py

I've just cloned a fresh repo (and upgraded all my Python binaries), and when I run an example (any of the three) I get the following:

[all the standard setup works OK]

Creating Initial Population
Traceback (most recent call last):
  File "examples/integer_regression.py", line 60, in <module>
    gp.evolution(error_func, problem_params)
  File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/gp.py", line 166, in evolution
    population = generate_random_population(evolutionary_params)
  File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/gp.py", line 97, in generate_random_population
    rand_genome = r.random_plush_genome(evolutionary_params)
TypeError: random_plush_genome() takes exactly 2 arguments (1 given)

Command line args could be made much more robust with argparse

https://docs.python.org/3/library/argparse.html

suggestion: Name predicate instructions to indicate boolean return value

I am noticing instructions like _exec_empty cropping up quite a lot. I realize that in Python variable names can only include underscores and Alnum characters, but I wonder if it might make Push code a bit more readable to name these in the Mathematica style: ending in Q (indicating "question", I guess?).

So for example _exec_empty_Q.

Just a minor suggestion. In Clojure implementations I use exec-empty?, and it helps readability quite a bit.

Contributing Guide

Instructions are run in a strange way.

_handle_?_instruction() implementation should be moved out of interpreter and class definition of corresponding instruction type.
PushInterpreter.execute_instruction() should be broken into PushInterpreter.eval_atom() and a execute() method found in each instruction class definition.
Documentation for all of this should be much better.

Custom special stack-element objects

Stack out of bounds (or should this be an exception)
Empty stack object

These will clean up code surrounding checking values on stack.

Re-write ReadTheDocs using Push-Redux as background knowledge.

Now that the Push-Redux has most of its content, we can simplify and the Pysh ReadTheDocs by referencing the redux.

Properly implement program_growth_cap

Update README.md

Due to lots of recent changes the readme need to be completely re-written.

Add generalization_function to SimplePushGPEvolver attributes

After automatic program simplification, run this function on the program to determine if it generalizes. generalization_function will look very similar to error function.

Also, consider refactoring examples to include 1 function which produces an error_function and a generalization_function.

Utilize numpy when possible

Expand interpreter constructor to lessen code re-use in problem files.

Pass input stack values

Add generational pre-process function

This will significantly clean up the evolution() function in the gp/gp.py file.

Separate Push Interpreter from GP

It should be easier to use (and modify) the Push interpreter without having to worry about adversely changing evolution.

Reformat Docs. Add autodoc API documentation.

Docs should be overhauled and largely gutted. Most of the PushGP descriptions should live with the nearly-complete Push-Redux which currently lives at https://erp12.github.io/push-redux/

I also just found out about the power of pairing ReadTheDocs and Autodoc. I am in the process of adding an auto-generated API pages to the ReadTheDocs documentation site. Unfortunately, I don't think there is a good way to use this to document the instruction set, so we will continue to rely on the hack-y comment scraper for now.

Re-run and document examples in examples/README.md

Pysh has changed a lot during development, not all runs documented in the examples/README.md are accurate anymore.

Rename modules so they don't start with "pysh_"

odd number tutorial error

I tried to run the odd number tutorial and got an error message reading[ init missing 1 required positional argument: "spawner"], not sure why this is happening

Push Instruction Set Tests

Current tests should be removed because they are too difficult to maintain and are hand written, so they probably don't cover enough cases. Tests should ideally be generated.

The difficulty of this is that the output of a program is the state of all the stacks. It is difficult (not possible?) to know what the expected output of a generated push program without running it, unless it isn't generated completely randomly. How can you generate a program in such a way that you know what its output should be?

Also, it is just as important to know what the output of a program should not be. In other words, if we are testing a random program we will have to determine what values should be expected on the stacks after execution. We will also have to check that no other values are on the stacks after execution. This is difficult to check for with programs that were generated with any degree of randomness in them.

Use a dev branch

Great to see all this progress on the pyshgp package. I strongly recommend using a separate development branch for developing the module, and dedicate the master branch to the latest release on pip. I got thrown off for a bit because I was installing pyshgp via pip but referring to examples on the latest dev version on GitHub.

Add doc attr to Instruction. Make app to generate md based off of all_instructions.

Vote instructions need improving.

Currently class vote instructions require a numeric argument, which adds burden to evolution. It would be beneficial to add vote instructions that increment and decrement vote levels for each class by a constant number baked into the instruction.

Add vote inc and vote dec instructions
Add vote inc and dec instructions that vote powers of 2.

Keyboard interrupt ^C does not halt processing

I was running one of the examples on my laptop, and got super boooooored, but ctrl-C just raises some kind of caught exception and leaves a pile of Python processes running.

Are you by any chance catching all exceptions, including KeyboardInterrupt? Because that's not really the way I would like ctrl-C to work, as it turns out. 😃

Break Up Uniform Mutation

Uniform mutation is a rather large variation operator. Now that pyshgp supports GeneticOperatorPipelines it would give the user more control to break UM into:

PerturbClosesMutuation
PerturbIntegerMutation
PerturbFloatMutation
TweakStringMutation
FlipBooleanMutation
RandomDeletionMutation
RandomAdditionMutation
RandomReplaceMutation
Genesis
Reproduction

Possible overhaul of instructions.

Note: Maybe this whole issue can be ignored if we want to migrate the push interpreter to CPushPush.

Instructions

We should easily be able to rely more on inheritance and add more functionality to the constructor of Instruction in order to remove a lot of code duplication.

See the way instructions are made in Propel and CPushPush for examples of patterns that are much better than what is currently in pyshgp.

Instruction Set

Each python module which defines Push Instructions should define __all__ and then we can safely use import * in the instructions __init__.py. This should make the whole instructions sub-package much easier to understand.

Tests

This time we should put a lot more thought into removing the code duplication. Here is a sketch of what I would currently suggest, although more thought should maybe be put into this:

common_tests = [
    [
        {'_integer': [1, 2, 3]},
        {'_integer': [1, 2]},
        '_integer_pop'
    ],
    [
        {'_boolean': [True]},
        {'_boolean': []},
        '_boolean_pop'
    ],
    [
        {'_string': ['A']},
        {'_string': ['A', 'A']},
        '_boolean_pop'
    ],
    [
        {'_float': [1.5, 2.3]},
        {'_float': [2.3, 1.5]},
        '_float_swap'
    ],
    [
        {'_integer': [1, 2, 3]},
        {'_integer': [2, 3, 1]},
        '_integer_rot'
    ]
]

for test in common_tests:
    assert run_test(*test)

Error metrics on problems are re-written for each problem. Should create metrics module.

Rewrite push spawner and translate functions to remove epigenetic markers.

The concepts of plush genes and epigenetic markers need to be merged. Silent and close markers should just be attributes of plush genes. There is no need to users to control the epigenetic markers.

This refactor will mainly impact Spawner and Translate code.

You spelled 'evolutionary' wrong in the readme

Registered instructions should be a Set, not Dictionary.

Because

There shouldn't be duplicates
Every instruction obj has a name attribute that makes the dictionary key redundant
It is rare lookup 1 instruction, so a filter() is already used for often

GP Tests

Okay, so I didn't exactly use Test Driven Development... Unit tests should still be added for GP related operations.

Exact implementation TBD because of the random nature of all the operators.

PushGPClassifier is missing

Blocked by #71

Issue with parallelization in regression.py

Hi, when I try running the regression.py with n_jobs = -1 I get the following error

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Remove Twilio

It was a fun idea, but difficult to make usable for anyone other than me. 😢

Remove `data` folder, use sklearn datasets.

Then we can avoid have csv, json, etc files in the repo.

End-to-end benchmarks to guide development.

It is hard to judge the impact of many of our changes during CI because full evolutionary runs can take weeks of CPU time. The best we can hope for is a benchmarking tool that can manually start a significant number of benchmarks and creates a report.

These benchmarks should be tracking runtime and pyshgp's ability to find solutions.

Things that need to be done to complete this work:

Implement more of the software synthesis benchmark problems.
Write a script to start n number of runs on x non-trivial problems, and track the runtime and solution of each. Ideally runs would happen in parallel. On fly? On digital ocean?
Determine some way of storing runtimes and solution rates long term.

Need consistent usage of '_' in instruction names.

Currently I am leaning towards always using the _ when referencing instruction names because it is indicates that the string is probably and instruction name, and will make expressing programs as lists of strings as more reliable... although I am not sure it is good to encourage (or support) the latter.

Re-document Examples

Pysh has changed enough over the past few months that the documentations about the examples is getting fairly out of date.

This could wait until #44 is done.

Make interface with Sklearn much easier.

Create wrapper classes for Regression and Classification problems that implement the base class of scikit-learn.

Command line args don't work as described

I tried python examples/idea_of_numbers.py --population_size=200 and it errors out immediately with

Traceback (most recent call last):
  File "examples/idea_of_numbers.py", line 59, in <module>
    gp.evolution(error_func, problem_params)
  File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/gp.py", line 148, in evolution
    params.grab_command_line_params(evolutionary_params)
  File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/params.py", line 143, in grab_command_line_params
    while sys.argv[i+j].startswith('-'):
IndexError: list index out of range

(the idea_of_numbers.py file is my own)

Genetic Operator Pipelines

pyshgp's current genetic operators are large and relatively complex.

Replacing them with many smaller operations that each have their own probabilities could be beneficial. In additon, making pyshgp's current system to combine operations into something more robust and easy to use (which I am calling Operator Pipelines) might be good as well.

An Operator Pipeline which includes all of the above would be equivalent of Uniform Mutation.
An Operator Pipeline could also include recombination.

Genome Simplification isn't great...

Often fails to make programs more reasonable as is.

At least should add replacement with no-ops instructions.

Randomness instructions are missing

Without the ability to generate random number, strings, vectors, etc. it is impossible to evolve probabilistic programs.

Develop a very simple push storage system (aka tags)

To learn more about tags, see this.

In order to enable research on the best use of tag (or tag-like) systems, we need a basic framework for push program general information storage.

Some simple ideas include:

3 variables per datatype

Drop python 2 support. Add type hints.

Python 2 is more work than it is worth. Type hints are a nice.

This will involve

Removing import from future
Refactoring the is_?_type() functions in util.
Updating CI

Separate Push interpreter tests from instruction set tests.

Related to #35 but more general. This is currently pretty well covered by the current tests for the instruction set but those tests will ideally be removed at some point.

string demo error

I tried tweaking the string demo where the target function takes a string (s) and returns s[:-2]+s[:-2]
by making it duplicate and then concatenate the reverse of the string by just changing the return to s[::-1]+s[::-1] and got the error

AttributeError Traceback (most recent call last)
in
40 )
41
---> 42 est.fit(X=X, y=y, verbose=True)
43 print(est._result.program)
44 print(est.predict(X))

~/anaconda3/lib/python3.7/site-packages/pyshgp/gp/estimators.py in fit(self, X, y, verbose)
211 else:
212 y_types = [type(y[0])]
--> 213 output_types = [push_type_for_type(t).name for t in y_types]
214
215 self.evaluator = DatasetEvaluator(X, y, interpreter=self.interpreter)

~/anaconda3/lib/python3.7/site-packages/pyshgp/gp/estimators.py in (.0)
211 else:
212 y_types = [type(y[0])]
--> 213 output_types = [push_type_for_type(t).name for t in y_types]
214
215 self.evaluator = DatasetEvaluator(X, y, interpreter=self.interpreter)

AttributeError: 'NoneType' object has no attribute 'name'

not sure why

Dumping and loading models

Is there a way to dump a model and load it later for prediction? Specifically, how do I use the final program printed out at the end of the GP run to perform a prediction task at later times?

How do I generate predictions with the best model from a run?

I'm starting with the iris example:

from sklearn import datasets, model_selection
import numpy as np

import pyshgp.gp.base as gp


iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data,
                                                                    iris.target,
                                                                    test_size=0.5)

model = gp.PushGPClassifier(population_size=100, max_generations=50)
model.fit(X_train, y_train)

I tried model.score(X_test, y_test), but it complained that there is no predict function. Is there an easy way to create a predict function?

Pysh Exceptions Module

Probably bad form to use the generic Exception object for everything...

/pysh/gp/operators.py: raise Exception("Tried to perform unknown genetic operator " + str(op))
/pysh/gp/selection.py: raise Exception("Unknown selection method: " + str(evolutionary_params["selection_method"]))
/pysh/push/translation.py: raise Exception('Something bad found on paren_stack!')

What needs to be done to add a new example?

I've got a couple of example problems I use in my GP classes and workshops, and they seem to be working already in Pysh. Aside from in-file docs, what needs to be added, and where, before I submit a PR?

erp12 / pyshgp Goto Github PK

pyshgp's Issues

Instructions

Instruction Set

Tests

Recommend Projects

Recommend Topics

Recommend Org