Giter Club home page Giter Club logo

pyshgp's Introduction

PyshGP

PyPI version PyshGP Tests

Push Genetic Programming in Python

WARNING: The public API of this package may see breaking changes until the 1.0 version.

Motivation

What is PushGP?

Push is programming language that plays nice with evolutionary computing / genetic programming. It is a stack-based language that features 1 stack per data type, including code. Programs are represented by lists of instructions, which modify the values on the stacks. Instructions are executed in order.

More information about PushGP can be found on the Push Redux the Push Homepage and the Push Language Discourse.

Why use PushGP?

PushGP is a leading software synthesis (sometimes called "programming by example") system. It utilized stochastic (typically evolutionary) search methods to produce programs that are capable of manipulating all the common data types, control structures, and data structures. It is easily extendable to specific use cases and has seen impressive human-competitive coding results. PushGP has discovered novel quantum computer programs previously unknown to human programers, and has achieved human competitive results in finding algebraic terms in the study of finite algebras.

In contrast to the majority of other ML/AI methods, PushGP does not require the transformation of data into numeric structures. PushGP does not optimize a set of numeric parameters using a gradient, but rather attempts to intelligently search the space of programs. The result is a system where the primary output is a program written in the Turing complete Push language.

PushGP has proven itself to be one of the most power "general program synthesis" frameworks. Like most evolutionary search frameworks, it usually requires an extremely high runtime, however it can solve problems that few other programming-by-example system can solve.

Additional references on the successes of PushGP:

Goals of PyshGP

Previous PushGP frameworks have focused on supporting genetic programming and software synthesis research. One of the leading PushGP projects is Clojush, which is written in Clojure and heavily focused on the experimentation needed to further the research field.

Pyshgp aims to bring PushGP to a wider range of users and use cases. Many popular ML/AI frameworks are written in Python, and with pyshgp it is much easier to compare PushGP with other methods or build ML pipelines that contain PushGP and other models together.

Although PushGP is constantly changing through research and publication, pyshgp is meant to be a slowly changing, more stable, PushGP framework. It is still possible to use pyshgp for research and development, however accepted contributions to the main repository will be extensively benchmarked, tested, and documented.

Installing pyshgp

pyshgp is compatible with python 3.7.x and 3.8.x.

Install from pip

pip install pyshgp
  • That's it! Read through the docs and examples to learn more.

Build From source

  • Clone the repo
  • cd into the pyshgp repo directory
  • run pip install . --upgrade
  • That's it! Read through the docs and examples to learn more.

Running Tests

Run the following command from project root directory. Make sure all the packages from requirements-with-dev.txt are installed in the instance of python you are using.

python -m pytest

Or run tests continuously (on save) during development using pytest-watch.

ptw 

Documentation

Example usages of pyshgp can be found:

The full pyshgp API can be found on official website.

Pysh Roadmap / Contributing

PyshGP is nearly ready for its 1.0 release. The main outstanding items ares:

  • Extensive benchmarking to make sure pyshgp has the program-finding capabilities we expect from a contemporary PushGP system.
  • More feedback on the API must be gathered before we commit to not making any breaking changes.

For information about contributing, see the Contributing Guide.

pyshgp's People

Contributors

epicfaace avatar erp12 avatar nayabur avatar nbro avatar vaguery avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyshgp's Issues

suggestion: Name predicate instructions to indicate boolean return value

I am noticing instructions like _exec_empty cropping up quite a lot. I realize that in Python variable names can only include underscores and Alnum characters, but I wonder if it might make Push code a bit more readable to name these in the Mathematica style: ending in Q (indicating "question", I guess?).

So for example _exec_empty_Q.

Just a minor suggestion. In Clojure implementations I use exec-empty?, and it helps readability quite a bit.

End-to-end benchmarks to guide development.

It is hard to judge the impact of many of our changes during CI because full evolutionary runs can take weeks of CPU time. The best we can hope for is a benchmarking tool that can manually start a significant number of benchmarks and creates a report.

These benchmarks should be tracking runtime and pyshgp's ability to find solutions.

Things that need to be done to complete this work:

  • Implement more of the software synthesis benchmark problems.
  • Write a script to start n number of runs on x non-trivial problems, and track the runtime and solution of each. Ideally runs would happen in parallel. On fly? On digital ocean?
  • Determine some way of storing runtimes and solution rates long term.

Remove Twilio

It was a fun idea, but difficult to make usable for anyone other than me. ๐Ÿ˜ข

Re-document Examples

Pysh has changed enough over the past few months that the documentations about the examples is getting fairly out of date.

This could wait until #44 is done.

Possible overhaul of instructions.

Note: Maybe this whole issue can be ignored if we want to migrate the push interpreter to CPushPush.

Instructions

We should easily be able to rely more on inheritance and add more functionality to the constructor of Instruction in order to remove a lot of code duplication.

See the way instructions are made in Propel and CPushPush for examples of patterns that are much better than what is currently in pyshgp.

Instruction Set

Each python module which defines Push Instructions should define __all__ and then we can safely use import * in the instructions __init__.py. This should make the whole instructions sub-package much easier to understand.

Tests

This time we should put a lot more thought into removing the code duplication. Here is a sketch of what I would currently suggest, although more thought should maybe be put into this:

common_tests = [
    [
        {'_integer': [1, 2, 3]},
        {'_integer': [1, 2]},
        '_integer_pop'
    ],
    [
        {'_boolean': [True]},
        {'_boolean': []},
        '_boolean_pop'
    ],
    [
        {'_string': ['A']},
        {'_string': ['A', 'A']},
        '_boolean_pop'
    ],
    [
        {'_float': [1.5, 2.3]},
        {'_float': [2.3, 1.5]},
        '_float_swap'
    ],
    [
        {'_integer': [1, 2, 3]},
        {'_integer': [2, 3, 1]},
        '_integer_rot'
    ]
]

for test in common_tests:
    assert run_test(*test)

Instructions are run in a strange way.

  • _handle_?_instruction() implementation should be moved out of interpreter and class definition of corresponding instruction type.
  • PushInterpreter.execute_instruction() should be broken into PushInterpreter.eval_atom() and a execute() method found in each instruction class definition.
  • Documentation for all of this should be much better.

Need consistent usage of '_' in instruction names.

Currently I am leaning towards always using the _ when referencing instruction names because it is indicates that the string is probably and instruction name, and will make expressing programs as lists of strings as more reliable... although I am not sure it is good to encourage (or support) the latter.

string demo error

I tried tweaking the string demo where the target function takes a string (s) and returns s[:-2]+s[:-2]
by making it duplicate and then concatenate the reverse of the string by just changing the return to s[::-1]+s[::-1] and got the error

AttributeError Traceback (most recent call last)
in
40 )
41
---> 42 est.fit(X=X, y=y, verbose=True)
43 print(est._result.program)
44 print(est.predict(X))

~/anaconda3/lib/python3.7/site-packages/pyshgp/gp/estimators.py in fit(self, X, y, verbose)
211 else:
212 y_types = [type(y[0])]
--> 213 output_types = [push_type_for_type(t).name for t in y_types]
214
215 self.evaluator = DatasetEvaluator(X, y, interpreter=self.interpreter)

~/anaconda3/lib/python3.7/site-packages/pyshgp/gp/estimators.py in (.0)
211 else:
212 y_types = [type(y[0])]
--> 213 output_types = [push_type_for_type(t).name for t in y_types]
214
215 self.evaluator = DatasetEvaluator(X, y, interpreter=self.interpreter)

AttributeError: 'NoneType' object has no attribute 'name'

not sure why

Genetic Operator Pipelines

pyshgp's current genetic operators are large and relatively complex.

Replacing them with many smaller operations that each have their own probabilities could be beneficial. In additon, making pyshgp's current system to combine operations into something more robust and easy to use (which I am calling Operator Pipelines) might be good as well.

  • Simple additions
  • Simple deletions
  • Gaussian perturb some numbers
  • Replace some chars in some string
  • Flip some booleans
  • Perturb close count

An Operator Pipeline which includes all of the above would be equivalent of Uniform Mutation.
An Operator Pipeline could also include recombination.

Bug in gp.py

I've just cloned a fresh repo (and upgraded all my Python binaries), and when I run an example (any of the three) I get the following:

[all the standard setup works OK]

Creating Initial Population
Traceback (most recent call last):
  File "examples/integer_regression.py", line 60, in <module>
    gp.evolution(error_func, problem_params)
  File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/gp.py", line 166, in evolution
    population = generate_random_population(evolutionary_params)
  File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/gp.py", line 97, in generate_random_population
    rand_genome = r.random_plush_genome(evolutionary_params)
TypeError: random_plush_genome() takes exactly 2 arguments (1 given)

Push Instruction Set Tests

Current tests should be removed because they are too difficult to maintain and are hand written, so they probably don't cover enough cases. Tests should ideally be generated.

The difficulty of this is that the output of a program is the state of all the stacks. It is difficult (not possible?) to know what the expected output of a generated push program without running it, unless it isn't generated completely randomly. How can you generate a program in such a way that you know what its output should be?

Also, it is just as important to know what the output of a program should not be. In other words, if we are testing a random program we will have to determine what values should be expected on the stacks after execution. We will also have to check that no other values are on the stacks after execution. This is difficult to check for with programs that were generated with any degree of randomness in them.

GP Tests

Okay, so I didn't exactly use Test Driven Development... Unit tests should still be added for GP related operations.

Exact implementation TBD because of the random nature of all the operators.

Vote instructions need improving.

Currently class vote instructions require a numeric argument, which adds burden to evolution. It would be beneficial to add vote instructions that increment and decrement vote levels for each class by a constant number baked into the instruction.

  • Add vote inc and vote dec instructions
  • Add vote inc and dec instructions that vote powers of 2.

Add generalization_function to SimplePushGPEvolver attributes

After automatic program simplification, run this function on the program to determine if it generalizes. generalization_function will look very similar to error function.

Also, consider refactoring examples to include 1 function which produces an error_function and a generalization_function.

How do I generate predictions with the best model from a run?

I'm starting with the iris example:

from sklearn import datasets, model_selection
import numpy as np

import pyshgp.gp.base as gp


iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data,
                                                                    iris.target,
                                                                    test_size=0.5)

model = gp.PushGPClassifier(population_size=100, max_generations=50)
model.fit(X_train, y_train)

I tried model.score(X_test, y_test), but it complained that there is no predict function. Is there an easy way to create a predict function?

Dumping and loading models

Is there a way to dump a model and load it later for prediction? Specifically, how do I use the final program printed out at the end of the GP run to perform a prediction task at later times?

Keyboard interrupt ^C does not halt processing

I was running one of the examples on my laptop, and got super boooooored, but ctrl-C just raises some kind of caught exception and leaves a pile of Python processes running.

Are you by any chance catching all exceptions, including KeyboardInterrupt? Because that's not really the way I would like ctrl-C to work, as it turns out. ๐Ÿ˜ƒ

Use a dev branch

Great to see all this progress on the pyshgp package. I strongly recommend using a separate development branch for developing the module, and dedicate the master branch to the latest release on pip. I got thrown off for a bit because I was installing pyshgp via pip but referring to examples on the latest dev version on GitHub.

Update README.md

Due to lots of recent changes the readme need to be completely re-written.

Break Up Uniform Mutation

Uniform mutation is a rather large variation operator. Now that pyshgp supports GeneticOperatorPipelines it would give the user more control to break UM into:

  • PerturbClosesMutuation
  • PerturbIntegerMutation
  • PerturbFloatMutation
  • TweakStringMutation
  • FlipBooleanMutation
  • RandomDeletionMutation
  • RandomAdditionMutation
  • RandomReplaceMutation
  • Genesis
  • Reproduction

Reformat Docs. Add autodoc API documentation.

Docs should be overhauled and largely gutted. Most of the PushGP descriptions should live with the nearly-complete Push-Redux which currently lives at https://erp12.github.io/push-redux/

I also just found out about the power of pairing ReadTheDocs and Autodoc. I am in the process of adding an auto-generated API pages to the ReadTheDocs documentation site. Unfortunately, I don't think there is a good way to use this to document the instruction set, so we will continue to rely on the hack-y comment scraper for now.

odd number tutorial error

I tried to run the odd number tutorial and got an error message reading[ init missing 1 required positional argument: "spawner"], not sure why this is happening

Issue with parallelization in regression.py

Hi, when I try running the regression.py with n_jobs = -1 I get the following error

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

What needs to be done to add a new example?

I've got a couple of example problems I use in my GP classes and workshops, and they seem to be working already in Pysh. Aside from in-file docs, what needs to be added, and where, before I submit a PR?

Pysh Exceptions Module

Probably bad form to use the generic Exception object for everything...

  • /pysh/gp/operators.py: raise Exception("Tried to perform unknown genetic operator " + str(op))
  • /pysh/gp/selection.py: raise Exception("Unknown selection method: " + str(evolutionary_params["selection_method"]))
  • /pysh/push/translation.py: raise Exception('Something bad found on paren_stack!')

Drop python 2 support. Add type hints.

Python 2 is more work than it is worth. Type hints are a nice.

This will involve

  • Removing import from future
  • Refactoring the is_?_type() functions in util.
  • Updating CI

Develop a very simple push storage system (aka tags)

To learn more about tags, see this.

In order to enable research on the best use of tag (or tag-like) systems, we need a basic framework for push program general information storage.

Some simple ideas include:

  • 3 variables per datatype

Command line args don't work as described

I tried python examples/idea_of_numbers.py --population_size=200 and it errors out immediately with

Traceback (most recent call last):
  File "examples/idea_of_numbers.py", line 59, in <module>
    gp.evolution(error_func, problem_params)
  File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/gp.py", line 148, in evolution
    params.grab_command_line_params(evolutionary_params)
  File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/params.py", line 143, in grab_command_line_params
    while sys.argv[i+j].startswith('-'):
IndexError: list index out of range

(the idea_of_numbers.py file is my own)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.