Giter Club home page Giter Club logo

pyshgp's Issues

Bug in gp.py

I've just cloned a fresh repo (and upgraded all my Python binaries), and when I run an example (any of the three) I get the following:

[all the standard setup works OK]

Creating Initial Population
Traceback (most recent call last):
  File "examples/integer_regression.py", line 60, in <module>
    gp.evolution(error_func, problem_params)
  File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/gp.py", line 166, in evolution
    population = generate_random_population(evolutionary_params)
  File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/gp.py", line 97, in generate_random_population
    rand_genome = r.random_plush_genome(evolutionary_params)
TypeError: random_plush_genome() takes exactly 2 arguments (1 given)

suggestion: Name predicate instructions to indicate boolean return value

I am noticing instructions like _exec_empty cropping up quite a lot. I realize that in Python variable names can only include underscores and Alnum characters, but I wonder if it might make Push code a bit more readable to name these in the Mathematica style: ending in Q (indicating "question", I guess?).

So for example _exec_empty_Q.

Just a minor suggestion. In Clojure implementations I use exec-empty?, and it helps readability quite a bit.

Instructions are run in a strange way.

  • _handle_?_instruction() implementation should be moved out of interpreter and class definition of corresponding instruction type.
  • PushInterpreter.execute_instruction() should be broken into PushInterpreter.eval_atom() and a execute() method found in each instruction class definition.
  • Documentation for all of this should be much better.

Update README.md

Due to lots of recent changes the readme need to be completely re-written.

Add generalization_function to SimplePushGPEvolver attributes

After automatic program simplification, run this function on the program to determine if it generalizes. generalization_function will look very similar to error function.

Also, consider refactoring examples to include 1 function which produces an error_function and a generalization_function.

Reformat Docs. Add autodoc API documentation.

Docs should be overhauled and largely gutted. Most of the PushGP descriptions should live with the nearly-complete Push-Redux which currently lives at https://erp12.github.io/push-redux/

I also just found out about the power of pairing ReadTheDocs and Autodoc. I am in the process of adding an auto-generated API pages to the ReadTheDocs documentation site. Unfortunately, I don't think there is a good way to use this to document the instruction set, so we will continue to rely on the hack-y comment scraper for now.

odd number tutorial error

I tried to run the odd number tutorial and got an error message reading[ init missing 1 required positional argument: "spawner"], not sure why this is happening

Push Instruction Set Tests

Current tests should be removed because they are too difficult to maintain and are hand written, so they probably don't cover enough cases. Tests should ideally be generated.

The difficulty of this is that the output of a program is the state of all the stacks. It is difficult (not possible?) to know what the expected output of a generated push program without running it, unless it isn't generated completely randomly. How can you generate a program in such a way that you know what its output should be?

Also, it is just as important to know what the output of a program should not be. In other words, if we are testing a random program we will have to determine what values should be expected on the stacks after execution. We will also have to check that no other values are on the stacks after execution. This is difficult to check for with programs that were generated with any degree of randomness in them.

Use a dev branch

Great to see all this progress on the pyshgp package. I strongly recommend using a separate development branch for developing the module, and dedicate the master branch to the latest release on pip. I got thrown off for a bit because I was installing pyshgp via pip but referring to examples on the latest dev version on GitHub.

Vote instructions need improving.

Currently class vote instructions require a numeric argument, which adds burden to evolution. It would be beneficial to add vote instructions that increment and decrement vote levels for each class by a constant number baked into the instruction.

  • Add vote inc and vote dec instructions
  • Add vote inc and dec instructions that vote powers of 2.

Keyboard interrupt ^C does not halt processing

I was running one of the examples on my laptop, and got super boooooored, but ctrl-C just raises some kind of caught exception and leaves a pile of Python processes running.

Are you by any chance catching all exceptions, including KeyboardInterrupt? Because that's not really the way I would like ctrl-C to work, as it turns out. ๐Ÿ˜ƒ

Break Up Uniform Mutation

Uniform mutation is a rather large variation operator. Now that pyshgp supports GeneticOperatorPipelines it would give the user more control to break UM into:

  • PerturbClosesMutuation
  • PerturbIntegerMutation
  • PerturbFloatMutation
  • TweakStringMutation
  • FlipBooleanMutation
  • RandomDeletionMutation
  • RandomAdditionMutation
  • RandomReplaceMutation
  • Genesis
  • Reproduction

Possible overhaul of instructions.

Note: Maybe this whole issue can be ignored if we want to migrate the push interpreter to CPushPush.

Instructions

We should easily be able to rely more on inheritance and add more functionality to the constructor of Instruction in order to remove a lot of code duplication.

See the way instructions are made in Propel and CPushPush for examples of patterns that are much better than what is currently in pyshgp.

Instruction Set

Each python module which defines Push Instructions should define __all__ and then we can safely use import * in the instructions __init__.py. This should make the whole instructions sub-package much easier to understand.

Tests

This time we should put a lot more thought into removing the code duplication. Here is a sketch of what I would currently suggest, although more thought should maybe be put into this:

common_tests = [
    [
        {'_integer': [1, 2, 3]},
        {'_integer': [1, 2]},
        '_integer_pop'
    ],
    [
        {'_boolean': [True]},
        {'_boolean': []},
        '_boolean_pop'
    ],
    [
        {'_string': ['A']},
        {'_string': ['A', 'A']},
        '_boolean_pop'
    ],
    [
        {'_float': [1.5, 2.3]},
        {'_float': [2.3, 1.5]},
        '_float_swap'
    ],
    [
        {'_integer': [1, 2, 3]},
        {'_integer': [2, 3, 1]},
        '_integer_rot'
    ]
]

for test in common_tests:
    assert run_test(*test)

GP Tests

Okay, so I didn't exactly use Test Driven Development... Unit tests should still be added for GP related operations.

Exact implementation TBD because of the random nature of all the operators.

Issue with parallelization in regression.py

Hi, when I try running the regression.py with n_jobs = -1 I get the following error

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Remove Twilio

It was a fun idea, but difficult to make usable for anyone other than me. ๐Ÿ˜ข

End-to-end benchmarks to guide development.

It is hard to judge the impact of many of our changes during CI because full evolutionary runs can take weeks of CPU time. The best we can hope for is a benchmarking tool that can manually start a significant number of benchmarks and creates a report.

These benchmarks should be tracking runtime and pyshgp's ability to find solutions.

Things that need to be done to complete this work:

  • Implement more of the software synthesis benchmark problems.
  • Write a script to start n number of runs on x non-trivial problems, and track the runtime and solution of each. Ideally runs would happen in parallel. On fly? On digital ocean?
  • Determine some way of storing runtimes and solution rates long term.

Need consistent usage of '_' in instruction names.

Currently I am leaning towards always using the _ when referencing instruction names because it is indicates that the string is probably and instruction name, and will make expressing programs as lists of strings as more reliable... although I am not sure it is good to encourage (or support) the latter.

Re-document Examples

Pysh has changed enough over the past few months that the documentations about the examples is getting fairly out of date.

This could wait until #44 is done.

Command line args don't work as described

I tried python examples/idea_of_numbers.py --population_size=200 and it errors out immediately with

Traceback (most recent call last):
  File "examples/idea_of_numbers.py", line 59, in <module>
    gp.evolution(error_func, problem_params)
  File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/gp.py", line 148, in evolution
    params.grab_command_line_params(evolutionary_params)
  File "/usr/local/lib/python2.7/site-packages/pyshgp/gp/params.py", line 143, in grab_command_line_params
    while sys.argv[i+j].startswith('-'):
IndexError: list index out of range

(the idea_of_numbers.py file is my own)

Genetic Operator Pipelines

pyshgp's current genetic operators are large and relatively complex.

Replacing them with many smaller operations that each have their own probabilities could be beneficial. In additon, making pyshgp's current system to combine operations into something more robust and easy to use (which I am calling Operator Pipelines) might be good as well.

  • Simple additions
  • Simple deletions
  • Gaussian perturb some numbers
  • Replace some chars in some string
  • Flip some booleans
  • Perturb close count

An Operator Pipeline which includes all of the above would be equivalent of Uniform Mutation.
An Operator Pipeline could also include recombination.

Develop a very simple push storage system (aka tags)

To learn more about tags, see this.

In order to enable research on the best use of tag (or tag-like) systems, we need a basic framework for push program general information storage.

Some simple ideas include:

  • 3 variables per datatype

Drop python 2 support. Add type hints.

Python 2 is more work than it is worth. Type hints are a nice.

This will involve

  • Removing import from future
  • Refactoring the is_?_type() functions in util.
  • Updating CI

string demo error

I tried tweaking the string demo where the target function takes a string (s) and returns s[:-2]+s[:-2]
by making it duplicate and then concatenate the reverse of the string by just changing the return to s[::-1]+s[::-1] and got the error

AttributeError Traceback (most recent call last)
in
40 )
41
---> 42 est.fit(X=X, y=y, verbose=True)
43 print(est._result.program)
44 print(est.predict(X))

~/anaconda3/lib/python3.7/site-packages/pyshgp/gp/estimators.py in fit(self, X, y, verbose)
211 else:
212 y_types = [type(y[0])]
--> 213 output_types = [push_type_for_type(t).name for t in y_types]
214
215 self.evaluator = DatasetEvaluator(X, y, interpreter=self.interpreter)

~/anaconda3/lib/python3.7/site-packages/pyshgp/gp/estimators.py in (.0)
211 else:
212 y_types = [type(y[0])]
--> 213 output_types = [push_type_for_type(t).name for t in y_types]
214
215 self.evaluator = DatasetEvaluator(X, y, interpreter=self.interpreter)

AttributeError: 'NoneType' object has no attribute 'name'

not sure why

Dumping and loading models

Is there a way to dump a model and load it later for prediction? Specifically, how do I use the final program printed out at the end of the GP run to perform a prediction task at later times?

How do I generate predictions with the best model from a run?

I'm starting with the iris example:

from sklearn import datasets, model_selection
import numpy as np

import pyshgp.gp.base as gp


iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data,
                                                                    iris.target,
                                                                    test_size=0.5)

model = gp.PushGPClassifier(population_size=100, max_generations=50)
model.fit(X_train, y_train)

I tried model.score(X_test, y_test), but it complained that there is no predict function. Is there an easy way to create a predict function?

Pysh Exceptions Module

Probably bad form to use the generic Exception object for everything...

  • /pysh/gp/operators.py: raise Exception("Tried to perform unknown genetic operator " + str(op))
  • /pysh/gp/selection.py: raise Exception("Unknown selection method: " + str(evolutionary_params["selection_method"]))
  • /pysh/push/translation.py: raise Exception('Something bad found on paren_stack!')

What needs to be done to add a new example?

I've got a couple of example problems I use in my GP classes and workshops, and they seem to be working already in Pysh. Aside from in-file docs, what needs to be added, and where, before I submit a PR?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.