Giter Club home page Giter Club logo

hibachi's People

Contributors

pschmitt52 avatar rhiever avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hibachi's Issues

Cross-validation strategies

We're currently experiencing issues with hibachi where it seems to be overfitting on the training dataset and doesn't create a model that generalizes to other datasets. Let's try changing the fitness evaluation procedure to help promote generalization ability of the generated GP model.

1) Random subsampling

For every individual, create 10 random subsamples of 25% of the dataset with replacement (this means that one sample can potentially be chosen multiple times for the same subsample). Evaluate the individual on those 10 subsamples and use the average igsum over the 10 subsamples as the igsum portion of the fitness.

(Note: Every individual should have 10 different random subsamples, so make sure the random seed isn't reset each time.)

2) 10-fold cross-validation

For every individual, randomly divide the dataset up into 10 even "folds." Evaluate the individual on each fold separately then use the average igsum over the 10 folds as the igsum portion of the fitness.

(Note 1: This procedure is similar to 1) but guarantees that there is no overlap between the subsamples.)

(Note 2: Every individual should have 10 different folds, so make sure the random seed isn't reset each time.)

3) Adding random noise

Instead of subsampling, take the entire dataset and randomly change 10% of the feature values to another value (e.g., 0 can be changed to 1 or 2; 1 can be changed to 0 or 2; etc.). This procedure simulates noise in the dataset. Evaluate the individual on the noisy dataset. For every individual, repeat this procedure 10 times and report the average igsum over the 10 noisy copies of the dataset.

(Note: Every individual should have 10 different noisy copies of the dataset, so make sure the random seed isn't reset each time.)

Add column headers to the data set

If the data set does not currently have column headers, add column headers. For the class label column, make sure the column header is Class.

Check to see if models generalize to other data simulated from the same distribution

Steps:

  1. Run hibachi with one randomly-generated data set and store the best individual along with its fitness score.

  2. Test the individual on another randomly-generated data set (with the same number of features and feature types).

  3. Ensure that the fitness on the new data set is relatively close to the fitness on the original data set.

  4. Perform steps 1-3 several times with different data sets and different individuals.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.