Giter Club home page Giter Club logo

gtrees's Introduction

gtrees: Genetic Trees in Python

gtrees is a package for building decision tree classifiers using evolutionary methods. gtrees allows for more customization in tree structure and fit strategies than normal tree growing methods.

A common use case of gtrees is to build small, shallow trees whose leaf nodes are used to feed data into downstream models. In this case, the tree can be interpreted as a segmentation that optimally separates the data before training individual models within each segment.

Concepts

The gtrees package uses a unique structure to represent trees. gtrees separates a built tree into two parts: it's node structure, which determines what leaf a row is associated with, and a function per leaf that makes the prediction for rows associated with that leaf. Predicting the value of a row requires first traversing the tree with the values of that row to find the leaf node it is associated with and then finding the leaf node AND using the map to lookup the value.

This allows one to configure different prediction functions associated with tree leaves. Most trees use the mean of the values in the leaf for future predictions. gtrees allows one to specify even more complicated models, such as regressions.

The loss function optimized by the tree is configurable, as is the leaf Unlike many decision tree algorithms, gtrees allows the user to customize the loss function that the tree attempts to optimize.

Terms

tree

A Tree is an object that takes input data and determines what leaf it ends up in. Unlike many tree implementations, the Tree itself doesn't store data about the value of a leaf. That data must b stored externally.

loss_fn

A loss_fn is a function that takes data rows, the predicted targets for those rows, and the actual targets for those rows, and returns a single value that determines the "LOSS" or "COST" of that prediction (lower cost/loss is better)

def loss_fn(predicted_targets, actual_targets) -> float

A loss function must be additive (so, one should not apply a mean as a part of it)

leaf_prediction_fn

A leaf_prediction_fn is a function which takes the features and actual targets that end up in a leaf and returns a Series of the predictions for each row ending up in that leaf. It is typically a constant function whose value is either the mean good rate in that leaf (among the actual targets) or the median target, but can be anything else

def leaf_prediction_fn(features) -> pd.Series

leaf_prediction_builder

A leaf_prediction_builder is a function which takes the features and actual targets that end up in a TRANING leaf and returns a leaf_prediction_fn. This leaf_prediction_fn is used to predict the value of testing rows that end up in the same leaf.

def leaf_prediction_builder(features, actual_targets) -> leaf_prediction_fn

leaf_prediction_map

A leaf_prediction_map is a map of leaf ids (eg their hash) to the leaf_prediction_fn for that leaf. One can only use a tree to score data if one has a leaf_prediction_map. This design allows on to use the same tree as a subset of another tree without having their leaf values become entangled.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.