Giter Club home page Giter Club logo

dtree's Introduction

Dtree - A simple pure-Python decision tree construction algorithm

Overview

Given a training data set, it constructs a decision tree for classification or regression in a single batch or incrementally.

It loads data from CSV files. It expects the first row in the CSV to be a header, with each element conforming to the pattern "name:type:mode". Mode is optional, and denotes the class attribute. Type identifies the attribute as either a continuous, discrete, or nominal.

The module is loosely based on code published by Christopher Roach in his article Building Decision Trees in Python. I refactored his code to be more object-oriented, and extended it to support basic regression.

The class attribute can be either continuous, discrete or nominal, but all other attributes can only be discrete or nominal.

Installation

Download the code and then run:

python setup.py build
sudo python setup.py install

You can also install from PyPI using pip via:

sudo pip install dtree

Or upgrade from an earlier version via:

sudo pip install --upgrade dtree

Usage

Classification and regression are handled through the same interface, and differ only in the object returned by the predict() method and how the result from test() is interpreted.

With classification, this object will always be a DDist instance, representing a probability distribution over a set of discrete or nominal classes. In this case, the result from test() will be a CDist instance representing the classification accuracy.

With regression, this object will always be a CDist instance, representing a mean and variance. In this case, the result from test() will be a CDist instance representing the mean absolute error.

from dtree import Tree, Data

tree = Tree.build(Data('classification-training.csv'))
result = t.test(Data('classification-testing.csv'))
print 'Accuracy:',result.mean
prediction = tree.predict(dict(feature1=123, feature2='abc', feature3='hot'))
print 'best:',prediction.best
print 'probs:',prediction.probs

tree = Tree.build(Data('regression-training.csv'))
result = t.test(Data('regression-testing.csv'))
print 'MAE:',result.mean
prediction = tree.predict(dict(feature1=123, feature2='abc', feature3='hot'))
print 'mean:',prediction.mean
print 'variance:',prediction.variance

Features

  • building a classification or regression tree using batch or incremental/online methods

Todo

Does not yet support:

  • sparse training data
  • sparse query vector

History

0.1.0 - 2012.01.24 Initial development.

0.2.0 - 2012.02.08 Refactored to support incremental/online tree construction and forests.

dtree's People

Contributors

chrisspen avatar pombredanne avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.