Giter Club home page Giter Club logo

osdt's Introduction

Optimal Sparse Decision Trees (OSDT)

This accompanies the paper, "Optimal Sparse Decision Trees" by Xiyang Hu, Cynthia Rudin, and Margo Seltzer.

It appeared in the 2019 NeurIPS conference

Dependencies

  • gmp (GNU Multiple Precision Arithmetic Library)
  • mpfr (GNU MPFR Library for multiple-precision floating-point computations; depends on gmp)
  • libmpc (GNU MPC for arbitrarily high precision and correct rounding; depends on gmp and mpfr)
  • gmpy2 (GMP/MPIR, MPFR, and MPC interface to Python 2.6+ and 3.x)

Main function

The main function is the bbound() function in osdt.py.

Arguments

[x] The features of the training data.

[y] The labels of the training data.

[lamb] The regularization parameter lambda of the objective function.

[prior_metric] The scheduling policy.

  • Use curiosity to prioritize by curiosity (see Section 5 of our paper).
  • Use bound to prioritize by the lower bound.
  • Use objective to prioritize by the objective.
  • Use FIFO for first-in-first-out search.

[MAXDEPTH] Maximum depth of the tree. Default value is float('Inf').

[MAX_NLEAVES] Maximum number of leaves of the tree. Default value is float('Inf').

[niter] Maximum number of tree evaluations. Default value is float('Inf').

[logon] Record relevant trees and values during the execution. Default if False.

[support] Turn on Lower bound on leaf support. Default is True.

[incre_support] Turn on Lower bound on incremental classification accuracy. Default is True.

[accu_support] Turn on Lower bound on classification accuracy. Default is True.

[equiv_points] Turn on Equivalent points bound. Default is True.

[lookahead] Turn on Lookahead bound. Default is True.

[lenbound] Turn on Prefix-specific upper bound on number of leaves. Default is True.

[timelimit] Time limit on the running time. Default is True.

[init_cart] Initialize with CART. Default is True.

Example test code

We provide our test code in test_accuracy.py.

Dataset

See data/preprocessed/.

We used 7 datasets: Five of them are from the UCI Machine Learning Repository (tic-tac-toc, car evaluation, monk1, monk2, monk3). The other two datasets are the ProPublica recidivism data set and the Fair Isaac (FICO) credit risk datasets. We predict which individuals are arrested within two years of release ({N = 7,215}) on the recidivism data set and whether an individual will default on a loan for the FICO dataset.

osdt's People

Contributors

xiyanghu avatar margoseltzer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.