Giter Club home page Giter Club logo

xgbfir's Introduction

XGBFIR Build Status PyPI version

XGBoost Feature Interactions Reshaped

What is Xgbfir?

Xgbfir is a XGBoost model dump parser, which ranks features as well as feature interactions by different metrics.

This project started as a python port of Xgbfi - XGBoost Feature Interactions & Importance project. Thanks Far0n for great tool and idea!

Some basic description from Xgbfi project page is presented here.

The Metrics

  • Gain: Total gain of each feature or feature interaction
  • FScore: Amount of possible splits taken on a feature or feature interaction
  • wFScore: Amount of possible splits taken on a feature or feature interaction weighted by the probability of the splits to take place
  • Average wFScore: wFScore divided by FScore
  • Average Gain: Gain divided by FScore
  • Expected Gain: Total gain of each feature or feature interaction weighted by the probability to gather the gain
  • Average Tree Index
  • Average Tree Depth

Additional Features

  • Leaf Statistics
  • Split Value Histograms

Installation

You have several options to install Xgbfir.

Using pip

You can install using the pip package manager by running

pip install xgbfir

From source

Clone the repo and install:

git clone https://github.com/limexp/xgbfir.git
cd xgbfir
sudo python setup.py install

Or download the source code by pressing 'Download ZIP' on this page. Install by navigating to the proper directory and running

sudo python setup.py install

Usage

You can use Xgbfir as a python function or as a CLI (Command Line Interface) tool.

Python function

You can produce feature interactions file without saving any model dump file beforehand:

import xgbfir

xgbfir.saveXgbFI(booster) # booster is a XGBoost booster

List of saveXgbFI function parameters:

  • booster - XGBoost booster or XGBClassifier
  • feature_names (default = None) - feature names that will be set in booster
  • OutputXlsxFile (default = 'XgbFeatureInteractions.xlsx') - output file name
  • MaxTrees (default = 100) - Upper bound for trees to be parsed
  • MaxInteractionDepth (default = 2) - Upper bound for extracted feature interactions depth
  • MaxDeepening (default = -1) - Upper bound for interaction start deepening (zero deepening => interactions starting at root only)
  • TopK (default = 100) - Upper bound for exported feature interactions per depth level
  • MaxHistograms (default = 10) - Maximum number of histograms
  • SortBy (default = 'Gain') - Score metric to sort by (Gain, FScore, wFScore, AvgwFScore, AvgGain, ExpGain)

Python example

Take a look at this example of usage (available in examples):

from sklearn.datasets import load_iris, load_boston
import xgboost as xgb
import xgbfir

# loading database
boston = load_boston()

# doing all the XGBoost magic
xgb_rmodel = xgb.XGBRegressor().fit(boston['data'], boston['target'])

# saving to file with proper feature names
xgbfir.saveXgbFI(xgb_rmodel, feature_names=boston.feature_names, OutputXlsxFile='bostonFI.xlsx')


# loading database
iris = load_iris()

# doing all the XGBoost magic
xgb_cmodel = xgb.XGBClassifier().fit(iris['data'], iris['target'])

# saving to file with proper feature names
xgbfir.saveXgbFI(xgb_cmodel, feature_names=iris.feature_names, OutputXlsxFile='irisFI.xlsx')

CLI

Xgbfir can be run as a console tool with the following command:

xgbfir [options]

Use the following command for help:

xgbfir --help

XGBoost model dump must be created before running xgbfir.

To dump a model with proper feature names use the following code:

booster.feature_names = list(feature_names) # set names for XGBoost booster
booster.dump_model('xgb.dump', with_stats=True)

Dependencies

  • python (2.7+ or 3.5+)
  • xlsxwriter (>=0.9.3)

Contributing

Please see CONTRIBUTING.md.

Feel free to open issues or pull requests.

xgbfir's People

Contributors

limexp avatar marigold avatar

Watchers

Bai Feng avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.