Comments (1)
HI @mtl-tony -- Inference time speed is a complicated topic, and the real answer is that it depends. Our paper had comparisons on a few datasets. https://arxiv.org/pdf/1909.09223.pdf
If you're looking for ways to improve inference time speed, by far the biggest impact you can make is to make predictions in batches. If you make predictions one sample at a time, then by far the most amount of time is spent executing python code that extracts the raw data from numpy/pandas/scipy and the minority is spent on the actual inference part. This limitation applies to all the popular tree boosting packages that I’ve looked at. As a consequence, most models (assuming good implementations) tend to perform in the same ballpark if you're making predictions one at a time.
For batched predictions there can be quite a bit of variability between model types. EBMs will tend to be faster in scenarios where there are fewer bins, less features, and the features are more important. If there are a lot of unimportant features then other tree algorithms might ignore most of the features, and thus not pay much inference cost for them. This paper describes a method of pruning features from an existing EBM, with one of the goals being improved inference time: https://arxiv.org/pdf/2311.07452.pdf
EBMs also tend to do comparatively better when given Fortran-ordered numpy data, or data in pandas DataFrames (which are typically Fortran-ordered).
If you have categorical data, if you can use pandas then use a CategoricalDtype. It's around 50 times faster than using object arrays. If you need to stick with numpy then for the fastest speeds you’d want to convert your categories to floats yourself and pass in a feature type to indicate it’s a categorical.
Happy to go into more detail if you have a more specific scenario.
from interpret.
Related Issues (20)
- How to get word importance HOT 1
- Development installation: Requirements? HOT 2
- Query: performance prospects on massive data sets (curse of dimensionality?) HOT 3
- How to speed up EBM model? Unbelievable slow. HOT 9
- Question: Parallel boosting? HOT 4
- Integrate EBM into the pytorch framework HOT 7
- Visualising Decision Tree explainer gives a Cytoscape object which is not savable to my local machine HOT 2
- [DP-EBM] Question regarding range R and sensitivity
- Support for more parameters in the Differentially Private models HOT 1
- NAM Model HOT 1
- Some hyperparameter questions HOT 3
- Lookup Table for single feature and feature interaction terms HOT 3
- Operations when merging EBM HOT 6
- EBM Classifier Global Feature Importance x Random Forest Classifier with Morris Sensitivity Analysis HOT 1
- possibility of adding `sample_weight` to `interpret.glassbox.ClassificationTree` HOT 6
- 2d PDP Z-axis colours appear too similar HOT 1
- Exporting EBM as PMML HOT 3
- Feature Request: Passing Validation Set or Index HOT 2
- Explore the data with continuous output and category input HOT 4
- Using the init_score in EBM Classifier HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from interpret.