Comments (5)
Interaction strength is mentioned in the docs but does not appear to be implemented
from catboost.
@msampathkumar, yes, now model.feature_importance_
and model.feature_importance
the same thing. By default, CatBoost don't calculates feature_importance_
, if u need it we provide additional parameter calc_feature_importance=True
.
from catboost.
Hi Donskov7,
I might be missing something, but it looks like there's something wrong with the calc_feature_importance=True
approach you suggest.
With the current version of catboost (0.1.1.5, installed with pip) , the following test
import unittest
class TestCatBoost(unittest.TestCase):
def test_importances_builtin(self):
import catboost as cb
import numpy as np
config = {
'iterations': 10,
'random_seed': 1949,
'calc_feature_importance': True,
'verbose': True
}
train_x = np.random.random_integers(0, 255, size=[500, 3])
train_y = train_x[:, 0] > train_x[:, 1]
dataset = cb.Pool(train_x, label=train_y)
booster = cb.CatBoost(params=config)
booster.fit(dataset)
# The `feature_importance` attribute is dynamically added if `calc_feature_importance==True`
# noinspection PyUnresolvedReferences
feature_importance = booster.feature_importance
# The last column of `train_x` is not used, so it's importance should be low
# values in `feature_importance` seem to be in percents
print(feature_importance)
self.assertGreater(feature_importance[0], 30)
self.assertGreater(feature_importance[1], 30)
self.assertLess(feature_importance[2], 2)
if __name__ == "__main__":
unittest.main()
fails with:
File "C:\ProgramFiles\Anaconda\envs\matlab_py36\lib\site-packages\catboost\core.py", line 470, in fit
return self._fit(X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, plot)
File "C:\ProgramFiles\Anaconda\envs\matlab_py36\lib\site-packages\catboost\core.py", line 424, in _fit
self._train(X, eval_set, params)
File "_catboost.pyx", line 741, in _catboost._CatBoostBase._train (c:\users\donskov.ya\build\build_root\6830666230336978356274386c383661\catboost\python-package\catboost_catboost.pyx.cpp:13659)
File "_catboost.pyx", line 591, in _catboost._CatBoost._train (c:\users\donskov.ya\build\build_root\6830666230336978356274386c383661\catboost\python-package\catboost_catboost.pyx.cpp:9728)
File "_catboost.pyx", line 609, in _catboost._CatBoost._train (c:\users\donskov.ya\build\build_root\6830666230336978356274386c383661\catboost\python-package\catboost_catboost.pyx.cpp:9522)
_catboost.CatboostError: kov/documents/arcadia/catboost/libs/algo/params.cpp:288: invalid parameter: calc_feature_importance
It looks like a quick-and-dirty fix is to modify this to drop feature_importance
key before passing params
to _train()
.
if 'calc_feature_importance' in params:
del params['calc_feature_importance']
with log_fixup():
self._train(X, eval_set, params)
Different approach
But, AFAIU, there's no real reason to do those"quick-and-dirty modifications" - it looks like a better workaround is to manually call the feature_importances
:
import unittest
class TestCatBoost(unittest.TestCase):
def test_importances_standalone(self):
import catboost as cb
import numpy as np
np.random.seed(1)
config = {
'iterations': 10,
'random_seed': 1949,
'verbose': True
}
train_x = np.random.random_integers(0, 255, size=[500, 3])
train_y = train_x[:, 0] > train_x[:, 1]
dataset = cb.Pool(train_x, label=train_y)
booster = cb.CatBoost(params=config)
booster.fit(dataset)
feature_importance = booster.feature_importances(dataset)
# The last column of `train_x` is not used, so it's importance should be low
# values in `feature_importance` seem to be in percents
#
# Values I get are: [60.205931567769525, 38.85239652630154, 0.9416719059289234]
print(feature_importance)
self.assertGreater(feature_importance[0], 30)
self.assertGreater(feature_importance[1], 30)
self.assertLess(feature_importance[2], 2)
if __name__ == "__main__":
unittest.main()
Also, currently the "calc_feature_importance" parameter is not mentioned in a list of parameters, while the second approach is documented.
By the way, there's currently no way to get "Feature interaction strength" via python interface, right?
from catboost.
@i3v, thanks for your report, unfortunatly now it's not mentioned in documentation, but how it's really worked:
we allowed to provide some kwargs
in addition to our train params (additional params, like calc_feature_importance=True
). In CatBoostClassifier constructor u can see **kwargs
. Also it's work for CatBoost class, so u can fit calc_feature_importance=True
to parameter kwargs
and it will work fine.
config = { 'iterations': 10, 'random_seed': 1949, 'kwargs': {'calc_feature_importance': True}, 'verbose': True }
Maybe in future we would change it and do more "clearly".
UPDATE: And of course we add it to documentation, thanks again :)
from catboost.
added interaction strength in the last version
from catboost.
Related Issues (20)
- Major difference between predictions from trained model HOT 4
- "Plain" train mode still build the oblivious tree HOT 3
- Question for building ordered boosting tree
- Get difference tree result when converting cat_features to numerical values HOT 2
- Why does leaf value in plot tree is related to learning rate?
- How to recursive remove features by best loss ? HOT 2
- Issue with Categorical Feature Encoding in Binary Classification HOT 1
- Request to enable sample weights for Cox and AFT objectives HOT 1
- C++ standalone evaluator multiclass support HOT 1
- The results calculated according to the formula described in the doc are different from the results displayed by the model.
- Documentation: broken links HOT 1
- SetPredictionType(modelHandle, APT_CLASS) is broken HOT 1
- Where is the place for calculating the score function? HOT 2
- Build catboost python package with custom glibc HOT 3
- Custom RMSE loss in tutorial get difference tree structure with the original RMSE loss!!! HOT 3
- Tensor Search Helpers Should Be Unreachable HOT 8
- _catboost.CatBoostError: /src/catboost/catboost/libs/model/model.cpp:564: Too many features in model, ask catboost team for support HOT 3
- If multiple GPUs are present at server and devices parameter is set to specific GPU, catboost allocates GPU memory at other GPUs HOT 1
- Different results with Python and command line HOT 1
- PySpark ML CrossValidator cannot load serialized CrossValidator because it cannot find CatBoostRegressor class HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from catboost.