CatBoostClassifier Example from <a href="https://tech.yandex.com/catb

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add Interaction strength to python package about catboost HOT 5 CLOSED

catboost commented on May 21, 2024 1

Add Interaction strength to python package

from catboost.

Comments (5)

BrianMiner commented on May 21, 2024 3

Interaction strength is mentioned in the docs but does not appear to be implemented

from catboost.

Donskov7 commented on May 21, 2024

@msampathkumar, yes, now model.feature_importance_ and model.feature_importance the same thing. By default, CatBoost don't calculates feature_importance_, if u need it we provide additional parameter calc_feature_importance=True.

from catboost.

i3v commented on May 21, 2024

Hi Donskov7,

I might be missing something, but it looks like there's something wrong with the calc_feature_importance=True approach you suggest.

With the current version of catboost (0.1.1.5, installed with pip) , the following test

import unittest


class TestCatBoost(unittest.TestCase):

    def test_importances_builtin(self):
        import catboost as cb
        import numpy as np

        config = {
            'iterations': 10,
            'random_seed': 1949,
            'calc_feature_importance': True,
            'verbose': True
        }

        train_x = np.random.random_integers(0, 255, size=[500, 3])
        train_y = train_x[:, 0] > train_x[:, 1]

        dataset = cb.Pool(train_x, label=train_y)

        booster = cb.CatBoost(params=config)
        booster.fit(dataset)

        # The `feature_importance` attribute is dynamically added if `calc_feature_importance==True`
        # noinspection PyUnresolvedReferences
        feature_importance = booster.feature_importance

        # The last column of `train_x` is not used, so it's importance should be low
        # values in `feature_importance` seem to be in percents
        print(feature_importance)
        self.assertGreater(feature_importance[0], 30)
        self.assertGreater(feature_importance[1], 30)
        self.assertLess(feature_importance[2], 2)


if __name__ == "__main__":
    unittest.main()

fails with:

File "C:\ProgramFiles\Anaconda\envs\matlab_py36\lib\site-packages\catboost\core.py", line 470, in fit
return self._fit(X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, plot)
File "C:\ProgramFiles\Anaconda\envs\matlab_py36\lib\site-packages\catboost\core.py", line 424, in _fit
self._train(X, eval_set, params)
File "_catboost.pyx", line 741, in _catboost._CatBoostBase._train (c:\users\donskov.ya\build\build_root\6830666230336978356274386c383661\catboost\python-package\catboost_catboost.pyx.cpp:13659)
File "_catboost.pyx", line 591, in _catboost._CatBoost._train (c:\users\donskov.ya\build\build_root\6830666230336978356274386c383661\catboost\python-package\catboost_catboost.pyx.cpp:9728)
File "_catboost.pyx", line 609, in _catboost._CatBoost._train (c:\users\donskov.ya\build\build_root\6830666230336978356274386c383661\catboost\python-package\catboost_catboost.pyx.cpp:9522)
_catboost.CatboostError: kov/documents/arcadia/catboost/libs/algo/params.cpp:288: invalid parameter: calc_feature_importance

It looks like a quick-and-dirty fix is to modify this to drop feature_importance key before passing params to _train().

        if 'calc_feature_importance' in params:
            del params['calc_feature_importance']            
        with log_fixup():              
            self._train(X, eval_set, params)

Different approach

But, AFAIU, there's no real reason to do those"quick-and-dirty modifications" - it looks like a better workaround is to manually call the feature_importances :

import unittest


class TestCatBoost(unittest.TestCase):

    def test_importances_standalone(self):
        import catboost as cb
        import numpy as np

        np.random.seed(1)

        config = {
            'iterations': 10,
            'random_seed': 1949,
            'verbose': True
        }

        train_x = np.random.random_integers(0, 255, size=[500, 3])
        train_y = train_x[:, 0] > train_x[:, 1]

        dataset = cb.Pool(train_x, label=train_y)

        booster = cb.CatBoost(params=config)
        booster.fit(dataset)
        feature_importance = booster.feature_importances(dataset)

        # The last column of `train_x` is not used, so it's importance should be low
        # values in `feature_importance` seem to be in percents
        #
        # Values I get are: [60.205931567769525, 38.85239652630154, 0.9416719059289234]
        print(feature_importance)
        self.assertGreater(feature_importance[0], 30)
        self.assertGreater(feature_importance[1], 30)
        self.assertLess(feature_importance[2], 2)


if __name__ == "__main__":
    unittest.main()

Also, currently the "calc_feature_importance" parameter is not mentioned in a list of parameters, while the second approach is documented.

By the way, there's currently no way to get "Feature interaction strength" via python interface, right?

from catboost.

Donskov7 commented on May 21, 2024

@i3v, thanks for your report, unfortunatly now it's not mentioned in documentation, but how it's really worked:
we allowed to provide some kwargs in addition to our train params (additional params, like calc_feature_importance=True). In CatBoostClassifier constructor u can see **kwargs. Also it's work for CatBoost class, so u can fit calc_feature_importance=True to parameter kwargs and it will work fine.
config = { 'iterations': 10, 'random_seed': 1949, 'kwargs': {'calc_feature_importance': True}, 'verbose': True }
Maybe in future we would change it and do more "clearly".
UPDATE: And of course we add it to documentation, thanks again :)

from catboost.

annaveronika commented on May 21, 2024

added interaction strength in the last version

from catboost.

Add Interaction strength to python package about catboost HOT 5 CLOSED

Comments (5)

Different approach

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent