Giter Club home page Giter Club logo

Comments (5)

BrianMiner avatar BrianMiner commented on May 21, 2024 3

Interaction strength is mentioned in the docs but does not appear to be implemented

from catboost.

Donskov7 avatar Donskov7 commented on May 21, 2024

@msampathkumar, yes, now model.feature_importance_ and model.feature_importance the same thing. By default, CatBoost don't calculates feature_importance_, if u need it we provide additional parameter calc_feature_importance=True.

from catboost.

i3v avatar i3v commented on May 21, 2024

Hi Donskov7,

I might be missing something, but it looks like there's something wrong with the calc_feature_importance=True approach you suggest.

With the current version of catboost (0.1.1.5, installed with pip) , the following test

import unittest


class TestCatBoost(unittest.TestCase):

    def test_importances_builtin(self):
        import catboost as cb
        import numpy as np

        config = {
            'iterations': 10,
            'random_seed': 1949,
            'calc_feature_importance': True,
            'verbose': True
        }

        train_x = np.random.random_integers(0, 255, size=[500, 3])
        train_y = train_x[:, 0] > train_x[:, 1]

        dataset = cb.Pool(train_x, label=train_y)

        booster = cb.CatBoost(params=config)
        booster.fit(dataset)

        # The `feature_importance` attribute is dynamically added if `calc_feature_importance==True`
        # noinspection PyUnresolvedReferences
        feature_importance = booster.feature_importance

        # The last column of `train_x` is not used, so it's importance should be low
        # values in `feature_importance` seem to be in percents
        print(feature_importance)
        self.assertGreater(feature_importance[0], 30)
        self.assertGreater(feature_importance[1], 30)
        self.assertLess(feature_importance[2], 2)


if __name__ == "__main__":
    unittest.main()

fails with:

File "C:\ProgramFiles\Anaconda\envs\matlab_py36\lib\site-packages\catboost\core.py", line 470, in fit
return self._fit(X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, plot)
File "C:\ProgramFiles\Anaconda\envs\matlab_py36\lib\site-packages\catboost\core.py", line 424, in _fit
self._train(X, eval_set, params)
File "_catboost.pyx", line 741, in _catboost._CatBoostBase._train (c:\users\donskov.ya\build\build_root\6830666230336978356274386c383661\catboost\python-package\catboost_catboost.pyx.cpp:13659)
File "_catboost.pyx", line 591, in _catboost._CatBoost._train (c:\users\donskov.ya\build\build_root\6830666230336978356274386c383661\catboost\python-package\catboost_catboost.pyx.cpp:9728)
File "_catboost.pyx", line 609, in _catboost._CatBoost._train (c:\users\donskov.ya\build\build_root\6830666230336978356274386c383661\catboost\python-package\catboost_catboost.pyx.cpp:9522)
_catboost.CatboostError: kov/documents/arcadia/catboost/libs/algo/params.cpp:288: invalid parameter: calc_feature_importance

It looks like a quick-and-dirty fix is to modify this to drop feature_importance key before passing params to _train().

        if 'calc_feature_importance' in params:
            del params['calc_feature_importance']            
        with log_fixup():              
            self._train(X, eval_set, params)

Different approach

But, AFAIU, there's no real reason to do those"quick-and-dirty modifications" - it looks like a better workaround is to manually call the feature_importances :

import unittest


class TestCatBoost(unittest.TestCase):

    def test_importances_standalone(self):
        import catboost as cb
        import numpy as np

        np.random.seed(1)

        config = {
            'iterations': 10,
            'random_seed': 1949,
            'verbose': True
        }

        train_x = np.random.random_integers(0, 255, size=[500, 3])
        train_y = train_x[:, 0] > train_x[:, 1]

        dataset = cb.Pool(train_x, label=train_y)

        booster = cb.CatBoost(params=config)
        booster.fit(dataset)
        feature_importance = booster.feature_importances(dataset)

        # The last column of `train_x` is not used, so it's importance should be low
        # values in `feature_importance` seem to be in percents
        #
        # Values I get are: [60.205931567769525, 38.85239652630154, 0.9416719059289234]
        print(feature_importance)
        self.assertGreater(feature_importance[0], 30)
        self.assertGreater(feature_importance[1], 30)
        self.assertLess(feature_importance[2], 2)


if __name__ == "__main__":
    unittest.main()

Also, currently the "calc_feature_importance" parameter is not mentioned in a list of parameters, while the second approach is documented.

By the way, there's currently no way to get "Feature interaction strength" via python interface, right?

from catboost.

Donskov7 avatar Donskov7 commented on May 21, 2024

@i3v, thanks for your report, unfortunatly now it's not mentioned in documentation, but how it's really worked:
we allowed to provide some kwargs in addition to our train params (additional params, like calc_feature_importance=True). In CatBoostClassifier constructor u can see **kwargs. Also it's work for CatBoost class, so u can fit calc_feature_importance=True to parameter kwargs and it will work fine.
config = { 'iterations': 10, 'random_seed': 1949, 'kwargs': {'calc_feature_importance': True}, 'verbose': True }
Maybe in future we would change it and do more "clearly".
UPDATE: And of course we add it to documentation, thanks again :)

from catboost.

annaveronika avatar annaveronika commented on May 21, 2024

added interaction strength in the last version

from catboost.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.