Giter Club home page Giter Club logo

Comments (20)

annaveronika avatar annaveronika commented on May 21, 2024 26

We don't have support for sparse matrices yet. We don't have a concrete plan for adding it, but this is a great feature request, we probably add it later.

from catboost.

annaveronika avatar annaveronika commented on May 21, 2024 14

We are very happy to announce that this is finally implemented in catboost 0.17!

from catboost.

kizill avatar kizill commented on May 21, 2024 4

We are on a stage of internal code review(working on code for sparse column scoring) so it's a matter of weeks.

from catboost.

AndreyGurevich avatar AndreyGurevich commented on May 21, 2024 3

It`s planned! Great! 😄

from catboost.

annaveronika avatar annaveronika commented on May 21, 2024 2

There were two major speedups in latest releases, we plan for one more for CPU. Plus we have not yet supported sparse matrix as input data type. So no, it's not supported yet, but we are working on that.
You can try using catboost for sparse data now - if you have something like one-hot encoding, it should work fast enough. But you need to use dense matrix as input data.

from catboost.

bratao avatar bratao commented on May 21, 2024

Version 0.14 contains this on the changelog:
"Impressive speedups for sparse datsets. Will depend on the dataset, but will be at least 2--3 times for sparse data."

Should we consider that catboost already support sparse matrixes?

from catboost.

devforfu avatar devforfu commented on May 21, 2024

When I try to create a cb.Pool object using sparse data, I am getting the error:

trn_pool = cb.Pool(x_trn, y_trn)
# CatboostError: Invalid data type=<class 'scipy.sparse.csr.csr_matrix'>

Does it mean that sparse matrices are not supported yet?

from catboost.

annaveronika avatar annaveronika commented on May 21, 2024

When I try to create a cb.Pool object using sparse data, I am getting the error:

trn_pool = cb.Pool(x_trn, y_trn)
# CatboostError: Invalid data type=<class 'scipy.sparse.csr.csr_matrix'>

Does it mean that sparse matrices are not supported yet?

Yes

from catboost.

PMeiyappan avatar PMeiyappan commented on May 21, 2024

Is there an ETA for release of the sparse matrix support?

from catboost.

belonesox avatar belonesox commented on May 21, 2024

We are very happy to announce that this is finally implemented in catboost 0.17!

How this (sparse martix support) should work? I just («catboost==0.17.2») tried to make a Pool from csr_matrix, got
«_catboost.CatBoostError: only np.ndarray type is supported for cat_feature_data»

from catboost.

andrey-khropov avatar andrey-khropov commented on May 21, 2024

We are very happy to announce that this is finally implemented in catboost 0.17!

How this (sparse martix support) should work? I just («catboost==0.17.2») tried to make a Pool from csr_matrix, got
«_catboost.CatBoostError: only np.ndarray type is supported for cat_feature_data»

Sparse matrices are currently not supported as arguments to FeaturesData.
Supported ways to pass sparse data to Pool constructor or to fit/predict etc. functions is to pass as X argument any of scipy sparse matrix classes (except for dia_matrix which does not make sense as a features matrix) or pandas.DataFrame with sparse columns (the latter case is needed if you have heterogeneous data types for different columns).

We'll update the documentation soon, right now you can look at examples in the tests:

def test_pools_equal_on_dense_and_scipy_sparse_input(dataset):

def test_training_and_prediction_equal_on_pandas_dense_and_sparse_input(task_type, dataset, indexing_kind, boosting_type):

from catboost.

belonesox avatar belonesox commented on May 21, 2024

So right now, categorial and numerical features should be in same sparse matrix of type 'f4'?
(Yes, I tried to create Pool with dense and sparse matrices for numerical and categorial features...)

from catboost.

andrey-khropov avatar andrey-khropov commented on May 21, 2024

So right now, categorial and numerical features should be in same sparse matrix of type 'f4'?

Not necessarily 'f4', any type supported by scipy sparse matrices will work, but 'f4' will be optimal performance-wise, for categorical features it is necessary that these floating point values actually represent integers (i.e. have fractional part equal to 0), it is checked.

Alternatively, you can use pandas.DataFrame with sparse columns.

from catboost.

temakahap avatar temakahap commented on May 21, 2024

Thanks to the sparse-matrix support!

Can you pls update documentation - bacause in these documentation sparse-matrix support is not mentioned(https://catboost.ai/docs/concepts/python-reference_catboostclassifier_fit.html). And also it would be wonderful if there were instruction how to use categorical features in sparse's types.

Thank you!

from catboost.

annaveronika avatar annaveronika commented on May 21, 2024

Sure, the documentation update will be out this week.

from catboost.

andrey-khropov avatar andrey-khropov commented on May 21, 2024

Documentation has been updated with information about sparse data support.

from catboost.

temakahap avatar temakahap commented on May 21, 2024

Hello!

I am trying to fit catboost classifier with sparse matrix'es and gpu type.
And i have this error:
CatBoostError: util/generic/maybe.h:15: TMaybe is empty

Code:
model.fit(train_bootstraped, predict_bootsraped, self.cat_features,
eval_set = (self.X_validation, self.y_validation))empty

Shapes and sums:
print (train_bootstraped.shape, predict_bootsraped.shape, predict_bootsraped.sum()
self.X_validation.shape, self.y_validation.shape)
(7467, 214) (7467,) 2489 (25143, 214) (25143,)

Stackoverflow can't have this error(
Can you pls give an advice how ti avoid this erros?

Model params:
{'iterations': 1500,
'learning_rate': 0.01,
'depth': 6,
'custom_loss':'AUC',
'early_stopping_rounds': 100,
'use_best_model': True,
'task_type': 'GPU'}

from catboost.

annaveronika avatar annaveronika commented on May 21, 2024

Sparse data is only supported for CPU right now. Sorry that the error is not clear, we'll fix it.

from catboost.

annaveronika avatar annaveronika commented on May 21, 2024

Sorry, previous comment is not fully correct. We support sparse data types on GPU (sparse features on GPU are treated like regular features, we'll add full sparse data support for GPU at some point later), I'm not sure why you have this exception, we'll look on this.

from catboost.

annaveronika avatar annaveronika commented on May 21, 2024

Here's an issue for that #1175

from catboost.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.