Comments (20)
We don't have support for sparse matrices yet. We don't have a concrete plan for adding it, but this is a great feature request, we probably add it later.
from catboost.
We are very happy to announce that this is finally implemented in catboost 0.17!
from catboost.
We are on a stage of internal code review(working on code for sparse column scoring) so it's a matter of weeks.
from catboost.
It`s planned! Great! 😄
from catboost.
There were two major speedups in latest releases, we plan for one more for CPU. Plus we have not yet supported sparse matrix as input data type. So no, it's not supported yet, but we are working on that.
You can try using catboost for sparse data now - if you have something like one-hot encoding, it should work fast enough. But you need to use dense matrix as input data.
from catboost.
Version 0.14 contains this on the changelog:
"Impressive speedups for sparse datsets. Will depend on the dataset, but will be at least 2--3 times for sparse data."
Should we consider that catboost already support sparse matrixes?
from catboost.
When I try to create a cb.Pool
object using sparse data, I am getting the error:
trn_pool = cb.Pool(x_trn, y_trn)
# CatboostError: Invalid data type=<class 'scipy.sparse.csr.csr_matrix'>
Does it mean that sparse matrices are not supported yet?
from catboost.
When I try to create a
cb.Pool
object using sparse data, I am getting the error:trn_pool = cb.Pool(x_trn, y_trn) # CatboostError: Invalid data type=<class 'scipy.sparse.csr.csr_matrix'>
Does it mean that sparse matrices are not supported yet?
Yes
from catboost.
Is there an ETA for release of the sparse matrix support?
from catboost.
We are very happy to announce that this is finally implemented in catboost 0.17!
How this (sparse martix support) should work? I just («catboost==0.17.2») tried to make a Pool from csr_matrix, got
«_catboost.CatBoostError: only np.ndarray type is supported for cat_feature_data»
from catboost.
We are very happy to announce that this is finally implemented in catboost 0.17!
How this (sparse martix support) should work? I just («catboost==0.17.2») tried to make a Pool from csr_matrix, got
«_catboost.CatBoostError: only np.ndarray type is supported for cat_feature_data»
Sparse matrices are currently not supported as arguments to FeaturesData.
Supported ways to pass sparse data to Pool constructor or to fit/predict etc. functions is to pass as X argument any of scipy sparse matrix classes (except for dia_matrix which does not make sense as a features matrix) or pandas.DataFrame with sparse columns (the latter case is needed if you have heterogeneous data types for different columns).
We'll update the documentation soon, right now you can look at examples in the tests:
from catboost.
So right now, categorial and numerical features should be in same sparse matrix of type 'f4'?
(Yes, I tried to create Pool with dense and sparse matrices for numerical and categorial features...)
from catboost.
So right now, categorial and numerical features should be in same sparse matrix of type 'f4'?
Not necessarily 'f4', any type supported by scipy sparse matrices will work, but 'f4' will be optimal performance-wise, for categorical features it is necessary that these floating point values actually represent integers (i.e. have fractional part equal to 0), it is checked.
Alternatively, you can use pandas.DataFrame with sparse columns.
from catboost.
Thanks to the sparse-matrix support!
Can you pls update documentation - bacause in these documentation sparse-matrix support is not mentioned(https://catboost.ai/docs/concepts/python-reference_catboostclassifier_fit.html). And also it would be wonderful if there were instruction how to use categorical features in sparse's types.
Thank you!
from catboost.
Sure, the documentation update will be out this week.
from catboost.
Documentation has been updated with information about sparse data support.
from catboost.
Hello!
I am trying to fit catboost classifier with sparse matrix'es and gpu type.
And i have this error:
CatBoostError: util/generic/maybe.h:15: TMaybe is empty
Code:
model.fit(train_bootstraped, predict_bootsraped, self.cat_features,
eval_set = (self.X_validation, self.y_validation))empty
Shapes and sums:
print (train_bootstraped.shape, predict_bootsraped.shape, predict_bootsraped.sum()
self.X_validation.shape, self.y_validation.shape)
(7467, 214) (7467,) 2489 (25143, 214) (25143,)
Stackoverflow can't have this error(
Can you pls give an advice how ti avoid this erros?
Model params:
{'iterations': 1500,
'learning_rate': 0.01,
'depth': 6,
'custom_loss':'AUC',
'early_stopping_rounds': 100,
'use_best_model': True,
'task_type': 'GPU'}
from catboost.
Sparse data is only supported for CPU right now. Sorry that the error is not clear, we'll fix it.
from catboost.
Sorry, previous comment is not fully correct. We support sparse data types on GPU (sparse features on GPU are treated like regular features, we'll add full sparse data support for GPU at some point later), I'm not sure why you have this exception, we'll look on this.
from catboost.
Here's an issue for that #1175
from catboost.
Related Issues (20)
- Issue with Categorical Feature Encoding in Binary Classification HOT 1
- Request to enable sample weights for Cox and AFT objectives HOT 1
- C++ standalone evaluator multiclass support HOT 1
- The results calculated according to the formula described in the doc are different from the results displayed by the model.
- Documentation: broken links HOT 1
- SetPredictionType(modelHandle, APT_CLASS) is broken HOT 1
- Where is the place for calculating the score function? HOT 2
- Build catboost python package with custom glibc HOT 3
- Custom RMSE loss in tutorial get difference tree structure with the original RMSE loss!!! HOT 3
- Tensor Search Helpers Should Be Unreachable HOT 8
- _catboost.CatBoostError: /src/catboost/catboost/libs/model/model.cpp:564: Too many features in model, ask catboost team for support HOT 3
- If multiple GPUs are present at server and devices parameter is set to specific GPU, catboost allocates GPU memory at other GPUs HOT 1
- Different results with Python and command line HOT 1
- PySpark ML CrossValidator cannot load serialized CrossValidator because it cannot find CatBoostRegressor class HOT 1
- CatBoost for Apache Spark AUC eval metric not working as expected. HOT 3
- CatBoost throws an exception when dealing with a large dataset
- Catboost 1.2.5 broke text features on GPU "Can't find borders for feature #4138" HOT 4
- R: Failed to install 'unknown package' from GitHub HOT 2
- [Model dump: JSON] model.save_model does not warn when required "pool" argument is skipped
- tensor_search_helpers.cpp HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from catboost.