Comments (10)
Could you please share the code you have used to receive the error?
As I can see you have 17k+ features - are they real or you one-hot-encoded some variables?
Alex
from lightautoml.
automl = TabularAutoML(task=task,
timeout=TIMEOUT,
#cpu_limit=N_THREADS,
reader_params={'n_jobs': N_THREADS, 'cv': N_FOLDS, 'random_state': RANDOM_STATE},
#general_params={'use_algos': [['lgb', 'cb', 'LinearLBFGS', 'linear_l1', 'xgb']]}
)
oof_pred = automl.fit_predict(newTrainDum, roles=roles)
Is the line just above. The 17k features are 20 features and OHE feats are: ['id', 'yr_built', 'yr_renovated', 'zipcode']
The dataset is the toy King County housing dataset
from lightautoml.
Now all is clear. I'll state some moments I can see from your code below:
- if you do not set the
cpu_limit
param, we will use the default one which is equal to 4 vCPU cores (as you can see from the beginning of the log) - as for the
use_algos
- for now we have only 5 variants here:'linear_l2'
,'lgb'
,'lgb_tuned'
,'cb'
,'cb_tuned'
, but you still can combine them on the different stacking levels if necessary - and the main part - to use our
TabularAutoML
preset you have no need to do the preprocessing: we can work with the categorical features in their raw edition, we work with unfilled Nans in the dataset etc. You can just use the raw dataset version to train the model and receive the result.
Hope this helps.
Alex
from lightautoml.
@alexmryzhkov - I uncommented cpu_limit to properly utilize 12 threads. I backtracked and used the "wholeDf" with no OHE -- but this has resulted in the same issue. I have set
roles = {
#'drop': 'id2', #done when I thought it needed a drop column
#'group': 'breath_id', #from the kaggle root of formatting
#'category': autoMLcat, #just commented out to test, not working either way
'target': 'logPrice',
}
but I still end up with the same error.
For use_algos, that line is commented out -- I will say that it's not quite clear from the documentation how to implement the various algorithms -- for instance, I used "LinearLBFGS" as a result of the documentation rather than the example on Kaggle.
In terms of processing / category: I fed categorical feats with the no dummy df (wholeDf) and I fed [dummy cats + orig cats] to roles for "newTrainDum", but no matter what I am receiving the same error.
Perhaps I am just giving it too fine a tune for a beginner? Should I just try to run it in a naive style?
from lightautoml.
Please check my notebook on the King County dataset - if it works with you, cool. If you have any questions about that - please feel free to ask.
Alex
from lightautoml.
@alexmryzhkov I rewrote my notebook to better follow the flow of initializing the CV. It worked, which is great! I think the root of the problem may have been trying to set torch.device to 'cuda'?
Either way, thank you for your notebook and confirmation on the dataset!
from lightautoml.
@AlexanderLavelle if you set torch.device to cuda, do you want to train models on GPU? If yes, you have no need to do that - if your environment has properly installed GPU and torch, our LightAutoML will automatically train CatBoost models on GPU (for other models there will be almost no improvement, especially in Kaggle Kernels).
Alex
from lightautoml.
@alexmryzhkov I would like to train on GPU, top to bottom. I have sklearn-intelex and GPU versions of lightgbm on my local machine -- so in theory, any dataset within my 4GB nvidia card (planning to upgrade soon), I would like to have the pipeline do every calculation on GPU for speed. As far as gpu/enhanced sklearn (intelex), I have received notices that lightautoml will use the intelex augmented 'auc'.
from lightautoml.
Currently we do not have the full GPU pipeline but we are working on it
The only parts, which can work on the GPU for now are the models.
Alex
from lightautoml.
Stale issue message
from lightautoml.
Related Issues (20)
- pip installs dev packages with lama HOT 1
- providing CustomIterator to cv_iter in tabular_automl.fit_predict fails HOT 4
- LabelEncoder filtering is not working
- DateSeasons transformer works wrong
- ColumnSelector - possible typing typo HOT 1
- DummyIterator wrong type
- Dependency conflict (library `dataclasses` with `python` >= 3.7)
- Publishing Docker images
- ReportDeco parameter typo HOT 1
- colab crashing for unknown reason HOT 5
- Feature importances in TabularNLPAutoML HOT 3
- Broken links to images in "Tutorial_4_NLP_Interpretation"
- Exploding of linear models for non-smooth loss function HOT 1
- Poetry cant solve deps HOT 3
- TabularAutoML object has no attribute 'reader' HOT 3
- RMSLE metric issue HOT 1
- Demo is not working HOT 2
- Data downloader error HOT 1
- report deco error HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lightautoml.