Comments (3)
That's fair. I will work on fit()
and fit_transform()
with the cross validation. In terms of API, I'd like to follow the scikit-learn
style API and take the optional cv
object as an input as follows:
cv = KFold(N_FOLD, shuffle=True, random_state=RANDOM_SEED)
te = TargetEncoder(cv)
Thanks for the suggestion!
BTW, if you want, please feel free to work on it and submit a PR. :)
from kaggler.
I agree that there can be leakage and it should be used with caution. In practice, I'm using it with cross validation as you suggested as follows:
from sklearn.model_selection import KFold
from kaggler.preprocessing import TargetEncoder
...
cat_cols = [col for col in X.columns if X[col].dtype == 'object']
X_cat = pd.DataFrame(np.zeros_like(X[cat_cols]), columns=cat_cols)
cv = KFold(N_FOLD)
for i, (i_trn, i_val) enumerate(cv.split(X, 1)):
te = TargetEncoder()
te.fit(X.loc[i_trn, cat_cols])
X_cat.loc[i_val] += te.transform(X.loc[i_val, cat_cols]) / N_FOLD
X.loc[:, cat_cols] = X_cat.values
@takashioya Do you think it will be helpful to have the cross validation routine inside the class?
from kaggler.
yes it will be helpful because maybe some people don't notice the leakage problem and make a mistake.
my favorite API is like this.
te = TargetEncoder(folds, nfold, stratified, shuffle)
for the detail of these 4 arguments, please see lightgbm.cv
function in https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.cv.html
from kaggler.
Related Issues (20)
- Use pylint HOT 1
- Apply improvements in FTRL to other algorithms HOT 2
- Add examples for modules other than online_model
- API interface modified
- pip install on ubuntu 16.04 HOT 4
- Lacks support for numerical features
- macos pip install failure HOT 5
- ERROR: Command errored out with exit status 1 HOT 1
- ValueError: For early stopping, at least one dataset and eval metric is required for evaluation HOT 3
- AutoLGB: need to throw an error when metric is not available.
- Set embedding layer to n_uniq + 1 HOT 7
- LabelEncoder Usage HOT 2
- DAE References and Performance HOT 2
- DAE/SDAE's `transform` changes the input dataframe
- AutoLGB tune is example is not working
- from kaggler.model import AutoLGB does not work HOT 1
- Development
- ERROR: Could not build wheels for kaggler
- Columns and DataType Not Explicitly Set on line 13 of test_ohe.py
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kaggler.