Comments (12)
@meddulla The regular feature selection in the AutoFeatRegression
model still works, but it's more likely that you'll get a numerical error (as described in #1 ); besides that the results should still be reasonable.
from autofeat.
for now you can use the model to generate additional features (using fit_transform), which you can then use together with a classification model of your choice. for native support for classification model you have to wait for the next version
from autofeat.
for now you can use the model to generate additional features (using fit_transform), which you can then use together with a classification model of your choice
Can you share example code to make it clear how to do this
from autofeat.
here you go:
# import the model
from autofeat import AutoFeatRegression
# create a model instance
model = AutoFeatRegression()
# call fit transform with your features and labels (y needs to be a vector)
df = model.fit_transform(X, y)
# df is a pandas dataframe with your original and the new features,
# so you can use it with your classification model
clf = LogisticRegression()
clf.fit(df, y) # new features instead of original X
from autofeat.
Great thank
And what is about features interaction
Is your code takes it to account or consider features as independent each from others?
from autofeat.
By the way did you compared with
https://github.com/aspremon/NaiveFeatureSelection
from autofeat.
thanks for the link. no, I did not compare against this (since I was mostly focused on regression problem so far) but I'll look into it.
and yeah, one of the biggest problems in the feature selection is that many of the newly created features are highly correlated, which is why the feature selection process has multiple steps to deal with this. if the features were all independent a simple L1-regularized model (like LASSO) would get pretty good results by itself already even if you have more features than data points, but with correlated features, this doesn't work.
from autofeat.
I glad you working on this
but in your code do you treat this problem of features correlation
for both categorical and continues features ?
and if yes
do you return not only features but also
group of correlated features ?
from autofeat.
categorical features are returned as one-hot encoded vectors if you specified the columns when initializing the model.
and no, correlated features are not grouped, however, generally if they are very correlated, only one of the features should be selected as a correlated one does not bring a lot of additional information.
from autofeat.
Can FeatureSelector
be used in this instance or is it better to use another feature selector afterwards (like an SGDClassifier with RFECV)?
from autofeat.
Hadn't noticed that feature selection is already run on the generated features using Lasso.
from autofeat.
As of version 1.0.0, there are two autofeat models available, AutoFeatRegressor
and AutoFeatClassifier
for regression and classification problems respectively, so please update with pip install --upgrade autofeat
and use these models instead. The arguments for the models are mostly the same so you shouldn't have to change your code much.
from autofeat.
Related Issues (20)
- Data validation error when using Buckingham's Pi Theorem on Classification task HOT 1
- Is it possible to use autofeat without exceeding memory of the system? HOT 3
- possible point for verification HOT 3
- How to transform new data? HOT 1
- Speed up tranform() HOT 6
- MemoryError: Unable to allocate 2.05 GiB for an array with shape (501, 550174) and data type float64
- pandas corr is too slow; use numpy instead HOT 1
- Correlation matrix can have inconsistent column and row names HOT 1
- Allow user to pass dict of Pint objects/ureg
- Input contains NaN, infinity or a value too large for dtype('float32') on fit_transform HOT 2
- ufunc '_lambdifygenerated' did not contain a loop with signature matching types (<class 'numpy.dtype[float32]'>, <class 'numpy.dtype[float32]'>) -> None HOT 6
- How to choose sin(x) and cos(x) etl. as features? HOT 1
- Scaling and Autofeat HOT 2
- [enhancement] add predict_proba for classifiers HOT 11
- TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' HOT 1
- ValueError: Input X contains NaN. HOT 6
- Documentation Enhancement for getting model, features, coefficients HOT 1
- AutoFeatLight.fit_trasnform() take 2 positional arguments 3 given HOT 2
- Reproducibility issue HOT 1
- In toydata, autofeat finds the correct function (square) only under some circumstances HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from autofeat.