rasbt / mlxtend Goto Github PK
View Code? Open in Web Editor NEWA library of extension and helper modules for Python's data analysis and machine learning libraries.
Home Page: https://rasbt.github.io/mlxtend/
License: Other
A library of extension and helper modules for Python's data analysis and machine learning libraries.
Home Page: https://rasbt.github.io/mlxtend/
License: Other
Hello,
This is a question.
I am using example 7 in:
http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/
I would like to access the indices of the selected features. When I use:
sfs1.k_feature_idx_
as shown in example 1, I get the error:
AttributeError Traceback (most recent call last)
<ipython-input-76-92f3d7e599b6> in <module>()
----> 1 sfs1.k_feature_idx_
AttributeError: 'SequentialFeatureSelector' object has no attribute 'k_feature_idx_'
I also tried to use:
gs.transform()
but got the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-102-98472ed5822d> in <module>()
----> 1 gs.transform()
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/utils/metaestimators.py in __get__(self, obj, type)
33 # delegate only on instances, not the classes.
34 # this is to allow access to the docstrings.
---> 35 self.get_attribute(obj)
36 # lambda, but not partial, allows help() to work with update_wrapper
37 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/utils/metaestimators.py in __get__(self, obj, type)
33 # delegate only on instances, not the classes.
34 # this is to allow access to the docstrings.
---> 35 self.get_attribute(obj)
36 # lambda, but not partial, allows help() to work with update_wrapper
37 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
AttributeError: 'KNeighborsClassifier' object has no attribute 'transform'
So how do I access this information without having to search for the features again?
TIA
P.S: I requested membership to the Gougle group but I have not had a reply and it does not seem to be used.
I've been a nose user for a long, long time, and yeah, I know it's been kind of "abandoned" for some time. However, I am also a big fan of the popular saying "If it ain't broke, don't fix it," since it can be a time sink in disfavor of more interesting, valuable, and useful improvements elsewhere. On the other hand, as long as this project is rather "small" and as long as a transition (from nose to py.test) is still "reasonably doable", I think it's better to switch over to py.test rather sooner than later ...
The reason why I bring this up now is that I just stumbled upon today's release of py.test 3.0 ver. And as a reflex, I checked the current status on nose ... unfortunately, it still says:
Nose has been in maintenance mode for the past several years and will likely cease without a new person/team to take over maintainership. New projects should consider using Nose2, py.test, or just plain unittest/unittest2.
Hi, I am new to this lib and have tried all but nothing seems to make this to work. I even copy directly the example 4 into my notebook and it won't work, it won't highlight samples
I am using python 3.6 and this shows in mlxtend version
โ Downloads python3.6 -m pip show mlxtend
Name: mlxtend
Version: 0.5.1
Summary: Machine Learning Library Extensions
Home-page: https://github.com/rasbt/mlxtend
Author: Sebastian Raschka
Author-email: [email protected]
License: BSD 3-Clause
Location: /usr/local/lib/python3.6/site-packages
Requires: numpy, scipy
This is the example
from mlxtend.plotting import plot_decision_regions
from mlxtend.preprocessing import shuffle_arrays_unison
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.svm import SVC
# Loading some example data
iris = datasets.load_iris()
X, y = iris.data[:, [0,2]], iris.target
X, y = shuffle_arrays_unison(arrays=[X, y], random_seed=3)
X_train, y_train = X[:100], y[:100]
X_test, y_test = X[100:], y[100:]
# Training a classifier
svm = SVC(C=0.5, kernel='linear')
svm.fit(X_train, y_train)
# Plotting decision regions
plot_decision_regions(X, y, clf=svm, res=0.02,
legend=2, X_highlight=X_test)
# Adding axes annotations
plt.xlabel('sepal length [cm]')
plt.ylabel('petal length [cm]')
plt.title('SVM on Iris')
plt.show()
Hi, any plans to include a verbose option. I modify the code just to know which clf is being trained, but is far from ideal
Hello,
First of all: thanks for the great package! I have gotten a lot of good use out of it, especially the sequential feature selection.
SFS becomes problematic as the number of features d increases, since the complexity grows as O(d^2). I have found that one way to deal with this is to take a random subset of the remaining dimensions to check at each step instead of trying all of them. If the random subset has size k then the complexity goes down to O(dk).
Take an example of sequential forward selection with d=1000 and k=25.
During the first step, we can either try all 1000 univariate models or pick a random subset of 25 univariate models, and then take the best of them. It makes sense to try them all so as to start with a good baseline.
During the second step, instead of trying 999 bivariate models, we try only 25 of them.
Then 25 instead of 998 trivariate models. And so on until we have 25 left, at which point we revert to trying them all.
If you're interested in some empirical results, I wrote a blog post about this a while back: http://blog.explainmydata.com/2012/07/speeding-up-greedy-feature-selection.html
This would be a great feature to have!
All the tf.initialize_all_variables()
calls need to be changed to tf.global_variables_initializer()
by March 2017 to maintain compatibility with tensorflow.
Add continuous integration tests for scikit-learn 0.18 and ensure the mlxtend library is compatible to the recently released version of scikit-learn
First of all thank you Sebastian for the great work on this library.
Here's my code:
clf1 = RandomForestClassifier(n_estimators=10, n_jobs=-1, criterion='gini',random_state=42)
clf2 = RandomForestClassifier(n_estimators=10, n_jobs=-1, criterion='entropy',random_state=42)
clf3 = ExtraTreesClassifier(n_estimators=10,random_state=42,n_jobs=-1)
clf4 = ExtraTreesClassifier(n_estimators=10, n_jobs=-1, criterion='entropy',random_state=42)
clf5 = GradientBoostingClassifier(learning_rate=0.05, subsample=0.5,
max_depth=6, n_estimators=10,random_state=42)
lr = LogisticRegression()
sclf = StackingClassifier(classifiers=[clf1,clf2,clf3,clf4,clf5],
meta_classifier=lr,use_probas=True)
print('3-fold cross validation:\n')
for clf, label in zip([clf1,clf2,clf3,clf4,clf5,sclf],
['RF Gini',
'RF Entropy',
'Xtra Tree 1',
'Xtra tree 2',
'Grad Boost',
'StackingClassifier']):
scores = cross_validation.cross_val_score(clf, train_no_events, y,
cv=3, scoring='accuracy')
print("Accuracy: %0.2f (+/- %0.2f) [%s]"
% (scores.mean(), scores.std(), label))
It all goes well like this but if I add verbose=1 to StackingClassifier() I get a traceback:
3-fold cross validation:
Accuracy: 0.63 (+/- 0.01) [RF Gini]
Accuracy: 0.63 (+/- 0.01) [RF Entropy]
Accuracy: 0.63 (+/- 0.01) [Xtra Tree 1]
Accuracy: 0.63 (+/- 0.01) [Xtra tree 2]
Accuracy: 0.64 (+/- 0.00) [Grad Boost]
Fitting 5 classifiers...
Traceback (most recent call last):
File "<ipython-input-8-29daf2b04330>", line 21, in <module>
cv=3, scoring='accuracy')
File "C:\Anaconda3\lib\site-packages\sklearn\cross_validation.py", line 1433, in cross_val_score
for train, test in cv)
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 800, in __call__
while self.dispatch_one_batch(iterator):
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 658, in dispatch_one_batch
self._dispatch(tasks)
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 566, in _dispatch
job = ImmediateComputeBatch(batch)
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 180, in __init__
self.results = batch()
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "C:\Anaconda3\lib\site-packages\sklearn\cross_validation.py", line 1531, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Anaconda3\envs\devel\lib\site-packages\mlxtend\classifier\stacking_classification.py", line 95, in fit
(i, _name_estimators((clf,))[0][0], i, len(self.clf_)))
AttributeError: 'StackingClassifier' object has no attribute 'clf_'
Thanks!
Hi Sebastian,
I posted an issue after some tweet with you (I hope it could help other people).
I would like to perform a Sequential Feature Selector (SFS) with Pipeline. But at the end of the process, SFS takes SFS.k_features
(25 for this exemple) :
clf1 = LogisticRegression(class_weight='balanced', solver='newton-cg', C=100.0, random_state=17)
sfs1 = SFS(clf1,
k_features=25,
forward=True,
floating=False,
scoring='roc_auc',
cv=5)
sfs1 = sfs1.fit(data.values, y.values)
clf1_pipe = Pipeline([('sfs1', sfs1),
('Logistic Newton', clf1)])
print clf1_pipe.named_steps['sfs1'].k_feature_idx_
# (0, 1, 3, 4, 5, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 22, 23, 25, 27, 29, 30, 31, 34, 35)
The score clf1_pipe.named_steps['sfs1'].k_score_
is 0.6956081
but it is not the best score (performance) we got. In fact for we have a better score with 10 features :
result_clf1_pipe = pd.DataFrame.from_dict(clf1_pipe.named_steps['sfs1'].get_metric_dict(confidence_interval=0.90)).T
result_clf1_pipe.sort_values('avg_score', ascending=0, inplace=True)
result_clf1_pipe.head()
Can we during the Pipeline process get automatically the feature selection corresponding to the best performance ?
You can find the nootbook with the pipeline process SFS ("Using Pipieline to do it").
Edit :
I manually research the best number of k_features
for SFS for all my Estimators. Then I plug them in a EnsembleVoteClassifier. The result is not what I expected (see : "Find Manually the best k_features for SFS and fit our ensemble ")
It would be a good idea to move the datasets out of the .py files and create a universal fetch function that can be used within the original loading functions. All the CSV files are quite small except for the MNIST subset, which I would compress into a .csv.gz.
conda install -c rasbt mlxtend
would install the old version of mlxtend. It would be nice to have the new version available through conda.
If you pass an array of points in shape [n_samples, n_features], as stated in the docstring, a ValueError('X_highlight must be a 2D array')
it is raised.
Hello.
Don't you think, that your realization of stacking, where you are fitting 1-level clfs on whole X_train and after that just predict labels(or probs) by them on the same X_train is not really good? It may lead to big overfitting of this predicted labels.
I think, that better approach is to make folds, and train clfs smth like this ( you train on (1 - 1/n_folds) part of train data set, then predict labels for the rest of X_train and do it for all folds, when you finish it, you have labels for the whole X_train, but without overfitting, and now you may fit one more time every clf to predict X_test labels (here you can use the whole X_train for fitting them)):
def fit(self, X_train, y_train):
self.n_examples = X_train.shape[0]
self.n_classes = len(set(y_train))
self.folds = KFold(n=self.n_examples, n_folds=self.n_folds)
clfs_preds = np.array([]).reshape(self.n_examples, 0)
for clf in self.clfs:
clf_ = clone(clf)
clf_preds = np.array([]).reshape(0, self.n_classes*self.probas + 1 * (not self.probas))
for train, pred in self.folds:
clf_.fit(X_train[train], y_train[train])
if not self.probas:
clf_pred = clf_.predict(X_train[pred]).reshape(X_train[pred].shape[0], 1)
else:
clf_pred = clf_.predict_proba(X_train[pred]).reshape(X_train[pred].shape[0], self.n_classes)
clf_preds = np.concatenate([clf_preds, clf_pred], axis=0)
clfs_preds = np.concatenate([clfs_preds, clf_preds], axis=1)
# fitting the clfs to predict X_test labels or probabilities when fitting the meta-classifier
for clf in self.clfs:
clf.fit(X_train, y_train)
# add clfs predictions to X_train table or not
if self.append_preds:
X_train_ext = np.concatenate([X_train, clfs_preds], axis=1)
else:
X_train_ext = clfs_preds
self.meta_clf.fit(X_train_ext, y_train)
return self
def predict(self, X_test):
clfs_preds = np.array([]).reshape(X_test.shape[0], 0)
for clf in self.clfs:
if not self.probas:
clf_pred = clf.predict(X_test).reshape(X_test.shape[0], 1)
else:
clf_pred = clf.predict_proba(X_test).reshape(X_test.shape[0], self.n_classes)
clfs_preds = np.concatenate([clfs_preds, clf_pred], axis=1)
# add clfs predictions to X_test table or not
if self.append_preds:
X_test_ext = np.concatenate([X_test, clfs_preds], axis=1)
else:
X_test_ext = clfs_preds
return self.meta_clf.predict(X_test_ext)
Add a separate documentation for development version as http://rasbt.github.io/mlxtend/dev
Once in a while, e.g., every 10th time, Travis seems to complain about a certain unit test of the multi-layer perceptron. I couldn't find out what causes the problem but can't reproduce the issue on my local machine. Also, it works just fine via Travis ~90% of the time. I am wondering if it is related to the random state being set incorrectly, but since the random state is set equally for all classifiers inside the fit method that is imported from _BaseSupervisedEstimator
, I don't think this is the issue here. I think it is something hardware specific maybe.
I was wondering if someone has an idea of what could be going on here? Would really appreciate any ideas and feedback!
As a current work around this problem, I changed the unit test to the following:
def test_multiclass_gd_acc():
mlp = MLP(epochs=20,
eta=0.05,
hidden_layers=[10],
minibatches=1,
random_seed=1)
mlp.fit(X, y)
assert round(mlp.cost_[0], 2) == 0.55, mlp.cost_[0]
if round(mlp.cost_[-1], 2) == 0.25:
warnings.warn('About 10% of the time, mlp.cost_[-1] is'
' 0.247213137424 when tested via Travis CI.'
' Likely, it is an architecture-related problem but'
' should be looked into in future.')
else:
assert round(mlp.cost_[-1], 2) == 0.01, mlp.cost_[-1]
assert (y == mlp.predict(X)).all()
Hi,
I really love this package and the documentation, it's been very helpful for a machine learning beginner like myself. I am wondering if you would consider putting this package onto the Anaconda Cloud so that it can be downloaded with conda in addition to pip, like so:
conda install mlxtend
The benefit of doing this is it would make things easier for Windows users who install MiniConda (vs. Anaconda, which comes with a bunch of scientific computing packages by default) because some of this pacakge's dependencies (scipy, numpy, matplotlib) can be difficult to install on Windows via pip.
Personally I am fine using pip, but I often share the code I write on Ubuntu with co-workers who use Windows, and for them using conda makes their lives easier.
This is a very minor issue, but I hope it will make access to this package easier.
Thanks!
There seems to be a handy package that makes deprecation warning more convenient. Let's use it from now on
hey. Do you want to move some of this stuff upstream?
I wanted to have a general decision region / decision boundary function because it is reimplemented in soo many examples.
Use all level-1 class probabilities to train the second level classifier instead of averaging probabilities for each classifier, which should provide more information to the second-level classifier (as suggested in #29)
Paper Reference: Ting, Kai Ming, and Ian H. Witten. "Issues in stacked generalization." J. Artif. Intell. Res.(JAIR) 10 (1999): 271-289.
I am not 100% sure if this is a bug in matplotlib, but I would appreciate any helps or insights. There may be a work-around for solving the problem below, but I haven't found it, yet. (Also see this matplotlib discussion as reference.
The following code -- plotting 4 decision regions -- works just fine (1, 2, and 3 classes work fine as well):
(Note that the code below is a simplified version of the plot_decision_regions
plot in mlxtend. If we can fix this simple issue below -- I think it is related to the ListedColormap -- it can be directly transferred to the plot_decsion_regions
function)
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
def plot_decision_regions(X, y, classifier, resolution=0.1):
# setup marker generator and color map
markers = ('s', 'x', 'o', '^', 'v')
colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
cmap = ListedColormap(colors[:len(np.unique(y))+1])
# plot the decision surface
x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
np.arange(x2_min, x2_max, resolution))
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
Z = Z.reshape(xx1.shape)
plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())
# plot class samples
for idx, cl in enumerate(np.unique(y)):
plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],
alpha=0.8, c=colors[idx],
marker=markers[idx], label=cl)
# Loading some example data
iris = datasets.load_iris()
X = iris.data[:, [0,2]]
y = iris.target
y = np.concatenate((y, np.ones(50)+2))
y = y.astype(int)
X = np.concatenate((X, X[:50]*2))
lr = LogisticRegression(solver='newton-cg', multi_class='multinomial')
lr.fit(X, y)
plot_decision_regions(X, y, classifier=lr)
However, if I add a third class, I get the following problem:
iris = datasets.load_iris()
X = iris.data[:, [0,2]]
y = iris.target
y = np.concatenate((y, np.ones(50)+2, np.ones(50)+3))
y = y.astype(int)
X = np.concatenate((X, X[:50]*2, X[:50]*3))
lr = LogisticRegression(solver='newton-cg', multi_class='multinomial')
lr.fit(X, y)
plot_decision_regions(X, y, classifier=lr)
Problem: the image for Example 1 is incorrect. The given array is:
array([[3, 1],
[2, 2]])
Instead of showing the correct confusion matrix for the given array, the image displayed is a duplicate of the Example 2 image:
Source: appears to be cells run out of order in confusion_matrix.ipynb
Program will become permanently stuck if sdq contains k_idx lists that are different permutations of one another. This is easily fixed by appending k_idx as sets instead of lists.
Unfortunately I do not know of a dataset that I can use to show this example. Without a usable dataset to exploit this problem it is difficult to write a unittest for this.
I am trying to plot confusion matrix of 1000 labels and save it to a picture file. After using 15GB of memory, code is terminating with segmentation fault! Any reason?
The StackingCVClassifier
has an optional use_features_in_secondary
parameter, which, if True
, will feed the original features (in addition to the meta-features) to the level-2 classifier. Would be nice to add this to the StackingClassifier
as well!
https://travis-ci.org/rasbt/mlxtend/jobs/163844960
reports
ERROR: Failure: ImportError (No module named 'sklearn.exceptions')
while "sklearn.exceptions" is present in the scikit-learn>=0.17 , which is specified in requirements.txt
please explain me if I'm doing anything wrong here!
The random state parameter in shuffle_arrays_unison
function is random_seed
but in the docstring it is named random_state
. The same exists in the docs http://rasbt.github.io/mlxtend/user_guide/preprocessing/shuffle_arrays_unison/. I would make a PR but I don't know what we want to keep.
You might consider switching to using SVD (pseudoinverse) or QR decomposition for OLS. For SVD you could update after seeing new data. R's lm
uses QR by default for some nice features you get for ANOVA after.
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/regression/linear_model.py#L185
Translating from weird statsmodels notation
For SVD
beta = np.linalg.pinv(X).dot(y)
For QR
Q, R = np.linalg.qr(X)
# for ANOVA
effects = np.dot(Q.T, y)
beta = np.linalg.solve(R, effects)
I opened this issue as a follow up on a discussion with
Can we have some thing like "VotingClassifier" for regression as well.
I believe the same concept would also work for regression from a technical perspective. However, I am not sure how "good" or "practical" this would be. I think one would have to be especially careful with the averaging if there is a high gap between the predicted targets, e.g., if we have 3 regressors with predicted targets of e.g., 1.1, 1.3, 1031.4, averaging could yield some strange behavior. Maybe it would be worthwhile to include some optional outlier detection. Here are some examples
I believe that an implementation would be pretty straight-forward from a technical perspective using the Ensemble Classifier as a template.
In addition, an interesting idea by @nikhilRP was
[to] use r-squared scores (or any other measure) to determine weights of each predicted targets.
Honestly, I haven't really thought about it, yet, and I suggest that we brainstorm a bit more. However, in order to get started we could already prepare a scaffold EnsembleRegressor
that takes scikit-learns regressors as input estimators and outputs the average fits (similar to the EnsembleClassifier
's predict_proba
). :)
Note
The mlxtend
is currently being slightly overhauled (see branch https://github.com/rasbt/mlxtend/tree/03).
The mlxtend.classifier.EnsembleClassifier
from the mlxtend.classifier.ensemble.py
module has been moved to the mlxtend.classifier.ensemble_vote.py
module where it is now mlxtend.classifier.EnsembleVoteClassifier
. The reason is that the new "description" is more specific and would help distinguishing it from a futuremlxtend.classifier.EnsembleStackingClassifier
etc.
Furthermore, the regression
subpackage has been split into 2 new subpackages regressor
and regression_utils
-- The former contains classes (estimators) for fitting regression models, and the latter contains auxiliary functions such as visualization tools etc.
I am preparing the new documentation and website right now, and the new release (v. 0.3) will probably ready next week and ready to be merged into the current master branch.
Thus, an EnsembleRegressor
would best fit into the mlxtend.regressor
subpackage.
Hi.
Below commands that I run on a clean install of Ubuntu 16.04 / x64 (on DigitialOcean)
Preparation:
root@delme:~# python3 --version
Python 3.5.1+
apt -y install python3-pip
root@delme:~# pip3 --version
pip 8.1.1 from /usr/lib/python3/dist-packages (python 3.5)
Pip3 install (note 'Killed')
root@delme:~# pip3 install mlxtend
Collecting mlxtend
Downloading mlxtend-0.4.1.tar.gz (1.1MB)
100% |โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1.1MB 451kB/s
Building wheels for collected packages: mlxtend
Running setup.py bdist_wheel for mlxtend ... done
Stored in directory: /root/.cache/pip/wheels/fb/59/f3/284c574e254e2e619a93e76ec9644def550940c13ac9fe8576
Successfully built mlxtend
Installing collected packages: mlxtend
Killed
root@delme:~#
Install from github
pip3 install git+git://github.com/rasbt/mlxtend.git#egg=mlxtend
root@delme:~# pip3 install git+git://github.com/rasbt/mlxtend.git#egg=mlxtend
Collecting mlxtend from git+git://github.com/rasbt/mlxtend.git#egg=mlxtend
Cloning git://github.com/rasbt/mlxtend.git to /tmp/pip-build-f0aeav6_/mlxtend
Installing collected packages: mlxtend
Running setup.py install for mlxtend ... error
Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-f0aeav6_/mlxtend/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-22e_cbrj-record/install-record.txt --single-version-externally-managed --compile" failed with error code -9 in /tmp/pip-build-f0aeav6_/mlxtend/
Hi Sebastian,
I spotted a typo error in the example at readme. Please replace Naive Bayes with SVM label instead.
Thank you for your wonderful package.
Laam
To avoid requiring additional dependencies that are necessary for the core functionality of mlxtend, I think that it would be a good idea to move and unify all plotting functions and modules to a separate subpackage, e.g., mlxtend.plotting
Hello,
I am training and fitting data using different classifieres. For some of them (eg. Multi-layer perceptron) I scale the training and testing matrices using scikits StandardScaler(). For others (eg. Random Forest) I do not scale the data.
I would like to use the EnsembleClassifier and was wondering how to use it with classifiers that were trained with different matrices (scaled vs unscaled).
Thanks!
sclf = StackingClassifier(classifiers=[clf1, clf2, clf3], meta_classifier=lr, use_probas=True, verbose=1)
when I set use_probas=True, I got these.
AttributeError:'StackingClassifier' object has no attribute 'clf_'
what should I do?
I used gradient boosting classifier to build a classification model. I am trying to improve the model by using a stack up model. I want to ensemble 3 different models, let's say, gbm, randomforests, logistic regression (except for gbm, other models subject to change). In my GBM model, I used weights in the fit function by giving higher weights to positive target variable. I want too try the same thing in ensemble, but I am unable to figure out how to implement weights in the source code of the ensemblevotingclassifier. I am new to this, so would like to receive suggestions regarding implementation of weights
Thanks
In the pipy version the SequentialFeatureSelector
has by default k_features="best"
but this option is not explained in the docstring.
Hi, First of all ... awesome work man !!
I am new to python and after moving from R on kaggle found link to 'mlxtend'.
Now my issue is following code not plotting anything.
from mlxtend.regression import lin_regplot
import numpy as np
X = np.array([4, 8, 13, 26, 31, 10, 8, 30, 18, 12, 20, 5, 28, 18, 6, 31, 12, 12, 27, 11, 6, 14, 25, 7, 13,4, 15, 21, 15])
y = np.array([14, 24, 22, 59, 66, 25, 18, 60, 39, 32, 53, 18, 55, 41, 28, 61, 35, 36, 52, 23, 19, 25, 73, 16, 32, 14, 31, 43, 34])
intercept, slope, corr_coeff = lin_regplot(X[:,np.newaxis], y,)
Other codes like classifiers and 1d,2d, learning curve are working as mentioned on main page.
Can you help in this regard ? Did I miss something .. all codes given by you seems to be straight forward !!!
Hello,
I posted this question in the Google groups but it does not seem to attract any attention. So I am posting this here. If this is not correct, please tell me.
I have taken some Scikit source code that used the standard grid search and adapted it to using a
pipe with the use of the SFS. I use the the "seuclidean" metric with the ball-tree algorithm that requires a metric parameter - a variance vector. When I execute the Scikit standard code I have no problem. However with the SFS in a Pipeline I have two errors:
TypeError: __init__() takes exactly 1 positional argument (0 given)
ValueError: SEuclidean dist: size of V does not match
Error 2 is understandable - because SFS does feature selection, I cannot pre-calculate this value. It depends on the features used. I was expecting the metric parameters to be automatically calculate and therefore not require this input. I also tried to pass None
as the parameter, but with no success.
Can anyone shed light on how I should proceed? I have added my code below in case this helps
(data sets managed with Pandas).
TIA,
Hugo
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cross_validation import train_test_split
from sklearn.grid_search import GridSearchCV
from sklearn import preprocessing
# get the unormalized data
X = dy[ dy.columns.difference(['label']).values ]
y = dy['label'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
V = X_train.var().values
C = X_train.cov().values
CPI = np.linalg.pinv(C)
CI = np.linalg.inv(C)
# k_range : must be less than the training size. What happens if number of features > sample size
k_range = range(1, len(X.columns))
weights = ['uniform' , 'distance']
#algos_all = ['auto', 'ball_tree', 'kd_tree', 'brute']
algos_all = ['ball_tree', 'kd_tree', 'brute']
algos = ['brute', 'kd_tree']
leaf_sizes = range(5, 60, 10)
metrics = ["euclidean", "manhattan", "chebyshev", "minkowski"]
# Metric can only be used with certain algorithms
# Metrics intended for real-valued vector spaces:
seuclidean = {
'sfs__k_features' : list(range(1,len(X.columns))),
'sfs__estimator__metric' : ['seuclidean'],
'sfs__estimator__metric_params': [ {'V':V} ], # will be automatically calculated
'sfs__estimator__algorithm' : ['ball_tree'], # TODO , ['brute', 'ball_tree'],
'sfs__estimator__n_neighbors' : list(k_range),
'sfs__estimator__weights' : weights,
'sfs__estimator__leaf_size' : list(leaf_sizes) }
from sklearn.pipeline import Pipeline
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
import mlxtend
# Instantiate the algorithm
knn = KNeighborsClassifier(n_neighbors=10)
#print(knn.get_params().keys())
sfs1 = SFS(estimator=knn,
k_features=3,
forward=True,
floating=False,
scoring='accuracy',
print_progress=False,
cv=5)
# !?!? n_jobs=-1)
pipe = Pipeline([
('standardize', preprocessing.MinMaxScaler()),
('sfs', sfs1),
('knn', knn)])
# See KNeighborsClassifier equivalent param_grid
param_grid = [
seuclidean
]
# Instantiate the grid search
gs = GridSearchCV(estimator=pipe,
param_grid=param_grid,
scoring='accuracy',
#n_jobs=-1, for better stack tracing
cv=5,
verbose=1,
refit=True)
# Run the grid search
gs = gs.fit(X_train.values, y_train)
TypeError Traceback (most recent call last)
<ipython-input-68-4ef553dad211> in <module>()
167
168 # Run the grid search
--> 169 gs = gs.fit(X_train.values, y_train)
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/grid_search.py in fit(self, X, y)
802
803 """
--> 804 return self._fit(X, y, ParameterGrid(self.param_grid))
805
806
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/grid_search.py in _fit(self, X, y, parameter_iterable)
551 self.fit_params, return_parameters=True,
552 error_score=self.error_score)
--> 553 for parameters in parameter_iterable
554 for train, test in cv)
555
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
798 # was dispatched. In particular this covers the edge
799 # case of Parallel used with an exhausted iterator.
--> 800 while self.dispatch_one_batch(iterator):
801 self._iterating = True
802 else:
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
656 return False
657 else:
--> 658 self._dispatch(tasks)
659 return True
660
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
564
565 if self._pool is None:
--> 566 job = ImmediateComputeBatch(batch)
567 self._jobs.append(job)
568 self.n_dispatched_batches += 1
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __init__(self, batch)
178 # Don't delay the application, to avoid keeping the input
179 # arguments in memory
--> 180 self.results = batch()
181
182 def get(self):
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/cross_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)
1529 estimator.fit(X_train, **fit_params)
1530 else:
-> 1531 estimator.fit(X_train, y_train, **fit_params)
1532
1533 except Exception as e:
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params)
162 the pipeline.
163 """
--> 164 Xt, fit_params = self._pre_transform(X, y, **fit_params)
165 self.steps[-1][-1].fit(Xt, y, **fit_params)
166 return self
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/pipeline.py in _pre_transform(self, X, y, **fit_params)
143 for name, transform in self.steps[:-1]:
144 if hasattr(transform, "fit_transform"):
--> 145 Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
146 else:
147 Xt = transform.fit(Xt, y, **fit_params_steps[name]) \
/home/hmf/my_py3/lib/python3.4/site-packages/mlxtend/feature_selection/sequential_feature_selector.py in fit_transform(self, X, y)
239
240 def fit_transform(self, X, y):
--> 241 self.fit(X, y)
242 return self.transform(X)
243
/home/hmf/my_py3/lib/python3.4/site-packages/mlxtend/feature_selection/sequential_feature_selector.py in fit(self, X, y)
136 self._inclusion(orig_set=orig_set,
137 subset=prev_subset,
--> 138 X=X, y=y)
139 else:
140 k_idx, k_score, cv_scores = \
/home/hmf/my_py3/lib/python3.4/site-packages/mlxtend/feature_selection/sequential_feature_selector.py in _inclusion(self, orig_set, subset, X, y)
205 for feature in remaining:
206 new_subset = tuple(subset | {feature})
--> 207 cv_scores = self._calc_score(X, y, new_subset)
208 all_avg_scores.append(cv_scores.mean())
209 all_cv_scores.append(cv_scores)
/home/hmf/my_py3/lib/python3.4/site-packages/mlxtend/feature_selection/sequential_feature_selector.py in _calc_score(self, X, y, indices)
190 scoring=self.scorer,
191 n_jobs=self.n_jobs,
--> 192 pre_dispatch=self.pre_dispatch)
193 else:
194 self.est_.fit(X[:, indices], y)
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/cross_validation.py in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
1431 train, test, verbose, None,
1432 fit_params)
-> 1433 for train, test in cv)
1434 return np.array(scores)[:, 0]
1435
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
798 # was dispatched. In particular this covers the edge
799 # case of Parallel used with an exhausted iterator.
--> 800 while self.dispatch_one_batch(iterator):
801 self._iterating = True
802 else:
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
656 return False
657 else:
--> 658 self._dispatch(tasks)
659 return True
660
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
564
565 if self._pool is None:
--> 566 job = ImmediateComputeBatch(batch)
567 self._jobs.append(job)
568 self.n_dispatched_batches += 1
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __init__(self, batch)
178 # Don't delay the application, to avoid keeping the input
179 # arguments in memory
--> 180 self.results = batch()
181
182 def get(self):
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/cross_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)
1529 estimator.fit(X_train, **fit_params)
1530 else:
-> 1531 estimator.fit(X_train, y_train, **fit_params)
1532
1533 except Exception as e:
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/neighbors/base.py in fit(self, X, y)
801 self._y = self._y.ravel()
802
--> 803 return self._fit(X)
804
805
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/neighbors/base.py in _fit(self, X)
256 self._tree = BallTree(X, self.leaf_size,
257 metric=self.effective_metric_,
--> 258 **self.effective_metric_params_)
259 elif self._fit_method == 'kd_tree':
260 self._tree = KDTree(X, self.leaf_size,
sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree.__init__ (sklearn/neighbors/ball_tree.c:8381)()
sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric (sklearn/neighbors/dist_metrics.c:4330)()
sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.SEuclideanDistance.__init__ (sklearn/neighbors/dist_metrics.c:5888)()
TypeError: __init__() takes exactly 1 positional argument (0 given)
ValueError Traceback (most recent call last)
<ipython-input-69-558dd50887b6> in <module>()
167
168 # Run the grid search
--> 169 gs = gs.fit(X_train.values, y_train)
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/grid_search.py in fit(self, X, y)
802
803 """
--> 804 return self._fit(X, y, ParameterGrid(self.param_grid))
805
806
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/grid_search.py in _fit(self, X, y, parameter_iterable)
551 self.fit_params, return_parameters=True,
552 error_score=self.error_score)
--> 553 for parameters in parameter_iterable
554 for train, test in cv)
555
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
798 # was dispatched. In particular this covers the edge
799 # case of Parallel used with an exhausted iterator.
--> 800 while self.dispatch_one_batch(iterator):
801 self._iterating = True
802 else:
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
656 return False
657 else:
--> 658 self._dispatch(tasks)
659 return True
660
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
564
565 if self._pool is None:
--> 566 job = ImmediateComputeBatch(batch)
567 self._jobs.append(job)
568 self.n_dispatched_batches += 1
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __init__(self, batch)
178 # Don't delay the application, to avoid keeping the input
179 # arguments in memory
--> 180 self.results = batch()
181
182 def get(self):
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/cross_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)
1529 estimator.fit(X_train, **fit_params)
1530 else:
-> 1531 estimator.fit(X_train, y_train, **fit_params)
1532
1533 except Exception as e:
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params)
162 the pipeline.
163 """
--> 164 Xt, fit_params = self._pre_transform(X, y, **fit_params)
165 self.steps[-1][-1].fit(Xt, y, **fit_params)
166 return self
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/pipeline.py in _pre_transform(self, X, y, **fit_params)
143 for name, transform in self.steps[:-1]:
144 if hasattr(transform, "fit_transform"):
--> 145 Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
146 else:
147 Xt = transform.fit(Xt, y, **fit_params_steps[name]) \
/home/hmf/my_py3/lib/python3.4/site-packages/mlxtend/feature_selection/sequential_feature_selector.py in fit_transform(self, X, y)
239
240 def fit_transform(self, X, y):
--> 241 self.fit(X, y)
242 return self.transform(X)
243
/home/hmf/my_py3/lib/python3.4/site-packages/mlxtend/feature_selection/sequential_feature_selector.py in fit(self, X, y)
136 self._inclusion(orig_set=orig_set,
137 subset=prev_subset,
--> 138 X=X, y=y)
139 else:
140 k_idx, k_score, cv_scores = \
/home/hmf/my_py3/lib/python3.4/site-packages/mlxtend/feature_selection/sequential_feature_selector.py in _inclusion(self, orig_set, subset, X, y)
205 for feature in remaining:
206 new_subset = tuple(subset | {feature})
--> 207 cv_scores = self._calc_score(X, y, new_subset)
208 all_avg_scores.append(cv_scores.mean())
209 all_cv_scores.append(cv_scores)
/home/hmf/my_py3/lib/python3.4/site-packages/mlxtend/feature_selection/sequential_feature_selector.py in _calc_score(self, X, y, indices)
190 scoring=self.scorer,
191 n_jobs=self.n_jobs,
--> 192 pre_dispatch=self.pre_dispatch)
193 else:
194 self.est_.fit(X[:, indices], y)
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/cross_validation.py in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
1431 train, test, verbose, None,
1432 fit_params)
-> 1433 for train, test in cv)
1434 return np.array(scores)[:, 0]
1435
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
798 # was dispatched. In particular this covers the edge
799 # case of Parallel used with an exhausted iterator.
--> 800 while self.dispatch_one_batch(iterator):
801 self._iterating = True
802 else:
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
656 return False
657 else:
--> 658 self._dispatch(tasks)
659 return True
660
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
564
565 if self._pool is None:
--> 566 job = ImmediateComputeBatch(batch)
567 self._jobs.append(job)
568 self.n_dispatched_batches += 1
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __init__(self, batch)
178 # Don't delay the application, to avoid keeping the input
179 # arguments in memory
--> 180 self.results = batch()
181
182 def get(self):
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/cross_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)
1529 estimator.fit(X_train, **fit_params)
1530 else:
-> 1531 estimator.fit(X_train, y_train, **fit_params)
1532
1533 except Exception as e:
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/neighbors/base.py in fit(self, X, y)
801 self._y = self._y.ravel()
802
--> 803 return self._fit(X)
804
805
/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/neighbors/base.py in _fit(self, X)
256 self._tree = BallTree(X, self.leaf_size,
257 metric=self.effective_metric_,
--> 258 **self.effective_metric_params_)
259 elif self._fit_method == 'kd_tree':
260 self._tree = KDTree(X, self.leaf_size,
sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree.__init__ (sklearn/neighbors/ball_tree.c:8793)()
sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree._recursive_build (sklearn/neighbors/ball_tree.c:10053)()
sklearn/neighbors/ball_tree.pyx in sklearn.neighbors.ball_tree.init_node (sklearn/neighbors/ball_tree.c:20030)()
sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree.rdist (sklearn/neighbors/ball_tree.c:9932)()
sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.SEuclideanDistance.rdist (sklearn/neighbors/dist_metrics.c:6065)()
ValueError: SEuclidean dist: size of V does not match
I tried to install it via:
sudo pip install
but the following error was raised, which looks like one of the files doesn't exist, which seems that CHANGELOG is moved to the parant directory!
running install_data
copying LICENSE -> /usr/local/
copying docs/README.html -> /usr/local/
error: can't copy 'docs/CHANGELOG.txt': doesn't exist or not a regular file
----------------------------------------
Cleaning up...
Add load
and save
methods to classifiers and regressors using JSON to avoid common pickle issues / platform incompatibilities -- based on https://github.com/rasbt/python-machine-learning-book/blob/master/code/bonus/scikit-model-to-json.ipynb
Adding the documentation for PR #102
Add type hints (see PEP484) and static type checking to Travis CI using mypy.
def hello(r: int, c=5) -> str:
s = 'hello' # type: str
return '(%d + %d) times %s' % (r, c, s)
vs.
def hello(r, c=5):
s = 'hello' # type: str
return '(%d + %d) times %s' % (r, c, s)
Any thoughts?
As Daniel Moisset from machinalis pointed out, a Python 2.x compatible alternative to the first scenario above would be:
def hello(r, c=5):
# type: (int, int) -> str
"""Some docstring"""
s = 'hello'
return '(%d + %d) times %s' % (r, c, s)
or for longer parameter lists:
def hello(r, # type: int
c=5):
# type: (...) -> str
s = 'hello'
return '(%d + %d) times %s' % (r, c, s)
which would be the best option for now imho.
Also see his awesome blog post for more info on using mypy: http://www.machinalis.com/blog/a-day-with-mypy-part-1/
I am trying to use SequentialFeatureSelector
on a dataset where the number of available features is on the same order of magnitude as the number of samples. The dataset has lots of missing values (NaN
s) that cannot be imputed from other samples: they simply don't make sense in some cases.
I can obviously drop NaN
s before feeding anything to the feature selector, but this needlessly reduces the number of available data points, as the missing values don't happen always on the same rows, and I'd never want to fit my model using all columns at the same time.
A way around this problem would be to allow the NA dropping to (optionally) happen within the ColumnSelector.transform
. In this way, I'd only be dropping the rows for which there are NAs in the specific columns that are needed for a test. This, however, breaks the sklearn
API, as it would require the transform
method to modify also the target vector y
, so it seems I cannot just add a custom transformer to drop NA's to the base estimator.
An alternative solution could be to hard-code the dropna
within the SequentialFeatureSelector._calc_score
method, calling it before using cross-validation (find the row indices that contain NAs for the selected columns, then slice X and y by those rows before calling the scoring function). Would this be an acceptable/desirable change? I can put together a quick implementation if you think it is worth it.
Hello,
While looking at the feature selection documentation at:
http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/
I have come across several typos. In order to facilitate quick annotations of the text maybe the following tool is useful:
Anyone can comment, ask questions, suggest alterations and point out typos quickly. In this way the maintainer has access to the location and comment in a single web-page.
HTHs
Documentation: http://rasbt.github.io/mlxtend/user_guide/file_io/find_files/#api
The parameters recursive
and check_ext
are not formatted as individual parameters but are instead in the same bullet point as path
.
Hi,
i just started working with the package and it seems great, however if i want to use Sequential Forward Selection i constantly get:
AttributeError: 'NoneType' object has no attribute 'write'
Using tried different estimators from sklearn kit and all the sklearn build in feature selections tools seems to get along fine with the data.
Really dont know whats the problem
The error in detail is:
feature_selection\sequential_feature_selector.py in fit(self=SequentialFeatureSelector(clone_estimator=True, ...e, scoring='r2',
skip_if_stuck=True), X=array([[-0.09449112, 0.09449112, -0.95103007, .... 0.76772251,
0.76772251, -0.20334318]]), y=array([ 0.75431081, 0.58790606, 0.66942214, 0...3, 0.82269534, 0.79686278,
0.48766136]))
233 'cv_scores': cv_scores,
234 'avg_score': k_score}
235 sdq.append(k_idx)
236
237 if self.print_progress:
--> 238 sys.stderr.write('\rFeatures: %d/%d' % (
k_idx = (538,)
k_to_select = 20
239 len(k_idx), k_to_select))
240 sys.stderr.flush()
241
242 if select_in_range:
AttributeError: 'NoneType' object has no attribute 'write'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.