Thank you very much for providing the smote_variants package - an excellent tool!
Seems that the parameters can not be passed as lists. I have a questions regarding parameter tuning - using the logic from the manual one can continue the grid using integers:
oversampler = ('smote_variants', 'MulticlassOversampling',
{'oversampler': 'MWMOTE', 'oversampler_params': {}})
classifier = ('sklearn.ensemble', 'RandomForestClassifier', {'n_estimators':50, 'max_depth': 3, 'min_samples_split': 2})
model= Pipeline([('scale', StandardScaler()), ('clf', sv.classifiers.OversamplingClassifier(oversampler, classifier))])
model
param_grid= {'clf__oversampler':[('smote_variants', 'MWMOTE', {'proportion': 0.5}),
('smote_variants', 'MWMOTE', {'proportion': 1.0}),
('smote_variants', 'MWMOTE', {'proportion': 1.5})],
'clf__classifier':[('sklearn.ensemble', 'RandomForestClassifier', {'n_estimators': 60}),
('sklearn.ensemble', 'RandomForestClassifier', {'n_estimators': 1000}),
('sklearn.ensemble', 'RandomForestClassifier', {'n_estimators': 40}),
('sklearn.ensemble', 'RandomForestClassifier', {'n_estimators': 10}),
('sklearn.ensemble', 'RandomForestClassifier', {'n_estimators': 10}),
('sklearn.ensemble', 'RandomForestClassifier', {'max_depth': 9}),
('sklearn.ensemble', 'RandomForestClassifier', {'max_depth': 4}),
('sklearn.ensemble', 'RandomForestClassifier', {'min_samples_split': 9}),
('sklearn.ensemble', 'RandomForestClassifier', {'min_samples_split': 5}),
] }
Yet in this case, GridSearchCV will result in only one parameter. Another formulation of the grid would result in having all parameters but only the last values of those, which are most likely not optimal:
param_grid= {'clf__classifier': [('sklearn.ensemble', 'RandomForestClassifier', {
'max_depth': 20, 'max_depth': 7, 'max_depth': 9, 'max_depth': 2},
{'n_estimators': 300, 'n_estimators': 180, 'n_estimators': 25, 'n_estimators': 2},
{'min_samples_split': 3, 'min_samples_split': 19, 'min_samples_split': 2},
{'min_samples_leaf': 3, 'min_samples_leaf': 18, 'min_samples_leaf': 2},
)] }
The parameter requirement is basically a dictionary, but with floats or integers and not lists. Could you please provide additional instructions on passing through the parameters to the grid for fine tuning?
Any kind of hints would be very much appreciated. Thank you in advance!