lacava / few Goto Github PK
View Code? Open in Web Editor NEWa feature engineering wrapper for sklearn
Home Page: https://lacava.github.io/few
License: GNU General Public License v3.0
a feature engineering wrapper for sklearn
Home Page: https://lacava.github.io/few
License: GNU General Public License v3.0
Greetings!
How can we add custom node types? Is it (currently) possible?
If original features are found by FEW, transform() method fails with TypeError
eg:
print('Model: {}'.format(learner.print_model()))
Model: original features
Phi = learner.transform(X_test.values)
TypeError Traceback (most recent call last)
in ()
----> 1 Phi = learner.transform(X_test.values)
~/anaconda3/envs/ml/lib/python3.6/site-packages/FEW-0.0.38-py3.6-macosx-10.7-x86_64.egg/few/few.py in transform(self, x, inds, labels)
395 # return np.asarray(Parallel(n_jobs=10)(delayed(self.out)(I,x,labels,self.otype) for I in self._best_inds)).transpose()
396 return np.asarray(
--> 397 [self.out(I,x,labels,self.otype) for I in self._best_inds]).transpose()
398
399
TypeError: 'NoneType' object is not iterable
implement truncation selection based on feature scores. if none, use estimated fitness
write DistanceClassifier as separate sklearn learner
it cannot be installed with pip since you are importing eigency inside setup.py and the requirements are not installed (yet)!
add option to assess ML model based on balanced accuracy.
add this configuration option.
I'm getting low utilization of the GPU using the tensorflow evaluation strategy. There are a few things to try:
use this method to profile tensorflow and see where the inefficiencies lay.
according to this, using feed_dict is not a good idea. need to look into using pipelines or variables for feeding input data to the graphs.
Currently, few does not accept an instance of numpy.random as random state.
Hello,
Thanks for the help so far. I was able to get the tool up and running in windows.
However, 2 weird things I am observing.
https://github.com/GinoWoz1/AdvancedHousePrices/blob/master/FEW_GB.ipynb
https://github.com/GinoWoz1/AdvancedHousePrices/blob/master/FEW_RF.ipynb
I think I am missing something on how to use this tool but no idea what. I am trying to use this in tandem with TPOT as I am exploring feature creation GA/GP based tools. Sincerely appreciate any advice/guidance you can provide.
Sincerely,
G
replace point mutation with a length-based probability of mutation at each node.
include option to mediate lexicase survival via size of programs for each selection event.
case statistics should just be calculated once rather than each selection event.
Concerning errors of the form
self.ml.named_steps = undefined
205 hasattr(self.ml.named_steps['ml'],'feature_importances_')):
208 coef = (self.ml.named_steps['ml'].coef_ if
AttributeError: 'SGDClassifier' object has no attribute 'named_steps'
when using FEW in GridSearchCV while changing the ML parameter. The pipeline object needs to be redefined in the fit method so that GridSearch can change self.ml and the pipeline gets updated.
track stalling in runs and act based on them.
stalling occurs when the engineered features are no longer improving either 1) the ML model CV score or 2) the median fitness of the features themselves.
if stalling occurs, there should be options to
Hi, I've cloned few, built and installed on OS X 10.12 using:
CC=gcc-7 python setup.py install
But I'm getting a symbol not found error on import of the few module.
I note a few warnings during the build process beginning with: #warning "Using deprecated NumPy API, disable it by ...
and then finally:
g++ -bundle -undefined dynamic_lookup -L/Users/robertreynolds/anaconda3/envs/ml/lib -arch x86_64 -L/Users/robertreynolds/anaconda3/envs/ml/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/few/lib/few_lib.o -o build/lib.macosx-10.7-x86_64-3.6/few_lib.cpython-36m-darwin.so
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
Any advice what to check next?
Otherwise, I'm not entirely clear on why I'm seeing a clang message, so that, along with the indicated warning is my first avenue to explore.
Greetings!
I have the following code:
feats_gen = FEW(
ml=DecisionTreeClassifier(random_state=10, max_depth=None, min_samples_leaf=5),
population_size=100, tourn_size=2,
mutation_rate=0.5, crossover_rate=0.5,
sel='epsilon_lexicase',
clean=True,
mdr=True, boolean=True,
random_state=10, verbosity=1,
scoring_function=roc_auc_score,
max_depth=10, min_depth=1, max_depth_init=1,
classification=True,
generations=50, max_stall=None,
names=list(X_train.select_dtypes(include=[np.number]).columns))
feats_gen.fit(X_train.select_dtypes(include=[np.number]).values,
y_train.astype(int).values)
test_ = preprocessing_pipeline.transform(e.test)
X_test = test_.X
y_test = test_[test_.target_name].astype(int)
roc_auc_score(y_test, feats_gen._best_estimator.predict_proba(feats_gen.transform(X_test.select_dtypes(include=[np.number]).values))[:, 1])
Everytime I run this code, I get different ROC AUC values in both training and test. I'm pretty sure preprocessing_pipeline
is deterministic.
currently the training data is split into training and validation sets and the best model is updated when a model with a higher validation score is found. we could simplify quite a bit and have a more robust validation measure by removing train_test_split
and the associated numpy arrays / fitting predicting code with a direct call to cross_val_score(self.ml,features,labels,cv=3)
or cross_val_score(self.ml,self.X[self.valid_loc(),:].transpose(),labels,cv=3)
.
occasional error:
File "/home/bill/anaconda3/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/home/bill/anaconda3/lib/python3.5/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 506, in
main()
File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 495, in main
learner.fit(training_features, training_labels)
File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 181, in fit
self.ml.fit(pop.X.transpose(),y_t)
File "/home/bill/anaconda3/lib/python3.5/site-packages/sklearn/linear_model/least_angle.py", line 1132, in fit
Lars.fit(self, X, y)
File "/home/bill/anaconda3/lib/python3.5/site-packages/sklearn/linear_model/least_angle.py", line 671, in fit
return_n_iter=True, positive=self.positive)
File "/home/bill/anaconda3/lib/python3.5/site-packages/sklearn/linear_model/least_angle.py", line 260, in lars_path
sign_active[n_active] = np.sign(C)
ValueError: cannot convert float NaN to integer
should not occur due to safe operator outputs
Hello!
Thanks for sharing your work, this is really cool!
I was wondering if you could provide a bit of explanation as to the difference between these two outputs of the algorithm.
Also, is there any (outside) documentation on all this?
Thanks in advance!
Kind regards,
Theodore.
write evaluation routine for features in c++ with eigen and interface with main codebase via Cython. Include distutils changes to support package distribution.
normalize feature transformations automatically before feeding them into the ML fit method. store the transformer so that it can be used in prediction/transformation as well.
incorporate tensor flow evaluation
incorporate a multi-output program representation for which each individual has its own ML model.
add operators that re-encode input SNPs based on different encodings. include (add, dom, rec, het, sub-add, super-add). Need to resolve how underlying data would be represented; maybe assume the input is additive?
Greetings!
I would like to know if there is any practical difference between the two projects. I'm asking this because testing feat would require a lot more effort than few and, as such, I need to know if it is worth it.
Thanks in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.