dataiku-research / cardinal Goto Github PK
View Code? Open in Web Editor NEWA practical Active Learning python package with a strong focus on experiments.
Home Page: https://dataiku-research.github.io/cardinal/
License: Apache License 2.0
A practical Active Learning python package with a strong focus on experiments.
Home Page: https://dataiku-research.github.io/cardinal/
License: Apache License 2.0
Thanks for sharing this package and writing the paper.
It would be nice if this package also supported binary multi-label classification problems.
What would be the best way to aggregate for instance "Smallest Margin" computed for each label into a per-sample score in your opinion?
Placeholder more than anything as it's not merged/nor PR-ready, just so we don't forget it.
In the benchmarking script here, we should control the random_state
. So the permutation done before could be skipped with using the seed in place here.
We should not expect the classifier to return at least several proba. The error is:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-30-2b90468e1fd1> in <module>
4 n_init_points,
5 base_model,
----> 6 semi_model)
<ipython-input-25-008649b258dd> in run_experiment(X_train, X_test, y_train, y_test, batch_size, n_iter, n_init_points, base_model, semi_model)
118 sampler.fit(X_train[selected], y_train[selected])
119
--> 120 new_selected = sampler.select_samples(X_train[~selected])
121 new_selected = new_selected.astype(int)
122 selected[index[~selected][new_selected]] = True
~/dss/code-envs/python/alssl/lib/python3.6/site-packages/cardinAL/base.py in select_samples(self, X, strategy)
29
30 def select_samples(self, X, strategy='top'):
---> 31 sample_scores = self.score_samples(X)
32 self.sample_scores_ = sample_scores
33 if strategy == 'top':
~/dss/code-envs/python/alssl/lib/python3.6/site-packages/cardinAL/uncertainty.py in score_samples(self, X)
175 predictions (np.array): Returns an array where selected samples are classified as 1.
176 """
--> 177 return margin_score(self.classifier_, X)
178
179
~/dss/code-envs/python/alssl/lib/python3.6/site-packages/cardinAL/uncertainty.py in margin_score(classifier, X)
45 """
46 classwise_uncertainty = _get_probability_classes(classifier, X)
---> 47 part = np.partition(classwise_uncertainty, -2, axis=1)
48 margin = 1 - (part[:, -1] - part[:, -2])
49 return margin
<__array_function__ internals> in partition(*args, **kwargs)
~/dss/code-envs/python/alssl/lib/python3.6/site-packages/numpy/core/fromnumeric.py in partition(a, kth, axis, kind, order)
744 else:
745 a = asanyarray(a).copy(order="K")
--> 746 a.partition(kth, axis=axis, kind=kind, order=order)
747 return a
748
ValueError: kth(=-1) out of bounds (1)
Description: In the select_sample
method of Zhdanov query sampler, the weight attributes is not usable as is. First, it's passed to the Kmean's second step of different dimension than the X array. Most importantly as the second step depends on the selection done through the MarginSampling sampler, there is no way to know a priori the weights corresponding to the selection.
Proposed fix: Add the selection as:
new_selected = self.sampler_list[1].select_samples(
X[selected], sample_weight=sample_weight[selected])
So far I did not find a non-hacky way to deal with distance precomputation when dealing with sparse matrices. In particular, to reproduce zhdanov results, one wants to chain uncertainty sampling with submodularity.
As of today, the package load keras even if it not needed. This is a bummer since it makes the install unnecessarily more complicated and runtime slower too.
As the title says
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.