dataiku-research / cardinal Goto Github PK

View Code? Open in Web Editor NEW

51.0 51.0 6.0 17.16 MB

A practical Active Learning python package with a strong focus on experiments.

Home Page: https://dataiku-research.github.io/cardinal/

License: Apache License 2.0

Python 99.81% Makefile 0.19%

active-learning machine-learning python

cardinal's People

Contributors

Stargazers

Watchers

Forkers

simonamaggio micseb softwareimpacts arshahin mojifarmanbar

cardinal's Issues

Multi-label support

Thanks for sharing this package and writing the paper.

It would be nice if this package also supported binary multi-label classification problems.

What would be the best way to aggregate for instance "Smallest Margin" computed for each label into a per-sample score in your opinion?

Control for random seed in `train_test_split ` in experimental branch

Placeholder more than anything as it's not merged/nor PR-ready, just so we don't forget it.
In the benchmarking script here, we should control the random_state. So the permutation done before could be skipped with using the seed in place here.

Margin sampling fails with a bad error message if there is only one class in the predictions

We should not expect the classifier to return at least several proba. The error is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-2b90468e1fd1> in <module>
      4                                   n_init_points,
      5                                   base_model,
----> 6                                   semi_model)
<ipython-input-25-008649b258dd> in run_experiment(X_train, X_test, y_train, y_test, batch_size, n_iter, n_init_points, base_model, semi_model)
    118                 sampler.fit(X_train[selected], y_train[selected])
    119 
--> 120                 new_selected = sampler.select_samples(X_train[~selected])
    121                 new_selected = new_selected.astype(int)
    122                 selected[index[~selected][new_selected]] = True
~/dss/code-envs/python/alssl/lib/python3.6/site-packages/cardinAL/base.py in select_samples(self, X, strategy)
     29 
     30     def select_samples(self, X, strategy='top'):
---> 31         sample_scores = self.score_samples(X)
     32         self.sample_scores_ = sample_scores
     33         if strategy == 'top':
~/dss/code-envs/python/alssl/lib/python3.6/site-packages/cardinAL/uncertainty.py in score_samples(self, X)
    175             predictions (np.array): Returns an array where selected samples are classified as 1.
    176         """
--> 177         return margin_score(self.classifier_, X)
    178 
    179 
~/dss/code-envs/python/alssl/lib/python3.6/site-packages/cardinAL/uncertainty.py in margin_score(classifier, X)
     45     """
     46     classwise_uncertainty = _get_probability_classes(classifier, X)
---> 47     part = np.partition(classwise_uncertainty, -2, axis=1)
     48     margin = 1 - (part[:, -1] - part[:, -2])
     49     return margin
<__array_function__ internals> in partition(*args, **kwargs)
~/dss/code-envs/python/alssl/lib/python3.6/site-packages/numpy/core/fromnumeric.py in partition(a, kth, axis, kind, order)
    744     else:
    745         a = asanyarray(a).copy(order="K")
--> 746     a.partition(kth, axis=axis, kind=kind, order=order)
    747     return a
    748 
ValueError: kth(=-1) out of bounds (1)

Zhadnov select_sample weights wrong dimension

Description: In the select_sample method of Zhdanov query sampler, the weight attributes is not usable as is. First, it's passed to the Kmean's second step of different dimension than the X array. Most importantly as the second step depends on the selection done through the MarginSampling sampler, there is no way to know a priori the weights corresponding to the selection.

Proposed fix: Add the selection as:

new_selected = self.sampler_list[1].select_samples(
            X[selected], sample_weight=sample_weight[selected])

dataiku-research / cardinal Goto Github PK

cardinal's People

Contributors

Stargazers

Watchers

Forkers

cardinal's Issues

Multi-label support

Control for random seed in `train_test_split ` in experimental branch

Margin sampling fails with a bad error message if there is only one class in the predictions

Zhadnov select_sample weights wrong dimension

Handle sparse matrices and precomputed distance

Make keras optional

Implement Zhdanov's WKMeans and add it to an example.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent