Giter Club home page Giter Club logo

libact's Introduction

libact: Pool-based Active Learning in Python

authors: Yao-Yuan Yang, Shao-Chuan Lee, Yu-An Chung, Tung-En Wu, Si-An Chen, Hsuan-Tien Lin

Build Status Documentation Status PyPI version codecov.io

Introduction

libact is a Python package designed to make active learning easier for real-world users. The package not only implements several popular active learning strategies, but also features the active-learning-by-learning meta-algorithm that assists the users to automatically select the best strategy on the fly. Furthermore, the package provides a unified interface for implementing more strategies, models and application-specific labelers. The package is open-source along with issue trackers on github, and can be easily installed from Python Package Index repository.

Documentation

The technical report associated with the package is on arXiv, and the documentation for the latest release is available on readthedocs. Comments and questions on the package is welcomed at [email protected]. All contributions to the documentation are greatly appreciated!

Basic Dependencies

  • Python 2.7, 3.3, 3.4, 3.5, 3.6

  • Python dependencies

pip install -r requirements.txt
  • Debian (>= 7) / Ubuntu (>= 14.04)
sudo apt-get install build-essential gfortran libatlas-base-dev liblapacke-dev python3-dev
  • Arch
sudo pacman -S lapacke
  • macOS
brew install openblas

Installation

After resolving the dependencies, you may install the package via pip (for all users):

sudo pip install libact

or pip install in home directory:

pip install --user libact

or pip install from github repository for latest source:

pip install git+https://github.com/ntucllab/libact.git

To build and install from souce in your home directory:

python setup.py install --user

To build and install from souce for all users on Unix/Linux:

python setup.py build
sudo python setup.py install

Installation Options

  • LIBACT_BUILD_HINTSVM: set this variable to 1 if you would like to build hintsvm c-extension. If set to 0, you will not be able to use the HintSVM query strategy. Default=1.
  • LIBACT_BUILD_VARIANCE_REDUCTION: set this variable to 1 if you would like to build variance reduction c-extension. If set to 0, you will not be able to use the VarianceReduction query strategy. Default=1.

Example:

LIBACT_BUILD_HINTSVM=1 pip install git+https://github.com/ntucllab/libact.git

Usage

The main usage of libact is as follows:

qs = UncertaintySampling(trn_ds, method='lc') # query strategy instance

ask_id = qs.make_query() # let the specified query strategy suggest a data to query
X, y = zip(*trn_ds.data)
lb = lbr.label(X[ask_id]) # query the label of unlabeled data from labeler instance
trn_ds.update(ask_id, lb) # update the dataset with newly queried data

Some examples are available under the examples directory. Before running, use examples/get_dataset.py to retrieve the dataset used by the examples.

Available examples:

  • plot : This example performs basic usage of libact. It splits a fully-labeled dataset and remove some label from dataset to simulate the pool-based active learning scenario. Each query of an unlabeled dataset is then equivalent to revealing one labeled example in the original data set.
  • label_digits : This example shows how to use libact in the case that you want a human to label the selected sample for your algorithm.
  • albl_plot: This example compares the performance of ALBL with other active learning algorithms.
  • multilabel_plot: This example compares the performance of algorithms under multilabel setting.
  • alce_plot: This example compares the performance of algorithms under cost-sensitive multi-class setting.

Running tests

To run the test suite:

python setup.py test

To run pylint, install pylint through pip install pylint and run the following command in root directory:

pylint libact

To measure the test code coverage, install coverage through pip install coverage and run the following commands in root directory:

coverage run --source libact --omit */tests/* setup.py test
coverage report

Citing

If you find this package useful, please cite the original works (see Reference of each strategy) as well as the following

@techreport{YY2017,
  author = {Yao-Yuan Yang and Shao-Chuan Lee and Yu-An Chung and Tung-En Wu and Si-An Chen and Hsuan-Tien Lin},
  title = {libact: Pool-based Active Learning in Python},
  institution = {National Taiwan University},
  url = {https://github.com/ntucllab/libact},
  note = {available as arXiv preprint \url{https://arxiv.org/abs/1710.00379}},
  month = oct,
  year = 2017
}

Acknowledgments

The authors thank Chih-Wei Chang and other members of the Computational Learning Lab at National Taiwan University for valuable discussions and various contributions to making this package better.

libact's People

Contributors

alexandreabraham avatar ariapoy avatar dlackty avatar eugene-yang avatar hsuantien avatar iamyuanchung avatar jkleint avatar kh-huang avatar kjacks21 avatar lazywei avatar lsc36 avatar sian-chen avatar skgg avatar tungen avatar wadkar avatar yangarbiter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libact's Issues

Incompatibility with plotly and cufflinks

Hello,
I have found that your lib is not compatible with python packages plotly and cufflinks. I have tested it on fresh install of ubuntu 16.04 where anaconda was installed.
Everything was ok till installation of plotly and cufflinks:


pip install plotly --upgrade
pip install cufflinks --upgrade

Then running python setup.py test ends on this:

======================================================================
ERROR: query_strategies (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: query_strategies
Traceback (most recent call last):
  File "/path/anaconda3/lib/python3.5/unittest/loader.py", line 153, in loadTestsFromName
    module = __import__(module_name)
  File "/path/libact/libact/query_strategies/__init__.py", line 20, in <module>
    from ._variance_reduction import estVar
ImportError: /usr/lib/liblapacke.so.3: undefined symbol: dpotrf2_

Problems installing in Linux

Hello,

I am trying to install Libact in the HPC facilites of my university. However I am getting the following error every time I try to install it:

error: Command "gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/rmegret/irodriguez/anaconda3/envs/bee/lib/python3.6/site-packages/numpy/core/include -I/usr/include/lapacke -I/home/rmegret/irodriguez/anaconda3/envs/bee/include/python3.6m -c libact/query_strategies/src/variance_reduction/variance_reduction.c -o build/temp.linux-x86_64-3.6/libact/query_strategies/src/variance_reduction/variance_reduction.o -std=c11" failed with exit status 1

I have tried pip and cloning the repo and then using setup.py.

Just in case here is the specifications of the HPC: https://www.hpcf.upr.edu/documentation/boqueron/#ffs-tabbed-15

Clarify semantics of Model.predict_real

Currently Model.predict_real is connected to predict_proba in scikit-learn, which returns an array of n_classes floats standing for probabilities of corresponding labels. But decision_function is another candidate whose returning shapes vary from model to model, for example (in our case n_samples = 1):

  • LogisticRegression: (n_samples,) if n_classes == 2 else (n_samples, n_classes)
  • C-SVC: (n_samples, n_classes * (n_classes-1) / 2)

We have to make sure what we want in order to well-define the interface. @hsuantien can you give us some advice on this?

Identify whether the relabeling in sklearn will cause problem

Since sklearn internally relabels the given label to 0-n_labels. If I get it correctly, they do it in the order of data sending into the fit method.
So if after we updated an unlabeled data and cause the order of data sending into fit method to change. The value from predict_real method of our model might have wrong order.
One proposal for solving this problem could be manage relabeling set ourself in the model classes.

Enhancement for unit testing

For now, the unit tests for active learning algorithms are using the results of real-world data with fixed random seeds. So in the future if any modification to these algorithms have conflict with current test, it should be taken care carefully.

The rigorous way to do the test is to design artificial datasets. We'll leave it as future development goal.

Next stage

  1. Implement more classical query strategies.
  2. Add examples for using all query strategies.

Installation using pip fails for python 2

Tried to install libact using sudo pip install libact and got the following error message

libact/query_strategies/variance_reduction.c:26:15: error: variable ‘moduledef’ has initializer but incomplete type

You can see the full error message here.

I also tried to install using the setup.pyscript, which actually did work just fine, also the python3 installation worked using pip on the same machine.
I did some googling and the error looked similar to here, I cant look into it because setup.py worked.
Just wanted to let you guys know.

scikit-learn model adapter

Since we use scikit-learn models a lot, we should define an adapter from scikit-learn models to libact models.

Allow make_query to return multiple items (or the entire scored set)

In certain applications, you might want to know what the top N unlabelled entities are so that a human can go through and do batch labeling offline. Right now I have a particularly hacky way of getting multiple results out, just assuming the majority class in the update, but it would be great to tweak the make_query function to return arbitrary numbers of ordered results for batch label processing.
for i in range(20):
item_to_investigate = qs.make_query()
libact_ds.update(item_to_investigate, 0)
print item_to_investigate

Happy to contribute code to try to help this happen!

HintSVM mldataset - Buffer dtype mismatch error

Hi,

I try to use hintSVM query strategy with the vehicle dataset from mldata.
However, I don't understand why, I got the following error :

File "testing.py", line 60, in run
    ask_id = qs.make_query()
  File "/usr/local/lib/python3.5/site-packages/libact-0.1.2-py3.5-macosx-10.12-x86_64.egg/libact/query_strategies/hintsvm.py", line 151, in make_query
    np.array([x.tolist() for x in unlabeled_pool]), self.svm_params)
  File "libact/query_strategies/_hintsvm.pyx", line 16, in libact.query_strategies._hintsvm.hintsvm_query (libact/query_strategies/_hintsvm.c:1836)
ValueError: Buffer dtype mismatch, expected 'float64_t' but got 'long'

I don't have this error when I use others strategies (UncertaintySampling,Quire).

def split_scale_train_test(name_dataset,test_size):
    # choose a dataset with unbalanced class instances
    #data = sklearn.datasets.fetch_mldata('segment')
    data = sklearn.datasets.fetch_mldata(name_dataset)

    X = StandardScaler().fit_transform(data['data'])
    target = np.unique(data['target'])
    # mapping the targets to 0 to n_classes-1
    y = np.array([np.where(target == i)[0][0] for i in data['target']])

    X_trn, X_tst, y_trn, y_tst = \
        train_test_split(X, y, test_size=test_size, stratify=y)

    # making sure each class appears ones initially
    init_y_ind = np.array(
        [np.where(y_trn == i)[0][0] for i in range(len(target))])
    y_ind = np.array([i for i in range(len(X_trn)) if i not in init_y_ind])
    trn_ds = Dataset(
        np.vstack((X_trn[init_y_ind], X_trn[y_ind])),
        np.concatenate((y_trn[init_y_ind], [None] * (len(y_ind)))))

    tst_ds = Dataset(X_tst, y_tst)

    fully_labeled_trn_ds = Dataset(
        np.vstack((X_trn[init_y_ind], X_trn[y_ind])),
        np.concatenate((y_trn[init_y_ind], y_trn[y_ind])))

    cost_matrix = 2000. * np.random.rand(len(target), len(target))
    np.fill_diagonal(cost_matrix, 0)

    return trn_ds, tst_ds, y_trn,y_tst, fully_labeled_trn_ds, cost_matrix
def run(trn_ds, tst_ds, lbr, model, qs, quota):
    E_in, E_out = [], []
    score_train = []
    score_test = []

    for _ in range(quota):
        ask_id = qs.make_query()
        X, _ = zip(*trn_ds.data)
        lb = lbr.label(X[ask_id])
        trn_ds.update(ask_id, lb)

        model.train(trn_ds)
        E_in = np.append(E_in, 1 - model.score(trn_ds))
        E_out = np.append(E_out, 1 - model.score(tst_ds))
        score_train = np.append(score_train,model.score(trn_ds)*100)
        score_test = np.append(score_test,model.score(tst_ds)*100)

    return E_in, E_out,score_train,score_test
qs5 = HintSVM(trn_ds5, cl=1.0, ch=1.0, p=0.5)
        model = SVM(kernel='rbf',C = n_C, gamma = n_gamma, decision_function_shape='ovr')
        E_in_5, E_out_5,score_train_5,score_test_5 = run(trn_ds5, tst_ds, idealLabels, model, qs5, quota_to_query)
        results_out.append(E_out_5.tolist())
        results_score.append(score_test_5.tolist())

Do you have any insights about this error ?

thank you

Is specified version of Python is required when compiling? Compile error using "python setup.py install"

Hello, Thank you for providing this project

After I have installed the dependencies, I run
python setup.py install

But, I get some errors:

Platform Detection: Linux. Link to liblapacke...
running install
running build
running build_py
running build_ext
building 'libact.query_strategies._variance_reduction' extension
C compiler: x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC

compile options: '-I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/lapacke -I/usr/include/python2.7 -c'
extra options: '-std=c11'
x86_64-linux-gnu-gcc: libact/query_strategies/src/variance_reduction/variance_reduction.c
libact/query_strategies/src/variance_reduction/variance_reduction.c:26:15: error: variable ‘moduledef’ has initializer but incomplete type
static struct PyModuleDef moduledef = {
^
libact/query_strategies/src/variance_reduction/variance_reduction.c:27:5: error: ‘PyModuleDef_HEAD_INIT’ undeclared here (not in a function)
PyModuleDef_HEAD_INIT,
^
。。。 。。。 。。。
。。。 。。。 。。。

I wonder if I need to specify the version of Python, so I tried
python3 steup.py install
Still, I cannot install successfully, but the error changes
File "setup.py", line 13, in
from Cython.Build import cythonize
ImportError: No module named 'Cython'

However, I have already installed Cython using "pip install Cython"

It will be very kind of you if you could tell me the requirement of version of the installed dependencies

OR could you please tell how to modify the "-I/usr/include/lapacke -I/usr/include/python2.7" in the compile option

Many Thanks

QS: Model type check at constructor

For QSs that rely on a user-given model, a type checked should be performed since different QSs require different capabilities (e.g. UncertaintySampling requires a ContinuousModel).

Is there a way to perform batch mode active learning ?

Hi,

Instead of having of having unlabeled data which come as a stream, I would like to know if there is a way with libact to perform batch mode active learning meaning that the users can select multiples images at once (positive and negatives) ?

thank you in advance

Fix Travis Python 3.5 build

Python 3.5 seems to import everything before running unit tests, the _variance_reduction native extension is built and installed but import fails:

ImportError: Failed to import test module: libact.query_strategies
Traceback (most recent call last):
  File "/opt/python/3.5.0/lib/python3.5/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/opt/python/3.5.0/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/home/travis/build/ntucllab/libact/libact/query_strategies/__init__.py", line 16, in <module>
    from .variance_reduction import VarianceReduction
  File "/home/travis/build/ntucllab/libact/libact/query_strategies/variance_reduction.py", line 11, in <module>
    from libact.query_strategies import _variance_reduction
ImportError: cannot import name '_variance_reduction'

Build/install log of extension:

running build_ext
building 'libact.query_strategies._variance_reduction' extension
C compiler: gcc -pthread -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/libact
creating build/temp.linux-x86_64-3.5/libact/query_strategies
compile options: '-I/home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/numpy/core/include -I/opt/python/3.5.0/include/python3.5m -c'
extra options: '-std=c11'
Warning: Can't read registry to find the necessary compiler setting
Make sure that Python modules winreg, win32api or win32con are installed.
gcc: libact/query_strategies/variance_reduction.c
gcc -pthread -shared -L/opt/python/3.5.0/lib -Wl,-rpath=/opt/python/3.5.0/lib build/temp.linux-x86_64-3.5/libact/query_strategies/variance_reduction.o -L/opt/python/3.5.0/lib -lpython3.5m -o build/lib.linux-x86_64-3.5/libact/query_strategies/_variance_reduction.cpython-35m-x86_64-linux-gnu.so -llapacke -llapack -lblas
running install_lib
creating /home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/libact
creating /home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/libact/query_strategies
copying build/lib.linux-x86_64-3.5/libact/query_strategies/_variance_reduction.cpython-35m-x86_64-linux-gnu.so -> /home/travis/virtualenv/python3.5.0/lib/python3.5/site-packages/libact/query_strategies

Probabilistic models

I would like to ask you about which classsifiers are theorized as Probabilistic so as to be combined with query strategies like Uncertainty Sampling?

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.