ramhiser / activelearning Goto Github PK

View Code? Open in Web Editor NEW

44.0 44.0 12.0 918 KB

Active Learning in R

R 100.00%

active-learning machine-learning r

activelearning's People

Contributors

Stargazers

Watchers

Forkers

arturochian emaasit welch16 terry07 zhang-m suraj-deshmukh jimsow mlliarm francoisjaulin adamshapr jwijffels lichenbiostat

activelearning's Issues

Update interface to caret

As mentioned in #10, package is out of date. Main thing that needs to be updated is interface to caret.

Refactor disagreement methods

These were written a while back based on a limited use case. They lack uniformity across the various AL methods.

Add benchmark data sets to package

Consider the following data sets from the Active Learning Challenge.

Create a single check of correct arguments for caret

Implement a function that checks if the caret options are specified correctly.

This function:

Should essentially be a copy-paste of the first few error-checking lines in uncertainty_sampling.
Will be used in all activelearning methods that are built on top of caret.

Because I have several active learning methods planned, it is important to have only one copy of this code.

Error in Query by Bagging

In a series of emails with Chayaphon Tonwongvarl from TGGS University in Thailand, we have the following error:

 x <- iris[, -5]
y <- iris[, 5]
y <- replace(y, -c(1:10, 51:60, 101:110), NA)
query_by_bagging(x = x, y = y, disagreement = "vote_entropy", classifier = "qda", num_query = 5)

The above code leads to the following errors:

Error in qda.default(x, grouping, ...) : 
  rank deficiency in group virginica
In addition: There were 50 or more warnings (use warnings() to see the first 50)

> warnings()
Warning messages:
1: In eval(expr, envir, enclos) : model fit failed for Resample02: parameter=none
2: In eval(expr, envir, enclos) : model fit failed for Resample03: parameter=none
3: In eval(expr, envir, enclos) : model fit failed for Resample04: parameter=none
4: In eval(expr, envir, enclos) : model fit failed for Resample05: parameter=none
5: In eval(expr, envir, enclos) : model fit failed for Resample06: parameter=none
6: In eval(expr, envir, enclos) : model fit failed for Resample07: parameter=none
7: In eval(expr, envir, enclos)
 : model fit failed for Resample08: parameter=none
8: In eval(expr, envir, enclos) : model fit failed for Resample09: parameter=none
9: In eval(expr, envir, enclos) : model fit failed for Resample11: parameter=none
10: In eval(expr, envir, enclos) :
  model fit failed for Resample12: parameter=none
11: In eval(expr, envir, enclos) :

The issue appears to be that the sample covariance matrices computed by the qda classifier are singular. In the reproducible example, there are 10 observations in each group, so it must be that caret is resampling in such a way that is causing the error.

Given that we are using a wide range of classifiers, we need a better way of handling errors from the classifiers.

Refactor the 'query_by_committee' function

Implement the usage of caret in query_by_committee
Implement an example for query_by_committee
Update documentation in query_by_committee to describe the usage of caret

Update the NEWS file with initial features.

For an example of how to do this, see the NEWS file from the clusteval package.

Finalize package for initial push to CRAN

Is the package updated

Hello John,
I was searching for a package on active learning when I came across your github repo. I was wondering whether this package is up to date, considering that the last update was in 2012. It also appears that the package is not compartible with R version >3.2.0.

Thanks,
Daniel

Stop querying based on unlabeled data in training data

Rather than expecting unlabeled observations to be included with the training data, it makes far more sense to create an object from which a predict call can be made. The predict function would be applied to an unlabeled data set and indicates which observations should be queried.

To facilitate this feature, explore the new functions:

tidyr::nest()
tidyr::unnest()
tidyr::map()
etc.

I got this idea from listening to Hadley's talk at An Afternoon with Hadley Wickham and Friends. Slides?

R> library(activelearning)
R> 
R> x <- iris[, -5]
R> y <- iris[, 5]
R> 
R> # For demonstration, suppose that few observations are labeled in 'y'.
R> y <- replace(y, -c(1:10, 51:60, 101:110), NA)
R> 
R> fit_f <- function(x, y, ...) {
+   MASS::lda(x, y, ...)
+ }
R> predict_f <- function(object, x) {
+   predict(object, x)$class
+ }
R> 
R> query_bagging(x=x, y=y, fit_f=fit_f, predict_f=predict_f, C=10)
Loading required package: lattice
Loading required package: ggplot2
Error in rowSums(obs * log(obs/avg_post)) : 
  'x' must be an array of at least two dimensions
Calls: query_bagging ... predict.bag -> <Anonymous> -> lapply -> FUN -> rowSums
In addition: There were 12 warnings (use warnings() to see them)
Execution halted

not available anymore?

Hi ramhiser,

Thank you so much for this package. Is it not available on CRAN anymore though?

Unit tests fail miserably

Mostly namespace issues.

> test()
Loading activelearning
Testing activelearning
Uncertainty Sampling : 1234567

1. Failure (at test-uncert_sampling.r#10): An error is thrown when the specified classifier is NULL
uncert_sampling(x = x, y = y, classifier = NULL) does not match 'A classifier must be specified'. Actual value: "Error in force(expr) : could not find function "uncert_sampling"\n"

2. Failure (at test-uncert_sampling.r#14): An error is thrown when the specified classifier is NULL
uncert_sampling(x = x, y = y, uncertainty = "least_confidence", classifier = NULL) does not match 'A classifier must be specified'. Actual value: "Error in force(expr) : could not find function "uncert_sampling"\n"

3. Failure (at test-uncert_sampling.r#22): An error is thrown when the specified classifier is NA
uncert_sampling(x = x, y = y, classifier = NA) does not match 'A classifier must be specified'. Actual value: "Error in force(expr) : could not find function "uncert_sampling"\n"

4. Failure (at test-uncert_sampling.r#26): An error is thrown when the specified classifier is NA
uncert_sampling(x = x, y = y, uncertainty = "least_confidence", classifier = NA) does not match 'A classifier must be specified'. Actual value: "Error in force(expr) : could not find function "uncert_sampling"\n"

5. Failure (at test-uncert_sampling.r#37): An error occurs when the classifier is not found in 'caret'
uncert_sampling(x = x, y = y, classifier = classifier) does not match 'Cannot find, 'wtf' in the 'caret' package'. Actual value: "Error in force(expr) : could not find function "uncert_sampling"\n"

6. Failure (at test-uncert_sampling.r#41): An error occurs when the classifier is not found in 'caret'
uncert_sampling(x = x, y = y, uncertainty = "margin", classifier = classifier) does not match 'Cannot find, 'wtf' in the 'caret' package'. Actual value: "Error in force(expr) : could not find function "uncert_sampling"\n"

7. Error: uncert_sampling works correctly with the LDA classifier and the iris data set
could not find function "uncert_sampling"
1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)

!> library(activelearning)
 Loading required package: caret
 Loading required package: lattice
 Loading required package: ggplot2
 Loading required package: entropy
 Loading required package: itertools2
 Loading required package: mlbench
 Loading required package: parallel
 Warning messages:
 1: replacing previous import by ‘caret::bag’ when loading ‘activelearning’
 2: replacing previous import by ‘caret::bagControl’ when loading ‘activelearning’
 3: replacing previous import by ‘entropy::entropy’ when loading ‘activelearning’