ramhiser / activelearning Goto Github PK
View Code? Open in Web Editor NEWActive Learning in R
Active Learning in R
As mentioned in #10, package is out of date. Main thing that needs to be updated is interface to caret
.
These were written a while back based on a limited use case. They lack uniformity across the various AL methods.
Consider the following data sets from the Active Learning Challenge.
Implement a function that checks if the caret
options are specified correctly.
This function:
uncertainty_sampling
.caret
.Because I have several active learning methods planned, it is important to have only one copy of this code.
In a series of emails with Chayaphon Tonwongvarl from TGGS University in Thailand, we have the following error:
x <- iris[, -5]
y <- iris[, 5]
y <- replace(y, -c(1:10, 51:60, 101:110), NA)
query_by_bagging(x = x, y = y, disagreement = "vote_entropy", classifier = "qda", num_query = 5)
The above code leads to the following errors:
Error in qda.default(x, grouping, ...) :
rank deficiency in group virginica
In addition: There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In eval(expr, envir, enclos) : model fit failed for Resample02: parameter=none
2: In eval(expr, envir, enclos) : model fit failed for Resample03: parameter=none
3: In eval(expr, envir, enclos) : model fit failed for Resample04: parameter=none
4: In eval(expr, envir, enclos) : model fit failed for Resample05: parameter=none
5: In eval(expr, envir, enclos) : model fit failed for Resample06: parameter=none
6: In eval(expr, envir, enclos) : model fit failed for Resample07: parameter=none
7: In eval(expr, envir, enclos)
: model fit failed for Resample08: parameter=none
8: In eval(expr, envir, enclos) : model fit failed for Resample09: parameter=none
9: In eval(expr, envir, enclos) : model fit failed for Resample11: parameter=none
10: In eval(expr, envir, enclos) :
model fit failed for Resample12: parameter=none
11: In eval(expr, envir, enclos) :
The issue appears to be that the sample covariance matrices computed by the qda
classifier are singular. In the reproducible example, there are 10 observations in each group, so it must be that caret
is resampling in such a way that is causing the error.
Given that we are using a wide range of classifiers, we need a better way of handling errors from the classifiers.
caret
in query_by_committee
query_by_committee
query_by_committee
to describe the usage of caret
For an example of how to do this, see the NEWS file from the clusteval package.
Hello John,
I was searching for a package on active learning when I came across your github repo. I was wondering whether this package is up to date, considering that the last update was in 2012. It also appears that the package is not compartible with R version >3.2.0.
Thanks,
Daniel
Rather than expecting unlabeled observations to be included with the training data, it makes far more sense to create an object from which a predict
call can be made. The predict
function would be applied to an unlabeled data set and indicates which observations should be queried.
To facilitate this feature, explore the new functions:
tidyr::nest()
tidyr::unnest()
tidyr::map()
I got this idea from listening to Hadley's talk at An Afternoon with Hadley Wickham and Friends. Slides?
Hi, I want to use a non-caret model for active learning. Is this possible using this function set ? Please advise. Thanks !!!
I got some error when calling query_bagging()
by following the example in the help page. Can you fix the bug? Thanks.
R> library(activelearning)
R>
R> x <- iris[, -5]
R> y <- iris[, 5]
R>
R> # For demonstration, suppose that few observations are labeled in 'y'.
R> y <- replace(y, -c(1:10, 51:60, 101:110), NA)
R>
R> fit_f <- function(x, y, ...) {
+ MASS::lda(x, y, ...)
+ }
R> predict_f <- function(object, x) {
+ predict(object, x)$class
+ }
R>
R> query_bagging(x=x, y=y, fit_f=fit_f, predict_f=predict_f, C=10)
Loading required package: lattice
Loading required package: ggplot2
Error in rowSums(obs * log(obs/avg_post)) :
'x' must be an array of at least two dimensions
Calls: query_bagging ... predict.bag -> <Anonymous> -> lapply -> FUN -> rowSums
In addition: There were 12 warnings (use warnings() to see them)
Execution halted
Hi ramhiser,
Thank you so much for this package. Is it not available on CRAN anymore though?
Mostly namespace issues.
> test()
Loading activelearning
Testing activelearning
Uncertainty Sampling : 1234567
1. Failure (at test-uncert_sampling.r#10): An error is thrown when the specified classifier is NULL
uncert_sampling(x = x, y = y, classifier = NULL) does not match 'A classifier must be specified'. Actual value: "Error in force(expr) : could not find function "uncert_sampling"\n"
2. Failure (at test-uncert_sampling.r#14): An error is thrown when the specified classifier is NULL
uncert_sampling(x = x, y = y, uncertainty = "least_confidence", classifier = NULL) does not match 'A classifier must be specified'. Actual value: "Error in force(expr) : could not find function "uncert_sampling"\n"
3. Failure (at test-uncert_sampling.r#22): An error is thrown when the specified classifier is NA
uncert_sampling(x = x, y = y, classifier = NA) does not match 'A classifier must be specified'. Actual value: "Error in force(expr) : could not find function "uncert_sampling"\n"
4. Failure (at test-uncert_sampling.r#26): An error is thrown when the specified classifier is NA
uncert_sampling(x = x, y = y, uncertainty = "least_confidence", classifier = NA) does not match 'A classifier must be specified'. Actual value: "Error in force(expr) : could not find function "uncert_sampling"\n"
5. Failure (at test-uncert_sampling.r#37): An error occurs when the classifier is not found in 'caret'
uncert_sampling(x = x, y = y, classifier = classifier) does not match 'Cannot find, 'wtf' in the 'caret' package'. Actual value: "Error in force(expr) : could not find function "uncert_sampling"\n"
6. Failure (at test-uncert_sampling.r#41): An error occurs when the classifier is not found in 'caret'
uncert_sampling(x = x, y = y, uncertainty = "margin", classifier = classifier) does not match 'Cannot find, 'wtf' in the 'caret' package'. Actual value: "Error in force(expr) : could not find function "uncert_sampling"\n"
7. Error: uncert_sampling works correctly with the LDA classifier and the iris data set
could not find function "uncert_sampling"
1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
Focus on the active learning methods that are implemented in version 0.1.
Simple tests for query_random
.
Update activelearning_sim
documentation so that the 'Details' section is more independent of the first paragraph.
Here's the output when I load the package.
!> library(activelearning)
Loading required package: caret
Loading required package: lattice
Loading required package: ggplot2
Loading required package: entropy
Loading required package: itertools2
Loading required package: mlbench
Loading required package: parallel
Warning messages:
1: replacing previous import by ‘caret::bag’ when loading ‘activelearning’
2: replacing previous import by ‘caret::bagControl’ when loading ‘activelearning’
3: replacing previous import by ‘entropy::entropy’ when loading ‘activelearning’
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.