nth-iteration-labs / contextual Goto Github PK
View Code? Open in Web Editor NEWContextual Bandits in R - simulation and evaluation of Multi-Armed Bandit Policies
Home Page: https://nth-iteration-labs.github.io/contextual/
Contextual Bandits in R - simulation and evaluation of Multi-Armed Bandit Policies
Home Page: https://nth-iteration-labs.github.io/contextual/
Thanks for your really quick fix last time. Here I'm having a problem with plotting. Using abline()
does not draw a line where expected. Also, using grid()
does not give the expected results.
Start R
and run the example from ?EpsilonGreedyPolicy
with the following:
library(contextual)
horizon <- 100L
simulations <- 100L
weights <- c(0.9, 0.1, 0.1)
policy <- EpsilonGreedyPolicy$new(epsilon = 0.1)
bandit <- BasicBernoulliBandit$new(weights = weights)
agent <- Agent$new(policy, bandit)
history <- Simulator$new(agent, horizon, simulations, do_parallel = FALSE)$run()
plot(history, type = "cumulative")
abline(h=0, col=3, lwd=4)
abline(h=1, col=3, lwd=4)
abline(v=0, col=3, lwd=4)
abline(v=1, col=3, lwd=4)
grid()
In the image below, the green lines show the locations of 0 and 1 according to abline
. The misalignment with the axis labels makes it difficult to use abline
on contextual
plots. Also, the faint dashed lines of grid
do not align with the axis ticks.
sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-redhat-linux-gnu (64-bit)
## Running under: CentOS release 6.10 (Final)
##
## other attached packages:
## [1] data.table_1.11.4 contextual_0.9.8.2
In your Section 7.1 of your paper you read in data from https://raw.githubusercontent.com/Nth-iteration-labs/contextual_data/master/data_cmab_basic/dataset.txt. However, trying to read it with fread
and visiting it in a browser result in a 404 error.
Without the data I am unable to reproduce that part of the paper. Are there plans to restore the data or is there another location?
We are contacting you because you are the maintainer of contextual, which imports ggplot2 and uses vdiffr to manage visual test cases. The upcoming release of ggplot2 includes several improvements to plot rendering, including the ability to specify lineend
and linejoin
in geom_rect()
and geom_tile()
, and improved rendering of text. These improvements will result in subtle changes to your vdiffr dopplegangers when the new version is released.
Because vdiffr test cases do not run on CRAN by default, your CRAN checks will still pass. However, we suggest updating your visual test cases with the new version of ggplot2 as soon as possible to avoid confusion. You can install the development version of ggplot2 using remotes::install_github("tidyverse/ggplot2")
.
If you have any questions, let me know!
Dear developers,
Thank you for your developing and maintaining the CMAB package. I benefit a lot from it in my own research.
I encounter a problem recently. I use offline bandit and CMAB (ConlinTS and ConlinUCB) policies for offline bandit evaluation. I also include MAB policy (UCB1 and TS) as benchmark in the same agent definition together with CMAB policies for simulation.
Here is my code:
f2 <- DV~ arm| covariates| r.1...|p
bandit <- OfflineDoublyRobustBandit$new(formula = f2, data = data, randomize = FALSE)
agents <- list(Agent$new(LinUCBDisjointOptimizedPolicy$new(1), bandit, "LinUCB"),
Agent$new(ContextualLinTSPolicy$new(v=0.2), bandit, "ConLinTS"),
Agent$new(EpsilonGreedyPolicy$new(epsilon = 0.5), bandit, "EGreedy"),
Agent$new(UCB1Policy$new(), bandit, "UCB1"),
Agent$new(ThompsonSamplingPolicy$new(1,1), bandit, "TS"),
Agent$new(RandomPolicy$new(), bandit, "Random"))
simulation <- Simulator$new(agents = agents, simulations = 1, horizon = 30,000, save_context = TRUE, worker_max=32)
It runs well if the number of simulations is set to 1 or 2. However, if I want to do more than 2 simulations, I got error --"Error in { : task 1 failed - "missing value where TRUE/FALSE needed"" after it started main loop.
I did several debuggings, and found this problem may occur only for the MAB policies. I can run 10 simulations if I use CMAB and Random policies separately. But I can only run 2 simulations for MAB policies. More than 2 simulations render the same error. In addition, there are warning messages. In addition: Warning messages:
1: In for (v in val) { :
closing unused connection 5 (<-kubernetes.docker.internal:11190)
2: In for (v in val) { :
closing unused connection 4 (<-kubernetes.docker.internal:11190)
3: In for (v in val) { :
closing unused connection 3 (<-kubernetes.docker.internal:11190)
Do you have any idea what happens?
Thank you so much!
Best,
Han
Hello Robin, I wanted to point out the following as it confused me for a while.
I was trying to set a random seed outside of Simulator
but my data was not being randomised properly.
Then I remembered that you mention in your documentation that calling Simulator
sets a random seed so that the simulations are replicable.
Because Simulator
resets the random seed each time it's called, naively doing the following:
horizon <- 100L
simulations <- 100L
weights <- c(0.9, 0.1, 0.1)
policy <- EpsilonGreedyPolicy$new(epsilon = 0.1)
bandit <- BasicBernoulliBandit$new(weights = weights)
agent <- Agent$new(policy, bandit)
for (i in 1:2) {
history <- Simulator$new(agent, horizon, simulations)$run()
print(runif(1))
}
results in identical values from the two calls of runif(1)
.
I came up with the following simple solution which works in my case (it sets a unique seed from the loop index):
for (i in 1:2) {
history <- Simulator$new(agent, horizon, simulations, set_seed= i)$run()
print(runif(1))
}
However, I wondered if there was a way for Simulator
to only set its own internal random seed and not reset it globally each time it is called.
I very much like your contextual package! Thanks for sharing it.
Below is a description of how to reproduce this bug (possibly just a documentation bug).
Start R and run the example from ?ContextualBinaryBandit
with the following:
library(contextual)
library(data.table)
horizon <- 100
sims <- 100
policy <- EpsilonGreedyPolicy$new(epsilon = 0.1)
bandit <- ContextualBinaryBandit$new(weights = c(0.6, 0.1, 0.1))
agent <- Agent$new(policy,bandit)
## Error in rep(list(self$theta_to_arms[[param_index]]), k) :
## invalid 'times' argument
sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-redhat-linux-gnu (64-bit)
## Running under: CentOS release 6.10 (Final)
##
## other attached packages:
## [1] data.table_1.11.4 contextual_0.9.8.2
I was reviewing the implementation of Exp3 algorithm in this function, with the help of the formula posted in this link. I think the last statement of the formula is wrong, since it is only updating the last element of probs
and not all the elements (seems like it should be inside the loop):
get_action = function(t, context) {
probs <- rep(0.0, context$k)
for (i in 1:context$k) {
probs[i] <- (1 - gamma) * (self$theta$weight[[i]] / sum_of(self$theta$weight))
}
inc(probs[i]) <- ((gamma) * (1.0 / context$k)) # <--------
action$choice <- categorical_draw(probs)
action
},
So I think it must be corrected as it follows:
get_action = function(t, context) {
probs <- rep(0.0, context$k)
for (i in 1:context$k) {
probs[i] <- (1 - gamma) * (self$theta$weight[[i]] / sum_of(self$theta$weight))
inc(probs[i]) <- ((gamma) * (1.0 / context$k)) # <-------
}
action$choice <- categorical_draw(probs)
action
},
Could you confirm this is a bug? Then I can make a simple PR to correct it.
Hi Robin, I was looking through your recent demo on the Simpson's paradox and I realised that there might be a wrong/false statement.
Instead of " you’d falsely conclude Sports is more popular then Movies, overall", I think the statement should be alluding to the wrong conclusion that "Movies is more popular than Sports".
Thanks for coming up with the awesome package!
In the documentation from ?ContextualEpochGreedyPolicy
it says, under Usage
,
policy <- EpsilonGreedyPolicy(epsilon = 0.1)
which is a different policy, that is, epsilon greedy instead of epoch greedy.
I think the correction would be
policy <- ContextualEpochGreedyPolicy $ new (sZl = 0.1)
sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-redhat-linux-gnu (64-bit)
## Running under: CentOS release 6.10 (Final)
##
## other attached packages:
## [1] data.table_1.11.4 contextual_0.9.8.2
Dear Robin,
This is not a bug report but more like a new feature request.
We know that the theta is updated after the agents' every interaction with the bandit. What I want to ask is that is it possible to save the "trained" agent with the theta for later use on another dataset. The logic behind this is that the trained agent acts as an oracle/ground truth of the environment, then I want to add a benchmark full information model based on this oracle.In this way, I can look at what is the maximum reward I can theoretically get if I initiate my offline evaluation with this oracle, without knowing the ground truth until the ends of my simulation.
Basically, to achieve this goal, I need to save the trained agents with the thetas, and break the thata updating chain and hold the thetas unchanged when used for another dataset.
Thank you so much for your help!
Best,
Han
By setting a new seed each time another round of simulation is run (if the number of simulations run is greater than 1), is the program reordering the input data? For example, if we have 10 data points, would they be ordered differently across different simulations? If not, what exactly is reassigning the seed doing in each round of simulation?
First, thank you for the awesome package.
I'm wondering how to get a prediction from the test data.
Let's say that I build online advertisement recommendation system using multi armed bandit problem. I build simulator and history for certain data(Replay Evaluator bandit, Linear UCB policy with 100-dimensional context vectors). I want to see what's the optimal output(what advertisement should be represented) with my test data.
I tried "predict" function with simulator or agents, and none of them worked. I really look forward to your help.
Thanks!
Hi!
I try to run on my MRAN 3.5.1:
history <- Simulator$new(agents = agent,
horizon = horizon,
simulations = 1000,do_parallel = T)$run()
but recieve :
Setting up parallel backend.
Cores available: 4
Workers assigned: 3
Simulation horizon: 250
Number of simulations: 5000 # this also stay unchanged!
Number of batches: 3
Starting main loop.
Error in gp$globals[[match(s, syms)]] : subscript out of bounds
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Matrix products: default
locale:
[1] LC_COLLATE=Russian_Russia.1251 LC_CTYPE=Russian_Russia.1251 LC_MONETARY=Russian_Russia.1251
[4] LC_NUMERIC=C LC_TIME=Russian_Russia.1251
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] itertools_0.1-3 iterators_1.0.10 data.table_1.11.4 contextual_0.9.8.3 RevoUtils_11.0.1
[6] RevoUtilsMath_11.0.0
loaded via a namespace (and not attached):
[1] codetools_0.2-15 listenv_0.7.0 future_1.9.0 withr_2.1.2 digest_0.6.15 foreach_1.5.0
[7] R.methodsS3_1.7.1 R6_2.3.0 R.devices_2.16.0 doParallel_1.0.13 R.oo_1.22.0 R.utils_2.6.0
[13] devtools_1.13.6 Formula_1.2-3 rjson_0.2.20 tools_3.5.1 yaml_2.2.0 parallel_3.5.1
[19] compiler_3.5.1 base64enc_0.1-3 globals_0.12.1 memoise_1.1.0
the argument 'k' in the initialize_theta(k) function in 'policy.R' is invalid.
Error in rep(list(self$theta_to_arms[[param_index]]), k) :
invalid 'times' argument
The change in itself had been discussed somewhat on the R-devel
mailing list before it happened,
starting here:
https://stat.ethz.ch/pipermail/r-devel/2020-February/079049.html
and also a bit afterwards:
https://stat.ethz.ch/pipermail/r-devel/2020-February/079061.html
This is quite a minor point, but the documentation from ?LinUCBGeneralPolicy
mentions "Algorithm 1 LinUCB" in the paper by Lihong Li et all (2010), whereas the documentation from ?LinUCBDisjointPolicy
does not. However, comparing Algorithm 1 from the paper with your code it seems that LinUCBDisjointPolicy
is exactly "Algorithm 1 LinUCB" from the paper, whereas LinUCBGeneralPolicy
is similar but not the same (more like the other part of the LinUCB hydrid model that is not in LinUCBDisjointPolicy). If I'm right, I think it would helpful for the documentation to state explicitly that LinUCBDisjointPolicy
is "Algorithm 1 LinUCB" from Li's paper.
Also, the description for the documentation from ?LinUCBHybridPolicy
refers to LinUCBHybridOptimizedPolicy
. I guess that both LinUCBHybridPolicy
and LinUCBHybridOptimizedPolicy
are exactly "Algorithm 2 LinUCB with hybrid linear models" from Li's paper. Again, if I'm correct, I think it would helpful to state this explicitly in the documentation.
Hi,
this may be a stupid question but I don't see how I can choose the discount factor of the bandits? What value is it by default?
Thanks
I have offline data available in the following format:
time | reward_arm1 | reward_arm2| reward_arm3|
This is not a contextual bandit case as there is only reward of each arm at time t. I want to implement UCB1, UCB2 and other context-free algorithms on this data. I searched for a demo on context free custom data however, I could not find any. So, I am creating a custom context-free bandit from one of the demos available.
@robinvanemden
For example in your myocardial example, there are two arms - treatment and no_treatment - each has its own reward. The computed R1 and R2 in the example can be the rewards of the arms.
I tried to create a new dataset with just these two columns R1 and R2 as follows:
# Import myocardial infection dataset
url <- "http://d1ie9wlkzugsxr.cloudfront.net/data_propensity/myocardial_propensity.csv"
data <- fread(url)
simulations <- 1
horizon <- nrow(data)
data$trt <- data$trt + 1
data$alive <- abs(data$death - 1)
f <- alive ~ age + risk + severity
model_f <- function(arm) glm(f, data=data[trt==arm], family=binomial(link="logit"), y=F, model=F)
arms <- sort(unique(data$trt))
model_arms <- lapply(arms, FUN = model_f)
predict_arm <- function(model) predict(model, data, type = "response")
r_data <- lapply(model_arms, FUN = predict_arm)
r_data <- do.call(cbind, r_data)
colnames(r_data) <- paste0("R", (1:max(arms)))
data <- cbind(data,r_data)
# extracting only R1 and R2
data <- data[,8:9]
I am trying to creating a bandit out of this data and run the algorithms but it does not work.
# New-changed formula
#f <- alive ~ trt | age + risk + severity | R1 + R2
#2
f <- alive ~ R1 + R2
#bandit <- OfflineDirectMethodBandit$new(formula = f, data = data)
bandit <- OfflineDirectMethodBandit$new( data = data)
# Define agents.
#agents <- list(Agent$new(LinUCBDisjointOptimizedPolicy$new(0.2), bandit, "LinUCB"))
agents <- list(Agent$new(UCB1Policy$new(), bandit, "UCB"))
simulation <- Simulator$new(agents = agents, simulations = simulations, horizon = horizon, do_parallel = FALSE)
sim <- simulation$run()
plot(sim, type = "cumulative", regret = FALSE, rate = TRUE, legend_position = "bottomright")
Especially, the simulation step does not run. Can you please point me towards a minimum working example of running UCB1, or UCB2 on a context-free custom data?
Hi, I'd like to create a custom bandit bernoulli where each context/state has a different probability of occuring. For instance, I want context 1 to appear 70% of the time, and context 2 only 30% of the time. Here is what I have tried:
I have copy-pasted the code of the ContextualBernoulliBandit, and added a prob
argument to the initialize function, added a line with self$p <- length(self$prob) and adapted the line where the active feature is randomly chosen like so:
Xa <- sample(c(1,rep(0,self$d-1)), prob = self$p)
Below the full code of the class:
ContextualBernoulliBandit2 <- R6::R6Class(
inherit = ContextualBernoulliBandit,
class = FALSE,
public = list(
weights = NULL,
class_name = "ContextualBernoulliBandit2",
initialize = function(weights, prob) {
self$weights <- weights
self$prob <- prob
if (is.vector(weights)) {
self$weights <- matrix(weights, nrow = 1L)
} else {
self$weights <- weights # d x k weight matrix
}
self$d <- nrow(self$weights) # d features
self$k <- ncol(self$weights) # k arms
**self$p <- length(self$prob)**
},
get_context = function(t) {
# generate d dimensional feature vector, one random feature active at a time
Xa <- sample(c(1,rep(0,self$d-1)), prob = self$p)
context <- list(
X = Xa,
k = self$k,
d = self$d,
**p = self$p**
)
},
get_reward = function(t, context, action) {
# which arm was selected?
arm <- action$choice
# d dimensional feature vector for chosen arm
Xa <- context$X
# weights of active context
weight <- Xa %*% self$weights
# assign rewards for active context with weighted probs
rewards <- as.double(weight > runif(self$k))
optimal_arm <- which_max_tied(weight)
reward <- list(
reward = rewards[arm],
optimal_arm = optimal_arm,
optimal_reward = rewards[optimal_arm]
)
}
)
)
Now when I try to run this, I get the error message mentioned in the title:
horizon <- 10000L
simulations <- 1L
# S----M------------> Arm 1: Sport
# | | Arm 2: Movie
# | |
weights <- matrix( c(0.4, 0.3, #-----> Context: Male
0.8, 0.7), #-----> Context: Female
nrow = 2, ncol = 2, byrow = TRUE)
policy <- RandomPolicy$new()
bandit <- ContextualBernoulliBandit2$new(weights = weights, prob = c(0.7, 0.3))
Error in self$prob <- prob : cannot add bindings to a locked environment
--
I am not very familiar with R6 classes, which is the cause of this error. Any help would be appreciated!
Here's some info on my session:
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: openSUSE Tumbleweed
Matrix products: default
BLAS/LAPACK: /home/cbrunos/miniconda3/envs/r_env/lib/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 LC_PAPER=en_US.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] contextual_0.9.8.2
loaded via a namespace (and not attached):
[1] codetools_0.2-16 foreach_1.4.7 R.methodsS3_1.7.1 R6_2.4.1 R.devices_2.16.1 itertools_0.1-3
[7] data.table_1.12.8 doParallel_1.0.15 R.oo_1.23.0 R.utils_2.9.2 Formula_1.2-3 rjson_0.2.20
[13] iterators_1.0.12 tools_3.6.1 parallel_3.6.1 compiler_3.6.1 base64enc_0.1-3
I'm studying contextual bandits so I found very useful this library to understand the algorithms involved. In particular, I was trying to review the implementation of OfflineDoublyRobustBandit, and I got confused about the definition of "inverted" parameter in these lines:
if (self$inverted) p <- 1 / p
if (self$threshold > 0) {
if (isTRUE(self$inverted)) p <- 1 / p
p <- 1 / max(p,self$threshold)
} else {
if (!isTRUE(self$inverted)) p <- 1 / p
}
I think the last line could be a bug (since it inverts p even if "inverted" is FALSE), and it seems like this could be done with something simpler like:
if (self$threshold > 0) {
p <- 1 / max(p,self$threshold)
} else {
if (isTRUE(self$inverted)) p <- 1 / p
}
Could someone confirm if that is a bug or not? And what's the logic about that parameter? Thanks in advance.
Hello Robin,
This is a feature request, not a bug report.
I'd like the output from history$get_data_table()
to include a column for the predicted values of the chosen arms at each step.
For example, for EpsilonGreedyPolicy
it would just be self$theta$mean[[chosen_arm]]
, which I realise is available by setting save_theta = TRUE
in Simulator$new
. If I also set save_context = TRUE
the predicted value of the chosen action can be obtained. (Although I have to take into account the fact that the theta
values are one time step ahead of the values for the current context-arm pair since they have been updated with the reward from the current context-arm pair. That is, the theta
values do not hold the predicted values for the current context-arm pair since they hold the values computed after the reward for the current context-arm pair is known.)
With other policies, such as ContextualEpsilonGreedyPolicy
, using the output from history$get_data_table()
to compute the expected reward for the current action before it is taken is not so straightforward. I see in policy_cmab_lin_epsilon_greedy.R
that you compute expected_rewards[arm]
, but you don't seem to save the values for output later on. It is exactly expected_rewards[arm]
that I would like history$get_data_table()
to include in its output. Having expected_rewards[arm]
for just the chosen arm would be enough for my current needs, but maybe having expected_rewards[arm]
for all arms would be useful in future.
I had a look at history.R
to see if I could work out how to save the values of expected_rewards
, but it looks rather complicated to me and my R
is nowhere near as good as yours :-).
Thanks,
Paul
Hi there, I am really excited to use your package to learn more about contextual bandit simulations.
I was trying out a particular documentation article (https://nth-iteration-labs.github.io/contextual/articles/introduction.html) and I think there might be some syntax issue with the parameter input for EpsilonFirstPolicy object.
# Initialize an EpsilonFirstPolicy with a 100 step exploration period.
ef_policy <- EpsilonFirstPolicy$new(first = 100)
I believe it should be time_steps = 100
instead of first = 100
based on the function help documentation as shown:
#Usage
policy <- EpsilonFirstPolicy(epsilon = 0.1, N = 1000, time_steps = NULL)
#Arguments
epsilon
numeric; value in the closed interval (0,1] that sets the number of time steps to explore through epsilon * N.
N
integer; positive integer which sets the number of time steps to explore through epsilon * N.
time_steps
integer; positive integer which sets the number of time steps to explore - can be used instead of epsilon and N.
Thanks for putting up such a complete package!
The package has been removed from CRAN for a package check problem. https://cran.r-project.org/package=contextual
I am using the contextual package to run some simulations. More specifically, I am using the LinUCBDisjointOptimizedPolicy. Is there a way for me to get the arm choice sequence from the simulation?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.