Giter Club home page Giter Club logo

mlrmbo's Introduction

mlrMBO

Package website: mlrmbo.mlr-org.com

Model-based optimization with mlr.

tic CRAN_Status_Badge Coverage Status Monthly RStudio CRAN Downloads

Installation

We recommend to install the official release version:

install.packages("mlrMBO")

For experimental use you can install the latest development version:

remotes::install_github("mlr-org/mlrMBO")

Introduction

mlrMBO is a highly configurable R toolbox for model-based / Bayesian optimization of black-box functions.

Features:

  • EGO-type algorithms (Kriging with expected improvement) on purely numerical search spaces, see Jones et al. (1998)
  • Mixed search spaces with numerical, integer, categorical and subordinate parameters
  • Arbitrary parameter transformation allowing to optimize on, e.g., logscale
  • Optimization of noisy objective functions
  • Multi-Criteria optimization with approximated Pareto fronts
  • Parallelization through multi-point batch proposals
  • Parallelization on many parallel back-ends and clusters through batchtools and parallelMap

For the surrogate, mlrMBO allows any regression learner from mlr, including:

  • Kriging aka. Gaussian processes (i.e. DiceKriging)
  • random Forests (i.e. randomForest)
  • and many more…

Various infill criteria (aka. acquisition functions) are available:

  • Expected improvement (EI)
  • Upper/Lower confidence bound (LCB, aka. statistical lower or upper bound)
  • Augmented expected improvement (AEI)
  • Expected quantile improvement (EQI)
  • API for custom infill criteria

Objective functions are created with package smoof, which also offers many test functions for example runs or benchmarks.

Parameter spaces and initial designs are created with package ParamHelpers.

How to Cite

Please cite our arxiv paper (Preprint). You can get citation info via citation("mlrMBO") or copy the following BibTex entry:

@article{mlrMBO,
  title = {{{mlrMBO}}: {{A Modular Framework}} for {{Model}}-{{Based Optimization}} of {{Expensive Black}}-{{Box Functions}}},
  url = {https://arxiv.org/abs/1703.03373},
  shorttitle = {{{mlrMBO}}},
  archivePrefix = {arXiv},
  eprinttype = {arxiv},
  eprint = {1703.03373},
  primaryClass = {stat},
  author = {Bischl, Bernd and Richter, Jakob and Bossek, Jakob and Horn, Daniel and Thomas, Janek and Lang, Michel},
  date = {2017-03-09},
}

Some parts of the package were created as part of other publications. If you use these parts, please cite the relevant work appropriately:

mlrmbo's People

Contributors

berndbischl avatar bklppr avatar danielhorn avatar github-actions[bot] avatar ja-thomas avatar jakob-r avatar jakobbossek avatar karinschork avatar katrinleinweber avatar mb706 avatar mllg avatar pat-s avatar surmann avatar tobiaswagner avatar verenamayer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlrmbo's Issues

How to modify evalTargetFunction() ?

Why is evalTargetFunction() such complex? Can we get rid of "...", because we always get an error here, if we want to evaluate mbo() function step by step.

Regarding the "fun" argument of evalTargetFunction() function: "Fitness function to minimize. The first argument has to be a list of values." Here is unclear whether each parameter has to be a list entry or the whole set, e.g., list(c(22,13)) or list(22,13)?

I would propose to get rid of the list condition and switch to vector. In this case
user can define the objective functions more easy. For example, with the list condition it is impossible to define BBOB functions as follows:
objfun1=generate_ackley_function(dimensions=5) as the first argument is a vector and not a list.

To summarize, at the moment leads the function evalTargetFunction() to the most error messages by trying to apply mbo() function.

Parameter "infill.opt" of makeMBOControl() function.

@param infill.opt [\code{character(1)}]\cr

' How should SINGLE points be proposed by using the surrogate model. Possible are:

' \dQuote{random}: Use a large random latin hypercube design of points and

' evaluate the surrogate model at each.

' \dQuote{cmaes}: Use CMAES to optimize mean prediction value.

' \dQuote{ei}: Use expected improvement.

' Default is \dQuote{random}.

The name "random" might be a bit confusing as one can think, here the next point is chosen randomly. Is it not better to name this option "seq.design"?

Is the option "ei" still active (meaningful), as we have now "infill.crit" parameter of makeMBOControl() function?

Tutorial: (Am I) Missing some essential step. (?)

It's called mlrMBO for a reason, right? But neither in the mlr nor in the mlrMBO tutorial it is shown how to optimize an mlr learner using mlrMBO. But in mlr we have something unexported like makeTuneControlMBO.
How to proceed?

Bug in mbo

The following produces an error I dont get:

library(mlrMBO)

set.seed(1)

objfun = function(x, ...) rnorm(1)

ps = makeParamSet(
  makeNumericParam("sstep", lower=0.8, upper=1),
  makeNumericParam("distanz", lower=0.5, upper=0.8)
)

lrn = makeLearner("regr.km", predict.type="se", nugget.estim=TRUE)

ctrl = makeMBOControl(
    init.design.points = 8, 
    iters = 1, 
    infill.crit = "lcb",
    infill.opt = "cmaes"
)

res = mbo(objfun, ps, learner=lrn, control=ctrl)


Computing y column for design. Was not provided
[mbo] 0: sstep=0.94; distanz=0.52 : y=-0.059
[mbo] 0: sstep=0.92; distanz=0.62 : y= 1.100
[mbo] 0: sstep=0.84; distanz=0.66 : y= 0.763
[mbo] 0: sstep=0.89; distanz=0.73 : y=-0.165
[mbo] 0: sstep=0.96; distanz=0.57 : y=-0.253
[mbo] 0: sstep=0.80; distanz=0.80 : y= 0.697
[mbo] 0: sstep=0.98; distanz=0.72 : y= 0.557
[mbo] 0: sstep=0.86; distanz=0.59 : y=-0.689
Fehler in t.default(results[[j]]$par) : Argument ist keine Matrix
> traceback()
10: t.default(results[[j]]$par)
9: t(results[[j]]$par)
8: as.data.frame(t(results[[j]]$par))
7: infill.opt.fun(infill.crit.fun, model, control, par.set, opt.path, 
       design)
6: proposePoints(model, par.set, control, opt.path)
5: mbo(objfun, ps, learner = lrn, control = ctrl) at MBOTest2.R#21
4: eval(expr, envir, enclos)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("MBOTest2.R")

Licence

Since R version 3.0.2 the BSD license is deprecated. This is noted by make check. We should change to BSD_3_clause or BSD_2_clause plus license file as suggested by the R team. BSD 3 seems to be appropriate. Any suggestions?

Objective function should return more information than just y measures

At the moment I have a problem that my objective function

  1. Needs information about the actual best y value.
  2. Should return more information than just y value

Here is a very simple and foolish example:

objfun=function(listOfValues,akt.best.y)
{
x1=listOfValues$x1
x2=listOfValues$x2
if((x1+x2)<akt.best.y) {S="ok"} esle {S="not ok"}
y=x1_3+5_x2-2
return(list(y=y,S=S))
}

Regarding the first point:

I will have to adapt the mbo function for my example. I tried to do following in the mbo loop , but it does not work (of course) :
akt.best= max(getOptPathY(opt.path, y.name, drop = TRUE))
evals = evalTargetFun(fun, par.set, xs, opt.path, control, show.info, oldopts, akt.best, ...)

Is it possible to implement my problem without changing the evalTargetFun?

Mention parallelMap in the tutorial

mlrMBO currently supports the parallelization of some internal funs with Bernds parallelMap package. This should be briefly mentioned in the tutorial.

Typing errors in MBOcontrol object are not clearly identifiable

Hi,
By some typing errors we get some misleading error messages.

For example for typing errors in “infill.opt” following message is shown:

Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'lcb' of mode 'function' was not found

Because of this code in ProposePoints:

infill.opt.fun = switch(control$infill.opt,
cmaes = infillOptCMAES,
focussearch = infillOptFocus,
ea = infillOptEA,
# default: try to match the fun which is given as string
match.fun(control$multipoint.method))


By an error in infill.crit we get following message which is mutch usefull as the previous one but still not clear:

Error in checkArg(infill.opt, "character", len = 1L, na.ok = FALSE) :
object 'infill.opt' not found

Some learners produce errors with discrete Params

A lot of learners will fail to predict with discrete Params because of dropped factor levels, here are some examples:

library(mlrMBO)
ps = makeParamSet(makeNumericVectorParam("x", len = 5, lower = 0, upper = 1),
makeDiscreteParam("z", values = 1:10))
f = function(x) sum(x$x) + as.numeric(x$z)

mbo(f, ps,
learner = makeBaggingWrapper(makeLearner("regr.kknn"), 10L, predict.type = "se"),
control = makeMBOControl( init.design.points = 20,
iters = 10))

mbo(f, ps,
learner = makeBaggingWrapper(makeLearner("regr.lm"), 10L, predict.type = "se"),
control = makeMBOControl( init.design.points = 20,
iters = 10))

mbo(f, ps,
learner = makeBaggingWrapper(makeLearner("regr.blackboost"), 10L, predict.type = "se"),
control = makeMBOControl( init.design.points = 20,
iters = 10))

mbo(f, ps,
learner = makeBaggingWrapper(makeLearner("regr.nnet"), 10L, predict.type = "se"),
control = makeMBOControl( init.design.points = 20,
iters = 10))

Include the time estimate

We want to include the time aspect into both uni and multi crit, e.g. we want to have things like expected improvement per minute.
One special application: In multicrit: Include this while the time is one target function and while we want to propose multiple points -> here the time of one iteration is the maximum time of one of the proposed function evals.

We need to think about what we can do here.

Test fails - initial design

Failure: mbo works correctly with and without initial design
mbo(f, ps, des, learner, ctrl) code did not generate an error

in particular it is the test PROVIDE INITIAL DESIGN WITH TRAFO

Errorhandling Thread

< I WILL UPDATE THIS WITH NEW GOOD IDEAS FROM BELOW BUT ANSWER BELOW! >

We need to discuss how error are handled in the package. This is important as we otherwise lose most of the info of long optimization runs.

There are errors of multiple kinds

FE1) Function Eval: Exception occurs. Could catch this.

FE2) Function Eval: Crash that kills the entire R process. Problematic if the eval was done in the same process where mbo runs.

FE3) Function does not return. Because it does not terminate, or terminate only after 100 years.

MBO1) Exception in our own code happens. Should not, but could.

MBO2) Total crash of our own code.

Options to handle this:

O1) Always store the all relevant information from mbo (optpath, learner, control object and so) on the master in an RDATA file every k iterations. This helps will ALL errors above in the way that we do not lose PRIOR information. It does not help with the fact that the optimization stops.
Implement this in any case as a user option.
We can even try to code a warm-start / continue function.

O2) Catch FE1 error via "try". Warn on the console about it, log message to opt.path. Impute value of eval. Handles FE1 completely, but nothing else.

O3) Run evals in separate R process (with walltime). Then basically do the same in O2. Handles FE1 and FE2 both, FE2 only if we can specify a timeout.
Incurs overhead and is much harder to implement.
We basically have this already for free in parallelMap / BJ mode when we do multipoint evals. Maybe we could be tricky and use this as well for single-point evals.
Should be discussed.

O4) When an error occurs, either FE1 or MBO1, and the user does want to or cannot impute values, we could return the opt.path some how, either in global mem or on disk. Maybe this should simply be combined with O1. Simply do a final "store-on-disk" then.

Against MBO1 and 2 we cannot do much except O1 / O4.

Discuss!

Setting initial design by hand

Setting a design by hand requires the user to set the trafo attribute manually (initial design must not be transformed). There must be a better solution. Maybe

  • assume, that initial design is not transformed (i.e., set attr(design, trafo) = FALSE inside mob function),
  • build a wrapper which a user has to call before passing the design to the function, i.e., design = makeDesign(design, transformed=FALSE).

finish renaming issues

random.points ---> focussearch.points

infill.opt="random" ---> "focussearch"

todo-files/parego.R
74: infill.opt="random", infill.opt.random.points=1000)

shiny/server.R
79: infill.crit="ei", infill.opt="random", infill.opt.random.points=2000)

test_src.R
48: # infill.opt.random.maxit=5, infill.opt.random.points=1000L)

inst/examples/ex_1d_1.R
24: infill.crit="ei", infill.opt="random", infill.opt.random.points=500)

inst/examples/ex_autoplot.R
24:# infill.crit="ei", infill.opt="random", infill.opt.random.points=500)
51:# ctrl = makeMBOControl(init.design.points=20, iters=5, infill.opt.random.points=100, noisy=TRUE)
65: infill.crit="ei", infill.opt="random", infill.opt.random.points=2000)

inst/examples/ex_1d_3.R
34: infill.opt.random.points=100, noisy=TRUE)

inst/examples/ex_2d_1.R
20: infill.crit="ei", infill.opt="random", infill.opt.random.points=2000)

inst/tests/test_misc.R
18:# ctrl = makeMBOControl(minimize=FALSE, infill.crit="mean", iters=30, infill.opt.random.points=100)
70:# opt = mbo(fit, ps, learner = surrogate, control = makeMBOControl(infill.opt.random.points=10))

inst/tests/test_exampleRun.R
10:# infill.opt="random", infill.opt.random.points=10)

inst/tests/test_mbo_impute.R
21: ctrl = makeMBOControl(iters=20, infill.opt.random.points=500)
23: ctrl = makeMBOControl(iters=20, infill.opt.random.points=500, impute=function(x, y, opt.path) 0)
25: ctrl = makeMBOControl(iters=50, infill.opt.random.points=500)
27: ctrl = makeMBOControl(iters=50, infill.opt.random.points=500, impute=function(x, y, opt.path) 0, impute.errors=TRUE)

Sometimes we reduce variables to only 1 factor level - but not all learners can work with such variables

Example with kknn:

library(mlrMBO)
fun = function(x)
  sin(x$num2) + ifelse(x$disc1 == "a", sin(x$num1), 0)
ps = makeParamSet(
  makeDiscreteParam("disc1", values = c("a", "b")),
  makeNumericParam("num1", lower = 0, upper = 1, 
                   requires = quote(disc1 == "a")),
  makeNumericParam("num2", lower = 0, upper = 1)
)

res = mbo(fun, ps,
           learner = makeBaggingWrapper(makeLearner("regr.kknn"), 10L, predict.type = "se"), 
           control = makeMBOControl( init.design.points = 20,
                                     iters = 10,
                                     infill.crit = "ei"))

I can think of 3 possible solutions:

  1. Guarentee inside mlrMBO (in the focus search), that every variable has at least 2 factor level
  2. Inside mlrMBO befor learning the model, remove variables with only 1 level
  3. Force the user to use a preproc wrapper for their learner, which removes variables with only 1 factor level

Bug in Initial Design

The following produces an error.


library(mlrMBO)

objfun = function(x) 1


par.set = makeParamSet(
  makeNumericParam("x", lower=0,upper=1),
  makeIntegerParam("k", lower=1, upper=2)
)


control = makeMBOControl(
  iters = 1,
  init.design.args=list(k=3, dup=4)  
)

learner_rf = makeLearner("regr.randomForest")

mbo(objfun, par.set, control=control, learner = learner_rf)

Error in (function (n, k, dup = 1) :
formal argument "k" matched by multiple actual arguments

8: (function (n, k, dup = 1)
{
if (length(n) != 1 | length(k) != 1 | length(dup) != 1)
stop("n, k, and dup may not be vectors")
if (any(is.na(c(n, k, dup))))
stop("n, k, and dup may not be NA or NaN")
if (any(is.infinite(c(n, k, dup))))
stop("n, k, and dup may not be infinite")
if (n != floor(n) | n < 1)
stop("n must be a positive integer\n")
if (k != floor(k) | k < 1)
stop("k must be a positive integer\n")
if (dup != floor(dup) | dup < 1)
stop("The dup factor must be a positive integer\n")
result <- numeric(k * n)
result2 <- .C("maximinLHS_C", as.integer(n), as.integer(k),
as.integer(dup), as.integer(result))[[4]]
eps <- runif(n * k)
result2 <- (result2 - 1 + eps)/n
return(matrix(result2, nrow = n, ncol = k, byrow = TRUE))
})(n = 20L, k = 2L, k = 3, dup = 4)
7: do.call(fun, c(list(n = n, k = k), fun.args))
6: generateDesign(control$init.design.points, par.set, control$init.design.fun,
control$init.design.args, trafo = FALSE)
5: mbo(objfun, par.set, control = control, learner = learner_rf) at bla.R#21
4: eval(expr, envir, enclos)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("C:/Users/Karin/Desktop/bla.R")

Integrate parallelMap

  • At least for initial design.
  • Think about other embarrassingly parallel bottlenecks

Errors by passing the design in mbo

Hi, the following setting causes errors:

library(mlrMBO)
library(soobench)

objfun=generate_branin_function()

ps = makeNumericParamSet(len = number_of_parameters(objfun1), lower = lower_bounds(objfun1), upper = upper_bounds(objfun1))

design.x=generateDesign(30, ps)
y=apply(design.x,1,objfun)
design=cbind(design.x,y)
attr(design, "trafo")=FALSE
learner_km=makeLearner("regr.km", predict.type="se", covtype="matern3_2",nugget.estim=TRUE)

ctrl = makeMBOControl(
iters = 50,
infill.crit="ei",
init.design.points=10,
infill.opt="focussearch")

m=mbo(makeMBOFunction(objfun), design=design, par.set=ps, learner=learner_km, control=ctrl, show.info=TRUE)

I have found that it lies on the function generateMBODesign, line 57:
if (all(y.name %in% colnames(design.x)))
As design.x can not contain y names (see line 40: design.x = dropNamed(design, y.name))

I changed lines 57-58 as follows:
if (all(y.name %in% colnames(design))) {
design.y = data.frame(design[, y.name])
names(design.y)=y.name

And the lines 70-71 as follows:
ys = convertRowsToList(as.vector(design.y))
Map(function(x,y) addOptPathEl(opt.path, x=x, y=unlist(y), dob=0), xs, ys)

With these changes we do not get the error message more.
If you find the changes ok I will commit them.

One Starter for All

We don't want to have 2 functions for "normal" mbo and parEGO, but one function and a method-parameter. This is not crucial yet, but after implementing some more multicrit methods this should be done.

This would include: Renaming the old mbo-function into soMBO (single objective), writing a new function mbo with exactly the same interface, that just makes some param checks and than calls the regarding real function and introducing a new method param.

Restructure the control object

I think we talked about this a while ago: At the moment, the control object and its constructor are ugly, huge things and theye are going to grow even more. I'm working on the new feature for parEGO we discussed last friday, this will add another parameter. And it won't stop growing. I doubt anyone except for us can overview this mass of parameters. Most new features we implement add one or more new parameters, and we dump everything into this one function / object.

What we shoud do (not now, but in the near future) is restructure this a bit. We don't want to change our intern usage of the control object - we just want to have a better user interface.
We could split the function into useful parts, likte one part for infill.crit options, one part for multipoint proposal, one part for multicrit, etc., since some parameters will never be set at the same time. E.g. the multipoint and the parEGO params can't be set at the same time.
And in the end there would be the makeControlObject function to set some main params and to fuse them with the specialized control objects.

Discrete parameters

We have to adapt the code for the case of the discrete parameters. E.g. after generating of the lhs initial design we have to transform the output to the right discrete values.

imputeFeatures

must be removed, mlr code must be used.

think about pascal remarks with 2*max

Better log output

Atm, the log output is rounded to 2 digets, like

[mbo] 0: cost=14092.26; gamma=0.00; epsilon=0.01 : error=0.762, execTime=2.464
[mbo] 0: cost=0.01; gamma=0.00; epsilon=0.00 : error=0.028, execTime=24374.496
[mbo] 0: cost=0.00; gamma=0.18; epsilon=0.00 : error=0.028, execTime=22879.370

as you see, this is bad, because many parameters are rounded to 0.00. We want output like 1.44e-4.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.