mlr-org / paradox Goto Github PK
View Code? Open in Web Editor NEWParamHelpers Next Generation
Home Page: https://paradox.mlr-org.com
License: GNU Lesser General Public License v3.0
ParamHelpers Next Generation
Home Page: https://paradox.mlr-org.com
License: GNU Lesser General Public License v3.0
Usecase1: you want to tune RF::mtry, but from [0, 1] as percentage, not as an integer from 1..k
Usecase 2: You have created somesmart heuristics to set params, like a,b,c which execute some code.
Proposal: You can add a HP + some piece of code to a pipeop, which maps the setting of that new param to values of already existing ones. you then can use (or tune) the new one
thats undocumented, and also does seem to make sense?
I think it might be good if the user could set a value to a ParamInt for example and so he/she can set the whole hyper-paramset mannually. This feature will allow phng to be used in any R package as a option manager.
i can see that this is then "induced" as a function call for the whole ParamSet.
but does this make sense? maybe it does, but lets discuss this briefly
and needs to be documented then
so can the user set lower = 1, upper = 2, allow.inf = TRUE?
you are calling assertClass, thats wrong
seems not to be consitently implemented? if it works and is "simple" enough it can stay!
"forbidden" might be a better name
apparently i can ask about bounds of all params in the set, so these thinks are there:
but i cannot ask about nlevels or values.
matthias has this in ConfigSpace, this might be more general then adding a full list of params in construction?
I would like to hide the fact that the tree param set contains actual values from the user.
I have a problem with making the copy of the ParamSet class because it seems that does not make a real deep
copy. See the example below:
library(paradox)
#> Loading required package: data.table
pl1 <- paradox::ParamSet$new(
params = list(
ParamInt$new("a"),
ParamInt$new("b")
)
)
pl1$params
#> $a
#> a [integer]: {-Inf, ..., Inf}
#>
#> $b
#> b [integer]: {-Inf, ..., Inf}
pl2 = pl1$clone(deep = TRUE)
pl2list <- pl2$params
invisible(
lapply(
pl2list,
function(x) x$id = paste("x", x$id, sep = ":"))
)
# new param list with renamed parameters:
# so far so good.
ParamSet$new(params = pl2list)
#> ParamSet: parset
#> Parameters:
#> x:a [integer]: {-Inf, ..., Inf}
#> x:b [integer]: {-Inf, ..., Inf}
# However the ids in pl1 were also changed:(
pl1
#> ParamSet: parset
#> Parameters:
#> x:a [integer]: {-Inf, ..., Inf}
#> x:b [integer]: {-Inf, ..., Inf}
The explanation appears to quite simple - R6 makes a copy of the params
list, but not the elements in that list (I think that this behavior of R6 makes sense because it should not examine every data structure in each field to check if there's something to copy). However, in the ParamSet case, I think that all the elements of the params
list should be copied when the deep
parameter is set to true.
What do you think?
what is this supposed to do? also the name is very strange.
please start documenting stuff
why is this a slot member, but is.finite is computed from the range?
If we concatenate a filterwrapper together with a tune wrapper and want to co-tune the hyper-parameters for the learner (random forest) and the fw.perc. Currently it seems to be difficult to do, since the fw.perc comes first which will select the number of features that will be fed into the tuneWraper, but the mtry parameter in random forest decides on the number of features which is variable for each iteration of the tuning. Our new version should make this kind of tuning easier to be done.
ps.ranger = function(p) {
makeParamSet(
# FIXME: mtry must depend on the other parameter "fw.perc"
# makeIntegerParam("mtry", lower = as.integer(p/10), upper = as.integer(p/1.5)),
#makeIntegerParam("min.node.size", lower = 1L, upper = 50L, default = 5L),
#makeIntegerParam("num.trees", lower = 100, upper = 5000, default = 500L),
makeNumericParam("sample.fraction", lower = 0.1, upper = 1, default = 0.5),
# makeDiscreteParam("fw.perc", values = PERF_GRID))
makeNumericParam("fw.perc", lower = 0.001, upper = 0.8)) # feature selection percentage
}
because that was really bad in PH, as NA could also be a regular value
Compare to check function in ParamSetFlat that gets passed tu super$initialize
needs to be properly implemented and unit tested
The expressions in the requirements are supposed to work vectorized.
However for the checkmate assert/test/check functions we only evaluate them on single values. Somehow we would like to have a TRUE/FALSE vector for a bunch of values.
we could use the PCS definition and parse string
sort-algo{quick,insertion,merge,heap,stooge,bogo} [bogo]
quick-revert-to-insertion{1,2,4,8,16,32,64} [16]
quick-revert-to-insertion|sort-algo in {quick}
OptPath
we should have as.data.table, as.data.frame simply calls this
add: remove message, transform_x
OptPath should have a tranform_x method
thats not the same operation?
Currently the tree branch only have the single parent dependencies, we should broaden this to allow for multi-parent dependencies.
Function values are stored in named lists.
To transform them to a single string you could use paste(names(x), x, sep = "=" ,collapse=",")
This is problematic for
So we want to shorten and format some of them.
Formatting and shortening should be configurable.
Each ParamNode should be able to transform a named list to a character.
I propose Param(Set/Real/...)$value_to_string(x)
.
a) thats seems specific to LHS designs? or is there any other use?
b) at least its name is bad
c) i dont even know ehether we should support LHS. i have never seen evidence that they are that worthwhile, and they are hard to even define properly for complex spaces
otherwise nobody has a chance of checking whether the intended semantics work properly
cond = Condition$new(child=param, parent=param, cond = cond_equal(rhs))
ps$add_condition(cond)
rhs is some value / list of values
can be implemented with
cond_eq = function(rhs) { <return operator ==, with added attribute "rhs">}
we can now easily implement a "check" for feasibility:
go thru all params. if they have no parent, check their value. if they have a parent, check that parent first for its condition.
we can easily construct a tree from the condition.
have a list S and T. S is all params, T is empty. take an element from S.
if it either has no parent or the parent is in T, create a node, put the param in T. link the node to its parent.
sampling can easily be implemented with rejection sampling. or we can use the tree
compare to @section Methods:
in ParamSetFlat.R
x = ParamReal$new("x", lower = 1, special.vals = list("a"))
that does not even create an object.
Is there something like this?
eg properly import BBmisc::vcapply et al
and do not write BBmisc::vcapply
should not be in paramset, and params. but seperate.
we also need to be able flexibly implement different samplers.
this might work
A couple of mini classes like this:
PSamplerIntUnif
PSamplerNumNormal
PSamplerNumUnif
they all inherit from PSampler, implement p$sample(n)
Then we have this:
pss = ParamSetSamplerIndep$new(list of samplers)
pss$sample(10)
inherits from ParamSetSampler
ParamSet interface
these should be removed
ParamSetBase, ParamBase, ParamNode, ParamHandle
Do we need the following to be public?
sampleList
and directly sample. Or is ist exactly what getRecursiveList()
does?compare to transform public method in ParamSetFlat
many properties of ParamNode do not make sense for ParamSets? like storage.type, tags, etc?
(transfered from @berndbischl FIXME)
doesnt really make sense and couples it too tightly
this should really go into a dedicated service class or service functions
Some functions are not exported although they seem to be needed
e.g.
ParamTreeFac
@smilesun Please check that the vignettes run with an installed version of phng.
this makes no sense. "sample" is a stochastic function. for unit tests it is much better to have predefined objects.
it might be ok to use this a couple of times, but it is uised very often, while it it should probably ONLY be called in test_sampler.R
can we please say how we are going to handle those?
because i currently see nothing in the package about this. and this is actually the hard part
otherwise it should throw an error
the code should not be in the parmset, but seperate
it might simply look like this
generate_design_lhs(par_set)
generate_design_random(par_set)
generate_design_grid(par_set)
but we need to have an "augment" function?
this is not urgent but kills at least 2 birds with one stone
a) we can parse 3rd party PCS files
b) we have a way to write down param sets with much less typing
we probably dont need more syntactic sugar then for other abbreviations
i left some fixmes in the code while traveling.
i should convert them to issues very soon
probably repParam is better.
one could also think about rep.Param, and implement a new S3 method, but as the id is changed this kindof a violation of the fact that R in other cases simply copies the object?
or call it vectorizeParam
creates a data table
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.