I've made a pass over the SLEP. It is overall very clear and the different cases really help. Thanks @jnothman and @adrinjalali for your efforts.
Here's a list of questions and comments that I have after reading it.
we can consider the use of keys that are not limited to strings or valid identifiers (and hence are not limited to using _ as a delimiter).
I don't understand how we are currently limited to using _
as a delimiter. Should this be __
? But even then I still don't follow.
TODO: proceed from here. Note that this change implies the need to add
a parameter to unwrap_X, since we will now append an additional column to X.
I don't understand why we don't need to add an additional column to X in this case.
while a GridSearchCV wrapping a Pipeline currently takes parameters with keys like {step_name}{prop_name}, this explicit routing, and conflict with GridSearchCV routing destinations, implies keys like estimator{step_name}__{prop_name}.
I'm not sure I completely understand this. A small illustration might help? Also, does "GridSearchCV routing destinations" refers to param_grid
instead?
All consumers would be required to check that
This sentence seems unfinished?
About solution 4:
Here the meta-estimator provides only what its each of its children requests. The meta-estimator would also need to request, on behalf of its children, any prop that descendant consumers require.
I could be wrong, but this doesn't seem to define the "essence" of solution 4, especially comparatively to solution 3. It seems to me that in solution 3, as well, the meta estimators only provides what the childrens need/request:
Each meta-estimator is given a routing specification which it must follow in passing only the required parameters to each of its children
For estimators to be cloned, this request information needs to be cloned with it
I'm not sure I understand this sentence. Seems like the grammar may be wrong?
get_props_request will return a dict
It is said prior that get_props_request would return a list (and possibly a dict)
One thing that isn't clear to me is the behaviour of solution 4., outside of the use of meta-estimators.
For example, what does this do:
lr = LogisticRegression().set_props_request(['sample_weights'])
lr.fit(X, y, sample_weights=None) # ?
lr = LogisticRegression().set_props_request([])
lr.fit(X, y, sample_weights=sample_weights) # ?
-
All examples seem to illustrate how cross_validate
is used, but none of them illustrate what a call to fit()
may look like. I think this may be useful as well (and might resolve .8 for me)
**kw syntax will be used to pass props by key.
In the examples of solution 4., **kw
are never used. Should they? Or does this refer to a call to fit
?
- Other stuff
- There are still a bunch of TODOs: we should remove them before calling for a vote
- Related: "(likley out of scope) passing sample properties..." The scope of the SLEP should be non-ambiguous so maybe we want to remove this bullet point.
- Not important but the use
# %%
doesn't render in the html docs.
- It would help to have a brief description of the proposed solution at the top, or an "abstract" section, as per our SLEP template.