Comments (7)
Naive plan: to imitate MLR and/or OpenML.
@fkiraly has raised some issues:
the mlr design i.m.o. has some flaws.
A key pain point for me is the treatment of the data - where does it go? Is it part of the task (e.g., pointed to), or not? Is the task applied to the data, is the model applied to the task? And so forth.
We are meeting soon to discuss in conjunction with another project. Watch this space for further discussion. Suggestions welcome.
from mlj.jl.
What is a Task? Here's the present design for the supervised tasks:
struct SupervisedTask{U} <: MLJTask # U is true for single target
data # a table
targets # list of names
ignore::Vector{Symbol} # list of names
is_probabilistic
target_scitype
input_scitypes
input_is_multivariate::Bool
end
In discussions at Turing there was consensus that tasks exclude description of evaluation (a point-of-departure from OpenML), although this is not cast in stone.
So, whether a task is regression or classifier is part of the task description, namely in target_scitype
(which is actually a little more informative).
At present the the Task constructor assumes the data meets the spec outlined at doc/getting_started.md and infers the last three fields from the data. However, my idea is to eventually make the constructor more flexible, coercing data if necessary based on user-interaction. And the user could let the constructor make educated guesses about intended scientific type, and so forth.
The user might give the task contructor a kwarg target=MultiClass
(ie classifier), and, supposing the target type is Int
, then the target column is coerced into a CategoricalValue
eltype. If no kwarg is given, then the constructor infers the scientific type from the data (in this case Count
) and reports that it has done so.
The present design does suppose that, once the task is constructed, the data it wraps conforms to our standard. This aspect I would be reluctant to change at this point.
from mlj.jl.
Oops. Closed by accident.
from mlj.jl.
The present design does suppose that, once the task is constructed, the data it wraps conforms to our standard. This aspect I would be reluctant to change at this point.
It seems to me this is not too restrictive given that there can always be a "pre" step where the data is verified and/or coerced right?
from mlj.jl.
Pasting @kirtsar 's comment from the merged issue #96:
"What should the Task do?
my vision of working with Task object is something like:
assume that we have some data for supervised learning: X, y. X and y can be any reasonable type (X is Matrix, DataFrame, ...; y is some subtype of AbstractVector).
task = Task(data = X, target = y, goal = SomeGoal(optional args) )
where SomeGoal is something from (for example):
Binary(is proba = true/false)
Multiclass(is proba = true/false)
Regression(is proba = true/false)
Based on the type of the task, the output for X_and_y should be appropriate (Continuous, discrete, ...)
"
from mlj.jl.
@kirtsar I think the current design satisfies your requirements?
Except that X and y are not split explicitly, but only a column reference indicates what is y.
Which, in my opinion makes a lot of sense since there is other types of tasks where the specification is not easily done by splitting the data in two.
from mlj.jl.
A basic task interface is now in place. Let's open new issues for possible enhancements.
from mlj.jl.
Related Issues (20)
- Update docs for new class imbalance support
- Add new sk-learn models to the docs
- Export the name `MLJFlow` HOT 1
- `evaluate` errors HOT 3
- Add AutoEncoderMLJ model (part of BetaML) HOT 10
- need a tutorial for using logger with dagshub and mlflow HOT 4
- Document how to add plot recipes in a new model implementation HOT 4
- Add new model descriptors to fix doc-generation fail HOT 1
- Two models fail integration tests but defy isolation
- Update list of BetaML models HOT 1
- Reinstate CatBoost integraton test
- Upate ROADMAP.md HOT 1
- Improve documentation by additional hierarchy HOT 5
- Include support for MixedModels.jl HOT 2
- Deserialisation fails for wrappers like `TunedModel` when atomic model overloads `save/restore` HOT 2
- feature_importances for Pipeline including XGBoost don't work HOT 2
- Current performance evaluation objects, recently added to TunedModel histories, are too big HOT 2
- Update cheat sheet instance of depracated `@from_network` code
- Requesting better exposure to MLJFlux in the model browser HOT 2
- Reexport `CompactPerformanceEvaluation` and `InSample`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlj.jl.