Comments (6)
Hi @mat-ej !
Many thanks for trying CatBoost!
Multi quantile loss is technically similar to multi classification -- single label, multiple predictions
At the moment, it is impossible to train models with user-defined multi quantile loss because there is no way to specify prediction dimension
For now, you may train a model for each quantile independently
However, predictions for quantiles may disagree slightly, and this method is slower than computing all quantiles in a single pass
We will try to add a parameter to specify prediction dimension for user-defined regression losses soon
from catboost.
great, looking forward to such an update
from catboost.
I was able to hack around this by repeating the y vector as many times as needed and then using MultiTargetCustomObjective, e.g.
class MultiRmseObjective(MultiTargetCustomObjective):
def calc_ders_multi(self, approx, target, weight):
assert len(target) == len(approx)
w = weight if weight is not None else 1.0
der1 = [(target[i] - approx[i]) * w for i in range(len(approx))]
der2 = [-w for i in range(len(approx))]
return (der1, der2)
Interestingly, if a MultiTargetCustomObjective is used the approx and target comes in batches of N=1 (?)
(Even though the method arguments are named approxes and targets)
When I use normal CustomObjective with a single target, the approxes and targets come in batches N>>>1.
class RmseObjective(object):
def calc_ders_range(self, approxes, targets, weights):
assert len(approxes) == len(targets)
if weights is not None:
assert len(weights) == len(approxes)
result = []
for index in range(len(targets)):
der1 = targets[index] - approxes[index]
der2 = -1
if weights is not None:
der1 *= weights[index]
der2 *= weights[index]
result.append((der1, der2))
return result
Q: Is this a design choice ?
Now back to the original question of writing my own MQ loss.
Can I hack around with replicating y vector and using multi target objective ?
Just to give some background:
My overall goal is to use cross entropy on a regression task, (a trendy thing recently), it can be direct cross entropy or the cross entropy loss is used to learn quantiles.
from catboost.
Q: Is this a design choice ?
yep
Now back to the original question of writing my own MQ loss.
Can I hack around with replicating y vector and using multi target objective ?
i think yes, if you combine it with appropriate MultiTargetCustomMetric
from catboost.
thanks for taking the time to reply to my questions.
Only thing that I see happening is that the cross entropy loss if calculated per data point is very tricky and it is not the same cross entropy that would be calculated per "batch" of data points.
Q1: Doesnt this difference of SingleTargetObjective vs MuiltiTargetObjective affect the behaviour of the classifier significantly ?
Q2: Am I misunderstanding things and the MultiTargetRegressor / Classifier does indeed sum up / mean the per data point losses somewhere after the calculation and hence is indeed optimizing in the correct direction ?
from catboost.
A1: no, this is related to different parallelization for single and multiple predictions losses
A2: multi rmse normalizes by total sample weight
A2: https://catboost.ai/en/docs/concepts/loss-functions-multiregression#MultiRMSE
A2: multi logloss/cross-entropy normalizes by total sample weight times target dimension
A2: https://catboost.ai/en/docs/concepts/loss-functions-multilabel-classification#MultiLogloss
from catboost.
Related Issues (20)
- /src/catboost/catboost/libs/metrics/metric.cpp:6745: All train targets are equal HOT 1
- Caret object: Inconsistent grid creation with documentation HOT 3
- Error: catboost metric [MultiLogloss] and loss [HammingLoss] are incompatible HOT 1
- [spark] ai.catboost.CatBoostError: CatBoost Master process failed: exited with code 134 HOT 2
- The difference ranking result in the python and libcatboost by max_relevance times HOT 1
- Pool Creation Error - TypeError: must be real number, not NoneType HOT 1
- Typo in docstring of Python package catboost.CatBoostRegressor.grid_search() HOT 1
- Get all candidate splits for catboosttregressor HOT 2
- Error java.lang.OutOfMemoryError: GC overhead limit exceeded HOT 2
- how to do ordered target statistic in prediction HOT 2
- "RuntimeError: Attempt to pop from an empty stack" is raised when running models fit in parallel with threads. HOT 5
- Python package--build from source failed HOT 11
- Different Between PairLogitPairwise and PairLogit and Impact on Categorical Values HOT 1
- Val in CatBoost's plot_tree HOT 1
- Major difference between predictions from trained model HOT 4
- "Plain" train mode still build the oblivious tree HOT 3
- Question for building ordered boosting tree
- Get difference tree result when converting cat_features to numerical values HOT 2
- Why does leaf value in plot tree is related to learning rate?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from catboost.