Giter Club home page Giter Club logo

multi-task-learning-example's People

Contributors

ragmeh11 avatar yaringal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multi-task-learning-example's Issues

Why return torch.mean(loss)?

@yaringal Hi, I have a question about your multi-task loss function.
Below you return a loss as torch.mean(loss), but if i undersatnd this function correctly, loss is just a single tensor value and not a list, so torch.mean(loss) will be same as loss. What was your motivation behind using torch.mean(loss)?
Thank you!

def criterion(y_pred, y_true, log_vars):
  loss = 0
  for i in range(len(y_pred)):
    precision = torch.exp(-log_vars[i])
    diff = (y_pred[i]-y_true[i])**2.
    loss += torch.sum(precision * diff + log_vars[i], -1)
  return torch.mean(loss)

Any way to incorporate these methods in other tasks easily?

Hi, thanks for your excellent job! I wonder is there any way to easily incorporate this method into other multi-tasks learning pipeline? I'm still trying to understanding the formulas and have no idea where I can obtain a noise scalar for each task. Looking forward to your reply :)

This is a lucky demo When I change the data generation process, the prediction of the variance is wrong

I think this is a lucky demo. When I chenge the data generation code, the optimization will be guided wrongly and the variance prediction is wrong.

So I think this uncertainty method only works at a situation that: the value of the diff is close to precsion and log variance. If their value are not at the same scale, the method will broke.

`

def gen_data(N):

X = np.random.randn(N, Q)

w1 = 2.*1e2

b1 = 8.*1e2

sigma1 = 10  # ground truth

Y1 = X.dot(w1) + b1 + sigma1 * np.random.randn(N, D1)

w2 = 3*1e2

b2 = 3*1e2

sigma2 = 1*1e2 # ground truth

Y2 = X.dot(w2) + b2 + sigma2 * np.random.randn(N, D2)

return X, Y1, Y2

`

Code Doesn't Agree with Paper

There are a couple of instances where the code doesn't agree with the paper (https://arxiv.org/pdf/1705.07115.pdf):

  1. For each individual task's loss, the paper suggests adding the log of task-dependent standard deviation to the multi-task loss. However, the code is adding log of log of task-dependent variance to the multi-task loss instead. Why this discrepancy? Is there a typo in the paper?

  2. Weights given to each loss in the code don't correspond to those in the equations presented in the paper, please see my followup here: #1

The loss might be nagative value

Thanks for your good jobs. But I have a question. I have transported your code into my project and it worked at that time. However after several steps the loss became nagative. And I found that it was the log_var item led to that. When I removed log_var item the loss would be all right. So I want to know if there is any better solution for that? Thanks again!

loss += K.sum(precision * (y_true - y_pred)**2. + log_var[0], -1)

Log var can become negative and explode

The loss function can optimize in a way that keep decreasing the log_var values, which I observe in my experiments. One simple solution is to do torch.abs(log_var). Any thoughts on how this might affect the formulation of the deductions?

about σ

Hi! Really nice work. I wonder how you calculate the noise σ when training a real network ?

Question on relative weights

Thanks for the great research and code sharing.
After reading the paper and using it in my research, I got a question.
There are two styles for the implementation of weighted loss.
Case 1) L = w_a * L_a + w_b * L_b + w_c * L_c
Case 2) L = L_a + w_b * L_b + w_c * L_c
In case 2, the weight of a loss L_a is set to 1. In my humble opinion, I guess that w_b and w_c will be learned with relative log_vars values accordingly.
In your paper or code, on the other hand, all weights, i.e., all log_vars are set to learnable as in Case 1.
Is there any intention to prefer Case 1? Could it be a problem if I use the style of Case 2?

Question about the loss

As described in this paper, the noise(sigma) increases, the respected L(W) decreases. But if we understand sigma as uncertainty of y,
maybe it's better for L to be increase with uncertainty, because it means y is harder to learn, so it needs more attention to learn?

Final Loss equation

Hi @yaringal !

I have read the paper and it is really amazing. Thanks for your team's hard-work!

However, I have a question regarding the final equation and also your keras implementation.

image

The equation above has 1/2 multiplied by the loss, but you didn't include it in the keras implementation.

I tried experimenting on it, and included 1/2 in the loss function, but it couldn't converge. I am wondering if the problem is in the paper or the keras implementation, because if I exclude the 1/2, it converges to the ground-truth std.

Best regards,

Hardian

Sounds like a lucky result comes from a wrong formula deduction

I read the paper carefully, the formula in paper is fundamentally wrong.

  • Under the formula (2) and (3), the probility output has a gaussian distribution. However, the probility can't be a gaussian distribution as it distributed in [0,1] rather than (-infty, +infty).

  • Under the independent assumption(formula (4)) and gaussian distribution mentioned above, the formula (7) is correct. However, if we just look at the first line in formula (7), if independent assumption is established, -log p(y1, y2|f(w,x)) = -log p(y1|f(w,x)) - log(y2|f(w,x)); which is just a sum of cross-entropy loss over different tasks. This is apparently contradicted with the result under additional gaussian assumption.

  • Somehow, the paper repalce the cross entrophy loss with mse which finally reach the result that higher loss task should have higher theta weights. If the paper report is correct, I think the benefit here comes from loss re-balance. Which means, re-balance the task loss will benefit multi-task performance?

some questions about formulation 10 in paper

Hi,thanks for your great works, and i have some questions about the formulation 10

  • it says "in the last transition we introduced the explicit simplifying assumption ... which becomes an equality when /sigma2 --> 1";
    infact, i find that the final value of /sigma (task variance param)is not closed to 1, so is the assumption appropriate ?

thanks for your reply

Calculating back to actual weights of loss functions

@yaringal Hi, thank you very much for releasing the code here!
I am currently using your technique in adversarial loss training for semantic segmentation.
However, I would like to know the actual weights applying to each loss function (Cross Entropy, Adversarial Loss).
Could you please help me with how to get these weights?

(I am currently using your equation of std and do 1/(2*std**2) to get the weight. Is this correct?)

How can I use the trained model for prediction? is it right to use prediction_model.predict(new_x) ?

I'm not in the field of deep learning and computer science, but I found this work very interesting. I am confused about what should I do if I want to use the trained model for prediction? Can I achieve this goal through prediction_model.predict(new_x)? I see only the trainable_model was trained but it can not achieve prediction. Has the prediction_model been trained at the same time? Thanks very much.

uncertainty for self-supervised learning

@yaringal

Thank you for your example, it helps a lot to understand the paper. I am currently use the proposed formula (exp(-log_var)*loss+log_var)) in self-supervised learning with uncertainty estimation.

In my project, the loss is L1 distance between input images pixels and warped images pixels, the loss works well along. But, when I take uncertainty into training together using the above formula, however, performance drops a lot.

I have totally no idea why. Do you have any advice? By the way, before taking L1 distance, diff = warp_pixel - input_pixel follows Gaussian distribution perfectly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.