lorenmt / auto-lambda Goto Github PK

View Code? Open in Web Editor NEW

126.0 126.0 16.0 70 KB

The Implementation of "Auto-Lambda: Disentangling Dynamic Task Relationships" [TMLR 2022].

Home Page: https://shikun.io/projects/auto-lambda

License: Other

Python 100.00%

auxiliary-learning meta-learning multi-task-learning

auto-lambda's People

Contributors

Stargazers

Watchers

Forkers

daeunni aunusualman aung2phyowai wisdom0530 zivzone leocho1learning weisili2016 alan-lyx996 thucbx99 noashoham godplusone dimarapis kanapazombie shism2 vik-u tyqfat

auto-lambda's Issues

损失函数为Nan

博主您好，非常感谢您的工作，近期我尝试用auto-lambda来优化自己的模型，因为自己的模型中存在一些无法求导的参数，因此我尝试将求取梯度改为以下部分：
model_params = [
p for p in self.model.parameters() if p.requires_grad
]
gradients = torch.autograd.grad(loss, shared_params,retain_graph=True,allow_unused=True)
但是这样就存在一些梯度层为None,我个人直接做了一个if判定，直接不操作这些为None的层，代码跑通了，但是训练过程中存在loss
指数上升最后为Nan的情况：
0%| | 1/11807 [01:52<369:46:10, 112.75s/it]tensor(31.6492, device='cuda:0', grad_fn=)
0%| | 2/11807 [01:53<259:48:03, 79.23s/it] tensor(408.9402, device='cuda:0', grad_fn=)
0%| | 3/11807 [01:54<182:54:59, 55.79s/it]tensor(43848.0703, device='cuda:0', grad_fn=)
0%| | 4/11807 [01:55<129:05:55, 39.38s/it]tensor(1.1228e+15, device='cuda:0', grad_fn=)
0%| | 5/11807 [01:56<91:24:46, 27.88s/it] tensor(nan, device='cuda:0', grad_fn=)
0%| | 6/11807 [01:58<65:00:37, 19.83s/it]tensor(nan, device='cuda:0', grad_fn=)
0%| | 7/11807 [01:59<46:34:15, 14.21s/it]tensor(nan, device='cuda:0', grad_fn=)
0%| | 8/11807 [02:00<33:45:58, 10.30s/it]tensor(nan, device='cuda:0', grad_fn=)
0%| | 9/11807 [02:01<24:43:29, 7.54s/it]tensor(nan, device='cuda:0', grad_fn=)
0%| | 10/11807 [02:02<18:18:33, 5.59s/it]Traceback (most recent call last):
想请教一下您，看看有没有什么建议

No module named "***"

A problem occurred when I ran the Cityscape dataset, unable to import the defined modules correctly.

grad为none

作者您好，非常感谢您精彩的工作，最近我在尝试应用时遇到了这样一个问题，我使用‘语义分割’和‘深度估计’作为主要任务，也就是说最终有两条分支进行任务预测，分别为语义分割和深度估计。但我使用auto_lambda.py中的virtual_step求解梯度时，gradients = torch.autograd.grad(loss, self.model.parameters(), allow_unused=True)，总有一部分梯度为None，我打印了梯度为None的参数，发现都是第二个任务‘深度估计’的分支结构。在去掉auto_lambda的情况下，使用backward，是没有问题的，想请教一下可能是哪里出现了问题呢？再次感谢！

Applying a combination of both weighting and gradient-based methods can further improve performance？

As we know, weight-based methods search for different task weights, and task weights will act on the loss. Then the loss back-propagation will act on the gradient. In other words, weight-based methods have an effect on the gradient, and gradient-based methods also give different weights to each gradient. It looks like both weighting and gradient-based methods serve the same purpose.

I have three questions about the combination of both weighting and gradient-based methods :

What is the motivation for the combination of both weighting and gradient-based methods?
It looks like both weighting and gradient-based methods serve the same purpose. Why is it still necessary to combine both weighting and gradient-based methods?
And why the combination of both can further improve performance?

Thank you very much.

Normalisation of depth data and depth prediction performance

Hi,

Congratulations on a great paper :)

Thanks a lot for making your code open-source. I went through your code and it seems like you normalise the input RGB data to [-1, 1] scale whilst the depth data is normalised to [-1, max]. It was my understanding that for Cityscapes, the depth data would be normalised to [-1, 1] after using the map_disparity function and the RGB data normalised to ImageNet stats if for instance using pre-trained weights. Am I wrong?

I also have a general question about training depth prediction models from Cityscapes. I have tried various flavours of models (DeepLabV3, HRNet) and yet, training single-task depth prediction network seems to yield overly smooth depth maps with convergence of the loss occurring very soon in training regardless of the learning rate (1e-3, 1e-4 etc. for ADAM). For reference, the RGB data is normalised using ImageNet stats (using pre-trained encoders on ImageNet) and the depth data is either normalised to [-1, 1] or [-1, max] (using your disparity mapping functions)

I was wondering if you could comment based on your experience on the dataset? This would be very helpful. These same networks have been tested on the 19-class segmentation problem.

Many thanks

Can we do Multiple Classification task

Hello lorentmt,

Kudos to the great work. I just want to know if we can use this repo to train multiple classification tasks. For example Vehicle Make ,Color,Orientation,Model - each 4 attributes as individual tasks.

regards
akirs

batch size changes after update_metric

Hi, I have a small question here, why the batch size of the model output changes after this line:

auto-lambda/trainer_dense.py

Line 214 in 24591b7

train_metric.update_metric(train_pred, train_target, train_loss)

it seems before this line the batch is 4, then it comes to 3.

since I want to use the output of the model after loss.backward() in the next epoch...so it becomes a problem if the batch size change.

would you kindly give me some idea on this?

With best regards

关于compute_hessian函数的问题

您好我拜读了你的论文非常感谢您对多任务的贡献在此有几个问题希望得到您的解答
在代码auto_lambda.py文件夹 compute_hessian函数中，1. 首先对 p += eps * d 后求导self.meta_weights权重，2. 然后 p -= 2 * eps * d，后再一次求导self.meta_weights，3. 最后p += eps * d 计算得到hessian = [(p - n) / (2. * eps) for p, n in zip(d_weight_p, d_weight_n)]
（1）我不明白 p先加上eps * d 再减去2*eps * d，再加上eps * d 是不是相当于p没有变化？
（2） d_model 是经过最重要的val_loss更新后的网络权重，我不明白p += eps * d这样做的意义？
（3）因为我这块方向了解不深入，compute_hessian函数应该是核心算法，但是代码这段代码作用我没看明白，希望得到您的指导万分感谢

failed to download the NYU npy dataset

Hello, I tried many times to download the NYU npy file from the dropbox link, but it always failed in the last minutes.

I am confusing about the problem because the code can only support the npy format...may be there is some preprocess code for NYU?

With best regards

mtan model

Hi, I was wondering if this method can be applied to the original MTAN (SegNet) model?

Question about the multi-task learning

Hi,

Thank you so much for sharing the code!

I am trying to reproduce your results and just wanted to double-check if the following command is for multi-task learning, not for auxiliary learning:
python trainer_dense.py --network split --dataset nyuv2 --task all --weight autol --gpu 3
In other words, this command will give the result of Split Multi-Task Auto-lambda?

Thanks!

lorenmt / auto-lambda Goto Github PK

auto-lambda's People

Contributors

Stargazers

Watchers

Forkers

auto-lambda's Issues

损失函数为Nan

No module named "***"

grad为none

Applying a combination of both weighting and gradient-based methods can further improve performance？

Normalisation of depth data and depth prediction performance

Can we do Multiple Classification task

batch size changes after update_metric

关于compute_hessian函数的问题

failed to download the NYU npy dataset

mtan model

Question about the multi-task learning

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent