lorenmt / auto-lambda Goto Github PK
View Code? Open in Web Editor NEWThe Implementation of "Auto-Lambda: Disentangling Dynamic Task Relationships" [TMLR 2022].
Home Page: https://shikun.io/projects/auto-lambda
License: Other
The Implementation of "Auto-Lambda: Disentangling Dynamic Task Relationships" [TMLR 2022].
Home Page: https://shikun.io/projects/auto-lambda
License: Other
博主您好,非常感谢您的工作,近期我尝试用auto-lambda来优化自己的模型,因为自己的模型中存在一些无法求导的参数,因此我尝试将求取梯度改为以下部分:
model_params = [
p for p in self.model.parameters() if p.requires_grad
]
gradients = torch.autograd.grad(loss, shared_params,retain_graph=True,allow_unused=True)
但是这样就存在一些梯度层为None,我个人直接做了一个if判定,直接不操作这些为None的层,代码跑通了,但是训练过程中存在loss
指数上升最后为Nan的情况:
0%| | 1/11807 [01:52<369:46:10, 112.75s/it]tensor(31.6492, device='cuda:0', grad_fn=)
0%| | 2/11807 [01:53<259:48:03, 79.23s/it] tensor(408.9402, device='cuda:0', grad_fn=)
0%| | 3/11807 [01:54<182:54:59, 55.79s/it]tensor(43848.0703, device='cuda:0', grad_fn=)
0%| | 4/11807 [01:55<129:05:55, 39.38s/it]tensor(1.1228e+15, device='cuda:0', grad_fn=)
0%| | 5/11807 [01:56<91:24:46, 27.88s/it] tensor(nan, device='cuda:0', grad_fn=)
0%| | 6/11807 [01:58<65:00:37, 19.83s/it]tensor(nan, device='cuda:0', grad_fn=)
0%| | 7/11807 [01:59<46:34:15, 14.21s/it]tensor(nan, device='cuda:0', grad_fn=)
0%| | 8/11807 [02:00<33:45:58, 10.30s/it]tensor(nan, device='cuda:0', grad_fn=)
0%| | 9/11807 [02:01<24:43:29, 7.54s/it]tensor(nan, device='cuda:0', grad_fn=)
0%| | 10/11807 [02:02<18:18:33, 5.59s/it]Traceback (most recent call last):
想请教一下您,看看有没有什么建议
作者您好,非常感谢您精彩的工作,最近我在尝试应用时遇到了这样一个问题,我使用‘语义分割’和‘深度估计’作为主要任务,也就是说最终有两条分支进行任务预测,分别为语义分割和深度估计。但我使用auto_lambda.py中的virtual_step求解梯度时,gradients = torch.autograd.grad(loss, self.model.parameters(), allow_unused=True),总有一部分梯度为None,我打印了梯度为None的参数,发现都是第二个任务‘深度估计’的分支结构。在去掉auto_lambda的情况下,使用backward,是没有问题的,想请教一下可能是哪里出现了问题呢?再次感谢!
As we know, weight-based methods search for different task weights, and task weights will act on the loss. Then the loss back-propagation will act on the gradient. In other words, weight-based methods have an effect on the gradient, and gradient-based methods also give different weights to each gradient. It looks like both weighting and gradient-based methods serve the same purpose.
I have three questions about the combination of both weighting and gradient-based methods :
Thank you very much.
Hi,
Congratulations on a great paper :)
Thanks a lot for making your code open-source. I went through your code and it seems like you normalise the input RGB data to [-1, 1]
scale whilst the depth data is normalised to [-1, max]
. It was my understanding that for Cityscapes, the depth data would be normalised to [-1, 1]
after using the map_disparity
function and the RGB data normalised to ImageNet stats if for instance using pre-trained weights. Am I wrong?
I also have a general question about training depth prediction models from Cityscapes. I have tried various flavours of models (DeepLabV3, HRNet) and yet, training single-task depth prediction network seems to yield overly smooth depth maps with convergence of the loss occurring very soon in training regardless of the learning rate (1e-3, 1e-4 etc. for ADAM). For reference, the RGB data is normalised using ImageNet stats (using pre-trained encoders on ImageNet) and the depth data is either normalised to [-1, 1]
or [-1, max]
(using your disparity mapping functions)
I was wondering if you could comment based on your experience on the dataset? This would be very helpful. These same networks have been tested on the 19-class segmentation problem.
Many thanks
Hello lorentmt,
Kudos to the great work. I just want to know if we can use this repo to train multiple classification tasks. For example Vehicle Make ,Color,Orientation,Model - each 4 attributes as individual tasks.
regards
akirs
Hi, I have a small question here, why the batch size of the model output changes after this line:
Line 214 in 24591b7
it seems before this line the batch is 4, then it comes to 3.
since I want to use the output of the model after loss.backward()
in the next epoch...so it becomes a problem if the batch size change.
would you kindly give me some idea on this?
With best regards
您好 我拜读了你的论文 非常感谢您对多任务的贡献 在此有几个问题希望得到您的解答
在代码auto_lambda.py文件夹 compute_hessian函数中,1. 首先对 p += eps * d 后求导self.meta_weights权重,2. 然后 p -= 2 * eps * d,后再一次求导self.meta_weights,3. 最后p += eps * d 计算得到hessian = [(p - n) / (2. * eps) for p, n in zip(d_weight_p, d_weight_n)]
(1)我不明白 p先加上eps * d 再减去2*eps * d,再加上eps * d 是不是相当于p没有变化?
(2) d_model 是经过最重要的val_loss更新后的网络权重,我不明白p += eps * d这样做的意义?
(3)因为我这块方向了解不深入,compute_hessian函数应该是核心算法,但是代码这段代码作用我没看明白,希望得到您的指导 万分感谢
Hello, I tried many times to download the NYU npy file from the dropbox link, but it always failed in the last minutes.
I am confusing about the problem because the code can only support the npy format...may be there is some preprocess code for NYU?
With best regards
Hi, I was wondering if this method can be applied to the original MTAN (SegNet) model?
Hi,
Thank you so much for sharing the code!
I am trying to reproduce your results and just wanted to double-check if the following command is for multi-task learning, not for auxiliary learning:
python trainer_dense.py --network split --dataset nyuv2 --task all --weight autol --gpu 3
In other words, this command will give the result of Split Multi-Task Auto-lambda?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.