Giter Club home page Giter Club logo

libmtl's Introduction

LibMTL

Documentation Status License: MIT PyPI version Supported Python versions CodeFactor paper coverage Hits Made With Love

LibMTL is an open-source library built on PyTorch for Multi-Task Learning (MTL). See the latest documentation for detailed introductions and API instructions.

⭐ Star us on GitHub — it motivates us a lot!

News

  • [Feb 08 2024] Added support for DB-MTL.
  • [Aug 16 2023]: Added support for MoCo (ICLR 2023). Many thanks to the author's help @heshandevaka.
  • [Jul 11 2023] Paper got accepted to JMLR.
  • [Jun 19 2023] Added support for Aligned-MTL (CVPR 2023).
  • [Mar 10 2023]: Added QM9 and PAWS-X examples.
  • [Jul 22 2022]: Added support for Nash-MTL (ICML 2022).
  • [Jul 21 2022]: Added support for Learning to Branch (ICML 2020). Many thanks to @yuezhixiong (#14).
  • [Mar 29 2022]: Paper is now available on the arXiv.

Table of Content

Features

  • Unified: LibMTL provides a unified code base to implement and a consistent evaluation procedure including data processing, metric objectives, and hyper-parameters on several representative MTL benchmark datasets, which allows quantitative, fair, and consistent comparisons between different MTL algorithms.
  • Comprehensive: LibMTL supports many state-of-the-art MTL methods including 8 architectures and 16 optimization strategies. Meanwhile, LibMTL provides a fair comparison of several benchmark datasets covering different fields.
  • Extensible: LibMTL follows the modular design principles, which allows users to flexibly and conveniently add customized components or make personalized modifications. Therefore, users can easily and fast develop novel optimization strategies and architectures or apply the existing MTL algorithms to new application scenarios with the support of LibMTL.

Overall Framework

framework

Each module is introduced in Docs.

Supported Algorithms

LibMTL currently supports the following algorithms:

Optimization Strategies Venues Arguments
Equal Weighting (EW) - --weighting EW
Gradient Normalization (GradNorm) ICML 2018 --weighting GradNorm
Uncertainty Weights (UW) CVPR 2018 --weighting UW
MGDA (official code) NeurIPS 2018 --weighting MGDA
Dynamic Weight Average (DWA) (official code) CVPR 2019 --weighting DWA
Geometric Loss Strategy (GLS) CVPR 2019 Workshop --weighting GLS
Projecting Conflicting Gradient (PCGrad) NeurIPS 2020 --weighting PCGrad
Gradient sign Dropout (GradDrop) NeurIPS 2020 --weighting GradDrop
Impartial Multi-Task Learning (IMTL) ICLR 2021 --weighting IMTL
Gradient Vaccine (GradVac) ICLR 2021 --weighting GradVac
Conflict-Averse Gradient descent (CAGrad) (official code) NeurIPS 2021 --weighting CAGrad
Nash-MTL (official code) ICML 2022 --weighting Nash_MTL
Random Loss Weighting (RLW) TMLR 2022 --weighting RLW
MoCo ICLR 2023 --weighting MoCo
Aligned-MTL (official code) CVPR 2023 --weighting Aligned_MTL
DB-MTL arXiv --weighting DB_MTL
Architectures Venues Arguments
Hard Parameter Sharing (HPS) ICML 1993 --arch HPS
Cross-stitch Networks (Cross_stitch) CVPR 2016 --arch Cross_stitch
Multi-gate Mixture-of-Experts (MMoE) KDD 2018 --arch MMoE
Multi-Task Attention Network (MTAN) (official code) CVPR 2019 --arch MTAN
Customized Gate Control (CGC), Progressive Layered Extraction (PLE) ACM RecSys 2020 --arch CGC, --arch PLE
Learning to Branch (LTB) ICML 2020 --arch LTB
DSelect-k (official code) NeurIPS 2021 --arch DSelect_k

Supported Benchmark Datasets

Datasets Problems Task Number Tasks multi-input Supported Backbone
NYUv2 Scene Understanding 3 Semantic Segmentation+
Depth Estimation+
Surface Normal Prediction
ResNet50/
SegNet
Office-31 Image Recognition 3 Classification ResNet18
Office-Home Image Recognition 4 Classification ResNet18
QM9 Molecular Property Prediction 11 (default) Regression GNN
PAWS-X Paraphrase Identification 4 (default) Classification Bert

Installation

  1. Create a virtual environment

    conda create -n libmtl python=3.8
    conda activate libmtl
    pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
  2. Clone the repository

    git clone https://github.com/median-research-group/LibMTL.git
  3. Install LibMTL

    cd LibMTL
    pip install -r requirements.txt
    pip install -e .

Quick Start

We use the NYUv2 dataset as an example to show how to use LibMTL.

Download Dataset

The NYUv2 dataset we used is pre-processed by mtan. You can download this dataset here.

Run a Model

The complete training code for the NYUv2 dataset is provided in examples/nyu. The file main.py is the main file for training on the NYUv2 dataset.

You can find the command-line arguments by running the following command.

python main.py -h

For instance, running the following command will train an MTL model with EW and HPS on NYUv2 dataset.

python main.py --weighting EW --arch HPS --dataset_path /path/to/nyuv2 --gpu_id 0 --scheduler step --mode train --save_path PATH

More details is represented in Docs.

Citation

If you find LibMTL useful for your research or development, please cite the following:

@article{lin2023libmtl,
  title={{LibMTL}: A {P}ython Library for Multi-Task Learning},
  author={Baijiong Lin and Yu Zhang},
  journal={Journal of Machine Learning Research},
  volume={24},
  number={209},
  pages={1--7},
  year={2023}
}

Contributor

LibMTL is developed and maintained by Baijiong Lin.

Contact Us

If you have any question or suggestion, please feel free to contact us by raising an issue or sending an email to [email protected].

Acknowledgements

We would like to thank the authors that release the public repositories (listed alphabetically): CAGrad, dselect_k_moe, MultiObjectiveOptimization, mtan, MTL, nash-mtl, pytorch_geometric, and xtreme.

License

LibMTL is released under the MIT license.

libmtl's People

Contributors

baijiong-lin avatar median-research-group avatar yuezhixiong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libmtl's Issues

Decoders weights not updating?

Hi,
Thank you for putting up a such a fantastic MTL experimentation library. I used it for my own datasets and all looked good except when i observed weights of encoder/decoders during training rounds and it seems that only encoder weights get updated after any iteration or epochs but decoders weights remain same. When i just freeze the encoder layers (self.model.encoder.requires_grad_(False))and not the decoders layers than the training loss remain same in all iterations/epochs which means that decoders weights are not updating during training rounds. I tried with HPS architecture and EW weighting. Kindly can you help to debug what can contribute to this issue?

Unable to save the trained model

When I try to save the trained model i.e. full model using the following command -

torch.save(model, "<path>")

It throws this error

AttributeError: Can't pickle local object 'Trainer._prepare_model.<locals>.MTLmodel'

Issue with val Epoch and `self.base_result`

Hi,
Thanks for open sourcing this wonderful repo. I have a question regarding the following line

self.best_result['epoch'] = epoch
, Shouldn't base result be updated here i.e. self.base_result = new_result else the improvement will always be compared over base_result, and we would not get the best checkpoint. I may be missing something, but the results on my experiments doesn't look correct to me.
thanks

怎么样求得网络公共部分

我的网络里用到了fpn,有的head用了fpn的第一层输出,其它head用到了其它层的输出。
我要怎么确定fpn的网络公共部分
image

Support for missing labels

Thank you for this excellent library. I'm wondering if you would consider supporting single-input problems where there are incomplete labels for each task.

A simple example:

  • Input: Image from the camera
  • Output:
    • Task 1: Identify street signs (binary)
    • Task 2: Identity pedestrian (binary)
    • Task 3: Identify crossing animals (binary)
    • ...
    • Task 30: Identify street light (binary)
      The problem is the label is not all training examples have labels for all tasks. For instance
label for img1: [0, 1, 1, 0, 1, ..., 0, 0, 1]
label for img2: [1, ?, ?, 1, 1, ..., 1, ?, 0]
label for img3: [1, 0, ?, 1, 0, ..., 0, 1, ?]

(? is a missing label - the label doesn't specify whether or not img2 has a pedestrian or not.)

One could consider this as a multi-input problem, just duplicate the dataset and exclude missing labels for each task -> create 30 datasets. However, this is wasteful to train separate 30 task-specific forward networks without parameter sharing. Even just propagating forward 30 times for a single image is an inefficient usage of computational resources.

Please let me know if I have the wrong assumptions about how LibMTL works, or if my problem could easily be solved by current api. Thank you!

Trainer class use fails with the error "No module named 'torchvision.models.utils'"

Trainer class use fails with the error "No module named 'torchvision.models.utils'"

Full stack here -
Traceback (most recent call last):
File "src/main/pipelines/train_nsfw_mtl.py", line 11, in
from LibMTL import Trainer
File "/azureml-envs/azureml_8a26314e09753d45d0790003a01faf79/lib/python3.8/site-packages/LibMTL/init.py", line 2, in
from . import model
File "/azureml-envs/azureml_8a26314e09753d45d0790003a01faf79/lib/python3.8/site-packages/LibMTL/model/init.py", line 1, in
from LibMTL.model.resnet import resnet18
File "/azureml-envs/azureml_8a26314e09753d45d0790003a01faf79/lib/python3.8/site-packages/LibMTL/model/resnet.py", line 3, in
from torchvision.models.utils import load_state_dict_from_url
ModuleNotFoundError: No module named 'torchvision.models.utils'

The issue is fixed by using 'torch.hub' instead.

更换损失函数时的两个问题

您好,我在更换损失函数的时候有两个问题想请教一下,谢谢!

目前使用的版本是1.1.6

  1. 把损失函数从CELoss换成KLDivLoss的时候会出现维度对不上问题
 File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/LibMTL/core/trainer.py", line 461, in train
     train_losses[tn] = self._compute_loss(
   File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/LibMTL/core/trainer.py", line 304, in _compute_loss
     train_losses = self.losses[task_name].update_loss(preds[task_name], gts)
   File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/LibMTL/loss/abstract_loss.py", line 59, in update_loss
     loss = self.compute_loss(pred, gt)
   File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/LibMTL/loss/KLDivLoss.py", line 19, in compute_loss
     loss = self.loss_fn(pred, gt)
   File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
     return forward_call(*input, **kwargs)
   File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 465, in forward
     return F.kl_div(input, target, reduction=self.reduction, log_target=self.log_target)
   File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 2916, in kl_div
     reduced = torch.kl_div(input, target, reduction_enum, log_target=log_target)
RuntimeError: The size of tensor a (64) must match the size of tensor b (31) at non-singleton dimension 1
  1. 在原有损失函数换成一个新的损失时会出现,inplace操作,但是在代码中好像没有出现inplace,如果还是计算交叉熵则不会有问题

修改的代码

decoder_soft_loss = nn.KLDivLoss(reduction="batchmean")(
                                 nn.functional.log_softmax(unlearned_decoder / 10.0, dim=1),
                                 nn.functional.softmax(init_decoder / 10.0, dim=1))

其中unlearned_decoder 是模型输出的pred,init_decoder 是初始化模型输出的pred

错误信息

  File "train_office.py", line 12, in <module>
    Officemodel.kd_train()
  File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/LibMTL/core/trainer.py", line 712, in kd_train
    w = self.model.backward(train_losses, **weighting_arg)
  File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/LibMTL/weighting/DWA.py", line 40, in backward
    loss.backward()
  File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 31]], which is output 0 of AsStridedBackward0, is at version 5; expected version 4 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

大佬,您好!请问,非同源数据,不同任务的训练样本量相差很大,怎么应用优化器的梯度策略?

问题1:
比如,有task_a和task_b,他们的训练样本是非同源的,task_a的训练样本量是100000,task_b的训练样本量是1000。
如果我用一个dataloader去取数据,同一个batch里可能都是task_a的训练样本,一个叠代只训练了task_a这个任务,无法做多任务的梯度策略。
————————————————————————————————————————————————————
问题2:
同样是多任务非同源训练数据不均衡,如果是task_a、task_b、task_c三任务的话,还可能一个batch里只有task_a和task_c的样本,没有task_b的样本,如果是这样的话,又要怎么去应用梯度策略?

可视化工具

hi,你有什么对梯度更新的可视化办法吗,或者之后在libmtl里有计划实现

for GNN

Hi, I know it's kinda OOT but I am curious whether I can apply multi-task learning to graph neural network. What I learn from HPS, we shall share the encoder/decoder across the layer. I am curious should I create an encoder on top of the graph layer? Kinda stuck in this experiment, any suggestion would be helpful. Thanks

Pytorch Distributed DataParallel support

Hi, great work.

Did you guys try to run gradient surgery methods with distributed dataparallel in pytorch. I couldn't run it due to calling backwards multiple times creating gradient syncing issues in distributed learning

Gradvac的梯度更新

您好!Gradvac原文中提到了网络不同层间的梯度相似度最后收敛到不同的值,所以对不同任务以及不同层设置了不同的目标值。
原文描述如下:
To incorporate these three factors, we exploit an exponential moving average (EMA) variable for tasks i, j and parameter
group k (e.g. the k-th layer) as:
image
但你们实现的Gradvac仍然只是对不同任务间设置了不同目标值。这是否合理?

uw initialization

Hi, I found that the value -0.5 was used when initializing the parameter in line 19 of uw.py.
My doubt is why this value is not 0, since the variable loss_scale is equivalent to log \sigma in the original paper.

self.loss_scale = nn.Parameter(torch.tensor([-0.5]*self.task_num, device=self.device))

Passing arguments to encoder/decoder constructors

Is it possible to currently pass in arguments to initialize the encoders and decoders?
For example, given this linear encoder class:

class SimpleLinearEncoder(nn.Module):
    def __init__(self, n_features, n_hidden1, n_output):
        super(SimpleLinearEncoder, self).__init__()
        self.encoder = torch.nn.Sequential(
            torch.nn.Linear(n_features, n_hidden1),
            torch.nn.ReLU(),
            torch.nn.Linear(n_hidden1, n_output)
        )

    def forward(self, x):
        return self.encoder(x)

In order to create an encoder with 20 inputs, 10 neurons in the hidden layer and 4 outputs without setting them as default argument values, can we pass in these parameters to kwargs? E.g:

kwargs = {"weight_args": {"alpha": 1.5}, "arch_args": {}, "n_features": 20, "n_hidden1": 10, "n_output": 4 }

 model = Trainer(task_dict=task_dict,
                    weighting=weighting_method.__dict__[mtl_weighting_method],
                    architecture=architecture_method.__dict__[mtl_architecture],
                    encoder_class=SimpleLinearEncoder,
                    decoders=decoders,
                    rep_grad=False,
                    multi_input=True,
                    optim_param={'optim': 'sgd', 'lr': 0.005, 'weight_decay': 0.00005, 'momentum': 0.9},
                    scheduler_param={'scheduler': 'step', 'step_size': 100, 'gamma': 0.5},
                    **kwargs)

I wasn't able to get the above to work, and my naive guess is that it's because of these lines in the LibMTL.architecture classes, e.g. HPS:

self.encoder = self.encoder_class()

I'm new to PyTorch so perhaps I'm missing an easy solution.

Thanks.

Saving and loading models

Hello LibMTL team,
I went through your code, and I am not sure which part could be extended to support loading and saving the models to disk.
The model inside the Trainer is not picklable, and I was wondering if you could hint me to the components that would need to be saved and loaded for instance to resume training or postpone testing,
Best,
Salah

Question about Cross stitch implementation

Hi, I would like to ask you question about the Cross Stitch implementation.
In your code you create cross_unit as torch.ones(4, self.task_num). So for each of the 4 resnet layers you have tensor of task_num values. For example with 2 tasks it would be tensor(4, 2) and each layer has a 1x2 cross unit. But in the paper in equation (1) for two tasks they have a matrix of 2x2. Is there a reason that you have a 1x2 cross unit, did I misunderstand something in the paper ?
formula

self.cross_unit = nn.Parameter(torch.ones(4, self.task_num))
def forward(self, inputs):
s_rep = {task: self.resnet_conv[task](inputs) for task in self.task_name}
ss_rep = {i: [0]*self.task_num for i in range(4)}
for i in range(4):
for tn, task in enumerate(self.task_name):
if i == 0:
ss_rep[i][tn] = self.resnet_layer[str(i)][tn](s_rep[task])
else:
cross_rep = sum([self.cross_unit[i-1][j]*ss_rep[i-1][j] for j in range(self.task_num)])
ss_rep[i][tn] = self.resnet_layer[str(i)][tn](cross_rep)
return ss_rep[3]

Because I was expecting to see something like this in the implementation:

self.cross_unit = nn.Parameter(torch.ones(4, self.task_num, self.task_num)) # matrix for each layer

def forward(self, inputs):
    s_rep = {task: self.resnet_conv[task](inputs) for task in self.task_name}
    ss_rep = {i: [0]*self.task_num for i in range(4)}
    for i in range(4):
        for tn, task in enumerate(self.task_name):
            if i == 0:
                ss_rep[i][tn] = self.resnet_layer[str(i)][tn](s_rep[task])
            else:
                cross_rep = sum([self.cross_unit[i-1][tn][j]*ss_rep[i-1][j] for j in range(self.task_num)]) # access matrix row of each task
                ss_rep[i][tn] = self.resnet_layer[str(i)][tn](cross_rep)
    return ss_rep[3]

不同weighting方法的rep_grad使用

您好,请问为什么有些方法会限制 rep_grad 为 True 或 False。
例如:GradDrop 只能为 True。而像PCGrad、GradVac、CAGrad等都只能为False。

question about uw

In the uw original paper, the objective function is:
image
and according to the paper,
截屏2023-05-16 15 53 12
the second item 截屏2023-05-16 16 12 34 is diffirent with first item 截屏2023-05-16 16 13 44 in denominators.
But in you code, loss = (losses/(2*self.loss_scale.exp())+self.loss_scale/2).sum(), without distinction between the denominators of these two.
Is that correct?

some problem about metrics.py

I introduced L1Metric class ,I get a error:

AttributeError: 'L1Metric' object has no attribute 'abs_record'

class L1Metric(AbsMetric):
    r"""Calculate the Mean Absolute Error (MAE).
    """
    def __init__(self):
        super(L1Metric, self).__init__()
        
    def update_fun(self, pred, gt):
        r"""
        """
        abs_err = torch.abs(pred - gt)
        self.record.append(abs_err)
        self.bs.append(pred.size()[0])
        
    def score_fun(self):
        r"""
        """
        records = np.array(self.abs_record)
        batch_size = np.array(self.bs)
        return [(records*batch_size).sum()/(sum(batch_size))]

L1Metric class inherit AbsMetric class,but AbsMetric class has no attribute 'abs_record',So I guess there are maybe some problems,of course,this property may also come from other places

class AbsMetric(object):
    r"""An abstract class for the performance metrics of a task. 

    Attributes:
        record (list): A list of the metric scores in every iteration.
        bs (list): A list of the number of data in every iteration.
    """
    def __init__(self):
        self.record = []
        self.bs = []
    
    @property
    def update_fun(self, pred, gt):
        r"""Calculate the metric scores in every iteration and update :attr:`record`.

        Args:
            pred (torch.Tensor): The prediction tensor.
            gt (torch.Tensor): The ground-truth tensor.
        """
        pass
    
    @property
    def score_fun(self):
        r"""Calculate the final score (when an epoch ends).

        Return:
            list: A list of metric scores.
        """
        pass
    
    def reinit(self):
        r"""Reset :attr:`record` and :attr:`bs` (when an epoch ends).
        """
        self.record = []
        self.bs = []

trainer can't work

when I try to run this trainer , .next() can't run

-> "AttributeError: 'dict_keyiterator' object has no attribute 'next'"

I use:
python=3.7
torch=11.3

This method has been deleted in python 3?
Why not use next(iter)?

or what should I do to fix it?

image
image

Multi-input MTL

Most of the research papers for the implemented architectures and weighting methods seem to deal with single-input-multi-output tasks such as simultaneous object detection and segmentation from a single image. However, LibMTL also supports multi-input tasks, where each task has its own data (e.g. MNIST where each digit has its own task-specific layers). Is there a set of research papers that discuss this approach? I'm curious to learn how LibMTL evaluates the loss in these situations. I can dig into the code to get a deeper understanding, but in the meantime, if there are papers that discuss this, it would really help to have some links (and maybe update the docs).

Thanks!

Colab tutorial

The library is really interesting.

It would be very useful a Colaboratory notebook to test it online.

v1.1.6文档

作者你好,不知道是否有v1.1.6的最新文档呢?我看新加的一些参数例如cfg等等都没有完整的例子与介绍,文档还是原来的版本

saving model

After raining the training script, where will the model be saved?

Wrong condition

if epoch == 0 and self.base_result is None and (mode=='val' if self.has_val else 'test'):

if epoch == 0 and self.base_result is None and mode==('val' if self.has_val else 'test'): should be

Time-series MTL

Hi,

First of all, this is a fantastic library! Amazing work.

My question: can LibMTL can be used in time-series applications? I.e. do we only need to provide encoder/decoder architectures such as LSTMs? Would the loss-weighting methods need to be extended in any way?
I'm fairly new to MTL, so pardon the naive question.

Thanks,
Madhu

Questions about AlignMTL

Hi there, I had a hard time reproducing the AlignMTL results. I am wondering if you have encountered the same issue? Have you guys evaluated AlignMTL under HPS?

Thanks,
Max

Support for Visualization

Thanks for your excellent work!

Could LibMTL provide visualization support? We can easily visualize the training progress of the model by using visualization tools, like tensorboardX, or Visdom.

Thanks

About MGDA implementation, some details I want to confirm.

Hi, thanks for your wonderful project. There are some questions I want to confirm when I apply the MGDA weighting method, could you please give me an answer, thanks!

  1. what's the self.rep_tasks?
  2. what's the rep_grad?

For the above two problems, I try to give my answer, the first is the representation generated by the representation layer (sharing parameters), and the second is whether using the gradients of representations.

In this case, my third problem is what is the purpose of the variable rep_grad when it is in MGDA?

It's used to implement the MGDA-UB? I realize that it will save the gradients of self.rep_tasks in the function of _compute_grad() in abstract_weighting.py, so I made such an assumption.

I'm a little bit confused about these technique detail, hope you can help me, thanks again!

Performance issue for NYUv2

Thank you for your excellent contribution!

I have a question about the performance of MTAN with PCGrad.

In my experiment, the result of NYUv2 is very different from the PCGrad official performance (although I not used UW in this experiment)

image
(sorry for this image to very small)

Also, when I experimented to the resnet50-HPS without weighting method, the result is beyond the MTAN official performance.
image

I tried your official training command line in all cases and used the dataset in your Dropbox.

Is that result the right phenomenon?

Thank you in advance!

office-31 demo doesn't work

I ran the demo of office-31and found some errors in this example:
The version of pip is not the same version published on github
The operating environment is as follows:

  • python = 3.8
  • pytroch = 1.12

In train.py, .next() can't run
image

After I modified this syntax, the new problem appeared again
data = data.to(self.device, non_blocking=True)

I want to know if this is a version reason or something else

How to export saved models to other formats, such as onnx, mnn, etc

Hello, first of all, thank you for your great work.
I saved the best model while running the Office example, but I want to export this model as an ONNX model and use tools like Netron to understand the model structure. How can I add code to the framework? Can you add an exported component for exporting various models, such as onnx, mnn, tflite

DataLoader errors when I set num_workers>1

I have found someone says set num_workers=0 will work,but it's too slow...My system is Ubuntu

Exception ignored in: <function _MultiProcessingDataLoaderIter.del at 0x7f7c408fce60>
Traceback (most recent call last):
File "/home/user/miniconda3/envs/IB/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1510, in del
self._shutdown_workers()
File "/home/user/miniconda3/envs/IB/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
if w.is_alive():
File "/home/user/miniconda3/envs/IB/lib/python3.7/multiprocessing/process.py", line 151, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
AssertionError: can only test a child process

When running the example code for QM9, the program seems to enter an infinite loop. QM9案例训练代码无响应

When running train_qm9.py file, the program prints the following information and then becomes unresponsive, as if it has entered an infinite loop. What could be the possible reasons for this issue?

在运行train_qm9.py的时候,程序打印以下信息后就再无动静,似乎进入了死循环?请问可能是什么原因造成的呢?

General Configuration:
Wighting: EW
Architecture: HPS
Rep_Grad: False
Multi_Input: False
Seed: 0
Save Path: None
Load Path: None
Device: cuda:0
Optimizer Configuration:
optim: adam
lr: 0.0001
weight_decay: 1e-05

Total Params: 617675
Trainable Params: 617675
Non-trainable Params: 0

LOG FORMAT | 0_LOSS MAE | 1_LOSS MAE | 2_LOSS MAE | 3_LOSS MAE | 5_LOSS MAE | 6_LOSS MAE | 12_LOSS MAE | 13_LOSS MAE | 14_LOSS MAE | 15_LOSS MAE | 11_LOSS MAE | TIME

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.