Giter Club home page Giter Club logo

multiobjectiveoptimization's People

Contributors

chengyinghao avatar ozansener avatar shaanrockz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multiobjectiveoptimization's Issues

inconsistent implementation of update for task specific weights with the description in paper

Hi, thanks for releasing this awesome code! Currently, i am working on reproducing the result on cityscapes in paper. I found that in paper the description of mtl update equation say the weights of task specific subnetwork should be updated with original learning rate, then the shared weights of network is updated with the MGDA algorithm. But i didnt find the corresponding implementation in code where both the shared weights and task specific weights are updated consistently by timing loss of different task with a weight factor determined by MGDA. Am i missing something here, or is this a implemention trick?

`iter_count` is never incremented in `min_norm_solvers`?

Both the find_min_norm_element (https://github.com/isl-org/MultiObjectiveOptimization/blob/master/multi_task/min_norm_solvers.py#L120) and find_min_norm_element_FW (https://github.com/isl-org/MultiObjectiveOptimization/blob/master/multi_task/min_norm_solvers.py#L166) functions define a while loop based on an iter_count variable.

Line 13 of Algorithm 2 in the paper (https://arxiv.org/pdf/1810.04650.pdf) notes that optimization is done until a number of iterations or convergence threshold is reached. However, the code never actually increments iter_count, and so optimization only terminates based on the convergence threshold.

Citation for 'Projected Descent' method in Readme

Hi,
thanks for the open-source code!
I'm looking to implement something like https://openreview.net/forum?id=HJewiCVFPB (Gradient Surgery for Multi-Task Learning) using/on top of this repo, but i'm a little confused about whether the projected descent method that is mentioned in the Readme is something similar already present.
Could you add citations and sources for any additional algorithms and tricks that this repo implements apart from the main paper?
That would be very helpful!
(or alternatively it would be helpful to just get some pointers on what/where to modify to get what i want working)

PS: ^ the paper i mentioned just involves projecting the gradients of all tasks onto the planes of the gradients of the other tasks before doing updates, to avoid gradient conflicts, (for some quick context)

Thanks

size of the segmentation output

I find the input size of CityScapes is 256x512, but in the transform, the target is resize into 32x64. what's more, the measure metrix is based on 32x64.

So, why the target size is not 256x512, just same as the input size?

run error, Please help me, Thanks

I use "
"algorithm": "mgda",
"use_approximation": true,", That's mean the MGDA_UB is used。
When I trained the task,At the begin time, It can run true. But when it run some steps, It will get error: "RuntimeError: CUDA out of memory." ,I use torch_version==v1.1.0, Is it torch_version question?

`_min_norm_2d` func in `min_norm_solver.py` is slow

In my test, I found change below code in _min_norm_2d func may spped up train procedure.

    def _min_norm_2d(self, vecs, dps):
        """
        Find the minimum norm solution as combination of two points
        This is correct only in 2D
        ie. min_c |\sum c_i x_i|_2^2 st. \sum c_i = 1 , 1 >= c_1 >= 0 for all i, c_i + c_j = 1.0 for some i, j
        """
        dmin = 1e8
        for i in range(len(vecs)):
            for j in range(i + 1, len(vecs)):
                if (i, j) not in dps:
                    dps[(i, j)] = 0.0
                    # for k in range(len(vecs[i])):
                    #     dps[(i, j)] += torch.dot(vecs[i][k], vecs[j][k]).data[0]
                    dps[(i, j)] = torch.mm(vecs[i].transpose(1,0), vecs[j])
                    dps[(j, i)] = dps[(i, j)]
                if (i, i) not in dps:
                    dps[(i, i)] = 0.0
                    # for k in range(len(vecs[i])):
                    #     dps[(i, i)] += torch.dot(vecs[i][k], vecs[i][k]).data[0]
                    dps[(i, i)] = torch.mm(vecs[i].transpose(1, 0), vecs[i])
                if (j, j) not in dps:
                    dps[(j, j)] = 0.0
                    # for k in range(len(vecs[i])):
                    #     dps[(j, j)] += torch.dot(vecs[j][k], vecs[j][k]).data[0]
                    dps[(j, j)] = torch.mm(vecs[j].transpose(1, 0), vecs[j])
                c, d = self._min_norm_element_from2(dps[(i, i)], dps[(i, j)], dps[(j, j)])
                if d < dmin:
                    dmin = d
                    sol = [(i, j), c, d]
        return sol, dps

cityscape dataset

Hi,Thanks you code.

Could you provide the cityscape dataset depth ground truth?

thanks!

How to speed up the calculation of func '_min_norm_2d'?

I found that if the vecs input into func _min_norm_2d is too large, the whole procedure becomes very slow. In my experiment, the vecs.shape = [39321600,1], and one single training iteration costs 40s, while the _min_norm_2d spends 35s+.
The experiment in your paper assume that the share layers shape is much smaller mine, so I guess maybe this is the point.

local variable 'sol' referenced before assignment

I encounter an error UnboundLocalError: local variable 'sol' referenced before assignment

In min_norm_solvers.py,
def _min_norm_2d(vecs, dps):
"""
Find the minimum norm solution as combination of two points
This is correct only in 2D
ie. min_c |\sum c_i x_i|_2^2 st. \sum c_i = 1 , 1 >= c_1 >= 0 for all i, c_i + c_j = 1.0 for some i, j
"""
dmin = 1e8
for i in range(len(vecs)):
for j in range(i+1,len(vecs)):
if (i,j) not in dps:
dps[(i, j)] = 0.0
for k in range(len(vecs[i])):
dps[(i,j)] += torch.mul(vecs[i][k], vecs[j][k]).sum().data.cpu()
dps[(j, i)] = dps[(i, j)]
if (i,i) not in dps:
dps[(i, i)] = 0.0
for k in range(len(vecs[i])):
dps[(i,i)] += torch.mul(vecs[i][k], vecs[i][k]).sum().data.cpu()
if (j,j) not in dps:
dps[(j, j)] = 0.0
for k in range(len(vecs[i])):
dps[(j, j)] += torch.mul(vecs[j][k], vecs[j][k]).sum().data.cpu()
c,d = MinNormSolver._min_norm_element_from2(dps[(i,i)], dps[(i,j)], dps[(j,j)])
if d < dmin:
dmin = d
sol = [(i,j),c,d]
return sol, dps

The sol variable is only assigned if d < dmin, what if d >= dmin always, what should the sol variable be?
I assume that it will be sol = [(i,j),c,dmin], is that right?

is the code corresponding to the algorithm in paper ?

Hi, thanks for providing code for your paper

I was wondering, when you do the actual update there:

It looks to me that you are not exactly performing the same thing as what's stated in the paper
https://proceedings.neurips.cc/paper/2018/file/432aca3a1e345e339f35a30c8f65edce-Paper.pdf

On line 2, the "heads" parameters are updated through theta_t = theta_t - eta * grad, but here you do theta_t = theta_t * scale_t * eta * grad
is that correct ?

best

antoine

best config for MultiMNIST?

Could you please publish the best config for MultiMNIST experiments? I tried with the following but only got ~70% testing accuracy for both tasks, which is much lower than what's reported in your paper.

{
    "optimizer": "SGD",
    "batch_size": 256,
    "lr": 0.0005,
    "dataset": "mnist",
    "tasks": ["L", "R"],
    "normalization_type": "none",
    "algorithm": "mgda",
    "use_approximation": true,
    "scales": {"L":0.5, "R":0.5}
}

Thanks!

depth loss and metric

Hi, @ozansener, I find that there may be some mismatch among the depth loss, metric and reported results in the paper. In the depth loss, those pixels with <0 loss are excluded, but in the depth metric, pixels with <250 target depth (normalized by mean and std) are all included, So I think they include all the pixels, maybe this is a bug because 250 should be used for segmentation but not depth regression. Also, in Tab. 4 of your paper w.r.t disparity error, are they the direct output of the depth branch of the model in your code? Thanks.

PS: For evaluating the depth regression performance, why should the gt and pred be cast as int?https://github.com/intel-isl/MultiObjectiveOptimization/blob/5d8e8343c56cf3184081f71cc4a661af64e7cf7e/multi_task/metrics.py#L46-L54

params.json

how can i get the file named params.json?

About Cityscapes Instance Segmentation GroundTruth Generation.

Hi Ozen,
I notice that your function encoding_instancemap in cityscapes_loaders.py is strange.
The ignore pixel in variable ins is set to self.ignore_index, as default 250. But those pixel is not instance and 250 are in the range of x_map (I sample a image and calculate it. x_map max is 1038, min is -800, 250 is in the range)
Is it right? or it will be set to (0, 0) for y_map and x_map respectively?

Redundancy of optimiser.zero_grad()?

A question about the output of rep

Hello, my backbone has two output. For example, q0, q4 are both reps that will be used in one task head. I don't know how to adjust the code. Could you give me some advices? thanks!

Cityscapes experiment

Hi,
Thanks for open sourcing the code, this is great!
Could you share your json parameter file for cityscapes?
Also, I think it is is missing the file depth_mean.npy to be able to run it.

Thanks.

KeyError with the tensorboardX

Hello, can anybody tell me how to fix this issue? Is this a version inconsistency with the tensorboardX?
Cuz I am exactly run the code offered on the github without changing anything.
Appreciate in advance.


截屏2020-04-19 16 06 50

Classification Task on different datasets.

Hi, if I use different datasets to each task. Does this algorithm still works? e.g. Three classification tasks on datasets MNIST, ImageNet, Celebra, using multi objective optimization method proposed in your paper. Can I get better performances both on these three classification tasks?

Question about the MinNormSolver

Thanks for your code!

Here is my question.

According to the following code:

# Compute gradients of each loss function wrt z
for t in tasks:
    optimizer.zero_grad()
    out_t, masks[t] = model[t](rep_variable, None)
    loss = loss_fn[t](out_t, labels[t])
    loss_data[t] = loss.item()
    loss.backward()
    grads[t] = []
    if list_rep:
        grads[t].append(Variable(rep_variable[0].grad.data.clone(), requires_grad=False))
        rep_variable[0].grad.data.zero_()
    else:
        grads[t].append(Variable(rep_variable.grad.data.clone(), requires_grad=False))
        rep_variable.grad.data.zero_()

The grads is in the following format:

grads = {'Task_1': [grad_1], 'Task_2': [grad_2]}

where grad_1 and grad_2 are tensors of shape [batch_size, N]

Then the argument vecs of MinNormSolver.find_min_norm_element is in the following format:

vecs = [[grad_1], [grad_2]]

In this way, the vecs[i][k] is not a 1D vector and will lead to error.

    def _min_norm_2d(vecs, dps):
        """
        Find the minimum norm solution as combination of two points
        This is correct only in 2D
        ie. min_c |\sum c_i x_i|_2^2 st. \sum c_i = 1 , 1 >= c_1 >= 0 for all i, c_i + c_j = 1.0 for some i, j
        """
        dmin = 1e8
        for i in range(len(vecs)):
            for j in range(i+1,len(vecs)):
                if (i,j) not in dps:
                    dps[(i, j)] = 0.0
                    for k in range(len(vecs[i])):
                        dps[(i,j)] += torch.dot(vecs[i][k], vecs[j][k]).item()
                    dps[(j, i)] = dps[(i, j)]
                if (i,i) not in dps:
                    dps[(i, i)] = 0.0
                    for k in range(len(vecs[i])):
                        dps[(i,i)] += torch.dot(vecs[i][k], vecs[i][k]).item()
                if (j,j) not in dps:
                    dps[(j, j)] = 0.0   
                    for k in range(len(vecs[i])):
                        dps[(j, j)] += torch.dot(vecs[j][k], vecs[j][k]).item()
                c,d = MinNormSolver._min_norm_element_from2(dps[(i,i)], dps[(i,j)], dps[(j,j)])
                if d < dmin:
                    dmin = d
                    sol = [(i,j),c,d]
        return sol, dps

Unstable GPU usage while Training on Cityscapes

Hi,

Thanks for open-sourcing the awesome code.

I am using PyTorch 3.6 to run the equal-weights setup(without using mgda) on Cityscapes with a single GPU, but the GPU usage is extremely unstable and keeps shifting between 0 - 99. Do you know what could cause that?

Thanks

instance segmentation

During the inference, i predict instance vectors for x/y coordinates,
How can i get the final instance output?

The cityscapes training time is so long

Thanks for your amazing code. I want to reproduce the cityscapes result.

I use one P40 card and run the training code. However, it doesn't finish one epoch in 4 hours.

I want to know if there are some problems and how long did you train the net in cityscapes dataset.

Thank you very much.

Connection between fmp and task branches

May I ask what should be the correct dimension of grads?
Suppose the output feature map size is 100, and the batch_size is 512, should the rep and grads be of size [512, 100]? And how do you deal with the results of batch_size in the following calculation? Do you average the batchsize results into single result?
In my experiments, when I compute the _min_norm_2d, the input vec is like [[tensor1], [tensor2], [tensor3]], and each tensor is of size 512x100. Therefore, the issue occurs with dps[(i,j)] += torch.dot(vecs[i][k], vecs[j][k]).data[0], RuntimeError: 1D tensors expected, got 2D

Unable to obtain the same average error as Table 1 in the paper

The average error on celeba is 8.25 in the paper, but my result is 10.14, which is obtained by directly using the codes in this repo. I list the configurations I used below, is there any problem?

in configs.json

    "celeba": {
        "path": "data/celeba",
        "all_tasks": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
                      "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39"],
        "img_rows": 64,
        "img_cols": 64
    }

in sample.json

{
    "optimizer": "Adam", 
    "batch_size": 128, 
    "lr": 0.0005,
    "dataset": "celeba", 
    "tasks": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
                  "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39"],
    "normalization_type": "loss+",
    "algorithm": "mgda",
    "use_approximation": true,
    "scales": {"0":0.025, "1":0.025, "2":0.025, "3":0.025, "4":0.025, "5":0.025, "6":0.025, "7":0.025, "8":0.025, "9":0.025, "10":0.025, 
               "11":0.025, "12":0.025, "13":0.025, "14":0.025, "15":0.025, "16":0.025, "17":0.025, "18":0.025, "19":0.025, "20":0.025, 
               "21":0.025, "22":0.025, "23":0.025, "24":0.025, "25":0.025, "26":0.025, "27":0.025, "28":0.025, "29":0.025, "30":0.025, 
               "31":0.025, "32":0.025, "33":0.025, "34":0.025, "35":0.025, "36":0.025, "37":0.025, "38":0.025, "39":0.025}
}

and the learning rate is multiplied by 0.85 every 10 epochs:
    if (epoch+1) % 10 == 0:
        # Every 50 epoch, half the LR
        for param_group in optimizer.param_groups:
            param_group['lr'] *= 0.85
        logger.info('Multiply the learning rate by {} [{} steps]'.format(0.85, n_iter))

Unable to obtain the same average error as Table 1 in the paper

The average error on celeba is 8.25 in the paper, but my result is 10.14, which is obtained by directly using the codes in this repo. I list the configurations I used below, is there any problem?

in configs.json

    "celeba": {
        "path": "data/celeba",
        "all_tasks": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
                      "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39"],
        "img_rows": 64,
        "img_cols": 64
    }

in sample.json

{
    "optimizer": "Adam", 
    "batch_size": 256, 
    "lr": 0.0005,
    "dataset": "celeba", 
    "tasks": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
                  "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39"],
    "normalization_type": "loss+",
    "algorithm": "mgda",
    "use_approximation": true,
    "scales": {"0":0.025, "1":0.025, "2":0.025, "3":0.025, "4":0.025, "5":0.025, "6":0.025, "7":0.025, "8":0.025, "9":0.025, "10":0.025, 
               "11":0.025, "12":0.025, "13":0.025, "14":0.025, "15":0.025, "16":0.025, "17":0.025, "18":0.025, "19":0.025, "20":0.025, 
               "21":0.025, "22":0.025, "23":0.025, "24":0.025, "25":0.025, "26":0.025, "27":0.025, "28":0.025, "29":0.025, "30":0.025, 
               "31":0.025, "32":0.025, "33":0.025, "34":0.025, "35":0.025, "36":0.025, "37":0.025, "38":0.025, "39":0.025}
}

and the learning rate is multiplied by 0.85 every 10 epochs:
    if (epoch+1) % 10 == 0:
        # Every 50 epoch, half the LR
        for param_group in optimizer.param_groups:
            param_group['lr'] *= 0.85
        logger.info('Multiply the learning rate by {} [{} steps]'.format(0.85, n_iter))

and I train the model on two GPUs.

sample.json for Multi Mnist?

Hi. thanks for sharing.
what is the structure for sample.json and config.json when I want to use code for multiMnist dataset?

Get test accuracy for MultiMNIST

Hi,

I'm wondering how I can get the test accuracy for MultiMNIST. Right now, it seems to me that the metric results logged at each epoch are the test accuracies. Can you clarify how you obtain the test accuracy numbers in the paper? Thanks!

How is the derivative wrt. representations computed in a single backward pass?

First, thank you for the nice work. If you don't mind, I have a question:

You mention right after defining the upper bound (Eq. 6) that the derivative wrt. representations, i.e. \nabla_Z L^t, can be computed in a single backward pass for all tasks. But all tasks are sharing the same representation nodes, so how is this possible? Wouldn't this also apply then to the original derivative wrt. to the shared parameters?

Depth loss

There is a bug in the depth loss.
All entries smaller than zero are masked. This causes most of the entries to be excluded in the loss, as the depth map has been normalized. The correct way would be to mask only the zero entries in the loss.

different sizes of gradient

Hi, thanks for sharing your code!

I have one question about the size of gradient. I found that in the current code, only 2D gradient is supported. But if the parameter has 4 dimensional, such as (batch-size, N, width, height), how should I modify the min-norm solver ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.