Giter Club home page Giter Club logo

cvpr2021_plop's Introduction

PLOP: Learning without Forgetting for Continual Semantic Segmentation

Paper Conference Youtube

Vizualization on VOC 15-1

This repository contains all of our code. It is a modified version of Cermelli et al.'s repository.

@inproceedings{douillard2021plop,
  title={PLOP: Learning without Forgetting for Continual Semantic Segmentation},
  authors={Douillard, Arthur and Chen, Yifu and Dapogny, Arnaud and Cord, Matthieu},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

Requirements

You need to install the following libraries:

  • Python (3.6)
  • Pytorch (1.8.1+cu102)
  • torchvision (0.9.1+cu102)
  • tensorboardX (1.8)
  • apex (0.1)
  • matplotlib (3.3.1)
  • numpy (1.17.2)
  • inplace-abn (1.0.7)

Note also that apex seems to only work with some CUDA versions, therefore try to install Pytorch (and torchvision) with the 10.2 CUDA version. You'll probably need anaconda instead of pip in that case, sorry! Do:

conda install -y pytorch torchvision cudatoolkit=10.2 -c pytorch
cd apex
pip3 install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Note that while the code should be runnable without mixed precision (apex), some have reported lower perfs without it. So try with it!

Dataset

Two scripts are available to download ADE20k and Pascal-VOC 2012, please see in the data folder. For Cityscapes, you need to do it yourself, because you have to ask "permission" to the holders; but be reassured, it's only a formality, you can get the link in a few days by mail.

Performance on VOC

How to perform training

The most important file is run.py, that is in charge to start the training or test procedure. To run it, simpy use the following command:

python -m torch.distributed.launch --nproc_per_node=<num_GPUs> run.py --data_root <data_folder> --name <exp_name> .. other args ..

The default is to use a pretraining for the backbone used, that is searched in the pretrained folder of the project. We used the pretrained model released by the authors of In-place ABN (as said in the paper), that can be found here: link. I've also upload those weights there: link.

Since the pretrained are made on multiple-gpus, they contain a prefix "module." in each key of the network. Please, be sure to remove them to be compatible with this code (simply rename them using key = key[7:]) (if you're working on single gpu). If you don't want to use pretrained, please use --no-pretrained.

There are many options (you can see them all by using --help option), but we arranged the code to being straightforward to test the reported methods. Leaving all the default parameters, you can replicate the experiments by setting the following options.

  • please specify the data folder using: --data_root <data_root>
  • dataset: --dataset voc (Pascal-VOC 2012) | ade (ADE20K)
  • task: --task <task>, where tasks are
    • 15-5, 15-5s, 19-1 (VOC), 100-50, 100-10, 50, 100-50b, 100-10b, 50b (ADE, b indicates the order)
  • step (each step is run separately): --step <N>, where N is the step number, starting from 0
  • (only for Pascal-VOC) disjoint is default setup, to enable overlapped: --overlapped
  • learning rate: --lr 0.01 (for step 0) | 0.001 (for step > 0)
  • batch size: --batch_size <24/num_GPUs>
  • epochs: --epochs 30 (Pascal-VOC 2012) | 60 (ADE20K)
  • method: --method <method name>, where names are
    • FT, LWF, LWF-MC, ILT, EWC, RW, PI, MIB

For all details please follow the information provided using the help option.

Example commands

LwF on the 100-50 setting of ADE20K, step 0:

python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset ade --name LWF --task 100-50 --step 0 --lr 0.01 --epochs 60 --method LWF

MIB on the 50b setting of ADE20K, step 2:

python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset ade --name MIB --task 100-50 --step 2 --lr 0.001 --epochs 60 --method MIB

LWF-MC on 15-5 disjoint setting of VOC, step 1:

python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name LWF-MC --task 15-5 --step 1 --lr 0.001 --epochs 30 --method LWF-MC

PLOP on 15-1 overlapped setting of VOC, step 1:

python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name PLOP --task 15-5s --overlapped --step 1 --lr 0.001 --epochs 30 --method FT --pod local --pod_factor 0.01 --pod_logits --pseudo entropy --threshold 0.001 --classif_adaptive_factor --init_balanced --pod_options "{"switch": {"after": {"extra_channels": "sum", "factor": 0.0005, "type": "local"}}}"

Once you trained the model, you can see the result on tensorboard (we perform the test after the whole training) or you can test it by using the same script and parameters but using the command

--test

that will skip all the training procedure and test the model on test data.

Or more simply you can use one of the provided script that will launch every step of a continual training.

For example, do

bash scripts/voc/plop_15-1.sh

Note that you will need to modify those scripts to include the path where your data.

cvpr2021_plop's People

Contributors

arthurdouillard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cvpr2021_plop's Issues

potential bug in datasets.utils (won't change behavior)

Describe the bug
potential bug (won't change behavior)

image

should it be the "c" variable in the two red rectangles? though "cls" variable won't cause any bugs since the lambda can capture local variables, but this could be confused to some other people.

Reproduce 15-1 setup on Pascal VOC

Describe the bug
I tried to run the provided pascal VOC script using Apex optimization 01 and everything same as script except i was using a single GPU and hence changed the batch size to 24.
But I got the following results

1-15 16-20 all
Paper 65.12 21.11 54.64
Code results 58.73 21.6 49.7

To Reproduce
start=date +%s`

START_DATE=$(date '+%Y-%m-%d')

PORT=$((9000 + RANDOM % 1000))
GPU=0
NB_GPU=1
DATA_ROOT=./data
DATASET=voc
TASK=15-5s
NAME=PLOP
METHOD=PLOP
BATCH_SIZE=24
INITIAL_EPOCHS=30
EPOCHS=30
OPTIONS="--checkpoint checkpoints/step/"

RESULTSFILE=results/${START_DATE}${DATASET}${TASK}_${NAME}.csv
rm -f ${RESULTSFILE}

CUDA_VISIBLE_DEVICES=${GPU} python3 -m torch.distributed.launch --master_port ${PORT} --nproc_per_node=${NB_GPU} run.py --date ${START_DATE} --data_root ${DATA_ROOT} --overlap --batch_size ${BATCH_SIZE} --dataset ${DATASET} --name ${NAME} --task ${TASK} --step 0 --lr 0.01 --epochs ${INITIAL_EPOCHS} --method ${METHOD} --opt_level O1 ${OPTIONS}
for step in 1 2 3 4 5
do
CUDA_VISIBLE_DEVICES=${GPU} python3 -m torch.distributed.launch --master_port ${PORT} --nproc_per_node=${NB_GPU} run.py --date ${START_DATE} --data_root ${DATA_ROOT} --overlap --batch_size ${BATCH_SIZE} --dataset ${DATASET} --name ${NAME} --task ${TASK} --step ${step} --lr 0.001 --epochs ${EPOCHS} --method ${METHOD} --opt_level O1 ${OPTIONS}
done
python3 average_csv.py ${RESULTSFILE}`

Experiments on cityscapes dataset

Hello @arthurdouillard, I enjoyed reading your paper. I have a few queries regarding the implementation using the cityscapes dataset.

  1. I am reproducing the experiments for plop_cityscapesDomain_1-1.sh .
    I see that the method is specified as FT and not PLOP. I am unable to understand why that is the case here? Why is the method not PLOP?
  2. There is no explicit background class in the cityscapes dataset. How is that dealt with in plop implementation? It may not be an issue in domain-incremental but will be required in class-incremental experiments. I see one argument 'dont_predict_bg' but that hasn't been used.

Let me know if you can clarify these doubts. Thanks.

Ade20k results - which splits was used

Hi, thank you for providing the code, I have a few questions regarding the reported results in the paper.

I noticed that both PLOP and your MIB implementation significantly out-perform the original MIB implementation on the old classes and what you only provide the Ade20k weights trained on classes 1-100.

So my questions is, which splits did you average when calculating the Ade20k results. Both the original order (a) and the random order (b)?
I also had a question regarding the setting for Ade20k, is it based on the overlapped setting, the (pseudo-)disjoint setting of MIB or some other setting?

Thanks for any clarification you can give,

Best regards

Results about ADE100-50

After I train ade100-50 according to your .sh, the results are as follows:
Last Step: 1
Final Mean IoU 29.67
Average Mean IoU 29.67
Mean IoU first 0.0
Mean IoU last 0.0
ade_100-50_PLOP On GPUs 0,1
Run in 61645s

And I want to know why, thank you!

Loss is not converge

While I am reimplementing your code with your setting given in the scripts folder, I found the results are a bit lower than the paper results(2%-5%). When I check the tensorboard for the loss, I found that from step 1, the loss is not converging and some of them are NaN.
image

Have you ever run into this problem?

Paper reproduction

Hello, I have run your code model on my server. I know your code has completed the training and prediction, but I can't find the prediction picture in the file. Only the predicted score, my deep learning ability is limited, so I ask you for help on how to get the segmented image in the paper. Thank you.
My email: [email protected]

Model weights become NaN in step 1 on VOC

Hi,
Thanks for your contribution.
I have a problem when training PLOP on VOC dataset with setting 15-5.
After successfully training the model at step 0, I trained the model at the next step.
The model becomes NaN after few training iterations even at the first epoch.

Since the model M0 can be trained without any problem, I doubt that the distilling the knowledge of M0 to M1 might lead to a divergence problem for M1.

Following the papers, I used lr=0.01 for M0 and lr=0.001 for M1.
Here is the setting I used.
Step 0:
python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root /media/hieu/DATA/semantic_segmentation/PascalVOC12 --batch_size 12 --dataset voc --name PLOP --task 15-5 --overlap --step 0 --lr 0.01 --epochs 30 --method FT --pod local --pod_factor 0.01 --pod_logits --pseudo entropy --threshold 0.001 --classif_adaptive_factor --init_balanced

Step 1:
python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root /media/hieu/DATA/semantic_segmentation/PascalVOC12 --batch_size 12 --dataset voc --name PLOP --task 15-5 --overlap --step 1 --lr 0.001 --epochs 30 --method FT --pod local --pod_factor 0.01 --pod_logits --pseudo entropy --threshold 0.001 --classif_adaptive_factor --init_balanced.

Filtering images

Hi,

I'm looking at your code to filter_images in
utils.py. I see that the arguments to the lambda function are not used at all.

fil = lambda c: any(x in labels for x in cls)

and I think the filtering codes can be simplified to np.intersect1d, right?

Is this correct?

Clarification regarding domain shift experiments on Cityscapes

Hi @arthurdouillard, I really enjoyed reading your work! Thanks for bringing in the domain shift aspect of CSS. I have the following doubts in the implementation of ILT, MiB and PLOP for the domain shift experiments on Cityscapes (Table 5):

  1. wrt PLOP: I'm assuming the pseudo labeling will not be applicable in these experiments as the label spaces are fixed in the domain incremental scenario. So do I just use the distillation loss along with regular cross-entropy? Is my understanding correct wrt using PLOP in a domain IL scenario?
  2. MiB modifies distillation and cross-entropy to tackle the background class shift issue. Since there is no such issue in the domain incremental scenario, doesn't their method get reduced to ILT (basically LwF)? I'm confused as to why there is a difference in the performance (For e.g. 59% for ILT and 61.5% for MiB in the 11-5 case).

Also, is it possible to share the joint model (traditional segmentation model) mIoU you get for Cityscapes on DeeplabV3, ResNet101? (I couldn't find this in the paper and wanted to see the drop wrt the joint one).

About apex=0.1

I config the same pytorch, torchvision and cuda version with you, however, I can not install apex=0.1. Can you tell you how to install it in detail? Thank you!

TensorFlow version?

Very interesting work! Some of the ideas you propose seem perfect for our applications and we are very interested in testing them :]

However, we are mostly using TF-Keras for ML, so I was wondering if you had an alternative implementation in this framework, or if anyone is trying to reimplement this code to this framework?

Question regarding the validation process to pick the best model

Hi, I find that the validation is done every epoch on the test set with only the current novel class annotations. However, it seems like the model will always be saved regardless of whether the validation mIoU gets better or not. That makes the validation totally in vain. Do you notice this problem? I think it was a legacy issue from MiB's code. Looking forward to your reply. Thanks.

predict and visualization

Describe the bug
Hi , I have one question about how to predict a picture when finish the training stage, in other words, how to visualize the segmentation results in your code.
I have written a simple visualization .py file, but when I load the model , I found that the model did not load successfully.
If it's convenient, I would appreciate it if you could share your predict code.
We look forward to your answer, thanks

To Reproduce

Dataset: voc-2012
Setting: 19-1
Command used or script used:

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Try to reproduce voc 10-1 results

I am trying to reproduce 10-1 results as shown in the table below. I notice a large gap of the old class mIoU between my reproduce result (38.82) and your reported one (44.03), roughly 5 percent points. I am wondering what will cause this problem. I run the experiments with 2 x RTX 3090 GPU. I follow your original implementation except for the cuda version. I am using cuda 11.3 because cuda 10.2 does not support RTX 3090. Does it matter?

Btw, may I know what GPU model do you use? I think it requires to have at least 16G to hold a batch of 12 on each device and needs to support cuda 10.2 as well. V100? I guess.

Meanwhile, I notice a weird phenomenon that background performance drops drastically starting from the 8-th step and becomes 0 at 9-th step. I think this harms the old class performance a lot. Do you have a similar issue?

Thanks.

step background aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor 0-10 11-20 all
0 95.19% 89.84% 41.01% 89.74% 72.24% 85.02% 95.29% 88.26% 93.30% 43.15% 92.25% x x x x x x x x x x 80.48% - 80.48%
1 92.01% 87.25% 35.76% 82.88% 67.51% 79.44% 94.49% 82.55% 88.44% 46.25% 90.74% 37.38% x x x x x x x x x 77.03% 37.38% 73.73%
2 89.16% 84.00% 34.32% 83.28% 63.44% 68.45% 92.90% 82.54% 82.37% 31.68% 84.48% 24.68% 62.59% x x x x x x x x 72.42% 43.64% 67.99%
3 83.97% 79.07% 31.92% 73.50% 53.19% 51.06% 91.18% 80.90% 76.82% 17.40% 65.26% 18.69% 3.93% 22.11% x x x x x x x 64.02% 14.91% 53.50%
4 79.89% 77.61% 35.83% 74.77% 47.67% 49.70% 90.73% 78.02% 79.41% 13.21% 65.81% 17.77% 9.43% 20.52% 33.81% x x x x x x 62.97% 20.38% 51.61%
5 84.56% 84.34% 37.88% 81.72% 53.81% 56.75% 90.72% 82.48% 82.98% 13.77% 69.20% 0.57% 0.00% 26.10% 58.37% 69.71% x x x x x 67.11% 30.95% 55.81%
6 82.18% 80.07% 37.93% 69.65% 49.38% 55.32% 89.74% 81.63% 77.15% 14.10% 61.90% 0.48% 0.00% 23.42% 51.28% 64.77% 5.04% x x x x 63.55% 24.16% 49.65%
7 80.42% 77.61% 36.88% 50.15% 42.00% 53.55% 75.10% 80.31% 74.55% 10.30% 20.94% 0.05% 0.00% 24.42% 50.90% 57.63% 0.00% 19.33% x x x 54.71% 21.76% 41.90%
8 44.45% 71.53% 35.75% 51.99% 44.17% 50.20% 80.98% 76.45% 69.19% 29.32% 40.46% 0.18% 0.00% 22.80% 51.03% 71.25% 0.00% 15.88% 3.26% x x 54.04% 20.55% 39.94%
9 0.55% 61.80% 35.53% 51.15% 44.62% 50.52% 69.98% 77.75% 57.82% 9.92% 16.76% 0.05% 0.00% 23.56% 46.89% 62.95% 0.00% 7.37% 1.49% 2.27% x 43.31% 16.06% 31.05%
10 0.00% 50.48% 32.40% 38.14% 40.64% 51.90% 62.86% 69.76% 56.53% 17.47% 6.83% 0.02% 0.00% 23.25% 53.95% 66.92% 0.00% 1.73% 1.67% 0.04% 2.60% 38.82% 15.02% 27.49%

Reproduce ADE20k 50-50

Hi, I have trouble reproducing results for ADE20k 50-50.
Could you please share with me the script?

don't use apex

Describe the bug
A clear and concise description of what the bug is.
Hello,because my server is 3090 and does not support CUDA10.2, how can I change the code if I don't use APEX?
To Reproduce

Dataset: ...
Setting: ...
Command used or script used: ...

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Hello, does this model support training on a single GPU

Describe the bug
A clear and concise description of what the bug is.
ERROR:torch.distributed.elastic.multiprocessing.api:failed
To Reproduce

Dataset: VOC2012
Setting: ...
Command used or script used:
I tried two methods:

  1. python -m torch.distributed.launch --nproc_per_node=1 run.py --data_root /home/before_dawn/Code/Dataset/VOCtrainval_11-May-2012/VOCdevkit/VOC2012 --batch_size 12 --dataset voc --name PLOP --task 15-5s --overlap --step 1 --lr 0.001 --epochs 30 --method FT --pod local --pod_factor 0.01 --pod_logits --pseudo entropy --threshold 0.001 --classif_adaptive_factor --init_balanced --pod_options "{"switch": {"after": {"extra_channels": "sum", "factor": 0.0005, "type": "local"}}}"
  2. python run.py --data_root /home/before_dawn/Code/Dataset/VOCtrainval_11-May-2012/VOCdevkit/VOC2012 --batch_size 12 --dataset voc --name PLOP --task 15-5s --overlap --step 1 --lr 0.001 --epochs 30 --method FT --pod local --pod_factor 0.01 --pod_logits --pseudo entropy --threshold 0.001 --classif_adaptive_factor --init_balanced --pod_options "{"switch": {"after": {"extra_channels": "sum", "factor": 0.0005, "type": "local"}}}"
    Expected behavior
    A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.
Does this model support training on a single GPU.

Question about validation data

Hello,
I have two questions about the validation data,

  1. why the val_dst is using labels=list(labels),labels_old=list(labels_old), instead of the same as test_dst using labels=list(labels_cum)?
  2. Why always save best model at the last iteration, how can we know the last iteration is always the best model?
    if rank == 0: # save best model at the last iteration

NaNs on PLOP 15-5 setting on single GPU

Hi,
I think i found a bug in your code. As others pointed out allready: You can get NaN values in your Network if you train on PLOP step 1. You can fix this problem if you use --opt_level 01. However, I wanted to edit apex out of your Code and thats when the NaNs become a real problem. I dont exactly know why, but apex seems to just skip these NaN values and carry on with training.

The problem can be found when computing the classif_adaptive_factor in train.py. It is computed as
classif_adaptive_factor = num / den
On rare occasion (if the picture just consists of one class) the den value can become zero as it is computed from mask_background which is set to:
mask_background = labels < self.old_classes
den = mask_background.float().sum(dim=(1, 2))

The issue can easily be solved by just modifiyng the den value right before computing the classif_adaptive_factor for example via
for i in range(0, opts.batch_size):
if den[i] == 0:
den[i] = 1

The solution is practical but isnt that satisfying. Maybe you can look into this particular section again and fix the issue. And I'd be really interested if someone has an idea why apex skips these NaNs.

Kind regards,
Jonas

Dataset: voc
Setting: 15-5 disjoint
Script: (I didnt use torch.distributed)
python run.py --data_root data --batch_size 8 --dataset voc --name PLOP --task 15-5 --step 0 --lr 0.01 --epochs 30 --method PLOP
python run.py --data_root data --batch_size 8 --dataset voc --name PLOP --task 15-5 --step 1 --lr 0.001 --epochs 30 --method PLOP

Question about how to install inplace-abn

Thanks for your great work!
How do I install the corresponding version of Inplace-ABN on Windows?
'Pip install inplace-abn' doesn't work.Neither does 'git clone' the package and then 'python setup.py install'.
I look forward to your reply.

Reproduce problem on 15-1

I reproduce 15-1 setting
the result is:
  | 1-15 | 16-20 | mIoU | all
mib | 35.1 | 13.5 | 29.7 | -
PLOP(reproduce train) | 64.94 | 20.77 | 53.90 | 55.16
PLOP(reproduce test) | 65.01 | 16.85 | 52.97 | 54.11
PLOP(paper) | 65.12 | 21.11 | 54.64 | 67.21
it is reasonable for train but for add --test the result drops too much.

About pretrained model

As you write in readme "Since the pretrained are made on multiple-gpus, they contain a prefix "module." in each key of the network. Please, be sure to remove them to be compatible with this code (simply rename them using key = key[7:]) (if you're working on single gpu). "

If I use 2 gpus, i.e., --nproc_per_node=2, do I still need to modify the pretrained network?

Reproducing the experiments on Cityscapes

We cannot reproduce the performance on Cityscapes. What are your training details? We get the last mIoU is 49.73 on 11-1 task.
We set the initial learning rate as 0.01 and run 60 epochs. And we set the later learning rate as 0.001 and run 60 epochs.
The batch size is set to 24.

Cityscapes Dataset Preparation

I'm struggling to reproduce results on CityScapes dataset like others already reported the issue.
I found that in your paper you mentioned that with domain-incremental setting on CityScapes datasets you have used images from 21 cities in a particular order. However, after checking the dataset downloaded from www.cityscapes-dataset.com, i only can se 18 cities in train directory.

Now my question is as followes:

  1. Where do you get images from those three extra cities, i.e. frankfurt, lindau, and munster.
  2. Could you please share more details on preparing CityScapes dataset

Thanks for your time.

Trying to reproduce cityscapes domain results

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

Dataset: City-scapes domain
Setting: 1-1 task
Command used or script used: I cloned the github repository and I was able to reproduce the results for FT(fine-tuning) method using the scripts. When I tried using PLOP method, after the task 0, I am getting the following error.
image

How to reproduce using own dataset

I'm going to use the author's code to replicate my own dataset, can you tell me how to generate the training.npy file in the image below?
image
Or can you say how to modify it to train own dataset? Thank you!

Asking for fine details about ADE.

Hello,
I wonder if you can provide me with the performance of your model on each part(1-100, 101-110, 111-120...) about ADE100-10, (1-50, 51-100, 101-150) about ADE50-50, hope you can help me. Thanks a lot.

About visualiztion on ADE20K

Describe the bug
Hi, I have run the code on 50-50-50 setting on ade20k.
I want to know how to visualize on the verification set. If you have the corresponding code, can you provide it?

Thank you for your work and look forward to your reply

To Reproduce

Dataset: ADE20K
Setting: 50-50-50

can't reproduce Ade20k and don't know the detail of Cityscapes.

Hi, sorry to disturb but i have a few questions as followed.

  1. Maybe you have changed some details about the experiments over Cityscapes, since it seems like MiB cannot work on the Cityscapes because there are only GT without pseudo labels. I'd appreciate it if you can tell me what the exact details you have changed over these experiments.
  2. I cannot reproduce the performance that you mentioned in your paper about the Ade20k, having seen that you had reproduced the performance of your method over Cityscapes in another issue, I hope that you can reproduce it over Ade20k too and have a further discussion with me.
    Wish you good luck.

Extremely hard incremental scenario

Thank you for your great work, PLOP!

As your code, we can reproduce the performance almost the same as your paper.

Additionally, we also conduct experiments on extremely hard incremental scenarios, such as 5-1 (16 steps) and 2-1 (19 steps).

for this, we add below lines on task.py

"5-1": {
        0 : [0, 1, 2, 3, 4, 5],
        1 : [6, ],
        2 : [7, ],
        3 : [8, ],
        4 : [9, ],
        5 : [10, ],
        6 : [11, ],
        7 : [12, ],
        8 : [13, ],
        9 : [14, ],
        10: [15, ],
        11: [16, ],
        12: [17, ],
        13: [18, ],
        14: [19, ],
        15: [20, ],
    },   
    "2-1":{
        0 : [0, 1, 2],
        1 : [3, ],
        2 : [4, ],
        3 : [5, ],
        4 : [6, ],
        5 : [7, ],
        6 : [8, ],
        7 : [9, ],
        8 : [10, ],
        9 : [11, ],
        10: [12, ],
        11: [13, ],
        12: [14, ],
        13: [15, ],
        14: [16, ],
        15: [17, ],
        16: [18, ],
        17: [19, ],
        18: [20, ],
    },   

However, during the training, the loss is divergence to nan.

I already noticed that someone suffers from the loss divergence issue #8 on 15-5 task, however, I can reproduce the performance on 15-5 task in my environmental settings.
Also, MiB on these extremely hard scenarios was well trained without the loss divergence, however, PLOP showed the issue.

Therefore, I wonder you also have the same issue in the extremely hard scenarios, 5-1 and 2-1.
And, please tell me how can I solve that issue (e.g., which hyperparameter should be changed).

Thanks.

环境报错

请问你的CUDA版本是多岁,readme中torch1.7.1h和torchvision0.4.0不匹配,我运行一直报错。

results of voc-15-5

Hi,

I run your code on voc-15-5, and get:
Final Mean IoU 69.35 Average Mean IoU 74.84 Mean IoU first 75.25 Mean IoU last 50.47

however, the paper stated is:
75.73 51.71 70.09 75.19

why the results are different and poor than the paper stated?

ade 100-10 reproduce

i used the script you gave and i only got 24.xx final mIoU, i don't know if it is related to specific seed?

by the way, due to the limitation of gpu memory, i used 4 gpus with 4 batch each gpu, which was equal to total batch size of 24, and the bn is iabc_sync, i thought that 4 gpus is not a issue?

any idea how i can reproduce your ade 100-10 results?

Asking for the data of ilt, mib and plop.

Hello,
Hope it'll not disturb you but I'm wondering if you can offer me the data(mIoU) of each step about ILT, MiB and PLOP on 15-5s as well as cityscapes? It's wonderful to follow your impressive work, but I still have to do some analysis over these data.

The weight file is corrupted

I click the link and it shows that the weight file is damaged. What is the reason? Can I upload it again? I am working on my graduation project and looking forward to your reply. Thank you!
Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.