lhoyer / hrda Goto Github PK

View Code? Open in Web Editor NEW

232.0 4.0 31.0 3.33 MB

[ECCV22] Official Implementation of HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation

License: Other

Python 99.57% Shell 0.43%

attention high-resolution multi-resolution semantic-segmentation transformer unsupervised-domain-adaptation

hrda's Introduction

HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation

by Lukas Hoyer, Dengxin Dai, and Luc Van Gool

[ECCV22 Paper] [Extension Paper]

🔔 News:

[2024-07-03] We are happy to announce that our work SemiVL on semi-supervised semantic segmentation with vision-language guidance was accepted at ECCV24.
[2024-07-03] We are happy to announce that our follow-up work DGInStyle on image diffusion for domain-generalizable semantic segmentation was accepted at ECCV24.
[2023-09-26] We are happy to announce that our Extension Paper on domain generalization and clear-to-adverse-weather UDA was accapted at PAMI.
[2023-08-25] We are happy to announce that our follow-up work EDAPS on panoptic segmentation UDA was accepted at ICCV23.
[2023-04-27] We further extend HRDA to domain generalization and clear-to-adverse-weather UDA in the Extension Paper.
[2023-02-28] We are happy to announce that our follow-up work MIC on context-enhanced UDA was accepted at CVPR23.
[2022-07-05] We are happy to announce that HRDA was accepted at ECCV22.

Overview

Unsupervised domain adaptation (UDA) aims to adapt a model trained on synthetic data to real-world data without requiring expensive annotations of real-world images. As UDA methods for semantic segmentation are usually GPU memory intensive, most previous methods operate only on downscaled images. We question this design as low-resolution predictions often fail to preserve fine details. The alternative of training with random crops of high-resolution images alleviates this problem but falls short in capturing long-range, domain-robust context information.

Therefore, we propose HRDA, a multi-resolution training approach for UDA, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention, while maintaining a manageable GPU memory footprint.

HRDA enables adapting small objects and preserving fine segmentation details. It significantly improves the state-of-the-art performance by 5.5 mIoU for GTA→Cityscapes and by 4.9 mIoU for Synthia→Cityscapes, resulting in an unprecedented performance of 73.8 and 65.8 mIoU, respectively.

The more detailed domain-adaptive semantic segmentation of HRDA, compared to the previous state-of-the-art UDA method DAFormer, can also be observed in example predictions from the Cityscapes validation set.

HRDA.Slider.Demo.mp4

HRDA can be further extended to domain generalization lifting the requirement of access to target images. Also in domain generalization, HRDA significantly improves the state-of-the-art performance by +4.2 mIoU.

For more information on HRDA, please check our [ECCV Paper] and the [Extension Paper].

If you find HRDA useful in your research, please consider citing:

@InProceedings{hoyer2022hrda,
  title={{HRDA}: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation},
  author={Hoyer, Lukas and Dai, Dengxin and Van Gool, Luc},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={372--391},
  year={2022}
}

@Article{hoyer2024domain,
  title={Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation},
  author={Hoyer, Lukas and Dai, Dengxin and Van Gool, Luc},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)}, 
  year={2024},
  volume={46},
  number={1},
  pages={220-235},
  doi={10.1109/TPAMI.2023.3320613}
}

Comparison with SOTA UDA

HRDA significantly outperforms previous works on several UDA benchmarks. This includes synthetic-to-real adaptation on GTA→Cityscapes and Synthia→Cityscapes as well as clear-to-adverse-weather adaptation on Cityscapes→ACDC and Cityscapes→DarkZurich.

	GTA→CS(val)	Synthia→CS(val)	CS→ACDC(test)	CS→DarkZurich(test)
ADVENT [1]	45.5	41.2	32.7	29.7
BDL [2]	48.5	--	37.7	30.8
FDA [3]	50.5	--	45.7	--
DACS [4]	52.1	48.3	--	--
ProDA [5]	57.5	55.5	--	--
MGCDA [6]	--	--	48.7	42.5
DANNet [7]	--	--	50.0	45.2
DAFormer (Ours) [8]	68.3	60.9	55.4*	53.8*
HRDA (Ours)	73.8	65.8	68.0*	55.9*

* New results of our extension paper

References:

Vu et al. "Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation" in CVPR 2019.
Li et al. "Bidirectional learning for domain adaptation of semantic segmentation" in CVPR 2019.
Yang et al. "Fda: Fourier domain adaptation for semantic segmentation" in CVPR 2020.
Tranheden et al. "Dacs: Domain adaptation via crossdomain mixed sampling" in WACV 2021.
Zhang et al. "Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation" in CVPR 2021.
Sakaridis et al. "Map-guided curriculum domain adaptation and uncertaintyaware evaluation for semantic nighttime image segmentation" in TPAMI, 2020.
Wu et al. "DANNet: A one-stage domain adaptation network for unsupervised nighttime semantic segmentation" in CVPR, 2021.
Hoyer et al. "DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation" in CVPR, 2022.

Comparison with SOTA Domain Generalization (DG)

HRDA and DAFormer significantly outperform previous works on domain generalization from GTA to real street scenes.

DG Method	Cityscapes	BDD100K	Mapillary	Avg.
IBN-Net [1,5]	37.37	34.21	36.81	36.13
DRPC [2]	42.53	38.72	38.05	39.77
ISW [3,5]	37.20	33.36	35.57	35.38
SAN-SAW [4]	45.33	41.18	40.77	42.43
SHADE [5]	46.66	43.66	45.50	45.27
DAFormer (Ours) [6]	52.65*	47.89*	54.66*	51.73*
HRDA (Ours)	57.41*	49.11*	61.16*	55.90*

* New results of our extension paper

References:

Pan et al. "Two at once: Enhancing learning and generalization capacities via IBN-Net" in ECCV, 2018.
Yue et al. "Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data" ICCV, 2019.
Choi et al. "RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening" in CVPR, 2021.
Peng et al. "Semantic-aware domain generalized segmentation" in CVPR, 2022.
Zhao et al. "Style-Hallucinated Dual Consistency Learning for Domain Generalized Semantic Segmentation" in ECCV, 2022.
Hoyer et al. "DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation" in CVPR, 2022.

Setup Environment

For this project, we used python 3.8.5. We recommend setting up a new virtual environment:

python -m venv ~/venv/hrda
source ~/venv/hrda/bin/activate

In that environment, the requirements can be installed with:

pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.3.7  # requires the other packages to be installed first

Please, download the MiT-B5 ImageNet weights provided by SegFormer from their OneDrive and put them in the folder pretrained/. Further, download the checkpoint of HRDA on GTA→Cityscapes and extract it to the folder work_dirs/.

Setup Datasets

Cityscapes: Please, download leftImg8bit_trainvaltest.zip and gt_trainvaltest.zip from here and extract them to data/cityscapes.

GTA: Please, download all image and label packages from here and extract them to data/gta.

Synthia (Optional): Please, download SYNTHIA-RAND-CITYSCAPES from here and extract it to data/synthia.

ACDC (Optional): Please, download rgb_anon_trainvaltest.zip and gt_trainval.zip from here and extract them to data/acdc. Further, please restructure the folders from condition/split/sequence/ to split/ using the following commands:

rsync -a data/acdc/rgb_anon/*/train/*/* data/acdc/rgb_anon/train/
rsync -a data/acdc/rgb_anon/*/val/*/* data/acdc/rgb_anon/val/
rsync -a data/acdc/gt/*/train/*/*_labelTrainIds.png data/acdc/gt/train/
rsync -a data/acdc/gt/*/val/*/*_labelTrainIds.png data/acdc/gt/val/

Dark Zurich (Optional): Please, download the Dark_Zurich_train_anon.zip and Dark_Zurich_val_anon.zip from here and extract it to data/dark_zurich.

BDD100K (Optional): Please, download the 10K Images and Segmentation from here and extract it to data/bdd100k.

Mapillary (Optional): Please, download the mapillary-vistas-dataset_public_v1.2.zip from here and extract it to data/mapillary.

The final folder structure should look like this:

HRDA
├── ...
├── data
│   ├── acdc (optional)
│   │   ├── gt
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── rgb_anon
│   │   │   ├── train
│   │   │   ├── val
│   ├── bdd100k (optional)
│   │   ├── images/10k/val
│   │   ├── labels/sem_seg/masks/val
│   ├── cityscapes
│   │   ├── leftImg8bit
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── gtFine
│   │   │   ├── train
│   │   │   ├── val
│   ├── dark_zurich (optional)
│   │   ├── gt
│   │   │   ├── val
│   │   ├── rgb_anon
│   │   │   ├── train
│   │   │   ├── val
│   ├── gta
│   │   ├── images
│   │   ├── labels
│   ├── mapillary (optional)
│   │   ├── validation/images
│   │   ├── validation/labels
│   ├── synthia (optional)
│   │   ├── RGB
│   │   ├── GT
│   │   │   ├── LABELS
├── ...

Data Preprocessing: Finally, please run the following scripts to convert the label IDs to the train IDs and to generate the class index for RCS:

python tools/convert_datasets/gta.py data/gta --nproc 8
python tools/convert_datasets/cityscapes.py data/cityscapes --nproc 8
python tools/convert_datasets/synthia.py data/synthia/ --nproc 8
python tools/convert_datasets/mapillary.py data/mapillary/ --nproc 8

Testing & Predictions

The provided HRDA checkpoint trained on GTA→Cityscapes can be tested on the Cityscapes validation set using:

sh test.sh work_dirs/gtaHR2csHR_hrda_246ef

The predictions are saved for inspection to work_dirs/gtaHR2csHR_hrda_246ef/preds and the mIoU of the model is printed to the console. The provided checkpoint should achieve 73.79 mIoU. Refer to the end of work_dirs/gtaHR2csHR_hrda_246ef/20220215_002056.log for more information such as the class-wise IoU.

If you want to visualize the LR predictions, HR predictions, or scale attentions of HRDA on the validation set, please refer to test.sh for further instructions.

Training

For convenience, we provide an annotated config file of the final HRDA. A training job can be launched using:

python run_experiments.py --config configs/hrda/gtaHR2csHR_hrda.py

The logs and checkpoints are stored in work_dirs/.

For the other experiments in our paper, we use a script to automatically generate and train the configs:

python run_experiments.py --exp <ID>

More information about the available experiments and their assigned IDs, can be found in experiments.py. The generated configs will be stored in configs/generated/.

When evaluating a model trained on Synthia→Cityscapes, please note that the evaluation script calculates the mIoU for all 19 Cityscapes classes. However, Synthia contains only labels for 16 of these classes. Therefore, it is a common practice in UDA to report the mIoU for Synthia→Cityscapes only on these 16 classes. As the Iou for the 3 missing classes is 0, you can do the conversion mIoU16 = mIoU19 * 19 / 16.

The results for Cityscapes→ACDC and Cityscapes→DarkZurich are reported on the test split of the target dataset. To generate the predictions for the test set, please run:

python -m tools.test path/to/config_file path/to/checkpoint_file --test-set --format-only --eval-option imgfile_prefix=labelTrainIds to_label_id=False

The predictions can be submitted to the public evaluation server of the respective dataset to obtain the test score.

Domain Generalization

HRDA/DAFormer for domain generalization (DG) is located on the DG branch, which can be checked out with:

git checkout dg

They can be trained for DG using:

python run_experiments.py --exp 50

For further details, please refer to experiment.py. The model is directly evaluated on Cityscapes during training with GTA data only. It can be additionally evaluated on BDD100K and Mapillary with tools/test.py:

python -m tools.test path/to/config_file path/to/checkpoint_file --eval mIoU --dataset BDD100K
python -m tools.test path/to/config_file path/to/checkpoint_file --eval mIoU --dataset Mapillary --eval-option efficient_test=True

Checkpoints

Below, we provide checkpoints of HRDA for different benchmarks. They come together with the log files of their training. As the results in the paper are provided as the mean over three random seeds, we provide the checkpoint with the median validation performance here.

The checkpoints come with the training logs. Please note that:

The logs provide the mIoU for 19 classes. For Synthia→Cityscapes, it is necessary to convert the mIoU to the 16 valid classes. Please, read the section above for converting the mIoU.
The logs provide the mIoU on the validation set. For Cityscapes→ACDC and Cityscapes→DarkZurich the results reported in the paper are calculated on the test split. For DarkZurich, the performance significantly differs between validation and test split. Please, read the section above on how to obtain the test mIoU.
The logs for domain generalization (DG) provide the validation performance on Cityscapes. Please, refer to the section above to evaluate the checkpoint on BDD100K and Mapillary.

Framework Structure

This project is based on mmsegmentation version 0.16.0. For more information about the framework structure and the config system, please refer to the mmsegmentation documentation and the mmcv documentation.

The most relevant files for HRDA are:

configs/hrda/gtaHR2csHR_hrda.py: Annotated config file for the final HRDA.
mmseg/models/segmentors/hrda_encoder_decoder.py: Implementation of the HRDA multi-resolution encoding with context and detail crop.
mmseg/models/decode_heads/hrda_head.py: Implementation of the HRDA decoding with multi-resolution fusion and scale attention.
mmseg/models/uda/dacs.py: Implementation of the DAFormer self-training.

Acknowledgements

HRDA is based on the following open-source projects. We thank their authors for making the source code publicly available.

License

This project is released under the Apache License 2.0, while some specific features in this repository are with other licenses. Please refer to LICENSES.md for the careful check, if you are using our code for commercial matters.

hrda's People

Contributors

Stargazers

Watchers

hrda's Issues

Unknown CUDA arch or GPU not supported

First of all, thank you for your work and your results. It's great. I encountered this problem when installing the environment:

                raise ValueError("Unknown CUDA arch ({}) or GPU not supported".format(arch))
  ValueError: Unknown CUDA arch (8.9) or GPU not supported
  [end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for mmcv-full
Running setup.py clean for mmcv-full
Failed to build mmcv-full
ERROR: Could not build wheels for mmcv-full, which is required to install pyproject.toml-based projects

This is the problem I encountered when installing mmcv-full.
My device is GTX4090. I guess it is because cuda architecture version of 4090 is not supported.
Excuse me, do you know how to solve this problem？

Code to make the heatmap

Hi, I'm enjoying your excellent work! Could you please provide the code for the heatmap of Figure. 5? Thanks a lot!

Great work! Discussion about oracle performance of GTA to Cityscapes.

Thank you for your great work!

As the paper shows, the performance of GTA to Cityscapes can achieve 73.8 mIoU which is absolutely outstanding. As HRDA is based on DAFormer structure, I checked DAFormer's oracle performance of Cityscapes which is 77.6 mIoU. As we can see, 73.8 is just slightly lower than 77.6. Therefore, does it mean that there's no much space for us to improve GTA to Cityscapes unless we improve the oracle performance?

DDP train

Hello,ihoyer:
When I use ddpwrapper to train your model, the final result only achieve 71.57 mIoU. Why the performance will fill ?Can you give me some suggestion ?

something about the scale attention

Hi, I have some questions about scale attention.

About the scale attention decoder, there seems to be some difference between paper and released code?
Segformer decoder in the paper, DAFormer decoder in code. Will there be any difference in performance?
In addition, can scale attention be understood as adding an additional segmentation head to process the context crop and get the result of the detail crop corresponding to the context crop?
In The second paragraph on page 8, The scale attention decode... Is there something wrong with scale attention? It should be f^A(f^E(x_c))?

Some issue

XIO: fatal IO error 25 (Inappropriate ioctl for device) on X server "localhost:12.0"
after 387 requests (387 known processed) with 4 events remaining.

It is a great work

simple but effective, easy to follow！

what is the difference between run_experiments.py and tools/train.py?

Question about the inference phase

I have calculated the necessary parameters in the inference phase. Is my calculation correct?

Case1.
If not using sliding window LR context crop, config is as follows:
[1]test_cfg=dict(mode='whole'))
If not using sliding window HR detail crop, config is as follows:
[2]hr_slide_inference=False,

The encoder is forwarded only once per image.
According to [1], the LR context crop image(whole image) is forwarded to the decoder once.
According to [2], the HR detail crop image(whole image) is forwarded to the decoder once.

Q1. if the model parameter is 80M(encoder:60M, decoder20M), the parameter required to forward one image is 100M(60M + 20M(LR crop) + 20M(HR crop). Is that right?

Case2.
If not using sliding window LR context crop, config is as follows:
[1]test_cfg=dict(mode='whole'))
If using sliding window HR detail crop, config is as follows:
[2]hr_slide_inference=True,

The encoder is forwarded only once per image.
According to [1], the LR context crop image(whole image) is forwarded to the decoder once.
According to [2], the HR detail crop image is forwarded to the decoder as much as the slide size(N).

Q2. if the model parameter is 80M(encoder:60M, decoder20M), the parameter required to forward one image is 60M + 20M(LR crop) + 20M x N(HR crop). Is that right?

The theory underpinning advantage

Hi, Ihoyer. What are the advantages of such an H/L resolution cropping choice over FPN? thank you

Configs for AdvSeg and MinEnt

Dear Lukas:
Thank you for your wonderful work and excellent code. Can you provide a configuration file that can be run directly using minent.py and advseg.py？ Thank you very much.

Test Pipeline Config for Mapillary

Hello,
thanks for open sourcing your work. I would like to run the test script with Mapillary dataset but I am not sure which img_scale to use for the test pipeline, as Mapillary dataset has various image resolutions. More specifically, in the usual configs, we have test_pipelines such as the one below, but I am not sure what value to give for img_scale. Could you maybe share the test_pipeline you used for Mapillary?

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1920, 1080),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]

Thanks and best regards,

Training accuracy problem

Hello, I retrained HRDA on single V100 according to the config file you provided, but MIoU missed by ten points. I compared the log files generated by your training and found some differences.
your log file：
exp = 10
name_dataset = 'gtaHR22cityscapesCFixHR_1024x1024'
name_architecture = 'mres40-0.1_daformer_sepaspp_sl3_mitb5'
name_decoder = 'mres40-0.1_daformer_sepaspp_sl3'
name_uda = 'dacs_a999_fdthings_rcs0.01-2.0_cpl7'

I retrain the generated log file：
exp = 'basic'
name_dataset = 'gtaHR2cityscapesHR_1024x1024'
name_architecture = 'hrda1-512-0.1_daformer_sepaspp_sl_mitb5'
name_decoder = 'hrda1-512-0.1_daformer_sepaspp_sl'
name_uda = 'dacs_a999_fdthings_rcs0.01-2.0_cpl2'
Is the poor training accuracy caused by these files? Or is it something else?
Attached is the complete log file generated by my training, I hope to get your reply, it is really important to me! Thank you very much!
20240227_115056.log

The performance of DAFormer in this repo.

I tried training the DAFormer configuration and got 66.1 mIoU, slightly lower than the DAFormer reported. Is the DAFormer in this repository consistent with the original author's code. Or it could be due to a different version of CUDA. I used CUDA 10.2 due to my graphics driver version.

Questions on Sliding Window Inference

In the appendix, 'whole' [2] does not use sliding window inference.
So, what does [1] mean?

[1] hr_slide_inference = False
[2] test_cfg=dict(mode='whole')

ACDC Accuracy

Hi,

Nice work! I tried to reproduce the result on Cityscapes->ACDC and ran your code with the given config script (i.e. uda_cityscapesHR_to_darkzurichHR_1024x1024.py), but I got the test set accuracy for just 60.97% rather than the 68% reported in README. Did I miss any detail? Thanks in advance for your clarification!

Open source license

Dear Lukas,

are you planning to add a license to this/DAformer repo? Otherwise the code can unfortunately not be used.

Questions about model forward

In the figure2(b), HR Detail Crops is input to the encoder. After that, the output of the encoder is input to the decoder.
My question is: How many forwards are required to infer a single image?

Question about the hrda_head

Hi Lucas!

In https://github.com/lhoyer/HRDA/blob/master/mmseg/models/decode_heads/hrda_head.py#L210, the function forward() returns 3 values 'fused_seg, lr_seg, hr_seg'. Then, in L225, 'seg_logits' will be a tuple and cause errors. Is there something wrong with the code?

CUDA out of memory upon the start of validation

Anyone has the issue of CUDA OOM when the validation starts?

[                                                  ] 0/500, elapsed: 0s, ETA:Traceback (most recent call last):
  File "run_experiments.py", line 116, in <module>
    train.main([config_files[i]])
  File "/home/ubuntu/Zheng/Softwares/HRDA/tools/train.py", line 168, in main
    train_segmentor(
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/apis/train.py", line 131, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
    self.call_hook('after_train_iter')
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/mmcv/runner/hooks/evaluation.py", line 172, in after_train_iter
    self._do_evaluate(runner)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/core/evaluation/eval_hooks.py", line 36, in _do_evaluate
    results = single_gpu_test(
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/apis/test.py", line 67, in single_gpu_test
    result = model(return_loss=False, **data)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 42, in forward
    return super().forward(*inputs, **kwargs)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/segmentors/base.py", line 112, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/segmentors/base.py", line 94, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/uda/uda_decorator.py", line 95, in simple_test
    return self.get_model().simple_test(img, img_meta, rescale)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/segmentors/encoder_decoder.py", line 385, in simple_test
    seg_logit = self.inference(img, img_meta, rescale)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/segmentors/encoder_decoder.py", line 362, in inference
    seg_logit = self.slide_inference(img, img_meta, rescale)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/segmentors/encoder_decoder.py", line 280, in slide_inference
    crop_seg_logits = self.encode_decode(crop_imgs, img_meta)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/segmentors/hrda_encoder_decoder.py", line 190, in encode_decode
    out = self._decode_head_forward_test(mres_feats, img_metas)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/segmentors/encoder_decoder.py", line 173, in _decode_head_forward_test
    seg_logits = self.decode_head.forward_test(x, img_metas, self.test_cfg)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/decode_heads/hrda_head.py", line 361, in forward_test
    test_results = self.forward(inputs)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/decode_heads/hrda_head.py", line 277, in forward
    hr_seg = self.decode_hr(hr_inp, batch_size)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/decode_heads/hrda_head.py", line 150, in decode_hr
    crop_seg_logits = self.head(features)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/decode_heads/daformer_head.py", line 227, in forward
    x = self.fuse_layer(x)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/decode_heads/daformer_head.py", line 76, in forward
    aspp_outs.extend(self.aspp_modules(x))
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/Zheng/Softwares/HRDA/mmseg/models/decode_heads/aspp_head.py", line 49, in forward
    aspp_outs.append(aspp_module(x))
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/mmcv/cnn/bricks/depthwise_separable_conv_module.py", line 93, in forward
    x = self.depthwise_conv(x)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/mmcv/cnn/bricks/conv_module.py", line 200, in forward
    x = self.norm(x)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 131, in forward
    return F.batch_norm(
  File "/home/ubuntu/.conda/envs/new_da/lib/python3.8/site-packages/torch/nn/functional.py", line 2056, in batch_norm
    return torch.batch_norm(
RuntimeError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 14.56 GiB total capacity; 7.89 GiB already allocated; 1.01 GiB free; 12.49 GiB reserved in total by PyTorch)```

How to run the source-only model?

Hi,
Thanks for your excellent work!

I need to conduct an experiment solely based on the source data.

How to run source only????

Using mixed precision during the training process.

I used a single RTX 3090 to run the codes, but I got the error about the cuda out of memory. So that I want to run the codes using mixed precision. Where should I modify the codes to use the mixed precision during the training process. Thank you very much.

Last question. I look forward to your response.

@lhoyer

In the paper. Figure 2(b) includes "Reassemble".

What I mean by forward pass is whether or not Reassemble is performed only once.
That is, parts of the image should not be inferred multiple times.

In Case 1, does Reassemble work once?

batch_size

May I ask, where should I adjust batch_size?

Using 2 GPUs to train the model.

I want to know if I use 2 GPUs to train the model, where I should modify in the codes. Thank you very much.

Mixed Precision Training

Hello and thank you for your excellent work! I would like to ask how to use mixed precision training?

trainsform to MRI

Hi!
Can this model be used for tumor segmentation? if answer is yes，where we can be modified to fit this task?

Questions on Feature Distance

Dear Lucas,

I am interested in your recent great work HRDA and thanks for sharing your code. During reading it I have some questions about the module of feature distance. [HRDA/mmseg/models/uda/dacs.py]

From the Figure it can be seen that features from multiple input scales are used only when feature_scale in feature_scale_all_strs. However according to your provided config file, feature_scale = 0.5 while feature_scale_all_strs = ['all'], thus this module will never be executed.

So are the features from multiple input scales not used during the training process?

Software to make the demo

Hi, thanks for the nice work, while the demo is quite impressive. I am wondering could you please share the way you make the demo? Any response is appreciated.

CUDA out of memory. how to change GPU, I want to specify a GPU device

RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 23.70 GiB total capacity; 1.33 GiB already allocated; 5.00 MiB free; 1.40 GiB reserved in total by PyTorch)

When I run daformer,It's Ok.
But , I run HRDA, it occour CUDA out of memory.
I want to change GPU 0 to GPU 1
But I don't know how to change it.
Usually, I use code to specify GPU by code

import os
os.environ["CUDA_VISIBLE_DEVICES"] = '1'

But it dosen't work in this work.
I find code in your gtaHR2csHR_hrda.py

n_gpus = 1
gpu_model = 'NVIDIATITANRTX'

is this gpu_model should change?
My gpus are two 3090.
So I want to know how to change GPU in this code. defualt is GPU 0
Or how to change configs to make the code successfully.

Maybe GPU 1 is used, I specify GPU 1, but in pytorch the index of GPU 1 become GPU 0.Then it occour this problem.

So, I want to know if 3090 can run this code. Or change the configs to make this run.

Unfair comparison

In Figure 1 (c), you compare your method to ProDA and SAC, and your method is not based on DeeplabV2. Is this really a meaningful fair comparison.

In Table 2, we can see that the mIoU based on DeeplabV2 only reaches 59.4, which is an ordinary precision.

question about ACDC Validate Dataset

Thank you for your work and your results. It's great!
I am trying to replicate your work on the cityscapes2acdc dataset, but I am not sure how to obtain images on the cityscapes2acdc validation dataset. Can you provide semantic segmentation images on the acdc validation dataset? Or can you tell me how to obtain it?
When I use this command
sh test.sh work_dirs/csHR2acdcHR_hrda_d3a68
it will report the following error:
bad substitution
best regards！

Evaluation on ACDC

Hi,
did you also have any evaluation on ACDC dataset?

Operating System : Window 10 ( feat. error of Installing the mmcv-full)

Hi :)

I have some errors for setting environments.

I already installed the cuda (toolkit) 11.0.3, torch+cu110 and torch 1.7.0.

But there is a same error with the issue (#35).

My gpu is rtx 4080 and my OS is Window 10.

Can't the mmcv-full 1.3.7 be installed on Window 10?

The reason that i think like above is there are only keywords "...-manylinux1..." on https://download.openmmlab.com/mmcv/dist/cu110/torch1.7/index.html.

thank you.

assertion lenth error about samples_with_class during trainning

Hello, I followed the README instructions trying to use gta data for training.

I can make predictions for cityscapes images with the pre-trained model gtaHR2csHR_hrda_246ef.

But when I tried to train the model with the code below( not using full gta dataset, only 4999 pics of them),
python run_experiments.py --config configs/hrda/gtaHR2csHR_hrda.py
I got some assertion lenth error about samples_with_class.

2024-01-11 11:21:01,827 - mmseg - INFO - Loaded 4999 images from data/gta/images
2024-01-11 11:21:01,910 - mmseg - INFO - Loaded 2975 images from data/cityscapes/leftImg8bit/train
2024-01-11 11:21:01,911 - mmseg - INFO - RCS Classes: [17, 7, 6, 4, 9, 5, 13, 14, 3, 11, 8, 1, 10, 2, 0]
2024-01-11 11:21:01,911 - mmseg - INFO - RCS ClassProb: [1.6132097e-01 1.5626104e-01 1.5324336e-01 1.3592052e-01 1.0360345e-01
 6.5830752e-02 5.8957204e-02 5.6184866e-02 4.4540595e-02 4.0169138e-02
 2.2608098e-02 1.3449171e-03 1.5118295e-05 1.4972226e-13 3.3338909e-23]
Traceback (most recent call last):
  File "run_experiments.py", line 120, in <module>
    train.main([config_files[i]])
  File "d:\HRDA\tools\train.py", line 151, in main
    datasets = [build_dataset(cfg.data.train)]
  File "d:\HRDA\mmseg\datasets\builder.py", line 73, in build_dataset
    dataset = UDADataset(
  File "d:\HRDA\mmseg\datasets\uda_dataset.py", line 100, in __init__
    assert len(self.samples_with_class[c]) > 0
AssertionError

I wonder if you could give me some advice. The environment info is as below. Many thanks.

2024-01-11 11:20:59,637 - mmseg - INFO - Environment info:
------------------------------------------------------------
sys.platform: win32
Python: 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:57:54) [MSC v.1924 64 bit (AMD64)]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 4070 Laptop GPU
CUDA_HOME: [C:\Program](file:///C:/Program) Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0
NVCC: Not Available
GCC: n/a
PyTorch: 1.7.1+cu110
PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 192729112
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191125 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 2019
  - CPU capability usage: AVX2
  - CUDA Runtime 11.0
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.4
  - Magma 2.5.4
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -DUSE_FBGEMM -DUSE_VULKAN_WRAPPER, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, 

TorchVision: 0.8.2+cu110
OpenCV: 4.4.0
MMCV: 1.3.7
MMCV Compiler: n/a
MMCV CUDA Compiler: n/a
MMSegmentation: 0.16.0+504516b

About the Domain adaptive understanding of the paper

Thanks for sharing the code. I have some confusion when reading the paper as follow:
1、The paper using context crop、detail crop and weighted attention to improve the segmentation accuraccy but how to transform it to domian adaptive? Only Self_training as like DAFormer? or others?
2、Inference with overlapping sliding window that using random crops or one by one crops?
Thank you very much

Binary instance segmentation masks and domain generalization with unlabeled real data

Thank you very much for sharing the code, excellent work.
I'm trying to train the framework using synthetic data for domain generalization with unlabeled real data. Is there any documentation regarding this?
When using my own data (synthetic and real), both with a binary segmentation mask (grayscale), I am having problems with the IoU and Acc metrics, which have the value "Nan", the only class with values (above 90% inclusive) is background. Any post-processing recommendations for the labels?
Thank you so much, this is an amazing work!

What is the line of code for cropping an image?

Training with custom data

Thanks for sharing your code!
I have been using DaFormer successfully training with synthetic images and now I am looking forward to incorporating unlabeled real data to improve domain generalization.
Would you have a comprehensive guide to the files that need to be changed in order to train your model with custom data?
Thank you so much, this is an amazing work!

Regarding code usability on colab

Hello,
I am highly interested in applications of domain adaptation in semantic segmentation,and I came across your work.
I have some questions regarding the requirements
My current colab environment requirments is as follows

 Torch1.11.0+cu113 True
Cuda 11.1
compiler GCC 7.3
 MMCV1.5.0
 MMSegmentation 0.24.1

Since my torch ,cuda ,mmcv and mmsegmentation are higher than the ones recommended in the readme and requirments.txt,should I downgrade them or is there no need for any downgrading.

how long to train the model. If this code can run on 3090

I am interested in this paper and I knwo transform framwork needs high equipment.
So I want to know the time to train it.
My device is not good. So I want to ask.

About Reproducing the results shown in the Paper

Dear Lukas,

I am interested in your recent great work HRDA and thanks for sharing your code. I would like to run the code you provided and reproduce the results.

I followed the setting in "experiments.py" file, but I found that the results I got can not match those provided in the Paper. Should I change some of the default setting to reach the results provided?

The attached image is the experiment data I recorded for Table 1 in paper. I run it on a RTX 6000 GPU.

(The mIoUs for gta5 - cityscapes are 61, 66.92, 63.31 at 3 random seeds, and for synthia - cityscapes are 55.76, 55.17, 56.14 at 3 random seeds.)

I also copied my environment information below:

sys.platform: linux
Python: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
CUDA available: True
GPU 0: Quadro RTX 6000
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.2.r11.2/compiler.29618528_0
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.7.1+cu110
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.0
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.2+cu110
OpenCV: 4.4.0
MMCV: 1.3.7
MMCV Compiler: GCC 9.4
MMCV CUDA Compiler: 11.2
MMSegmentation: 0.16.0+a57d967

Thank you so much!

Question about the inference phase #21

@lhoyer

Appreciate your kindness.
You're right. What I want to know is the forward pass.

Can you cross check if the calculation is correct?

Case1.
If not using sliding window LR context crop, config is as follows:
[1]test_cfg=dict(mode='whole')) If not using sliding window HR detail crop, config is as follows:
[2]hr_slide_inference=False,

In case 1, The number of forward passes is 160M?
*160M = 60M(Encoder(LR Crop)) + 20M(Decoder(LR Crop)) + 60M(Encoder(HR Crop)) + 20M(Decoder(HR crop))

Because if hr_slide_inference and sliding window inference are not used, the whole image is used for LR context crop and HR detail crop.

Also, am I wrong to think that HRDA is a resolution-based ensemble method?

Impressive work, but still some issues

Hi, Dr. Hoyer. Thanks for your contribution to the community. This is indeed a nice work, which inspired me a lot. After reading the paper, may i summary the core idea is to combine the multi-resolutions to adapt context and fine-grained features. However, did you ever try to directly train on HR inputs via DAFormer? Moreover, the select of HR regions are random. So have you ever considered to select them according to sth. For there are certain feature distribution correlated with spatial.