Giter Club home page Giter Club logo

lawin's Introduction

[ICLR2024] Multi-Scale Representations by Varying Window Attention for Semantic Segmentation

🔥🔥🔥 ICLR2024 Poster 🔥🔥🔥

arxiv

HuggingFace🤗

Running VW

VW-Swin/ConvNeXt

VW-MaskFormer

VW-Mask2Former

Citing VW

@inproceedings{yan2023multi,
  title={Multi-Scale Representations by Varing Window Attention for Semantic Segmentation},
  author={Yan, Haotian and Wu, Ming and Zhang, Chuang},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2023}
}

lawin's People

Contributors

fatlime avatar yan-hao-tian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lawin's Issues

About results on Cityscapes

Hi!

Could you please point out which subset of cityscapes the reported results belong to? Validation set or test set?

Thanks

The effect of skip connection from Transition Block 1

"The output of LawinASPP is upsampled to the size of a quarter of input image, then fused with the first-level feature by a linear layer".
I see this gives a 0.8% performance gain in Table 6. The only explanation you gave is "which manifests the importance of low-level information" in Section 4.3.3. Is that all, are there anymore intuitions or proof?

Loss nan

Hi, I used the lawin decoder and MiT-B2 as Encoder to train on Cityscapes, but got Nan Loss after about 16000 iterations. Could you please help me to check the code for the decoder again?

imagenet train code

Hi! Thanks for your excellent work, do you have a plan to release the imagenet pre-train code? ^ _ ^

FULL CODE

Congratulations to such an excellent job! I'm looking forward to your project of the full code. Thanks in advance!

code

Looking forward to your release

batch problem in training process

In the course of training, we encountered this problem
`/home/buaa/anaconda3/envs/vit/bin/python3.6 /snap/pycharm-community/302/plugins/python-ce/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 45947 --file /home/buaa/songyue/lawin-master/tools/train.py
Connected to pydev debugger (build 222.4345.23)
fatal: not a git repository (or any of the parent directories): .git
2022-10-21 17:23:32,633 - mmseg - INFO - Environment info:

sys.platform: linux
Python: 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
CUDA available: True
GPU 0,1: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.3.r11.3/compiler.29920130_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1+cu111
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  • CuDNN 8.0.5
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
    TorchVision: 0.9.1+cu111
    OpenCV: 4.6.0
    MMCV: 1.2.7
    MMCV Compiler: GCC 7.5
    MMCV CUDA Compiler: 11.3
    MMSegmentation: 0.11.0+

INFO:mmseg:Environment info:

sys.platform: linux
Python: 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
CUDA available: True
GPU 0,1: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.3.r11.3/compiler.29920130_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1+cu111
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  • CuDNN 8.0.5
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
    TorchVision: 0.9.1+cu111
    OpenCV: 4.6.0
    MMCV: 1.2.7
    MMCV Compiler: GCC 7.5
    MMCV CUDA Compiler: 11.3
    MMSegmentation: 0.11.0+

2022-10-21 17:23:32,633 - mmseg - INFO - Distributed training: True
INFO:mmseg:Distributed training: True
2022-10-21 17:23:33,165 - mmseg - INFO - Config:
norm_cfg = dict(type='SyncBN', requires_grad=True)
find_unused_parameters = True
................................................................................................................................................................................................................................................
2022-10-21 17:23:34,101 - mmseg - INFO - Loaded 4750 images
INFO:mmseg:Loaded 4750 images
fatal: not a git repository (or any of the parent directories): .git
2022-10-21 17:23:36,849 - mmseg - INFO - Loaded 1188 images
INFO:mmseg:Loaded 1188 images
2022-10-21 17:23:36,850 - mmseg - INFO - Start running, host: buaa@buaa-System-Product-Name, work_dir: /home/buaa/songyue/lawin-master/workdir
INFO:mmseg:Start running, host: buaa@buaa-System-Product-Name, work_dir: /home/buaa/songyue/lawin-master/workdir
2022-10-21 17:23:36,850 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters
INFO:mmseg:workflow: [('train', 1)], max: 160000 iters
Traceback (most recent call last):
File "/snap/pycharm-community/302/plugins/python-ce/helpers/pydev/pydevd.py", line 1496, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/snap/pycharm-community/302/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/buaa/songyue/lawin-master/tools/train.py", line 174, in
main()
File "/home/buaa/songyue/lawin-master/tools/train.py", line 170, in main
meta=meta)
File "/home/buaa/songyue/lawin-master/mmseg/apis/train.py", line 115, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/parallel/distributed.py", line 46, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/base.py", line 152, in train_step
losses = self(**data_batch)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/base.py", line 122, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/encoder_decoder.py", line 158, in forward_train
gt_semantic_seg)
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/encoder_decoder.py", line 102, in _decode_head_forward_train
self.train_cfg)
File "/home/buaa/songyue/lawin-master/mmseg/models/decode_heads/decode_head.py", line 188, in forward_train
seg_logits = self.forward(inputs)
File "/home/buaa/songyue/lawin-master/mmseg/models/decode_heads/lawin_head.py", line 328, in forward
abc = self.image_pool(_c)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/cnn/bricks/conv_module.py", line 195, in forward
x = self.norm(x)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 539, in forward
bn_training, exponential_average_factor, self.eps)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/functional.py", line 2147, in batch_norm
_verify_batch_size(input.size())
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/functional.py", line 2114, in _verify_batch_size
raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
python-BaseException
Backend TkAgg is interactive backend. Turning interactive mode on.`
We found that this should be the problem of batchsize being 1. But we don't know where to make the changes. We thought there was something wrong with the configuration that we weren't aware of. Can you give us some suggestions?
Looking forward to your reply!

Model weights

Hi, When will the waits of Lawin model be available?

Train code

Hello, I am really impressed with Lawin Transformer.
I would like to train with the model you proposed, but there are difficulties because hyper-parameters related to learning are not shared in the paper
Could you please share the code to train it?
thank you

VW-SegFormer v.s. VW-Mask2Former

Dear Authors,

Thank you for sharing the code for your outstanding research.

I noticed in your paper that the experiments for VW-SegFormer were conducted on the COCO and CityScapes datasets, while those for VW-Mask2Former were conducted on ADE20k. This makes it challenging for readers to determine which method offers better performance. Could you share any results or insights comparing these two methods on the same dataset?

The results don't need to be rigorously scientific, but any information would be helpful for me to decide which approach to use as a starting point for my work.

Thank you in advance for your assistance!

A little confusion about the linear layer in the decoder

Hi, thank you for sharing such an excellent work. But there is one detail that makes me curious and confused. When the first layer feature of the encoder passes through the linear layer of the decoder, why the number of output channels is adjusted to the special 48 and not other sizes?

The performance

Hi, thanks for your great work. I trained the lawin + MiT-b2 with 80k iterations and the final performance is 46.64 mIoU. The training protocols are exactly the same as segformer. Here is the log file.
20220402_071143.log

More permissive license

Hi, thanks for this nice work. I know that this repository is licensed under CC Attribution-NonCommercial 4.0 and I respect your decision on that. For this to have a greater impact to the community, would you consider adopting a more permissive license (MIT or Apache 2.0) for this? Mask2Former is an example of high-impact work that is licensed under the MIT license.

weights of cityscapes

Can you release the weights of cityscapes? I just find the weights of ade20k in your link. thanks!

Speed comparison

In addition to the parameters and flops, could you please provide the speed comparison with SegFormer?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.