yan-hao-tian / vw Goto Github PK
View Code? Open in Web Editor NEWiclr2024 poster Varying Window Attention
License: MIT License
iclr2024 poster Varying Window Attention
License: MIT License
Hi,
In your first draft of paper on Arxiv, you mentioned that you are using MLP mixer to mix the channels but I don't see any code that uses MLP Mixer. Can you please clarify? If you removed the MLP Mixer, what was the reason behind it?
Hi, thanks for your great work. I trained the lawin + MiT-b2 with 80k iterations and the final performance is 46.64 mIoU. The training protocols are exactly the same as segformer. Here is the log file.
20220402_071143.log
Hi, When will the waits of Lawin model be available?
In addition to the parameters and flops, could you please provide the speed comparison with SegFormer?
Congratulations to such an excellent job! I'm looking forward to your project of the full code. Thanks in advance!
"The output of LawinASPP is upsampled to the size of a quarter of input image, then fused with the first-level feature by a linear layer".
I see this gives a 0.8% performance gain in Table 6. The only explanation you gave is "which manifests the importance of low-level information" in Section 4.3.3. Is that all, are there anymore intuitions or proof?
Hi! Thanks for your excellent work, do you have a plan to release the imagenet pre-train code? ^ _ ^
sys.platform: linux
Python: 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
CUDA available: True
GPU 0,1: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.3.r11.3/compiler.29920130_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1+cu111
PyTorch compiling details: PyTorch built with:
sys.platform: linux
Python: 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
CUDA available: True
GPU 0,1: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.3.r11.3/compiler.29920130_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1+cu111
PyTorch compiling details: PyTorch built with:
2022-10-21 17:23:32,633 - mmseg - INFO - Distributed training: True
INFO:mmseg:Distributed training: True
2022-10-21 17:23:33,165 - mmseg - INFO - Config:
norm_cfg = dict(type='SyncBN', requires_grad=True)
find_unused_parameters = True
................................................................................................................................................................................................................................................
2022-10-21 17:23:34,101 - mmseg - INFO - Loaded 4750 images
INFO:mmseg:Loaded 4750 images
fatal: not a git repository (or any of the parent directories): .git
2022-10-21 17:23:36,849 - mmseg - INFO - Loaded 1188 images
INFO:mmseg:Loaded 1188 images
2022-10-21 17:23:36,850 - mmseg - INFO - Start running, host: buaa@buaa-System-Product-Name, work_dir: /home/buaa/songyue/lawin-master/workdir
INFO:mmseg:Start running, host: buaa@buaa-System-Product-Name, work_dir: /home/buaa/songyue/lawin-master/workdir
2022-10-21 17:23:36,850 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters
INFO:mmseg:workflow: [('train', 1)], max: 160000 iters
Traceback (most recent call last):
File "/snap/pycharm-community/302/plugins/python-ce/helpers/pydev/pydevd.py", line 1496, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/snap/pycharm-community/302/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/buaa/songyue/lawin-master/tools/train.py", line 174, in
main()
File "/home/buaa/songyue/lawin-master/tools/train.py", line 170, in main
meta=meta)
File "/home/buaa/songyue/lawin-master/mmseg/apis/train.py", line 115, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/parallel/distributed.py", line 46, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/base.py", line 152, in train_step
losses = self(**data_batch)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/base.py", line 122, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/encoder_decoder.py", line 158, in forward_train
gt_semantic_seg)
File "/home/buaa/songyue/lawin-master/mmseg/models/segmentors/encoder_decoder.py", line 102, in _decode_head_forward_train
self.train_cfg)
File "/home/buaa/songyue/lawin-master/mmseg/models/decode_heads/decode_head.py", line 188, in forward_train
seg_logits = self.forward(inputs)
File "/home/buaa/songyue/lawin-master/mmseg/models/decode_heads/lawin_head.py", line 328, in forward
abc = self.image_pool(_c)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/mmcv/cnn/bricks/conv_module.py", line 195, in forward
x = self.norm(x)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 539, in forward
bn_training, exponential_average_factor, self.eps)
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/functional.py", line 2147, in batch_norm
_verify_batch_size(input.size())
File "/home/buaa/anaconda3/envs/vit/lib/python3.6/site-packages/torch/nn/functional.py", line 2114, in _verify_batch_size
raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
python-BaseException
Backend TkAgg is interactive backend. Turning interactive mode on.`
We found that this should be the problem of batchsize being 1. But we don't know where to make the changes. We thought there was something wrong with the configuration that we weren't aware of. Can you give us some suggestions?
Looking forward to your reply!
Looking forward to your release
Can you release the weights of cityscapes? I just find the weights of ade20k in your link. thanks!
为什么对于cityscapes数据集的VWFormer的这个代码,跟另外一个VWFormer的代码不一样?有人知道原因吗???
Hi, I used the lawin decoder and MiT-B2 as Encoder to train on Cityscapes, but got Nan Loss after about 16000 iterations. Could you please help me to check the code for the decoder again?
Dear Authors,
Thank you for sharing the code for your outstanding research.
I noticed in your paper that the experiments for VW-SegFormer were conducted on the COCO and CityScapes datasets, while those for VW-Mask2Former were conducted on ADE20k. This makes it challenging for readers to determine which method offers better performance. Could you share any results or insights comparing these two methods on the same dataset?
The results don't need to be rigorously scientific, but any information would be helpful for me to decide which approach to use as a starting point for my work.
Thank you in advance for your assistance!
Hi, thanks for this nice work. I know that this repository is licensed under CC Attribution-NonCommercial 4.0 and I respect your decision on that. For this to have a greater impact to the community, would you consider adopting a more permissive license (MIT or Apache 2.0) for this? Mask2Former is an example of high-impact work that is licensed under the MIT license.
Hi, thank you for sharing such an excellent work. But there is one detail that makes me curious and confused. When the first layer feature of the encoder passes through the linear layer of the decoder, why the number of output channels is adjusted to the special 48 and not other sizes?
Hi!
Could you please point out which subset of cityscapes the reported results belong to? Validation set or test set?
Thanks
Hello, I am really impressed with Lawin Transformer.
I would like to train with the model you proposed, but there are difficulties because hyper-parameters related to learning are not shared in the paper
Could you please share the code to train it?
thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.