christophreich1996 / maxvit Goto Github PK
View Code? Open in Web Editor NEWPyTorch reimplementation of the paper "MaxViT: Multi-Axis Vision Transformer" [ECCV 2022].
Home Page: https://arxiv.org/abs/2204.01697
License: MIT License
PyTorch reimplementation of the paper "MaxViT: Multi-Axis Vision Transformer" [ECCV 2022].
Home Page: https://arxiv.org/abs/2204.01697
License: MIT License
Hello, thank you for your excellent work! Do you have the pytorch pretrained weight?
hi!
I have a question about the line 400 in maxvit.py.
there is a skip-connection in block attention and grid attention, so may be we should use 'output = output + self.drop_path(self.mlp(self.norm_2(output)))', rather than '*'
Looking forward to your reply, thanks!
Hi @ChristophReich1996, thanks for implementing MaxViT!
I am just wondering whether the grid partition function has been done right.
In lines, you have implemented the grid partition as: windows = input.view(B, C, H // grid_size[0], grid_size[0], W // grid_size[1], grid_size[1])
, which seems to be as same to window partition. I'm thinking that the grid partition should be fixing the number of windows instead of setting the window size. It should look like:
windows = input.view(B, C, grid_size[0], H // grid_size[0], grid_size[1], W // grid_size[1])
Am I right?
I am checking lucidrains's implementation here which seems to indicate this above meaning. Please let me know if I was wrong on this. Thanks~
Thanks to the authors for sharing Maxvit open source, I really enjoyed this project and studied it for a few days. However, I didn't understand this part of the work on the Grid Partition. In my opinion, it looks almost the same as SWIN V1, so how does it accomplish the grid operation shown below? Looking forward to your advice, thank you
Hi good job! I have a question that How can I convert the TF version pretained weight checkpoint file to pytorch version ?
Hello dear, Thanks for your reproducing this paper, and I want to do this model too.In your code, I find that grid operation seems some problem .I think that your operation refers to block operation rather than global grid operator. How to do global grid operator? And Im waiting for your continuous releasing the code. Thank you.
In this line of code
output = self.main_path(input):line 69, in forward
Error
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument weight in method wrapper_cudnn_batch_norm)
Do you have a trained pre-trained model? And could you share it please?
Hi Chrstoph, thanks for code skeleton for MaxViT paper.
I checked the number of parameters of your code and paper, and both seems to be difference. MaxViT tiny give 24M parameter in this github repo, whereas paper reports 31M. Can you please help me out?
Also I believe the main_path in MBConv block should be like :-
`
self.main_path = nn.Sequential(
norm_layer(in_channels),
nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=(1, 1)), # not in original code
norm_layer(out_channels),
act_layer(),
DepthwiseSeparableConv(in_chs=out_channels, out_chs=out_channels, stride=2 if downscale else 1,
act_layer=act_layer, norm_layer=norm_layer, drop_path_rate=drop_path),
SqueezeExcite(in_chs=out_channels, rd_ratio=0.25),
nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=(1, 1))
)
`
Here you missed first conv2d of kernel 1x1 in your code.
Thanks,
Saarthak
Thank you for your work.
I import the model, and push it to gpu, but the error happen:
RuntimeError: Tensor for argument #2 'weight' is on CPU, but expected it to be on GPU (while checking arguments for cudnn_batch_norm)
Lookiing forward for your reply!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.