Giter Club home page Giter Club logo

maxvit's People

Contributors

christophreich1996 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

maxvit's Issues

a questione about the code in maxvit.py

hi!
I have a question about the line 400 in maxvit.py.
there is a skip-connection in block attention and grid attention, so may be we should use 'output = output + self.drop_path(self.mlp(self.norm_2(output)))', rather than '*'
Looking forward to your reply, thanks!

Grid partition issue

Hi @ChristophReich1996, thanks for implementing MaxViT!

I am just wondering whether the grid partition function has been done right.

In lines, you have implemented the grid partition as: windows = input.view(B, C, H // grid_size[0], grid_size[0], W // grid_size[1], grid_size[1]), which seems to be as same to window partition. I'm thinking that the grid partition should be fixing the number of windows instead of setting the window size. It should look like:

windows = input.view(B, C, grid_size[0], H // grid_size[0],  grid_size[1], W // grid_size[1])

Am I right?

I am checking lucidrains's implementation here which seems to indicate this above meaning. Please let me know if I was wrong on this. Thanks~

Grid Partition

Thanks to the authors for sharing Maxvit open source, I really enjoyed this project and studied it for a few days. However, I didn't understand this part of the work on the Grid Partition. In my opinion, it looks almost the same as SWIN V1, so how does it accomplish the grid operation shown below? Looking forward to your advice, thank you
image

some questions

Hello dear, Thanks for your reproducing this paper, and I want to do this model too.In your code, I find that grid operation seems some problem .I think that your operation refers to block operation rather than global grid operator. How to do global grid operator? And Im waiting for your continuous releasing the code. Thank you.

Put the code on the GPU and run, but the run reports an error

In this line of code
output = self.main_path(input):line 69, in forward
Error
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument weight in method wrapper_cudnn_batch_norm)

pretrain

Do you have a trained pre-trained model? And could you share it please?

Regarding parameters

Hi Chrstoph, thanks for code skeleton for MaxViT paper.

I checked the number of parameters of your code and paper, and both seems to be difference. MaxViT tiny give 24M parameter in this github repo, whereas paper reports 31M. Can you please help me out?

Also I believe the main_path in MBConv block should be like :-

`

   self.main_path = nn.Sequential(

    norm_layer(in_channels),

nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=(1, 1)),   # not in original code

norm_layer(out_channels),

act_layer(),

     DepthwiseSeparableConv(in_chs=out_channels, out_chs=out_channels, stride=2 if downscale else 1,
                               act_layer=act_layer, norm_layer=norm_layer, drop_path_rate=drop_path),

      SqueezeExcite(in_chs=out_channels, rd_ratio=0.25),

      nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=(1, 1))
    )

`

Here you missed first conv2d of kernel 1x1 in your code.

Thanks,
Saarthak

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.