Giter Club home page Giter Club logo

cmae's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cmae's Issues

What's the output of feature decoder ?

Hello, thanks for releasing this great work.

Recently, we attempt to implement the CMAE according to the paper, if you don't mind to tell me about some crucial detail.

When i implement the feature decoder, which is similar with the pixel decoder (almost same with MAE decoder, but given the shallow depth(only 2~4 block) ).

However, we're not sure about the output of the feature decoder :
Is the feature decoder only output the prediction of masked patchs and conduct the mean-pooling to get the feature representation ? or,
feature decoder will output all prediction of patchs (including the unmasked patchs) and conduct the mean-pooling to get the feature representation ?

Any suggestion will be appreciated!!

a problem met in reimplmentation

Hi, Thank you for your excellent work.
We are working on reimplementing CMAE, but there is a very tricky problem we meet during pretraining. The infoNCE loss increases gradually while the pixel reconstruction loss goes down. The accuracy in Imagenet-1k is 83.4% after fine-tuning which is even worse than MAE. We have carefully followed the configurations written in your paper. Are there any details missing in the paper? We will appreciate that if you can release the code or provide pretrained weights. Looking forward to your reply.

Could you plz also release the config for 100ep pretraining?

Hello, I'm wondering did you'll have conduct the experiments for 100ep pertaining of CMAE ?
If you have could you release the 100ep pertaining / fine-tuning setup config. (such as lr, layer-decay, ...etc.) (or at least give us some hint about your setup) ?

Since we have reimplement the CMAE according to the paper (although we applied the BYOL loss), the accuracy seems hard to achieved the accuracy of MAE.
(In our implementation, it only got 79.35% top-1 acc)
I think it caused by our raw hyper-params config (we use same config as MAE-1600ep pretraining), so that the training config may not match with CMAE case and degrade the performance.

Any suggestion will be appreciated !!

About more detail of Data Augmentation in CMAE

Thanks for clearify the previous issuese one by one.
Recently, we found one more issue about the DA, would you mind to tell us what's Data Augmentation setup in CMAE?

We know that you applied all general DA setup in target branch (exactly same as SimCLR). But not sure about the student branch, will you apply the spatial DA, including RandomResize Crop, rotation, or the other ?
About the Resize crop setup, is that be radom one or center zoomin Crop?

Any suggestiin will be appreciated!!

Hi, about the experiment

Hi,

All pre-training experiments are conducted on 32 NVIDIA A100 GPUs with a batch size of 4096.

If i set batch_sizes=16 , is it a devastating blow to the experimental results ?

About the loss function design

Hello, thanks for releasing this amazing work.

I have some questions about the loss function design. The loss function could be the BYOL style loss as well as the contrastive loss. To simplify the implementation, we choice to apply BYOL style loss, but not sure that: "is it only calculate the asymmetric loss and backpropagate to the network"?

For example

class CMAE :
    def __init__ ( self, ... ) :  # omit args
        self.online_enc, self.target_enc = ..., ... # omit declaration 
        self.pixl_dec, self.feat_dec = ..., ... # omit declaration
        # BYOL-style proj-pred struct
        self.proj, self.pred, self.momentum_proj = ..., ..., ... # omit declaration
 
    def forward ( self, X ) :
        v_onl, v_tar = X
        # omit masking..
        # Suppose target_encoder forward implement the mean-pooling
        onl_p, tar_feat = self.online_enc(v_onl), self.target_enc(v_tar)
        im_p, onl_feat = self.pixl_dec(onl_p), self.feat_dec(onl_p)

        # predicted representation and projected representation
        p, z = self.pred( self.proj(onl_feat) ), self.momentum_proj(tar_feat)
        # omit BYOL loss implement..
        loss = BYOL_loss(p, z) 
        
        # No symmetric term ?  
        # ? p, z = self.pred( self.proj(tar_feat) ), self.momentum_proj(onl_feat)

Any suggestion will be appreciated !!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.