zhichenghuang / cmae Goto Github PK

View Code? Open in Web Editor NEW

61.0 61.0 6.0 952 KB

The official implementation of CMAE https://arxiv.org/abs/2207.13532 and https://ieeexplore.ieee.org/document/10330745

cmae's People

Stargazers

Watchers

cmae's Issues

What's the output of feature decoder ?

Hello, thanks for releasing this great work.

Recently, we attempt to implement the CMAE according to the paper, if you don't mind to tell me about some crucial detail.

When i implement the feature decoder, which is similar with the pixel decoder (almost same with MAE decoder, but given the shallow depth(only 2~4 block) ).

However, we're not sure about the output of the feature decoder :
Is the feature decoder only output the prediction of masked patchs and conduct the mean-pooling to get the feature representation ? or,
feature decoder will output all prediction of patchs (including the unmasked patchs) and conduct the mean-pooling to get the feature representation ?

Any suggestion will be appreciated!!

a problem met in reimplmentation

Hi, Thank you for your excellent work.
We are working on reimplementing CMAE, but there is a very tricky problem we meet during pretraining. The infoNCE loss increases gradually while the pixel reconstruction loss goes down. The accuracy in Imagenet-1k is 83.4% after fine-tuning which is even worse than MAE. We have carefully followed the configurations written in your paper. Are there any details missing in the paper? We will appreciate that if you can release the code or provide pretrained weights. Looking forward to your reply.

Could you plz also release the config for 100ep pretraining?

Hello, I'm wondering did you'll have conduct the experiments for 100ep pertaining of CMAE ?
If you have could you release the 100ep pertaining / fine-tuning setup config. (such as lr, layer-decay, ...etc.) (or at least give us some hint about your setup) ?

Since we have reimplement the CMAE according to the paper (although we applied the BYOL loss), the accuracy seems hard to achieved the accuracy of MAE.
(In our implementation, it only got 79.35% top-1 acc)
I think it caused by our raw hyper-params config (we use same config as MAE-1600ep pretraining), so that the training config may not match with CMAE case and degrade the performance.

Any suggestion will be appreciated !!

hope for the release of code and pretrained weight

congradulation on the great work, quite interesting for me.
hope for the release of code and pretrained weight.

best wishes

Look forward for this work ⚒

It's a amazing work and the idea is very impressive ~
Looking forward to the released code ~

About more detail of Data Augmentation in CMAE

Thanks for clearify the previous issuese one by one.
Recently, we found one more issue about the DA, would you mind to tell us what's Data Augmentation setup in CMAE?

We know that you applied all general DA setup in target branch (exactly same as SimCLR). But not sure about the student branch, will you apply the spatial DA, including RandomResize Crop, rotation, or the other ?
About the Resize crop setup, is that be radom one or center zoomin Crop?

Any suggestiin will be appreciated!!

Looking forward to your release of the source code！

Hi, about the experiment

Hi,

All pre-training experiments are conducted on 32 NVIDIA A100 GPUs with a batch size of 4096.

If i set batch_sizes=16 , is it a devastating blow to the experimental results ?

About the loss function design

Hello, thanks for releasing this amazing work.

I have some questions about the loss function design. The loss function could be the BYOL style loss as well as the contrastive loss. To simplify the implementation, we choice to apply BYOL style loss, but not sure that: "is it only calculate the asymmetric loss and backpropagate to the network"?

For example

class CMAE :
    def __init__ ( self, ... ) :  # omit args
        self.online_enc, self.target_enc = ..., ... # omit declaration 
        self.pixl_dec, self.feat_dec = ..., ... # omit declaration
        # BYOL-style proj-pred struct
        self.proj, self.pred, self.momentum_proj = ..., ..., ... # omit declaration
 
    def forward ( self, X ) :
        v_onl, v_tar = X
        # omit masking..
        # Suppose target_encoder forward implement the mean-pooling
        onl_p, tar_feat = self.online_enc(v_onl), self.target_enc(v_tar)
        im_p, onl_feat = self.pixl_dec(onl_p), self.feat_dec(onl_p)

        # predicted representation and projected representation
        p, z = self.pred( self.proj(onl_feat) ), self.momentum_proj(tar_feat)
        # omit BYOL loss implement..
        loss = BYOL_loss(p, z) 
        
        # No symmetric term ?  
        # ? p, z = self.pred( self.proj(tar_feat) ), self.momentum_proj(onl_feat)

Any suggestion will be appreciated !!

zhichenghuang / cmae Goto Github PK

cmae's People

Stargazers

Watchers

cmae's Issues

What's the output of feature decoder ?

a problem met in reimplmentation

Could you plz also release the config for 100ep pretraining?

hope for the release of code and pretrained weight

Look forward for this work ⚒

About more detail of Data Augmentation in CMAE

Looking forward to your release of the source code！

Hi, about the experiment

About the loss function design

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent