zhichenghuang / cmae Goto Github PK
View Code? Open in Web Editor NEWThe official implementation of CMAE https://arxiv.org/abs/2207.13532 and https://ieeexplore.ieee.org/document/10330745
The official implementation of CMAE https://arxiv.org/abs/2207.13532 and https://ieeexplore.ieee.org/document/10330745
Hello, thanks for releasing this great work.
Recently, we attempt to implement the CMAE according to the paper, if you don't mind to tell me about some crucial detail.
When i implement the feature decoder, which is similar with the pixel decoder (almost same with MAE decoder, but given the shallow depth(only 2~4 block) ).
However, we're not sure about the output of the feature decoder :
Is the feature decoder only output the prediction of masked patchs and conduct the mean-pooling to get the feature representation ? or,
feature decoder will output all prediction of patchs (including the unmasked patchs) and conduct the mean-pooling to get the feature representation ?
Any suggestion will be appreciated!!
Hi, Thank you for your excellent work.
We are working on reimplementing CMAE, but there is a very tricky problem we meet during pretraining. The infoNCE loss increases gradually while the pixel reconstruction loss goes down. The accuracy in Imagenet-1k is 83.4% after fine-tuning which is even worse than MAE. We have carefully followed the configurations written in your paper. Are there any details missing in the paper? We will appreciate that if you can release the code or provide pretrained weights. Looking forward to your reply.
Hello, I'm wondering did you'll have conduct the experiments for 100ep pertaining of CMAE ?
If you have could you release the 100ep pertaining / fine-tuning setup config. (such as lr, layer-decay, ...etc.) (or at least give us some hint about your setup) ?
Since we have reimplement the CMAE according to the paper (although we applied the BYOL loss), the accuracy seems hard to achieved the accuracy of MAE.
(In our implementation, it only got 79.35% top-1 acc)
I think it caused by our raw hyper-params config (we use same config as MAE-1600ep pretraining), so that the training config may not match with CMAE case and degrade the performance.
Any suggestion will be appreciated !!
congradulation on the great work, quite interesting for me.
hope for the release of code and pretrained weight.
best wishes
It's a amazing work and the idea is very impressive ~
Looking forward to the released code ~
Thanks for clearify the previous issuese one by one.
Recently, we found one more issue about the DA, would you mind to tell us what's Data Augmentation setup in CMAE?
We know that you applied all general DA setup in target branch (exactly same as SimCLR). But not sure about the student branch, will you apply the spatial DA, including RandomResize Crop, rotation, or the other ?
About the Resize crop setup, is that be radom one or center zoomin Crop?
Any suggestiin will be appreciated!!
Hi,
All pre-training experiments are conducted on 32 NVIDIA A100 GPUs with a batch size of 4096.
If i set batch_sizes=16 , is it a devastating blow to the experimental results ?
Hello, thanks for releasing this amazing work.
I have some questions about the loss function design. The loss function could be the BYOL style loss as well as the contrastive loss. To simplify the implementation, we choice to apply BYOL style loss, but not sure that: "is it only calculate the asymmetric loss and backpropagate to the network"?
For example
class CMAE :
def __init__ ( self, ... ) : # omit args
self.online_enc, self.target_enc = ..., ... # omit declaration
self.pixl_dec, self.feat_dec = ..., ... # omit declaration
# BYOL-style proj-pred struct
self.proj, self.pred, self.momentum_proj = ..., ..., ... # omit declaration
def forward ( self, X ) :
v_onl, v_tar = X
# omit masking..
# Suppose target_encoder forward implement the mean-pooling
onl_p, tar_feat = self.online_enc(v_onl), self.target_enc(v_tar)
im_p, onl_feat = self.pixl_dec(onl_p), self.feat_dec(onl_p)
# predicted representation and projected representation
p, z = self.pred( self.proj(onl_feat) ), self.momentum_proj(tar_feat)
# omit BYOL loss implement..
loss = BYOL_loss(p, z)
# No symmetric term ?
# ? p, z = self.pred( self.proj(tar_feat) ), self.momentum_proj(onl_feat)
Any suggestion will be appreciated !!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.