haihuangcode / cmg Goto Github PK

The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)

Python 99.73% Shell 0.27%

cross-modal cross-modal-generalization multimodal pretrained-models

cmg's Introduction

👋 Hi, I’m @haihuangcode
👀 I’m interested in Multimodal AI

cmg's People

Contributors

Stargazers

Watchers

cmg's Issues

关于训练过程中的lld_loss和mi_loss

您好，我尝试follow您的工作，并迁移到其它领域，但是在训练过程中主要遇到了如下几个问题：

lld_loss不收敛，导致互信息上界估计不准确，影响训练过程
使用mi_loss之后，模型参数中出现nan
mi_loss随着训练过程越来越大

我尝试了调整mi_net的层数和学习率等方法，但是问题依然存在。

想请教您模型训练中的更多细节：

您的模型在训练过程中，lld_loss是否是逐渐收敛的，还是稳定在一个范围？
在mi_loss的反向传播中，mi_net的参数是否更新？
mi_loss的训练过程大概如何，是否收敛？

self.audio_semantic_decoder and self.Audio_decoder

https://github.com/haihuangcode/CMG/blob/2cbdad8f68d6000657ddf45ace97c855c022334d/code/src/model/main_model_2.py#L507C1-L515C60

Hi sir! Thanks for your great work! I have some questions I would like to ask you. I don't know if it's right to understand it this way: self.audio_semantic_decoder and self. Audio_decoder are used for classification and feature reconstruction, respectively.
I also have a question about whether this work is using a transformer model? because I noticed a UniEncoder.py file

Looking forward to hearing from you!

code/src/model/main_model_2.py, line 790

您好，我在阅读您的代码时似乎发现了一个问题，
它在 code/src/model/main_model_2.py 中的第790行:

        for i in unactivated_indices:
            self.embedding[i] = activated_quantized[random.randint(0,len(activated_indices)-1)] + torch.Tensor(256).uniform_(-1/1024, -1/1024).cuda()

我认为这里应该是 (-1/1024, 1/1024) 而不是(-1/1024, -1/1024)
同样的问题还出现在977行和1152行
希望这对您的工作有所帮助 :D

Train on my own dataset

If I'd like to use CMG on my own dataset (for video and audio), how should I prepare the data? I've got video-audio pairs, whether should I extract their features? If yes, what feature extraction model should I use to align with CMG?

FileNotFoundError: [Errno 2] No such file or directory: 'vggsoundCategories2Prompts.csv'

hi, nice work, but miss this file, expect you reply, thx!

model/CPC.py

在pretrain.py文件的第599行里与model/CPC.py里的forward函数中40行的传参和98行返回值是不对应的。

embedding updated in MM_EMA

在main_model_2.py的Cross_VQEmbeddingEMA中，self.embedding更新了三次【self.embedding = self.ema_weight / self.ema_count.unsqueeze(-1)】，但只有最后一次赋值起作用？

encoders design question

Hello, your work has inspired me a lot！ I have a question about semantic encoders and modal-specific encoders, what do you need to consider when designing them, and are complex encoders helpful for the experimental results?

Looking forward to hearing from you!

haihuangcode / cmg Goto Github PK

cmg's Introduction

cmg's People

Contributors

Stargazers

Watchers

cmg's Issues

关于训练过程中的lld_loss和mi_loss

self.audio_semantic_decoder and self.Audio_decoder

code/src/model/main_model_2.py, line 790

Train on my own dataset

FileNotFoundError: [Errno 2] No such file or directory: 'vggsoundCategories2Prompts.csv'

model/CPC.py

embedding updated in MM_EMA

encoders design question

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent