Giter Club home page Giter Club logo

cmg's Introduction

  • 👋 Hi, I’m @haihuangcode
  • 👀 I’m interested in Multimodal AI

cmg's People

Contributors

haihuangcode avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cmg's Issues

关于训练过程中的lld_loss和mi_loss

您好,我尝试follow您的工作,并迁移到其它领域,但是在训练过程中主要遇到了如下几个问题:

  1. lld_loss不收敛,导致互信息上界估计不准确,影响训练过程
  2. 使用mi_loss之后,模型参数中出现nan
  3. mi_loss随着训练过程越来越大

我尝试了调整mi_net的层数和学习率等方法,但是问题依然存在。

想请教您模型训练中的更多细节:

  1. 您的模型在训练过程中,lld_loss是否是逐渐收敛的,还是稳定在一个范围?
  2. 在mi_loss的反向传播中,mi_net的参数是否更新?
  3. mi_loss的训练过程大概如何,是否收敛?

self.audio_semantic_decoder and self.Audio_decoder

https://github.com/haihuangcode/CMG/blob/2cbdad8f68d6000657ddf45ace97c855c022334d/code/src/model/main_model_2.py#L507C1-L515C60

Hi sir! Thanks for your great work! I have some questions I would like to ask you. I don't know if it's right to understand it this way: self.audio_semantic_decoder and self. Audio_decoder are used for classification and feature reconstruction, respectively.
I also have a question about whether this work is using a transformer model? because I noticed a UniEncoder.py file

Looking forward to hearing from you!

code/src/model/main_model_2.py, line 790

您好,我在阅读您的代码时似乎发现了一个问题,
它在 code/src/model/main_model_2.py 中的第790行:

        for i in unactivated_indices:
            self.embedding[i] = activated_quantized[random.randint(0,len(activated_indices)-1)] + torch.Tensor(256).uniform_(-1/1024, -1/1024).cuda()

我认为这里应该是 (-1/1024, 1/1024) 而不是(-1/1024, -1/1024)
同样的问题还出现在977行和1152行
希望这对您的工作有所帮助 :D

Train on my own dataset

If I'd like to use CMG on my own dataset (for video and audio), how should I prepare the data? I've got video-audio pairs, whether should I extract their features? If yes, what feature extraction model should I use to align with CMG?

model/CPC.py

在pretrain.py文件的第599行里与model/CPC.py里的forward函数中40行的传参和98行返回值是不对应的。

embedding updated in MM_EMA

在main_model_2.py的Cross_VQEmbeddingEMA中,self.embedding更新了三次【self.embedding = self.ema_weight / self.ema_count.unsqueeze(-1)】,但只有最后一次赋值起作用?

encoders design question

Hello, your work has inspired me a lot! I have a question about semantic encoders and modal-specific encoders, what do you need to consider when designing them, and are complex encoders helpful for the experimental results?

Looking forward to hearing from you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.