Giter Club home page Giter Club logo

eve's People

Contributors

paranioar avatar wxinlong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eve's Issues

question about train

hi,
Thanks for your excellent work.I noticed that the code and scripts related to train seemed not to be provided in the documents you provided. I would like to know if you plan to disclose the documents related to train? If so, when will it be made public?

Pre-Alignment Stage 1 Training JSON Format

Hi, thank you for your work. I wanted to ask that for Pre-Alignment Stage 1 Training, what is the JSON Format to be used. It would be helpful if you could give a sample of the json.

Pre-align checkpoint

Thanks for your excellent work! 😍

Since the checkpoints for pretraining and SFT have been released, I was wondering if you could also release the pre-align model for broader use. It would be incredibly kind of you to make the pre-align checkpoint available 😃 .

Have a nice day!

about datasets release

Hi,
Thanks for your excellent work about exploring the implement of VLMs! Will the training datasets be released together with codes and model weights? It would be very great and helpful for VLMs' open-source community. Many thanks~

What is the motivation behind Patch Aligning Layer (PAL)?

Thanks for your great work! However, I do not fully understand the functions of PAL. According to the paper, PAL is connected to the output of LLM and is forced to align with CLIP features. Why do the output features of LLM need to be aligned with CLIP features and how does it help EVE?

Experimental setting of EVE w/ stage1 in Table 5

Hi, thanks for your great work!

I'm curious about the specific experimental setting of EVE w/ stage1 in Table 5, shown in the figure below.
image

Would you please provide the amount of training data in each stage of this model? Appreciate your time and help in advance!

Question about loss function

Hello! Very cool project! In the paper, I saw that MSE loss is used between EVE and the image encoder. However, in the code, it looks like cosine similarity is used:

def compute_mseloss(self, pred_feature, clip_feature):
loss_func = nn.CosineSimilarity(dim=-1)
return 1 - loss_func(pred_feature, clip_feature).mean()

Could you tell me the motivation of using cosine similarity vs MSE? Thanks!

Some questions for training (Parameters, Batchsize...)

Remarkable work you have done! There are some questions for training that you may not detail in the paper.
Were the LLM parameters fully updated in stage 2 (Generative Pre-training)? I'm curious about the batchsize can be set to 512 on 2*8 GPUs with 40GB memory. Was the length of the training data general short?

Fine-tuning Stage

hi,
Thanks for your excellent work.I want to know whether the weight file used in the third Stage of training, Supervised Fine-tuning Stage, is EVE_7B_Pretrain or some other checkpoint file.This CKPT_PATH is found on line 31 of the scripts/eve/eve7b_finetune.sh export CKPT_PATH=checkpoints/ EVE-7B-PRTR1-672-MSE。

RuntimeError: shape '[33, 4096, 24, 20]' is invalid for input of size 62717952

Nice work! When I try to running the training code, I encounter the following error:

File "/ssddata/yuzhen/EVE/eve/model/language_model/eve_llama.py", line 96, in forward
    clip_loss = self.get_clip_loss()(_input_ids,
  File "/home/data/yuzhenh17/miniconda3/envs/eve_envs_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/data/yuzhenh17/miniconda3/envs/eve_envs_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/ssddata/yuzhen/EVE/eve/model/multimodal_encoder/vision_tokenizer.py", line 202, in forward
    i_features = i_features.reshape(L, D, H, W + 1)[:, :, :, :-1]
RuntimeError: shape '[33, 4096, 24, 20]' is invalid for input of size 62717952

This error is triggered by this line of code. And I think it is because this line which sets idx_end as idx_str + min(N, H * (W + 1) + 1). When N < H * (W + 1) + 1, i_features will not be possible to be reshape as (L, D, H, W + 1).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.