baaivision / eve Goto Github PK

EVE: Encoder-Free Vision-Language Models

License: MIT License

Python 93.38% Shell 6.62%

clip encoder-free-vlm instruction-following large-language-models llm mllm multimodal-large-language-models vision-language-models vlm

eve's People

Contributors

Stargazers

Watchers

Forkers

standardgalactic evdcush strategist922 jingjing12110

eve's Issues

question about train

hi，
Thanks for your excellent work.I noticed that the code and scripts related to train seemed not to be provided in the documents you provided. I would like to know if you plan to disclose the documents related to train? If so, when will it be made public?

Pre-Alignment Stage 1 Training JSON Format

Hi, thank you for your work. I wanted to ask that for Pre-Alignment Stage 1 Training, what is the JSON Format to be used. It would be helpful if you could give a sample of the json.

Pre-align checkpoint

Thanks for your excellent work! 😍

Since the checkpoints for pretraining and SFT have been released, I was wondering if you could also release the pre-align model for broader use. It would be incredibly kind of you to make the pre-align checkpoint available 😃 .

Have a nice day!

about datasets release

Hi,
Thanks for your excellent work about exploring the implement of VLMs! Will the training datasets be released together with codes and model weights? It would be very great and helpful for VLMs' open-source community. Many thanks~

Nan llm loss when finetuning EVE-7B-pretrain on llava_v1_5_mix665k

I am trying to finetune EVE-7B-pretrain on llava_v1_5_mix665k using the script here . And the output indicates that some of the llm_loss are equal to Nan value, which leads to Nan value of all_loss. In addition, there are many warnings about the "tokenization mismatch". Is this normal? Thank you for your reply.

OSError: Unable to load weights from pytorch checkpoint file for '../models--BAAI--EVE-7B-v1.0/snapshots/7ee69e818a52f2db723f0abd20e23bae6e2bfd63/pytorch_model-00002-of-00002.bin' at '.../models--BAAI--EVE-7B-v1.0/snapshots/7ee69e818a52f2db723f0abd20e23bae6e2bfd63/pytorch_model-00002-of-00002.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

I can't load the weight of BAAI/EVE-7B-v1.0, can you help me?

What is the motivation behind Patch Aligning Layer (PAL)?

Thanks for your great work! However, I do not fully understand the functions of PAL. According to the paper, PAL is connected to the output of LLM and is forced to align with CLIP features. Why do the output features of LLM need to be aligned with CLIP features and how does it help EVE?

Experimental setting of EVE w/ stage1 in Table 5

Hi, thanks for your great work!

I'm curious about the specific experimental setting of EVE w/ stage1 in Table 5, shown in the figure below.

Would you please provide the amount of training data in each stage of this model? Appreciate your time and help in advance!

Question about loss function

Hello! Very cool project! In the paper, I saw that MSE loss is used between EVE and the image encoder. However, in the code, it looks like cosine similarity is used:

EVE/eve/model/multimodal_encoder/vision_tokenizer.py

Lines 176 to 178 in b34b2b4

 def compute_mseloss(self, pred_feature, clip_feature): 

 loss_func = nn.CosineSimilarity(dim=-1) 

 return 1 - loss_func(pred_feature, clip_feature).mean()

Could you tell me the motivation of using cosine similarity vs MSE? Thanks!

Some questions for training (Parameters, Batchsize...)

Remarkable work you have done! There are some questions for training that you may not detail in the paper.
Were the LLM parameters fully updated in stage 2 (Generative Pre-training)? I'm curious about the batchsize can be set to 512 on 2*8 GPUs with 40GB memory. Was the length of the training data general short?

Fine-tuning Stage

hi，
Thanks for your excellent work.I want to know whether the weight file used in the third Stage of training, Supervised Fine-tuning Stage, is EVE_7B_Pretrain or some other checkpoint file.This CKPT_PATH is found on line 31 of the scripts/eve/eve7b_finetune.sh export CKPT_PATH=checkpoints/ EVE-7B-PRTR1-672-MSE。

RuntimeError: shape '[33, 4096, 24, 20]' is invalid for input of size 62717952

Nice work! When I try to running the training code, I encounter the following error:

File "/ssddata/yuzhen/EVE/eve/model/language_model/eve_llama.py", line 96, in forward
    clip_loss = self.get_clip_loss()(_input_ids,
  File "/home/data/yuzhenh17/miniconda3/envs/eve_envs_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/data/yuzhenh17/miniconda3/envs/eve_envs_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/ssddata/yuzhen/EVE/eve/model/multimodal_encoder/vision_tokenizer.py", line 202, in forward
    i_features = i_features.reshape(L, D, H, W + 1)[:, :, :, :-1]
RuntimeError: shape '[33, 4096, 24, 20]' is invalid for input of size 62717952

This error is triggered by this line of code. And I think it is because this line which sets idx_end as idx_str + min(N, H * (W + 1) + 1). When N < H * (W + 1) + 1, i_features will not be possible to be reshape as (L, D, H, W + 1).

baaivision / eve Goto Github PK

eve's People

Contributors

Stargazers

Watchers

Forkers

eve's Issues

question about train

Pre-Alignment Stage 1 Training JSON Format

Pre-align checkpoint

about datasets release

Nan llm loss when finetuning EVE-7B-pretrain on llava_v1_5_mix665k

What is the motivation behind Patch Aligning Layer (PAL)?

Experimental setting of EVE w/ stage1 in Table 5

Question about loss function

Some questions for training (Parameters, Batchsize...)

Fine-tuning Stage

RuntimeError: shape '[33, 4096, 24, 20]' is invalid for input of size 62717952

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def compute_mseloss(self, pred_feature, clip_feature):
	loss_func = nn.CosineSimilarity(dim=-1)
	return 1 - loss_func(pred_feature, clip_feature).mean()