Hi! I attempt to implement the multi-modal ability of llama-adapter-v2 myself and I've

Please check out demo page <a href="http://llama-adapter.opengvlab.com/" rel="nofollow

Pretrained weights have been released. <a href="https://github.com/ZrrSkywalker/LL

Questions about implementation of llama-adapter-v2's multi-modal ability and training about llama-adapter HOT 10 CLOSED

opengvlab commented on August 14, 2024 4

Questions about implementation of llama-adapter-v2's multi-modal ability and training

from llama-adapter.

Comments (10)

gaopengpjlab commented on August 14, 2024 2

Please check out demo page http://llama-adapter.opengvlab.com/

from llama-adapter.

gaopengpjlab commented on August 14, 2024 2

Pretrained weights have been released.
https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/llama_adapter_v2_multimodal

from llama-adapter.

gaopengpjlab commented on August 14, 2024 1

Demo and pretrained checkpoint of LLaMa Adapter V2 will be released in a few days. Sorry for the long waiting.

from llama-adapter.

gaopengpjlab commented on August 14, 2024 1

If there is no vision_tokens, can adapter_prompt in the first layer still be used to compute adaption_output and added to original attention_output?

If no vision tokens (for GPT4LLM / Alcapca dataset), we generate a pseudo image with all zeros pixel.

from llama-adapter.

PanQiWei commented on August 14, 2024

I just public my attempt implementation of multi-modal llama-adapter-v2 here, this is just for learning purpose, if there are any incorrect implements, I would really appreciate for anyone to point out.

from llama-adapter.

theAdamColton commented on August 14, 2024

I'm also looking forwards to seeing the training code. There are some ambiguities in the paper; the way the V2 models are trained is not immediately obvious from the paper.

As for 3.) I would guess that they probably do batches of mixed instruction-following and the image-caption items.

from llama-adapter.

PanQiWei commented on August 14, 2024

I'm also looking forwards to seeing the training code. There are some ambiguities in the paper; the way the V2 models are trained is not immediately obvious from the paper.

As for 3.) I would guess that they probably do batches of mixed instruction-following and the image-caption items.

I didn't mix instruction-following data and image caption data in one batch, for the weights update are disjoint (at least I think so)

from llama-adapter.

PanQiWei commented on August 14, 2024

Hi! first of all, thank you very much for publish this great work! I just read the code, and think the model framework is very similar to X-LLM, do you think this structure used in your and their work will be the standard way to build unified multi-modal LLMs?

from llama-adapter.

gaopengpjlab commented on August 14, 2024

https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/imagebind_LLM

pretraining/finetuning/inference code has been released. We support image/video/text/autio/point cloud input and bilingual(chinese/english) response.

Sorry for the long waiting. Hope you enjoy our code.

from llama-adapter.

PanQiWei commented on August 14, 2024

This is awesome! 🔥 🔥 Thank you so much! ❤️

from llama-adapter.

Recommend Projects

Questions about implementation of llama-adapter-v2's multi-modal ability and training about llama-adapter HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent