runpeidong / dreamllm Goto Github PK
View Code? Open in Web Editor NEW[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
Home Page: https://dreamllm.github.io/
License: Apache License 2.0
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
Home Page: https://dreamllm.github.io/
License: Apache License 2.0
Hi, thanks for your great work. I have some questions when I read the paper.
The generation of conditional embeddings is shown in Equation (3). The learnable dream tokens, together with the interleaved document sequence so far x and the generated image so far V, are fed into a cross-attention model to figure out conditional embedding. For this process, I have several questions in detail:
Thanks!
Hi, thanks for this new techniques for extending MLLMs for interleaved documents.
I had a doubt regarding visual encoder in fig-2 and section 3.
In figure 2, as I understand that "during training" itself the model is learning to generate <dream>
tokens as the cat example is given for which dream queries
are learned which are essentially textual inversion embeddings that can be synergised with the remaining context of the textual tokens. (it shows inference stream, but interleaved doc shows text with cat image so it confuses me a bit)
but as that happens, are we again sending the input image to the models via CLIP encodings via an extra projection encodings as shown in the diagram. (is that right? or we're just using the dream queries further ahead)
Also, for taking visual inputs are we following a similar pipeline of CLIP embeddings with projection like that of EMU?
Thank you
how to try the project?
Hi Author,
Thanks for the excellent work. Looking forward to the code. Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.