Comments (3)
Right now I don't have these planned for the near future, but in principle it should be quite easy to swap in stronger LLMs, like some of the instruction or dialogue fine-tuned one. From some conversations with others about the work, I think it's possible that some people are already working on this, so hopefully we might see something public soon!
from gill.
Hi, thanks for your interest! I think this level of output is somewhat expected. Your first image is the exact same as the one from our notebook, so the model should be correct.
Note that the image generator being used here is Stable Diffusion v1.5 (which is worse than SDXL and models like DeepFloyd), so the image realism/quality (not semantic match, as explained in our paper) is somewhat upper bounded by that model. Hope that helps!
from gill.
Hi @kohjingyu , thanks for your quick reply, will the code support stronger diffusion models and LLM models in the future?
from gill.
Related Issues (20)
- Clarification on precomputing the visual embeddings HOT 1
- How to get cc3m_embeddings HOT 1
- About the running log HOT 4
- Normalization of cc3m features HOT 1
- How could this affect the performance? HOT 10
- About error when running Precomputing Text Embeddings and Train HOT 2
- shape mismatch in the example notebook HOT 2
- [solved]
- why don't you use universal representation in one task?
- GILL Image Retrieval Code on VIST HOT 1
- Inference shape is not 8 HOT 1
- Visdial相关问题
- Error size mismatch when load decision model HOT 2
- RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
- param.grad is None !
- shape mismatch in the example "Multimodal Dialogue" HOT 1
- FID Evaluation on CC3M and VIST
- i try to dowmload cc3m using tools recommand by readme.md, but the number of picture can be download only 10% . is it normal?
- about [img] token and train data
- environment conflict
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gill.