x-plug / mplug Goto Github PK
View Code? Open in Web Editor NEWmPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Home Page: https://arxiv.org/abs/2205.12005
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Home Page: https://arxiv.org/abs/2205.12005
Hi authors,
Thanks for your great work. Are there any chance for you to release the fintuned COCO/Flikr30k checkpoint for image-retrieval task?
Thanks a lot.
The links of pre-trained models are invalid. Could you provide a new one?
请问在modelsope中公布的模型是论文中SOTA的模型吗,我这边使用模型介绍中的模型和caption输出方法,在coco-val 5k上统计bleu4和cider指标,都是低于论文指标的。
是模型的原因、测试方法不对或者是测试集测试工具没对齐吗?
The code in this repo shows that baseline reward is calculated by averaging reward of generated captions. However, the original version of scst as well as some other scst implementation (e.g., in VALOR) calculate the baseline reward with greedy-search-generated caption. Is there any reference or explanation about current implementation in this repo? Really appreciate it if I obtain any help.
Dear authors,
I finetuned mPLUG Base on VQAv2 but only get around 75% accuracy instead of the around 80% reported in the readme.
Could you kindly upload the finetuned checkpoints for VQA? I am benchmarking your model and would prefer to benchmark the strongest possible version.
Best,
你好,请问pretrain什么时候开放源码,看到readme里写的 coming soon,想在自己的数据上试一下效果。感谢
and how to find it? thanks
Hi. I don't understand how to use this pre-trained model for image captioning. Am I supposed to clone the github repo and then somehow load the pre-trained model? It would be extremely beneficial if you could provide me with some sort of notebook on how to achieve this
Hi, thanks for your work!
I used your model with modelscope, but I didn't find which model size is the default setting in modelscope, I only know is its named mplug_visual-question-answering_coco_large_en.
And I found that there is memory leak with following code.
pipeline_vqa = pipeline(Tasks.visual_question_answering, model=model_id)
for image_path in image_paths:
count = count + 1
raw_image = Image.open(image_path).convert('RGB')
image_list.append(raw_image)
image_name_list.append(os.path.basename(image_path))
question = "what is the man doing in the picture?"
input_vqa = {
'image': image_path,
'question': question,
}
text = pipeline_vqa(input_vqa) # {'text': 'talking on phone'}
text = text['text']
result_list.append(text)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.