Comments (10)
from transformers.
Thanks for the issue @toondata! You mention it's the same issue as #31377; does applying the same fix here solve the issue?
from transformers.
Sorry, my expression may have caused you misunderstanding. I encountered a problem similar to issue #31377. However, given the differences in implementation logic between idefics2 and Qwen/Qwen2-VL-7B-Instruct, I'm unsure whether the causes of these similar phenomena are the same. Despite downloading and compiling the latest mainline code, the issue remains unresolved.
Thanks for the issue @toondata! You mention it's the same issue as #31377; does applying the same fix here solve the issue?
from transformers.
@toondata can you share the hash pls, I can't find it. But I tried to run your code in the latest main
and got no errors, can you double check that updating version helps. Also, I see you have 'mps' commented out. We had several problems with size mismatch in LLaVA so the error might be related to that. If you can get try to run it in 'cuda', maybe in colab notebook, would be helpful to see the source of error
import requests
from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor
model_path="Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(model_path, torch_dtype=torch.bfloat16,).to("cuda:0")
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path,
min_pixels=min_pixels,
max_pixels=max_pixels
)
messages = [
{
"role": "user",
"content": [
{
"type": "image"
},
{
"type": "text",
"text": "Extract text from pdf"
}
]
}
]
image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = processor( text=[text], images=[image],).to("cuda:0")
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
from transformers.
Here is my git hash: 96429e7.
I just updated to the latest version on the main branch and reinstalled transformers via pip install, but the result is still the same as before.
I don’t have a CUDA device at hand. My device is a Mac M3 Max, and I commented out “mps” in order to provide you with more accurate information. Could this issue be related to MPS?
from transformers.
I ran your code and image source with the device changed to MPS, and the issue remains the same, except the tensor that caused the issue has different dimensions.
File "/Users/dev/products/dev/workspaces/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
inputs_embeds[image_mask] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [1247, 3584] cannot be broadcast to indexing result of shape [0, 3584]
@toondata can you share the hash pls, I can't find it. But I tried to run your code in the latest
main
and got no errors, can you double check that updating version helps. Also, I see you have 'mps' commented out. We had several problems with size mismatch in LLaVA so the error might be related to that. If you can get try to run it in 'cuda', maybe in colab notebook, would be helpful to see the source of errorimport requests from PIL import Image import torch from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor model_path="Qwen/Qwen2-VL-7B-Instruct" model = Qwen2VLForConditionalGeneration.from_pretrained(model_path, torch_dtype=torch.bfloat16,).to("cuda:0") min_pixels = 256*28*28 max_pixels = 1280*28*28 processor = AutoProcessor.from_pretrained(model_path, min_pixels=min_pixels, max_pixels=max_pixels ) messages = [ { "role": "user", "content": [ { "type": "image" }, { "type": "text", "text": "Extract text from pdf" } ] } ] image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw) text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = processor( text=[text], images=[image],).to("cuda:0") # Inference: Generation of the output generated_ids = model.generate(**inputs, max_new_tokens=128) generated_ids_trimmed = [ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output_text)
from transformers.
maybe #30294 helps, it has a solution that worked for llava with mps
from transformers.
After looking at #30294, I feel the issue might not be related. I switched my local code to run on the CPU, and the problem is the same as with MPS.
image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = processor( text=[text], images=[image],).to("cpu")
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
File "/Users/dev/products/dev/workspaces/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
inputs_embeds[image_mask] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [1247, 3584] cannot be broadcast to indexing result of shape [0, 3584]
from transformers.
I could also get a colab notebook working with the script, and the error on cpu also might happen as per the linked issue.
Let me see if I can get an mps to reproduce it, will need some time to dig
from transformers.
Thank you very much, looking forward to the results of your digings.
from transformers.
Related Issues (20)
- ValueError: Cannot use apply_chat_template() because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation HOT 1
- TypeError: '<' not supported between instances of 'NoneType' and 'int' HOT 2
- Can’t train Mamba2 with FP16 (Mamba(/2)ForCausalLM) HOT 1
- batch inference scales linearly with batch size when input is long HOT 1
- Exception raised with trainer + `accelerate launch` FSDP + large gradient accumulation steps + small dataset
- Do Transformers onnx export support the input of the Llama is the input_embeds? HOT 1
- Cannot batch them ({'num_frames', 'input_features', 'is_last'} != {'input_features', 'is_last'}) HOT 1
- The examples in the examples directory are mostly for fine-tuning pre-trained models?how to trian from scratch
- `gpt2` with `output_attentions=True` has different attentions shape between flash and eager
- Try to convert LlamaForCausalLM to ONNX with input_embeds as input
- eval_loss not found when training a peft model using trainer.py HOT 2
- documents not being applied in apply_chat_tempplate HOT 5
- FileNotFoundError: Unable to Create Symlink in `cache_dir` - No Such File or Directory HOT 2
- Default setting of GenerationConfig
- `Zero-shot object detection` documentation sentence rephrase
- Usage of deprecated task will break CI after datasets-3.0
- Character sets not supported on windows11 HOT 10
- Support Pixtral
- [Distilbert] Torch jit trace failed with `load_in_8bit=True`.
- RuntimeError when performing Mask2Former traced model inference on a different device
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.