System Info transformers ve

Thanks for the issue <a class="user-mention notranslate" data-hovercard-type="user" da

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Here is my git hash: <a class="commit-link" data-hovercard-type="commit" data-hovercar

maybe <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-

After looking at <a class="issue-link js-issue-link" data-error-text="Failed to load t

I could also get a <a href="https://colab.research.google.com/drive/1AuCWOsif9DplLb5I3

The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct about transformers HOT 10 OPEN

toondata commented on September 12, 2024 1

The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct

from transformers.

Comments (10)

amyer commented on September 12, 2024

这是来自QQ邮箱的假期自动回复邮件。您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。

from transformers.

LysandreJik commented on September 12, 2024

Thanks for the issue @toondata! You mention it's the same issue as #31377; does applying the same fix here solve the issue?

from transformers.

toondata commented on September 12, 2024

Sorry, my expression may have caused you misunderstanding. I encountered a problem similar to issue #31377. However, given the differences in implementation logic between idefics2 and Qwen/Qwen2-VL-7B-Instruct, I'm unsure whether the causes of these similar phenomena are the same. Despite downloading and compiling the latest mainline code, the issue remains unresolved.

Thanks for the issue @toondata! You mention it's the same issue as #31377; does applying the same fix here solve the issue?

from transformers.

zucchini-nlp commented on September 12, 2024

@toondata can you share the hash pls, I can't find it. But I tried to run your code in the latest main and got no errors, can you double check that updating version helps. Also, I see you have 'mps' commented out. We had several problems with size mismatch in LLaVA so the error might be related to that. If you can get try to run it in 'cuda', maybe in colab notebook, would be helpful to see the source of error

import requests
from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor


model_path="Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(model_path, torch_dtype=torch.bfloat16,).to("cuda:0")

min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path,
                            min_pixels=min_pixels, 
                            max_pixels=max_pixels
                        )
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image"
            },
            {
                "type": "text",
                "text": "Extract text from pdf"
            }
        ]
    }
]

image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
text = processor.apply_chat_template(
      messages, tokenize=False, add_generation_prompt=True
)
inputs = processor( text=[text], images=[image],).to("cuda:0")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
       generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

from transformers.

toondata commented on September 12, 2024

Here is my git hash: 96429e7.
I just updated to the latest version on the main branch and reinstalled transformers via pip install, but the result is still the same as before.
I don’t have a CUDA device at hand. My device is a Mac M3 Max, and I commented out “mps” in order to provide you with more accurate information. Could this issue be related to MPS?

from transformers.

toondata commented on September 12, 2024

I ran your code and image source with the device changed to MPS, and the issue remains the same, except the tensor that caused the issue has different dimensions.

File "/Users/dev/products/dev/workspaces/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
inputs_embeds[image_mask] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [1247, 3584] cannot be broadcast to indexing result of shape [0, 3584]

import requests
from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor


model_path="Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(model_path, torch_dtype=torch.bfloat16,).to("cuda:0")

min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path,
                            min_pixels=min_pixels, 
                            max_pixels=max_pixels
                        )
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image"
            },
            {
                "type": "text",
                "text": "Extract text from pdf"
            }
        ]
    }
]

image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
text = processor.apply_chat_template(
      messages, tokenize=False, add_generation_prompt=True
)
inputs = processor( text=[text], images=[image],).to("cuda:0")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
       generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

from transformers.

zucchini-nlp commented on September 12, 2024

maybe #30294 helps, it has a solution that worked for llava with mps

from transformers.

toondata commented on September 12, 2024

After looking at #30294, I feel the issue might not be related. I switched my local code to run on the CPU, and the problem is the same as with MPS.

image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
            text = processor.apply_chat_template(
                messages, tokenize=False, add_generation_prompt=True
            )
            inputs = processor( text=[text], images=[image],).to("cpu")

            # Inference: Generation of the output
            generated_ids = model.generate(**inputs, max_new_tokens=128)

from transformers.

zucchini-nlp commented on September 12, 2024

I could also get a colab notebook working with the script, and the error on cpu also might happen as per the linked issue.

Let me see if I can get an mps to reproduce it, will need some time to dig

from transformers.

toondata commented on September 12, 2024

Thank you very much, looking forward to the results of your digings.

from transformers.

The same situation as #31377 occurred when using Qwen/Qwen2-VL-7B-Instruct about transformers HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent