Giter Club home page Giter Club logo

Comments (10)

amyer avatar amyer commented on September 12, 2024

from transformers.

LysandreJik avatar LysandreJik commented on September 12, 2024

Thanks for the issue @toondata! You mention it's the same issue as #31377; does applying the same fix here solve the issue?

from transformers.

toondata avatar toondata commented on September 12, 2024

Sorry, my expression may have caused you misunderstanding. I encountered a problem similar to issue #31377. However, given the differences in implementation logic between idefics2 and Qwen/Qwen2-VL-7B-Instruct, I'm unsure whether the causes of these similar phenomena are the same. Despite downloading and compiling the latest mainline code, the issue remains unresolved.

Thanks for the issue @toondata! You mention it's the same issue as #31377; does applying the same fix here solve the issue?

from transformers.

zucchini-nlp avatar zucchini-nlp commented on September 12, 2024

@toondata can you share the hash pls, I can't find it. But I tried to run your code in the latest main and got no errors, can you double check that updating version helps. Also, I see you have 'mps' commented out. We had several problems with size mismatch in LLaVA so the error might be related to that. If you can get try to run it in 'cuda', maybe in colab notebook, would be helpful to see the source of error

import requests
from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor


model_path="Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(model_path, torch_dtype=torch.bfloat16,).to("cuda:0")

min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path,
                            min_pixels=min_pixels, 
                            max_pixels=max_pixels
                        )
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image"
            },
            {
                "type": "text",
                "text": "Extract text from pdf"
            }
        ]
    }
]

image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
text = processor.apply_chat_template(
      messages, tokenize=False, add_generation_prompt=True
)
inputs = processor( text=[text], images=[image],).to("cuda:0")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
       generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

from transformers.

toondata avatar toondata commented on September 12, 2024

Here is my git hash: 96429e7.
I just updated to the latest version on the main branch and reinstalled transformers via pip install, but the result is still the same as before.
I don’t have a CUDA device at hand. My device is a Mac M3 Max, and I commented out “mps” in order to provide you with more accurate information. Could this issue be related to MPS?

from transformers.

toondata avatar toondata commented on September 12, 2024

I ran your code and image source with the device changed to MPS, and the issue remains the same, except the tensor that caused the issue has different dimensions.

File "/Users/dev/products/dev/workspaces/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
inputs_embeds[image_mask] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [1247, 3584] cannot be broadcast to indexing result of shape [0, 3584]

@toondata can you share the hash pls, I can't find it. But I tried to run your code in the latest main and got no errors, can you double check that updating version helps. Also, I see you have 'mps' commented out. We had several problems with size mismatch in LLaVA so the error might be related to that. If you can get try to run it in 'cuda', maybe in colab notebook, would be helpful to see the source of error

import requests
from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration,AutoModel,AutoProcessor


model_path="Qwen/Qwen2-VL-7B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(model_path, torch_dtype=torch.bfloat16,).to("cuda:0")

min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_path,
                            min_pixels=min_pixels, 
                            max_pixels=max_pixels
                        )
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image"
            },
            {
                "type": "text",
                "text": "Extract text from pdf"
            }
        ]
    }
]

image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
text = processor.apply_chat_template(
      messages, tokenize=False, add_generation_prompt=True
)
inputs = processor( text=[text], images=[image],).to("cuda:0")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
       generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

from transformers.

zucchini-nlp avatar zucchini-nlp commented on September 12, 2024

maybe #30294 helps, it has a solution that worked for llava with mps

from transformers.

toondata avatar toondata commented on September 12, 2024

After looking at #30294, I feel the issue might not be related. I switched my local code to run on the CPU, and the problem is the same as with MPS.

image = Image.open(requests.get("https://www.ilankelman.org/stopsigns/australia.jpg", stream=True).raw)
            text = processor.apply_chat_template(
                messages, tokenize=False, add_generation_prompt=True
            )
            inputs = processor( text=[text], images=[image],).to("cpu")

            # Inference: Generation of the output
            generated_ids = model.generate(**inputs, max_new_tokens=128)

File "/Users/dev/products/dev/workspaces/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
inputs_embeds[image_mask] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [1247, 3584] cannot be broadcast to indexing result of shape [0, 3584]

from transformers.

zucchini-nlp avatar zucchini-nlp commented on September 12, 2024

I could also get a colab notebook working with the script, and the error on cpu also might happen as per the linked issue.

Let me see if I can get an mps to reproduce it, will need some time to dig

from transformers.

toondata avatar toondata commented on September 12, 2024

Thank you very much, looking forward to the results of your digings.

from transformers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.