Comments (5)
Sometimes it works landscape images of certain sizes. Some times it also crashes. Do images sizes have to be multiples of 336?
from text-generation-inference.
Same problem Method Prefill encountered an error
from text-generation-inference.
It seems that the current implementation counts the tokens generated from the encoded image as part of the prompt length.
It might be better to extract the image features first and then calculate the prompt token length separately. I'm not sure if TGI has support for this approach, as it could be quite involved.
from text-generation-inference.
Same issue, only width == height image works
from text-generation-inference.
I have the same issue, it seems to be linked to image sizes. I found that some sizes work in TGI v2.0.1 but not in TGI v2.0.2, and inversely.
I made here a recap for image size I tested. Note that the 2-bis image is the 2 image cropped, to ensure that the dimension is causing the issue.
Image | dimension | ratio L/W | works in v2.0.1 | works in v2.0.2 |
---|---|---|---|---|
1 | 450 x 299 | 1.505 | No | Yes |
2 | 800 x 531 | 1.506 | Yes | No |
2 bis | 450 x 299 | 1.505 | No | Yes |
3 | 300 x 168 | 1.785 | No | Yes |
4 | 640 x 480 | 1. 333 | Yes | Yes |
5 | 934 x 934 (square) | 1 | Yes | Yes |
When the image hasn't the right dimension, the server encounters an error and crashes. Here are the logs I get:
v2.0.1 (image 1 crash)
ERROR text_generation_launcher: Method Prefill encountered an error.
...
RuntimeError: shape mismatch: value tensor of shape [1464, 4096] cannot be broadcast to indexing result of shape [1376, 4096]
...
ERROR batch{batch_size=1}:prefill:prefill{id=0 size=1}:prefill{id=0 size=1}: text_generation_client: router/client/src/lib.rs:33: Server error: CANCELLED
ERROR batch{batch_size=1}:prefill:clear_cache{batch_id=Some(0)}:clear_cache{batch_id=Some(0)}: text_generation_client: router/client/src/lib.rs:33: Server error: transport error
ERROR chat_completions:generate:generate_stream:infer:send_error: text_generation_router::infer: router/src/infer.rs:866: Request failed during generation: Server error: CANCELLED
...
ERROR text_generation_launcher: Shard 0 crashed
v2.0.2 (image 2 crash, not happening at warmup)
INFO text_generation_launcher: Found 2095 in image of resolution 531x800
ERROR text_generation_launcher: Method Prefill encountered an error.
...
RuntimeError: shape mismatch: value tensor of shape [2144, 4096] cannot be broadcast to indexing result of shape [2095, 4096]
...
RuntimeError: Cannot fill images right now. If error happens at warmup, make sure you have enough `--max-input-tokens` to handle images. If error happens at regular runtime, please fill in an issue: shape mismatch: value tensor of shape [2144, 4096] cannot be broadcast to indexing result of shape [2095, 4096]
...
ERROR batch{batch_size=1}:prefill:prefill{id=0 size=1}:prefill{id=0 size=1}: text_generation_client: router/client/src/lib.rs:33: Server error: CANCELLED
ERROR batch{batch_size=1}:prefill:clear_cache{batch_id=Some(0)}:clear_cache{batch_id=Some(0)}: text_generation_client: router/client/src/lib.rs:33: Server error: transport error
ERROR chat_completions:generate:generate_stream:infer:send_error: text_generation_router::infer: router/src/infer.rs:866: Request failed during generation: Server error: CANCELLED
...
ERROR text_generation_launcher: Shard 0 crashed
My model info
{
model_id: "llava-hf/llava-v1.6-mistral-7b-hf",
validation_workers: 2,
trust_remote_code: false,
max_concurrent_requests: 128,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: Some(4000),
max_total_tokens: Some(5000),
waiting_served_ratio: 0.3,
max_waiting_tokens: 20,
hostname: "0.0.0.0",
port: 80,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: Some("/data"),
disable_custom_kernels: false,
cuda_memory_fraction: 1.0,
json_output: false,
cors_allow_origin: [],
ngrok: false,
disable_grammar_support: false,
env: false,
max_client_batch_size: 4,
}
from text-generation-inference.
Related Issues (20)
- Canno launch with error exllamav2_kernels not installed. HOT 4
- Failing to start a TGI pod with 2 or more GPUs. Sharding fails.
- Unable to stop TGI after serving models HOT 1
- Do I need to additionally apply an inference template?
- UserWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0
- Serverless inference API endpoints fails to return logprobs via chat completions
- 404 for Multi-modal docs
- The quantized llama-3-8b-instruct-awq with TGI 1.4 can handle fewer batch requests than the standard llama-3-8b-instruct with TGI 1.4 on the same RTX 3090 with 24GB VRAM.
- Add `stop_regex` parameter to `/generate` HOT 1
- Add `grammar` to chat/completions endpoint / Messages API
- Add Intel Arc iGPU support (Meteor Lake)
- TGI-2.0.2 encounter "CUDA is not available"
- Encounter install error when install vllm package.
- Mistral7b takes 4 times its size in VRAM on A100 HOT 5
- Regarding llama3-70b-instruct
- Use pre-built FA2, vllm, quantization kernels in the dockerfiles
- "docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data -e HUGGING_FACE_HUB_TOKEN={your_token} ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard" showing error with my token id that "Unable to find image 'ghcr.io/huggingface/text-generation-inference:latest' locally latest: Pulling from huggingface/text-generation-inference docker: no matching manifest for linux/arm64/v8 in the manifest list entries. See 'docker run --help'."
- Cannot use Inference Endpoint: UnprocessableEntityError: Error code: 422 - {'error': 'Template error: template not found', 'error_type': 'template_error'} HOT 1
- llama3-70B-Instruct-AWQ causing CUDA error: an illegal memory access was encountered
- how do I adjust the logging level when launching via the docker container?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text-generation-inference.