Comments (6)
For what its worth I think people might want to use causal lm to generate embeddings of just the prompt, at least thats the use case I currently have.
from vllm.
mistralai/Mistral-7B-Instruct-v0.2
is anXXXForCausalLM
model.CausalLM
means that it generates text. It should not be used for embeddings. --> see the config:
{
"architectures": [
"MistralForCausalLM" # << this tells us its a generation model
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.36.0",
"use_cache": true,
"vocab_size": 32000
}
intfloat/e5-mistral-7b-instruct
is anXXModel
. This means that the model just generates embeddings. It should be used for embeddings --> see the config:
{
"_name_or_path": "mistralai/Mistral-7B-v0.1",
"architectures": [
"MistralModel" # <<< this tells us its an embedding model
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pad_token_id": 2,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.34.0",
"use_cache": false,
"vocab_size": 32000
}
We automatically detect if the model is an embedding or a generation model based on these config. Supporting embedding models is a new feature. Thank you for bringing this bad UX to my attention.
I am going to update to:
- log a better error message
- make some documentation to help users understand how to use this better
from vllm.
I get it. Thanks for explaining this!
from vllm.
I just ran the example and did not see this issue
What model are you using? This error can occur if you call .encode
on a XXXForCausalLM
.
from vllm.
Interestingly enough, for me example is working fine and I actually see the example results (list of numbers) in my CLI.
Moreover, your error message states:
...
[rank0]: File "home/lib/python3.10/site-packages/vllm/model_executor/sampling_metadata.py", line 210, in _prepare_seq_groups
[rank0]: if sampling_params.seed is not None:
[rank0]: AttributeError: 'NoneType' object has no attribute 'seed'
The problem is that if sampling_params.seed is not None:
is line 208 (not 210) in current version of the file. It seems like you could have modified the file somehow and then it stopped working.
Hope it could help you somehow.
from vllm.
Thanks for all of your help!
I have indeed tried to modify the source code after encountering this error. I've changed back to the original code (though no change of functionality).
Interestingly, the script works well with intfloat/e5-mistral-7b-instruct
. After changing the model to mistralai/Mistral-7B-Instruct-v0.2
, I got the error mentioned earlier. Do you have suggestions how I can use for this specific model? Really appreciate your help!
from vllm.
Related Issues (20)
- [Usage]: Does vllm support dynamic quantization HOT 1
- [Feature]: support voice llm like cosyvoice HOT 1
- [Bug]: Extra body don't work when response_format is also sent for serving. HOT 6
- [Feature]: Small Model Large Latency Compared to SGLang and TensorRT-LLM HOT 2
- [Bug]: `ops.scaled_fp8_quant` returns wrong shape when input shape is () HOT 1
- [Bug]: LLama3 LoRA load failed HOT 2
- [Bug]:`vllm server` will get some error and `python3 -m vllm.entrypoints.openai.api_server` is correct HOT 2
- [Bug]: internvl2-8b 提问无限循环回答 HOT 1
- [Bug]: internvl2-8b提问无限循环 HOT 2
- [Feature]: Why vllm cli not provide a config arg? HOT 4
- Create speculative decode dynamic parallel strategy HOT 1
- [Bug]: CUDA out of memory for llama3.1 70gb gptq, while in llama3 70gb gptq doesn't HOT 2
- [Feature]: continuous batching for vllm.LLM HOT 3
- [Bug]: Using LLM Engine to infer the MiniCPM-V-2_6 model, the result is wrong HOT 2
- [Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already. HOT 2
- [Bug]: `gemma-2-27b-it-GGUF`: `Architecture gemma2 not supported` HOT 5
- [RFC]: Encoder/decoder models & feature compatibility HOT 3
- [Usage]: how to use LLM class with AsyncLLMEngine HOT 2
- [Installation]: git clone cutlass fails HOT 7
- [Misc]: Improving VLLM KVCACHE Transfer Efficiency with NCCL P2P Communication HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.