Comments (7)
ref vLLM include_stop_str_in_output
https://github.com/vllm-project/vllm/blob/c96fc067479453b02e92d9378eeeaebb6b3816de/vllm/sampling_params.py#L176-L183
from lmdeploy.
Thank you for your feedback. I will take a look at include_stop_str_in_output
. Since it is not clear whether openai use 'stop str' or 'stop token id', I will look into the behavior of their apis with or without streaming.
from lmdeploy.
it is not clear whether openai use 'stop str' or 'stop token id'
I forgot to mention, but ["the"]
works correctly as a stop
param, so the fact that ["\n\n"]
does not work indicates to me that this issue is related to exact token matching/alignment.
from lmdeploy.
ref https://help.openai.com/en/articles/5072263-how-do-i-use-stop-sequences-in-the-openai-api
![](https://private-user-images.githubusercontent.com/46627482/337834398-2ce2726b-59bf-4dc9-a9a5-36bb0efccf29.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjM3ODQwNDcsIm5iZiI6MTcyMzc4Mzc0NywicGF0aCI6Ii80NjYyNzQ4Mi8zMzc4MzQzOTgtMmNlMjcyNmItNTliZi00ZGM5LWE5YTUtMzZiYjBlZmNjZjI5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODE2VDA0NDkwN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTJjMTE0Mzc1OGZjNjBkMzI0NTg2NTVlNmQ1MzBmYWY1NzMyNzM4M2JmOTA3Yzk0NjgzODFjNzUyOWEyZGJiOTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.RxbWl-7ilAbAj5mJbNqOJJgqriFcYOJaVbj623fS3ZM)
from lmdeploy.
Also, with vLLM, if finish_reason
is "stop"
, and for example it was due to a stop
parameter like stop:["\n"]
, then the EventStream message JSON also has stop_reason
, like this:
...
"finish_reason":"stop",
"stop_reason":"\n",
...
I.e. indicating which stop string caused the stop. This is a handy feature, and nice for compatibility with vLLM (ease of transition), but not strictly necessary if include_stop_str_in_output
feature is implemented.
from lmdeploy.
For others who are hitting this issue, but who desperately want to use LMDeploy, you can of course remove the stop
parameter, and then manually check for the stop
strings in the full generated text each time you receive a new token, and then manually abort the request if one of those stop strings is detected. That's my current workaround.
from lmdeploy.
In LMDeploy, the word in the stop_words
list is supposed to be tokenized to ONE token id.
It does not support words that can be tokenized into multiple tokens as stop words now.
We have plans to resolve it. But it will take a while.
from lmdeploy.
Related Issues (20)
- i want to run profile_throughput.py using the smooth_quant model. Why did an error occur? HOT 3
- [Bug] 使用 lmdeploy 部署 internVL2-40B-AWQ, 容器中有triton环境,但是在triton环境检查时报错
- [Bug] 通过lmdeploy上线 Qwen-vl及其lora,但检查后发现lora并没有上线成功 HOT 3
- [Bug] Lmdeploy LLM Llama3在4090单卡和双卡上的推理结果不一致
- [Feature] multi-node training HOT 2
- [Bug] LMDeploy docker image with finetuned InternVL model doesnt work HOT 1
- [Bug] lmdeploy卡住,不能接收任何请求 HOT 3
- smooth 量化后推理性能没有提升 HOT 1
- [Feature] Add `logits_processor` to `GenerationConfig` HOT 3
- CPU offload when InternVL2-40B inference using lmdeploy.pipeline HOT 1
- [Docs] llava-llama3的图片预处理和前向推理过程 HOT 2
- [Bug] internvl2-2b使用awq量化后,推理速度基本上没有提升,精度还掉点 HOT 4
- [Bug] lmdeploy部署报错API call is not supported in the installed CUDA driver HOT 5
- [Bug] 一张卡上部署多个模型 HOT 3
- question about implements LRU policy
- [Feature] Support InternVL2-1B with the Turbomind Engine?
- 能否支持InternVL2-8B量化,有无相关文档 HOT 1
- [Bug] lmdeploy - ERROR - run out of tokens. session_id=1 HOT 1
- Scale out llm model deployment across different machine gpu's HOT 1
- [Feature] 能否在新版本中增加SM60级别的N卡适配
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lmdeploy.