Comments (3)
Can confirm that the same command + model works fine on 2x H100s, so I'm guessing it's just due to needing 140GB VRAM for 70B models. If the CPU issue is a wontfix, please feel free to close. Thanks!
from lmdeploy.
May set a shorter calib_seqlen since there is no need to make it 2048 which is a default value.
from lmdeploy.
I haven't yet had another chance to test, since 2xH100 (160GB VRAM) fixed the issue for me, but when I do another conversion I will try that. Thanks for your advice! I will close this now - please feel free to reopen it of course if the CPU error is something that needs to be fixed.
from lmdeploy.
Related Issues (20)
- i want to run profile_throughput.py using the smooth_quant model. Why did an error occur? HOT 3
- [Bug] 使用 lmdeploy 部署 internVL2-40B-AWQ, 容器中有triton环境,但是在triton环境检查时报错
- [Bug] 通过lmdeploy上线 Qwen-vl及其lora,但检查后发现lora并没有上线成功 HOT 3
- [Bug] Lmdeploy LLM Llama3在4090单卡和双卡上的推理结果不一致
- [Feature] multi-node training HOT 2
- [Bug] LMDeploy docker image with finetuned InternVL model doesnt work HOT 1
- [Bug] lmdeploy卡住,不能接收任何请求 HOT 3
- smooth 量化后推理性能没有提升 HOT 1
- [Feature] Add `logits_processor` to `GenerationConfig` HOT 3
- CPU offload when InternVL2-40B inference using lmdeploy.pipeline HOT 1
- [Docs] llava-llama3的图片预处理和前向推理过程 HOT 2
- [Bug] internvl2-2b使用awq量化后,推理速度基本上没有提升,精度还掉点 HOT 4
- [Bug] lmdeploy部署报错API call is not supported in the installed CUDA driver HOT 5
- [Bug] 一张卡上部署多个模型 HOT 3
- question about implements LRU policy
- [Feature] Support InternVL2-1B with the Turbomind Engine?
- 能否支持InternVL2-8B量化,有无相关文档 HOT 1
- [Bug] lmdeploy - ERROR - run out of tokens. session_id=1 HOT 1
- Scale out llm model deployment across different machine gpu's HOT 1
- [Feature] 能否在新版本中增加SM60级别的N卡适配
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lmdeploy.