Comments (4)
@njhill thanks. it works.
version 0.4.2
llm = LLM(model= model_local_path, dtype='float16', HF_HUB_OFFLINE=1)
from vllm.
@yananchen1989 what version are you using? Changes were made recently to not require a connection to HF hub if the model is in the local cache. And you can prevent it from attempting to connect at all by setting HF_HUB_OFFLINE=1
.
from vllm.
@yananchen1989 actually HF_HUB_OFFLINE=1
is an env variable, not an arg passed to the LLM
constructor.
You should be able to remove that, it will still work because it should still fall back to local cache if it can't contact HF hub.
from vllm.
My vllm version is 0.5.0.post1, and it shows me an error when I don't use HF_HUB_OFFLINE=1. Maybe it is because I'm using a private repository. And I'm using AsyncLLMEngine, so it is another possibility that causes an error.
from vllm.
Related Issues (20)
- [Performance]: Add weaker memory fence for custom allreduce
- [Bug]: How can I run VLLM serving without an internet connection? I tried setting the global variable but it still trying to connect to huggingface HOT 2
- [Usage]: How to use AutoModelForSequenceClassification correctly HOT 6
- [Installation]: vLLM Not Working on x86 CPUs from v0.6.1 Onwards HOT 5
- [Bug]: Model architectures ['Qwen2AudioForConditionalGeneration'] are not supported for now. HOT 3
- [New Model]: We can able to run phi-3.5 vision instruct model but wanted to run in int4 quantization HOT 8
- [Usage]: Collect performance metrics in offline serving
- [Performance]: Moving the initialisation of the v variable in the _fwd_kernel() function has an effect on performance.
- [Usage]: Get first token latency HOT 1
- [Bug]: AMD with multi-step enabled crashes HOT 1
- [Bug]: CPU silently doesn't support prompt adapter
- [Bug]: : CPU silently doesn't support multi-step (--num-scheduler-steps)
- [Feature]: Batch inference for `llm.chat()` API HOT 1
- [Bug]: crash on Triton backends when start vllm 0.6.1 HOT 5
- [Usage]: Dose vLLM support embedding api of multimodal llm? HOT 3
- [Usage]: Model Qwen2VLForConditionalGeneration does not support LoRA, but LoRA is enabled. HOT 2
- [Installation]: How to install vLLM on Jetson
- [Usage]: Running LLMEngine (or AsyncLLMEngine) in Arrow Flight RPC do_exchange()
- [Installation]: I want to install with CPU follow the guide on windows (wsl2, ubuntu) but the wsl crash. HOT 2
- [Bug]: L40 GPU deepseek-v2 fp8 cuda graph error; Using `--enforce-eager` can run properly. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.