Comments (14)
Thanks to @mgoin for the mention.
#5036 have currently addressed this issue preliminarily, we have tested it on TITAN RTX. You can clone this branch, and build.
from vllm.
You should clone my repo using:
git clone -b refactor-punica-kernel https://github.com/jeejeelee/vllm.git
from vllm.
@emillykkejensen I can run awq+lora properly on TITAN RTX. FYI https://github.com/vllm-project/vllm/blob/main/csrc/quantization/awq/dequantize.cuh#L18
from vllm.
Hi again @jeejeelee
Sorry for that, you are 100% right! If I do the above, but clone the correct branch (!!) it works.
Thanks for the fix, and hope it will be merged into master soon :)
from vllm.
Have the same issue, however is running it on an Azure VM with a T4 GPU using docker
from vllm.
Hi @rikitomo and @emillykkejensen, it is unfortunately the case that punica does not support T4 or V100, per #3197
Please follow up with this in the issue on their repo punica-ai/punica#44. Once it is addressed, we can pull in the updated kernels into vLLM - thanks!
On another note: perhaps this will be addressed by this recent work on using Triton for LoRA inference! #5036
from vllm.
Hi @jeejeelee
Thanks a lot for the proposed fix. However, when I try to build from your branch I get the same error. I'm building inside a Docker Container, so don't know if that is the issue.
What I did:
docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3
# and then from within the container
git clone https://github.com/jeejeelee/vllm.git
cd vllm
export VLLM_INSTALL_PUNICA_KERNELS=1
pip install -e .
Ones done building I ran:
python -m vllm.entrypoints.openai.api_server \
--model TheBloke/TinyLlama-1.1B-Chat-v0.3-AWQ \
--quantization awq \
--dtype half \
--enable-lora \
--enforce-eager \
--gpu-memory-utilization 0.90 \
--lora-modules sql-lora=jashing/tinyllama-colorist-lora/
That gave me this output:
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
config.json: 100%|███████████████████████████████████████████████████████████████████████████| 854/854 [00:00<00:00, 11.6MB/s]
WARNING 06-11 08:57:14 config.py:192] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO 06-11 08:57:14 llm_engine.py:103] Initializing an LLM engine (v0.4.2) with config: model='TheBloke/TinyLlama-1.1B-Chat-v0.3-AWQ', speculative_config=None, tokenizer='TheBloke/TinyLlama-1.1B-Chat-v0.3-AWQ', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=awq, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=TheBloke/TinyLlama-1.1B-Chat-v0.3-AWQ)
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████| 1.42k/1.42k [00:00<00:00, 25.9MB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 35.2MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 18.1MB/s]
added_tokens.json: 100%|███████████████████████████████████████████████████████████████████| 69.0/69.0 [00:00<00:00, 1.30MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████| 96.0/96.0 [00:00<00:00, 1.90MB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
generation_config.json: 100%|██████████████████████████████████████████████████████████████| 68.0/68.0 [00:00<00:00, 1.10MB/s]
INFO 06-11 08:57:16 selector.py:113] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 06-11 08:57:16 selector.py:44] Using XFormers backend.
INFO 06-11 08:57:18 weight_utils.py:206] Using model weights format ['*.safetensors']
model.safetensors: 100%|████████████████████████████████████████████████████████████████████| 766M/766M [00:02<00:00, 262MB/s]
INFO 06-11 08:57:22 model_runner.py:146] Loading model weights took 0.7370 GB
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]: return _run_code(code, main_globals, None,
[rank0]: File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]: exec(code, run_globals)
[rank0]: File "/workspace/vllm/vllm/entrypoints/openai/api_server.py", line 186, in <module>
[rank0]: engine = AsyncLLMEngine.from_engine_args(
[rank0]: File "/workspace/vllm/vllm/engine/async_llm_engine.py", line 382, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/workspace/vllm/vllm/engine/async_llm_engine.py", line 336, in __init__
[rank0]: self.engine = self._init_engine(*args, **kwargs)
[rank0]: File "/workspace/vllm/vllm/engine/async_llm_engine.py", line 458, in _init_engine
[rank0]: return engine_class(*args, **kwargs)
[rank0]: File "/workspace/vllm/vllm/engine/llm_engine.py", line 178, in __init__
[rank0]: self._initialize_kv_caches()
[rank0]: File "/workspace/vllm/vllm/engine/llm_engine.py", line 255, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: File "/workspace/vllm/vllm/executor/gpu_executor.py", line 75, in determine_num_available_blocks
[rank0]: return self.driver_worker.determine_num_available_blocks()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/workspace/vllm/vllm/worker/worker.py", line 154, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/workspace/vllm/vllm/worker/model_runner.py", line 787, in profile_run
[rank0]: self.execute_model(seqs, kv_caches)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/workspace/vllm/vllm/worker/model_runner.py", line 706, in execute_model
[rank0]: hidden_states = model_executable(**execute_model_kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/workspace/vllm/vllm/model_executor/models/llama.py", line 367, in forward
[rank0]: hidden_states = self.model(input_ids, positions, kv_caches,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/workspace/vllm/vllm/model_executor/models/llama.py", line 292, in forward
[rank0]: hidden_states, residual = layer(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/workspace/vllm/vllm/model_executor/models/llama.py", line 231, in forward
[rank0]: hidden_states = self.self_attn(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/workspace/vllm/vllm/model_executor/models/llama.py", line 160, in forward
[rank0]: qkv, _ = self.qkv_proj(hidden_states)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/workspace/vllm/vllm/lora/layers.py", line 470, in forward
[rank0]: output_parallel = self.apply(input_, bias)
[rank0]: File "/workspace/vllm/vllm/lora/layers.py", line 853, in apply
[rank0]: output = self.base_layer.quant_method.apply(self.base_layer, x, bias)
[rank0]: File "/workspace/vllm/vllm/model_executor/layers/quantization/awq.py", line 168, in apply
[rank0]: out = ops.awq_dequantize(qweight, scales, qzeros, 0, 0, 0)
[rank0]: File "/workspace/vllm/vllm/_custom_ops.py", line 119, in awq_dequantize
[rank0]: return vllm_ops.awq_dequantize(qweight, scales, zeros, split_k_iters, thx,
[rank0]: RuntimeError: CUDA error: no kernel image is available for execution on the device
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[rank0]: Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
[rank0]: frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x70ddb257a897 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
[rank0]: frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x70ddb252ab25 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
[rank0]: frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x70ddb29e1718 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
[rank0]: frame #3: <unknown function> + 0x2ea76 (0x70ddb29bda76 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
[rank0]: frame #4: <unknown function> + 0x343e4 (0x70ddb29c33e4 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
[rank0]: frame #5: <unknown function> + 0x35ca7 (0x70ddb29c4ca7 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
[rank0]: frame #6: <unknown function> + 0x360e7 (0x70ddb29c50e7 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
[rank0]: frame #7: <unknown function> + 0x1866589 (0x70dd9a7bb589 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
[rank0]: frame #8: at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>) + 0x14 (0x70dd9a7b51e4 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
[rank0]: frame #9: at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>) + 0x111 (0x70dd660f6641 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[rank0]: frame #10: at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>) + 0x36 (0x70dd660f6916 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[rank0]: frame #11: at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>) + 0x20 (0x70dd66334a30 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[rank0]: frame #12: <unknown function> + 0x329a789 (0x70dd6833f789 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[rank0]: frame #13: <unknown function> + 0x329a86b (0x70dd6833f86b in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[rank0]: frame #14: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>) + 0xe7 (0x70dd9b7b9be7 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
[rank0]: frame #15: <unknown function> + 0x2c10def (0x70dd9bb65def in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
[rank0]: frame #16: at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>) + 0x1a0 (0x70dd9b801a00 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
[rank0]: frame #17: at::empty(c10::ArrayRef<long>, c10::TensorOptions, std::optional<c10::MemoryFormat>) + 0x150 (0x70dcec735c60 in /workspace/vllm/vllm/_C.cpython-310-x86_64-linux-gnu.so)
[rank0]: frame #18: torch::empty(c10::ArrayRef<long>, c10::TensorOptions, std::optional<c10::MemoryFormat>) + 0x8a (0x70dcec735dea in /workspace/vllm/vllm/_C.cpython-310-x86_64-linux-gnu.so)
[rank0]: frame #19: awq_dequantize(at::Tensor, at::Tensor, at::Tensor, int, int, int) + 0x249 (0x70dcec759609 in /workspace/vllm/vllm/_C.cpython-310-x86_64-linux-gnu.so)
[rank0]: frame #20: <unknown function> + 0xf5449 (0x70dcec74f449 in /workspace/vllm/vllm/_C.cpython-310-x86_64-linux-gnu.so)
[rank0]: frame #21: <unknown function> + 0xf123d (0x70dcec74b23d in /workspace/vllm/vllm/_C.cpython-310-x86_64-linux-gnu.so)
[rank0]: <omitting python frames>
from vllm.
@emillykkejensen It seems that the error is triggered by awq
. It's possible that awq
only supports SM80+. Have you tested lora using an FP16 model?
from vllm.
So I tried to build a local docker image using your branch: (docker build -t my-vllm-image https://github.com/jeejeelee/vllm.git#refactor-punica-kernel
)
It seems to load vLLM and also load the model okay, but when I call it I get the following error:
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
WARNING 06-13 11:09:54 config.py:192] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO 06-13 11:09:54 llm_engine.py:103] Initializing an LLM engine (v0.4.2) with config: model='TheBloke/TinyLlama-1.1B-Chat-v0.3-AWQ', speculative_config=None, tokenizer='TheBloke/TinyLlama-1.1B-Chat-v0.3-AWQ', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=awq, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=TheBloke/TinyLlama-1.1B-Chat-v0.3-AWQ)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 06-13 11:09:55 selector.py:120] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 06-13 11:09:55 selector.py:51] Using XFormers backend.
INFO 06-13 11:09:56 selector.py:120] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 06-13 11:09:56 selector.py:51] Using XFormers backend.
INFO 06-13 11:09:56 weight_utils.py:207] Using model weights format ['*.safetensors']
INFO 06-13 11:09:57 weight_utils.py:250] No model.safetensors.index.json found in remote.
INFO 06-13 11:10:08 model_runner.py:146] Loading model weights took 0.7370 GB
INFO 06-13 11:10:11 gpu_executor.py:83] # GPU blocks: 32795, # CPU blocks: 11915
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 06-13 11:10:14 serving_chat.py:83] No chat template provided. Chat API will not work.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 06-13 11:10:15 serving_embedding.py:131] embedding_mode is False. Embedding API will not work.
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO 06-13 11:10:25 metrics.py:341] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 06-13 11:10:35 metrics.py:341] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 06-13 11:10:45 metrics.py:341] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 06-13 11:10:55 metrics.py:341] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 06-13 11:11:05 metrics.py:341] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 06-13 11:11:15 metrics.py:341] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 06-13 11:11:23 async_llm_engine.py:545] Received request cmpl-ca79698496dd4702a6e821afaef7b588-0: prompt: 'San Francisco is a', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=7, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [1, 3087, 8970, 338, 263], lora_request: LoRARequest(lora_name='sql-lora', lora_int_id=1, lora_local_path='jashing/tinyllama-colorist-lora/', long_lora_max_len=None).
WARNING 06-13 11:11:23 tokenizer.py:142] No tokenizer found in jashing/tinyllama-colorist-lora/, using base model tokenizer instead. (Exception: Incorrect path_or_model_id: 'jashing/tinyllama-colorist-lora/'. Please provide either the path to a local folder or the repo_id of a model on the Hub.)
ERROR 06-13 11:11:23 async_llm_engine.py:44] Engine background task failed
ERROR 06-13 11:11:23 async_llm_engine.py:44] Traceback (most recent call last):
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 174, in _load_lora
ERROR 06-13 11:11:23 async_llm_engine.py:44] lora = self._lora_model_cls.from_local_checkpoint(
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/lora/models.py", line 314, in from_local_checkpoint
ERROR 06-13 11:11:23 async_llm_engine.py:44] with open(lora_config_path) as f:
ERROR 06-13 11:11:23 async_llm_engine.py:44] FileNotFoundError: [Errno 2] No such file or directory: 'jashing/tinyllama-colorist-lora/adapter_config.json'
ERROR 06-13 11:11:23 async_llm_engine.py:44]
ERROR 06-13 11:11:23 async_llm_engine.py:44] The above exception was the direct cause of the following exception:
ERROR 06-13 11:11:23 async_llm_engine.py:44]
ERROR 06-13 11:11:23 async_llm_engine.py:44] Traceback (most recent call last):
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 39, in _raise_exception_on_finish
ERROR 06-13 11:11:23 async_llm_engine.py:44] task.result()
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 517, in run_engine_loop
ERROR 06-13 11:11:23 async_llm_engine.py:44] has_requests_in_progress = await asyncio.wait_for(
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
ERROR 06-13 11:11:23 async_llm_engine.py:44] return fut.result()
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 491, in engine_step
ERROR 06-13 11:11:23 async_llm_engine.py:44] request_outputs = await self.engine.step_async()
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 225, in step_async
ERROR 06-13 11:11:23 async_llm_engine.py:44] output = await self.model_executor.execute_model_async(
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 117, in execute_model_async
ERROR 06-13 11:11:23 async_llm_engine.py:44] output = await make_async(self.driver_worker.execute_model
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
ERROR 06-13 11:11:23 async_llm_engine.py:44] result = self.fn(*self.args, **self.kwargs)
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 06-13 11:11:23 async_llm_engine.py:44] return func(*args, **kwargs)
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 272, in execute_model
ERROR 06-13 11:11:23 async_llm_engine.py:44] output = self.model_runner.execute_model(seq_group_metadata_list,
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 06-13 11:11:23 async_llm_engine.py:44] return func(*args, **kwargs)
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 689, in execute_model
ERROR 06-13 11:11:23 async_llm_engine.py:44] self.set_active_loras(lora_requests, lora_mapping)
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 827, in set_active_loras
ERROR 06-13 11:11:23 async_llm_engine.py:44] self.lora_manager.set_active_loras(lora_requests, lora_mapping)
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 137, in set_active_loras
ERROR 06-13 11:11:23 async_llm_engine.py:44] self._apply_loras(lora_requests)
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 266, in _apply_loras
ERROR 06-13 11:11:23 async_llm_engine.py:44] self.add_lora(lora)
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 274, in add_lora
ERROR 06-13 11:11:23 async_llm_engine.py:44] lora = self._load_lora(lora_request)
ERROR 06-13 11:11:23 async_llm_engine.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 187, in _load_lora
ERROR 06-13 11:11:23 async_llm_engine.py:44] raise RuntimeError(
ERROR 06-13 11:11:23 async_llm_engine.py:44] RuntimeError: Loading lora jashing/tinyllama-colorist-lora/ failed
Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7328afd917e0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7328a5023160>>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7328afd917e0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7328a5023160>>)>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 174, in _load_lora
lora = self._lora_model_cls.from_local_checkpoint(
File "/usr/local/lib/python3.10/dist-packages/vllm/lora/models.py", line 314, in from_local_checkpoint
with open(lora_config_path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'jashing/tinyllama-colorist-lora/adapter_config.json'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 39, in _raise_exception_on_finish
task.result()
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 517, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 491, in engine_step
request_outputs = await self.engine.step_async()
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 225, in step_async
output = await self.model_executor.execute_model_async(
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 117, in execute_model_async
output = await make_async(self.driver_worker.execute_model
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 272, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 689, in execute_model
self.set_active_loras(lora_requests, lora_mapping)
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 827, in set_active_loras
self.lora_manager.set_active_loras(lora_requests, lora_mapping)
File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 137, in set_active_loras
self._apply_loras(lora_requests)
File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 266, in _apply_loras
self.add_lora(lora)
File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 274, in add_lora
lora = self._load_lora(lora_request)
File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 187, in _load_lora
raise RuntimeError(
RuntimeError: Loading lora jashing/tinyllama-colorist-lora/ failed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 46, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
INFO 06-13 11:11:23 async_llm_engine.py:157] Aborted request cmpl-ca79698496dd4702a6e821afaef7b588-0.
INFO: 172.17.0.1:55552 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 174, in _load_lora
lora = self._lora_model_cls.from_local_checkpoint(
File "/usr/local/lib/python3.10/dist-packages/vllm/lora/models.py", line 314, in from_local_checkpoint
with open(lora_config_path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'jashing/tinyllama-colorist-lora/adapter_config.json'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 118, in create_completion
generator = await openai_serving_completion.create_completion(
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_completion.py", line 155, in create_completion
async for i, res in result_generator:
File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 241, in consumer
raise e
File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 234, in consumer
raise item
File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 218, in producer
async for item in iterator:
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 662, in generate
async for output in self.process_request(
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 780, in process_request
raise e
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 776, in process_request
async for request_output in stream:
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 79, in __anext__
raise result
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 39, in _raise_exception_on_finish
task.result()
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 517, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 491, in engine_step
request_outputs = await self.engine.step_async()
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 225, in step_async
output = await self.model_executor.execute_model_async(
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 117, in execute_model_async
output = await make_async(self.driver_worker.execute_model
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 272, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 689, in execute_model
self.set_active_loras(lora_requests, lora_mapping)
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 827, in set_active_loras
self.lora_manager.set_active_loras(lora_requests, lora_mapping)
File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 137, in set_active_loras
self._apply_loras(lora_requests)
File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 266, in _apply_loras
self.add_lora(lora)
File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 274, in add_lora
lora = self._load_lora(lora_request)
File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 187, in _load_lora
raise RuntimeError(
RuntimeError: Loading lora jashing/tinyllama-colorist-lora/ failed
from vllm.
FileNotFoundError: [Errno 2] No such file or directory: 'jashing/tinyllama-colorist-lora/adapter_config.json'
ERROR 06-13 11:11:23 async_llm_engine.py:44]
maybe you can try passing the lora path using a local absolute path.
from vllm.
@jeejeelee Hi, thank you so much for your work! If I just want to run LoRA on a T4, which of your previous commit should I build from?
from vllm.
@jeejeelee Hi, thank you so much for your work! If I just want to run LoRA on a T4, which of your previous commit should I build from?
You can build from the last commit. If you have any questions, please feel free to contact me.
from vllm.
the same problem applying lora for chatglm3-6b
on T4 GPU
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]: return _run_code(code, main_globals, None,
[rank0]: File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]: exec(code, run_globals)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 216, in <module>
[rank0]: engine = AsyncLLMEngine.from_engine_args(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 431, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 360, in __init__
[rank0]: self.engine = self._init_engine(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 507, in _init_engine
[rank0]: return engine_class(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 256, in __init__
[rank0]: self._initialize_kv_caches()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 353, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 76, in determine_num_available_blocks
[rank0]: return self.driver_worker.determine_num_available_blocks()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 173, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 874, in profile_run
[rank0]: self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1243, in execute_model
[rank0]: hidden_or_intermediate_states = model_executable(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/chatglm.py", line 371, in forward
[rank0]: hidden_states = self.transformer(input_ids, positions, kv_caches,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/chatglm.py", line 319, in forward
[rank0]: hidden_states = self.encoder(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/chatglm.py", line 274, in forward
[rank0]: hidden_states = layer(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/chatglm.py", line 209, in forward
[rank0]: attention_output = self.self_attention(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/chatglm.py", line 108, in forward
[rank0]: context_layer = self.attn(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/attention/layer.py", line 94, in forward
[rank0]: return self.impl.forward(query, key, value, kv_cache, attn_metadata,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/xformers.py", line 279, in forward
[rank0]: output = torch.empty_like(query)
[rank0]: RuntimeError: CUDA error: no kernel image is available for execution on the device
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
``
from vllm.
@naturomics Hi you can try #5036. It should be able to address your issues.
from vllm.
Related Issues (20)
- [Doc]: Using LoRA adapters HOT 2
- [Usage]: multimodal large models load local image files HOT 7
- [Usage]: speculative OutOfMemoryError: HOT 3
- [Bug]: TypeError: 'NoneType' object is not subscriptable RPCServer process died before responding to readiness probe HOT 4
- [Misc]: Enable dependabot to help managing known vulnerabilities in dependencies
- [Bug]: stuck at "generating GPU P2P access cache in /home/luban/.cache/vllm/gpu_p2p_access_cache_for_0,1.json" HOT 2
- [Bug]: InternVl2-8B-AWQ gives error when trying to run with vllm-openai cuda 11.8 docker image HOT 1
- [Bug]: Server crashes when kv cache exhausted HOT 2
- [Feature]: Support Inference Overrides for mm_processor_kwargs HOT 1
- Why is the bitsandbytes model significantly slower than the AWQ model? HOT 3
- Error loading models since versions 0.6.1xxx HOT 1
- [Bug]: OLMoE produces incorrect output with TP>1 HOT 1
- [Performance]: Analysis of performance dashboard movements
- [Bug]: OLMoForCausalLM not supported HOT 1
- [Bug]: 请求报错 HOT 1
- [Bug]: tensor parallel processes not working in vllm_cpu HOT 2
- [Bug]: use cpu_offload_gb in gguf failed. HOT 1
- [Usage]: Question about dequantization HOT 1
- [Usage]: how to acquire logits in vllm HOT 2
- [Usage]: Total generated tokens in benchmarking script HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.