Light

[Feature]: LoRA support for Mixtral GPTQ and AWQ about vllm HOT 1 OPEN

StrikerRUS commented on July 3, 2024

[Feature]: LoRA support for Mixtral GPTQ and AWQ

from vllm.

Comments (1)

robertgshaw2-neuralmagic commented on July 3, 2024 2

:)

from vllm.

Related Issues (20)

[Bug]: HIP error: invalid argument in cudaMemGetInfo
[Bug]: `distributed_executor_backend=mp` does not work with GPTQ tp>1 HOT 5
Concurrent timeout
[Bug]: Segmentation fault (core dumped) while loading deepseek coder v2 lite model HOT 5
[Usage]: Load local model from local path HOT 1
embedings error python -m vllm.entrypoints.openai.api_server --trust-remote-code --model gte_Qwen2-7B-instruct --seed 48 --max-model-len 1000 --tensor-parallel-size 2 --gpu-memory-utilization 1 --dtype float16
[New Model]: facebook/seamless-m4t-v2-large
[Bug]: vllm offline调用和online调用，同一个prompt输出结果有差异(There are differences in the output results of the same prompt between vllm offline and online calls)
How to use Offline Batched Inference to run multi chat. HOT 1
[Bug]: After enabling APC, concurrent processing of requests will result in error and return responses from other requests.
[Bug]: Producer process has been terminated before all shared CUDA tensors released (v 0.5.0 post1, v 0.4.3)
[Usage]: Gemma-2-9b is not supported HOT 3
[Bug]: When I inference with a 1b model, tp2 latency is greater than tp1
[Bug]: identical branches in csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu HOT 4
[Bug]: Current Main Does Not Work On Python3.8 HOT 1
[Bug]: Speculative decoding does not respect per-request seed HOT 2
[Bug]: call for stack trace for "Watchdog caught collective operation timeout"
[Usage]: is there a way to turn off fast attention? a parameter maybe? my model deployment takes 30min to complete HOT 1
[Usage]: How to use --pipeline-parallel-size HOT 1
[Bug]: debugging guide for device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp" HOT 1

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.