Comments (3)
Hi! Thanks for your interest.
Yup, we do make heavy use of vLLM so it's quite similar. Aphrodite is designed more to be used in-house for our future website, so our main focus won't really be supporting a user-base with this.
I'll keep it in mind to run a few benchmarks for the attention and get us some metrics when it's done.
from aphrodite-engine.
Hi @AlpinDale i'm the maintainer of LiteLLM and we allow you to maximize throughput by load balancing between multiple LLM endpoints.
Thought it would be useful for you, I'd love feedback if not
Here's the quick start, to use LiteLLM load balancer (works with 100+ LLMs)
doc: https://docs.litellm.ai/docs/simple_proxy#model-alias
Step 1 Create a Config.yaml
model_list:
- model_name: openhermes
litellm_params:
model: openhermes
temperature: 0.6
max_tokens: 400
custom_llm_provider: "openai"
api_base: http://192.168.1.23:8000/v1
- model_name: openhermes
litellm_params:
model: openhermes
custom_llm_provider: "openai"
api_base: http://192.168.1.23:8001/v1
- model_name: openhermes
litellm_params:
model: openhermes
custom_llm_provider: "openai"
frequency_penalty : 0.6
api_base: http://192.168.1.23:8010/v1
Step 2: Start the litellm proxy:
litellm --config /path/to/config.yaml
Step3 Make Request to LiteLLM proxy:
curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "openhermes",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}
'
from aphrodite-engine.
Hi! Thanks for your interest.
Yup, we do make heavy use of vLLM so it's quite similar. Aphrodite is designed more to be used in-house for our future website, so our main focus won't really be supporting a user-base with this.
I'll keep it in mind to run a few benchmarks for the attention and get us some metrics when it's done.
Any chance there are some benchmarks vs vLLM?
Seems you're implying the main benefit is the fp8 attention option? (But no flash decoding? Unlike TGI?)
from aphrodite-engine.
Related Issues (20)
- [Bug]: HOT 1
- [Usage]: Please provide the environment variable that closes the KoboldAI Lite page.
- [Performance]: Memory Usage Fix for gguf. HOT 3
- [Installation]: ValueError: 17 is not a valid GGMLQuantizationType HOT 21
- [Installation]: Upload Aphrodite v0.5.2 On Pypi.org HOT 3
- [Usage]: What to set to get acceptable performance on Pascal GPUs? (Non-P100) HOT 2
- [Installation]: Installing from source does not work. undefined symbol: _ZN3c104cuda14ExchangeDeviceEa HOT 8
- [Bug]: PermissionError: [Errno 13] Permission denied: '/app/aphrodite-engine/.triton' HOT 3
- [Bug]: LoRA broken when TP>1
- [Bug]: LoRA fails to load HOT 1
- [Feature]: Exllamav2 Q4 cache HOT 2
- [Usage]: Lora Adapter Parameter while inferencing HOT 1
- [Bug]: Flash attention cannot be used on v0.5.3 HOT 7
- [Bug]: GPUExecutor throwing 'TypeError: 'type' object is not subscriptable' on 0.5.3 HOT 2
- [Bug]: Cannot load 70b exl2 5bpw model across 4 GPUs. HOT 10
- [Bug]: torch._dynamo.exc.BackendCompilerFailed with command-r-plus HOT 3
- [Bug]: Cannot load llama-3 gguf based models HOT 1
- [Bug]: Int8 k/v cache calibrate don't work with QWen model?
- [Feature]: request for support DeepseekV2ForCausalLM.
- [Bug]: Running aphrodite throws ImportError HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aphrodite-engine.