lm-sys / routellm Goto Github PK

View Code? Open in Web Editor NEW

2.9K 25.0 218.0 5.34 MB

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!

License: Apache License 2.0

Python 100.00%

routellm's Introduction

RouteLLM

RouteLLM is a framework for serving and evaluating LLM routers.

[ Blog ] [ Paper ]

Our core features include:

Drop-in replacement for OpenAI's client (or launch an OpenAI-compatible server) to route simpler queries to cheaper models.
Trained routers are provided out of the box, which we have shown to reduce costs by up to 85% while maintaining 95% GPT-4 performance on widely-used benchmarks like MT Bench.
Benchmarks also demonstrate that these routers achieve the same performance as commercial offerings while being >40% cheaper.
Easily extend the framework to include new routers and compare the performance of routers across multiple benchmarks.

Installation

From PyPI

pip install "routellm[serve,eval]"

From source

git clone https://github.com/lm-sys/RouteLLM.git
cd RouteLLM
pip install -e .[serve,eval]

Quickstart

Let's walkthrough replacing an existing OpenAI client to route queries between LLMs instead of using only a single model.

First, let's replace our OpenAI client by initializing the RouteLLM controller with the mf router. By default, RouteLLM will use the best-performing config:

import os
from routellm.controller import Controller

os.environ["OPENAI_API_KEY"] = "sk-XXXXXX"
# Replace with your model provider, we use Anyscale's Mixtral here.
os.environ["ANYSCALE_API_KEY"] = "esecret_XXXXXX"

client = Controller(
  routers=["mf"],
  strong_model="gpt-4-1106-preview",
  weak_model="anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1",
)

Above, we pick gpt-4-1106-preview as the strong model and anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1 as the weak model, setting the API keys accordingly. You can route between different model pairs or providers by updating the model names as described in Model Support.

Want to route to local models? Check out Routing to Local Models.

Each routing request has a cost threshold that controls the tradeoff between cost and quality. We should calibrate this based on the types of queries we receive to maximize routing performance. As an example, let's calibrate our threshold for 50% GPT-4 calls using data from Chatbot Arena.

> python -m routellm.calibrate_threshold --routers mf --strong-model-pct 0.5 --config config.example.yaml
For 50.0% strong model calls for mf, threshold = 0.11593

This means that we want to use 0.11593 as our threshold so that approximately 50% of all queries (those that require GPT-4 the most) will be routed to it (see Threshold Calibration for details).

Now, let's update the model field when we generate completions to specify the router and threshold to use:

response = client.chat.completions.create(
  # This tells RouteLLM to use the MF router with a cost threshold of 0.11593
  model="router-mf-0.11593",
  messages=[
    {"role": "user", "content": "Hello!"}
  ]
)

That's it! Now, requests with be routed between the strong and weak model depending on what is required, saving costs while maintaining a high quality of responses.

Depending on your use case, you might want to consider using a different model pair, modifying the configuration, or calibrating the thresholds based on the types of queries you receive to improve performance.

Server & Demo

Instead of using the Python SDK, you can also launch an OpenAI-compatible server that will work with any existing OpenAI client, using similar steps:

> export OPENAI_API_KEY=sk-XXXXXX
> export ANYSCALE_API_KEY=esecret_XXXXXX
> python -m routellm.openai_server --routers mf --strong-model gpt-4-1106-preview --weak-model anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:6060 (Press CTRL+C to quit)

Once the server is launched, you can start a local router chatbot to see how different messages are routed.

python -m examples.router_chat --router mf --threshold 0.11593

Model Support

In the above examples, GPT-4 and Mixtral 8x7B are used as the model pair, but you can modify this using the strong-model and weak-model arguments.

We leverage LiteLLM to support chat completions from a wide-range of open-source and closed models. In general, you need a setup an API key and point to the provider with the appropriate model name. Alternatively, you can also use any OpenAI-compatible endpoint by prefixing the model name with openai/ and setting the --base-url and --api-key flags.

Note that regardless of the model pair used, an OPENAI_API_KEY will currently still be required to generate embeddings for the mf and sw_ranking routers.

Instructions for setting up your API keys for popular providers:

Local models with Ollama: see this guide
Anthropic
Gemini - Google AI Studio
Amazon Bedrock
Together AI
Anyscale Endpoints

For other model providers, find instructions here or raise an issue.

Motivation

Different LLMs vary widely in their costs and capabilities, which leads to a dilemma when deploying them: routing all queries to the most capable model leads to the highest-quality responses but can be very expensive, while routing queries to smaller models can save costs but may result in lower-quality responses.

LLM routing offers a solution to this. We introduce a router that looks at queries and routes simpler queries to smaller, cheaper models, saving costs while maintaining quality. We focus on routing between 2 models: a stronger, more expensive model and a cheaper but weaker model. Each request is also associated with a cost threshold that determines the cost-quality tradeoff of that request - a higher cost threshold leads to lower cost but may lead to lower-quality responses.

The research in this repository was conducted in collaboration with Anyscale, and we are grateful for their help and support.

Server

RouteLLM offers a lightweight OpenAI-compatible server for routing requests based on different routing strategies:

python -m routellm.openai_server --routers mf --config config.example.yaml

--routers specifies the list of routers available to the server. For instance, here, the server is started with one available router: mf (see below for the list of routers).
--config specifies the path to the configuration file for the routers. If unspecified, the server will default to using our best-performing configuration (see Configuration for details).

For most use-cases, we recommend the mf router as we have evaluated it to be very strong and lightweight.

When making a request to the server, clients specify the router and cost threshold to use for each request using the model field in the following format router-[ROUTER NAME]-[THRESHOLD]. For instance, using a model of router-mf-0.5 specifies that the request should be routed using the mf router with a threshold of 0.5.

Threshold Calibration

The threshold used for routing controls the cost-quality tradeoff. The range of meaningful thresholds varies depending on the type of router and the queries you receive. Therefore, we recommend calibrating thresholds using a sample of your incoming queries, as well as the % of queries you'd like to route to the stronger model.

By default, we support calibrating thresholds based on the public Chatbot Arena dataset. For example, to calibrate the threshold for the mf router such that 50% of calls are routed to the stronger model:

> python -m routellm.calibrate_threshold --task calibrate --routers mf --strong-model-pct 0.5 --config config.example.yaml
For 50.0% strong model calls for mf, threshold = 0.11593

This means that the threshold should be set to 0.1881 for the mf router so that approximately 50% of calls are routed to the strong model i.e. using a model field of router-mf-0.1159.

However, note that because we calibrate the thresholds based on an existing dataset, the % of calls routed to each model will differ based on the actual queries received. Therefore, we recommend calibrating on a dataset that closely resembles the types of queries you receive.

Evaluation

RouteLLM also includes an evaluation framework to measure the performance of different routing strategies on benchmarks.

To evaluate a router on a benchmark, you can use the following command:

python -m routellm.evals.evaluate --routers random sw_ranking bert --benchmark gsm8k --config config.example.yaml

--routers specifies the list of routers to evaluate, for instance, random and bert in this case.
--benchmark specifies the specific benchmark to evaluate the routers on. We currently support: mmlu, gsm8k, and mt-bench.

Evaluation results will be printed to the console. A plot of router performance will also be generated in the current directory (override the path using --output). To avoid recomputing results, the results for a router on a given benchmark is cached by default. This behavior can be overridden by using the --overwrite-cache flag, which takes in a list of routers to overwrite the cache for.

The results for all our benchmarks have been cached. For MT Bench, we use the precomputed judgements for the desired model pair. For MMLU and GSM8K, we utilized SGLang to compute the results for the desired model pair - the full code for this can be found in the benchmark directories if you would like to evaluate a different model pair.

By default, GPT-4 and Mixtral are used as the model pair for evaluation. To modify the model pair used, set them using the --strong-model and --weak-model flags.

Routers

Out of the box, RouteLLM supports 4 routers trained on the gpt-4-1106-preview and mixtral-8x7b-instruct-v0.1 model pair.

The full list of routers:

mf: Uses a matrix factorization model trained on the preference data (recommended).
sw_ranking: Uses a weighted Elo calculation for routing, where each vote is weighted according to how similar it is to the user's prompt.
bert: Uses a BERT classifier trained on the preference data.
causal_llm: Uses a LLM-based classifier tuned on the preference data.
random: Randomly routes to either model.

While these routers have been trained on the gpt-4-1106-preview and mixtral-8x7b-instruct-v0.1 model pair, we have found that these routers generalize well to other strong and weak model pairs as well. Therefore, you can replace the model pair used for routing without having to retrain these models!

We also provide detailed instructions on how to train the LLM-based classifier in the following notebook.

For the full details, refer to our paper.

Configuration

The configuration for routers is specified in either the config argument for Controller or by passing in the path to a YAML file using the --config flag. It is a top-level mapping from router name to the keyword arguments used for router initialization.

An example configuration is provided in the config.example.yaml file - it provides the configurations for routers that have trained on Arena data augmented using GPT-4 as a judge. The models and datasets used are all hosted on Hugging Face under the RouteLLM and LMSYS organizations.

Contribution

We welcome contributions! Please feel free to open an issue or a pull request if you have any suggestions or improvements.

Adding a new router

To add a new router to RouteLLM, implement the abstract Router class in routers.py and add the new router to the ROUTER_CLS dictionary. Then, you can use immediately the new router in the server or evaluation framework.

There is only a single method to implement: calculate_strong_win_rate, which takes in the user prompt and returns the win rate for the strong model conditioned on that given prompt - if this win rate is great than user-specified cost threshold, then the request is routed to the strong model. Otherwise, it is routed to the weak model.

Adding a new benchmark

To add a new benchmark to RouteLLM, implement the abstract Benchmark class in benchmarks.py and update the evaluate.py module to properly initialize the new benchmark class. Ideally, the results for the benchmark should be precomputed to avoid having to regenerate the results for each evaluation run -- see the existing benchmarks for examples on how to do this.

Citation

The code in this repository is based on the research from the paper. Please cite if you find the repository helpful.

@misc{ong2024routellmlearningroutellms,
      title={RouteLLM: Learning to Route LLMs with Preference Data},
      author={Isaac Ong and Amjad Almahairi and Vincent Wu and Wei-Lin Chiang and Tianhao Wu and Joseph E. Gonzalez and M Waleed Kadous and Ion Stoica},
      year={2024},
      eprint={2406.18665},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2406.18665},
}

routellm's People

Contributors

Stargazers

Watchers

Forkers

visioninhope codeaudit xxwarmachineroxx sciumo brunoscaglione daviddelaurier fangyuan-ksgk shanshanpt ml-aware24k polya20 mdwoicke weareastral sitais russpalms mojowebs meirm developersdigest thejustinjames nate-lrt phiresearch-github jpsiyyadri richelynscott deluair llaith-ai kalam360 j034j dreroc gino2013 zergey denji shaneholloman arcturus-ai creative-emporium waichan8 notionalpha bingjiw beimingmaster nzb15555196162 chat19 roschler hubayirp fellowtraveler danielchico paperwave kkbava amitkumariitmadras anewryzm olivercarmont octag0no thomascherickal aaag1980 rahulmr42 princetrunks hercules261188 sekmet cdaprod ai-sanjay psyb0t smartyhouses imihaljko zook111 melnikovics joshua-shepherd bigfoxmedia bit-r sorokinvld rkalm a7mad-magdy77 divyaal22 tomchapin bardusco olimiemma a-b-strategy-partners-data-consulting zeroshotdave and270 primedeviation jackal37 rio100 forenxics whizboy-arnold ivovesselinow rebots-online jidechao emilalvaroaitekph2024 meta-automata-nix bitnom stuj79 dagz55 raldone01 liucr supremelobster dzivkovi ahricat trunksneo pavai-research cloudbee7 vsayyagari allank24 evdcush scalenow

routellm's Issues

Why not use Sigmoid over the logit produced by the Classifier in train_matrix_factorization.py?

From what I understand, Sigmoid could have been used on top of the classifier to output some value in [0, 1], which could be then compared to the cost treshold/alpha to determine a suitable model.

Also, why not use Softmax when num_classes > 1?

Provide support for more than two models and provide a training guide.

It looks like this only supports two models, a strong and a weak model. But there are other things to consider like if privacy is a concern, or if the question is math heavy, or if the question has a visual element, etc.

Why not have a RouteLLM that could route to several arbitrary models (including local, self-hosted, or models as a service like GPT4).

And provide some example training scripts and/or a training guide that we could use to fine tune this.

Wondering how the D_golden training data is constructed.

Congrats that you've made such a great innovation in agents. I'm working out to reproducing the paper maybe using more data~ But there exists some problems.
In your paper, I recognize that $D_golden$ is to select which model gives the right response. However, when they produce the same result, no matter both right or wrong. What is the winner_model? Should I tag them by tie? Or I'll make the $M_weak$ to be the winner. What did you tag in the aug data? I am grateful to hear your answer!

can support ollama?

Can I use Azure open AI

I am trying to use azure openai but I got this error.

raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

Also can we use multiple models instead of only two strong and weak model?

OpenAIError: The api_key client option must be set

while running basic example I get this error

`
import os
from routellm.controller import Controller

os.environ["OPENAI_API_KEY"] = 'my api'

client = Controller(
routers=["mf"],
strong_model="gpt-4o",
weak_model="gpt-4o-mini",
)
`

'---------------------------------------------------------------------------
OpenAIError Traceback (most recent call last)
Cell In[1], line 2
1 import os
----> 2 from routellm.controller import Controller
4 os.environ["OPENAI_API_KEY"] = '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'
7 client = Controller(
8 routers=["mf"],
9 strong_model="gpt-4o",
10 weak_model="gpt-4o-mini",
11 )

File ~/playground/agent_2906/0507/RouteLLM/routellm/controller.py:10
7 from litellm import acompletion, completion
8 from tqdm import tqdm
---> 10 from routellm.routers.routers import ROUTER_CLS
12 # Default config for routers augmented using golden label data from GPT-4.
13 # This is exactly the same as config.example.yaml.
14 GPT_4_AUGMENTED_CONFIG = {
15 "sw_ranking": {
16 "arena_battle_datasets": [
(...)
27 "mf": {"checkpoint_path": "routellm/mf_gpt4_augmented"},
28 }

File ~/playground/agent_2906/0507/RouteLLM/routellm/routers/routers.py:17
12 from routellm.routers.causal_llm.llm_utils import (
13 load_prompt_format,
14 to_openai_api_messages,
15 )
16 from routellm.routers.causal_llm.model import CausalLLMClassifier
---> 17 from routellm.routers.matrix_factorization.model import MODEL_IDS, MFModel
18 from routellm.routers.similarity_weighted.utils import (
19 OPENAI_CLIENT,
20 compute_elo_mle_with_tie,
21 compute_tiers,
22 preprocess_battles,
23 )
26 def no_parallel(cls):

File ~/playground/agent_2906/0507/RouteLLM/routellm/routers/matrix_factorization/model.py:4
1 import torch
2 from huggingface_hub import PyTorchModelHubMixin
----> 4 from routellm.routers.similarity_weighted.utils import OPENAI_CLIENT
6 MODEL_IDS = {
7 "RWKV-4-Raven-14B": 0,
8 "alpaca-13b": 1,
(...)
70 "zephyr-7b-beta": 63,
71 }
74 class MFModel(torch.nn.Module, PyTorchModelHubMixin):

File ~/playground/agent_2906/0507/RouteLLM/routellm/routers/similarity_weighted/utils.py:11
8 from sklearn.linear_model import LogisticRegression
10 choices = ["A", "B", "C", "D"]
---> 11 OPENAI_CLIENT = OpenAI()
14 def compute_tiers(model_ratings, num_tiers):
15 n = len(model_ratings)

File ~/anaconda3/envs/langchain/lib/python3.11/site-packages/openai/_client.py:105, in OpenAI.init(self, api_key, organization, project, base_url, timeout, max_retries, default_headers, default_query, http_client, _strict_response_validation)
103 api_key = os.environ.get("OPENAI_API_KEY")
104 if api_key is None:
--> 105 raise OpenAIError(
106 "The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable"
107 )
108 self.api_key = api_key
110 if organization is None:

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable'

Can we use the matrix factorization model locally by downloading it to our local?

I am using a matrix factorization (mf) router in a RAG application and want to download the MF model to my local system. Is this possible? Could you also explain how it works internally? Additionally, can we change the embedding model used by the MF router?

Add support for LLama-3.1

Meta has released Llama-3.1 , we should enable support for this in route LLM.

One can start with config file.

Allow Separate Base URLs for Strong and Weak Models in Controller

For my use case, I have different models running on different servers which all replicate the OpenAI completions endpoint. However, from what I can see, it is currently not possible to use both the default OpenAI base URL and a custom server's base URL when defining the controller (or two separate server base URLs). There is functionality to create a Controller client from within a local server running a model, but what if there is no access to run the RouteLLM code from within the server hosting the model.

It would be great if the controller could be provided with a separate base URL for a strong and weak model as, from my understanding, right now, if a base_url is provided, it overrides the base_url used for both the strong and weak model.

Example Use Case:
I want to use an OpenAI model like GPT-4o (using the OpenAI completions endpoint) and an open-source model like Mistral running on a custom server with a custom URL (replicating the OpenAI completion endpoint).

Actual Behavior:

Without providing a base_url parameter to the Controller class, the controller cannot call the Mistral model as it defaults to using the official OpenAI completions endpoint.
If a custom base_url is provided, the Mistral model works but GPT-4o does not, as the GPT-4o model is not found at this new endpoint.

Steps to Reproduce:

Define a controller without a base_url parameter.

Attempt to call a model (e.g., Mistral) hosted on a custom server with its own URL.

Define a controller with a custom base_url.

Attempt to call an OpenAI model (e.g., GPT-4o).

Current Workaround:
I am currently solving this in a rudimentary way by checking whether the model called in the Controller.completion function (found within kwargs["model"]) matches the strong_model string or the weak_model string and using the corresponding base_url ive provided for each model.

Proposed Solution:
Introduce functionality in the Controller class to allow specifying separate base URLs for the strong and weak models.

controller = Controller(
    strong_model="gpt-4o",
    weak_model="openai/mistral",
    strong_model_base_url="https://api.openai.com/v1" # or just None,
    weak_model_base_url="http://custom-endpoint.com/v1"
)

Not sure if I am just misunderstanding something and this functionality does exist. Thank you!

webpage 0.0.0.0:6060 down

The webpage at http://0.0.0.0:6060/ might be temporarily down or it may have moved permanently to a new web address although I launch server with routers: ['mf'] completely.

Trouble understanding datasets used

Hello,

I have not clearly understood the format and source of the datasets used to train these routers. It's said to be published in huggingface. but for example I can't find the dataset that is used to train: routellm/mf_gpt4_augmented. As I understood from the code: train_matrix_factorization.py there has to be a json dataset with keys idx, model_a, model_b, winner. But there is no such dataset in the huggingface. Could you clarify the format and the creation of the dataset that is used for mf training?

Gemini-pro from openrouter have context length of 91k but getting error

openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 8977 tokens (8977 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

I am using routellm _ openrouter and this is the error I am getting when using google's gemini pro

can not initializing the RouteLLM controller

when i initializing the RouteLLM controller with demo, i meet the errors below:

import os
from routellm.controller import Controller

os.environ["OPENAI_API_KEY"] = "sk-xxxxxxx"
os.environ["ANYSCALE_API_KEY"] = "esecret_xxxxx"

client = Controller(
  routers=["mf"],
  strong_model="gpt-4-1106-preview",
  weak_model="anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1",
)

response = client.chat.completions.create(
  model="router-mf-0.11593",
  messages=[
    {"role": "user", "content": "Hello!"}
  ]
)

Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/routellm/openai_server.py", line 22, in <module>
    from routellm.controller import Controller, RoutingError
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/routellm/controller.py", line 10, in <module>
    from routellm.routers.routers import ROUTER_CLS
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/routellm/routers/routers.py", line 17, in <module>
    from routellm.routers.matrix_factorization.model import MODEL_IDS, MFModel
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/routellm/routers/matrix_factorization/model.py", line 4, in <module>
    from routellm.routers.similarity_weighted.utils import OPENAI_CLIENT
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/routellm/routers/similarity_weighted/utils.py", line 8, in <module>
    from sklearn.linear_model import LogisticRegression
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/__init__.py", line 84, in <module>
    from .base import clone
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/base.py", line 19, in <module>
    from .utils._estimator_html_repr import _HTMLDocumentationLinkMixin, estimator_html_repr
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/utils/__init__.py", line 11, in <module>
    from ._chunking import gen_batches, gen_even_slices
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/utils/_chunking.py", line 8, in <module>
    from ._param_validation import Interval, validate_params
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/utils/_param_validation.py", line 14, in <module>
    from .validation import _is_arraylike_not_scalar
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/utils/validation.py", line 26, in <module>
    from ..utils._array_api import _asarray_with_order, _is_numpy_namespace, get_namespace
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/utils/_array_api.py", line 11, in <module>
    from .fixes import parse_version
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 20, in <module>
    import scipy.stats
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/stats/__init__.py", line 606, in <module>
    from ._stats_py import *
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/stats/_stats_py.py", line 49, in <module>
    from . import distributions
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/stats/distributions.py", line 11, in <module>
    from . import _discrete_distns
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/stats/_discrete_distns.py", line 10, in <module>
    from scipy.interpolate import interp1d
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/interpolate/__init__.py", line 167, in <module>
    from ._interpolate import *
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/interpolate/_interpolate.py", line 14, in <module>
    from . import _fitpack_py
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/interpolate/_fitpack_py.py", line 8, in <module>
    from ._fitpack_impl import bisplrep, bisplev, dblint  # noqa: F401
  File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/interpolate/_fitpack_impl.py", line 103, in <module>
    'iwrk': array([], dfitpack_int), 'u': array([], float),
TypeError

i use python 3.9.5 and install routellm with

python3 -m venv .routellm
source .routellm/bin/activate

pip install "routellm[serve,eval]"

pls help to find the problem, ths!

Does routellm support routing queries to different models in OpenAI assistant?

I can see that queries can be routed to different models of chat completion, but can routellm also route queries to different OpenAI assistants or use different llms for the same OpenAI assistant?

code not supported by numpy>=2

I got a numpy error which required me to pin a lesser version of numpy. PR inc

dep version not pinned

it's more chill to pin all the deps. #48 already pins Python>=3.10 but idk what the least version you want is.

Also, #46 needed to happen before I could run the examples from the repo.

Will draft some pins. pdm will do it for me.

OpenAI Error

Can I ask if we must have an OpenAI API to use RouteLLM? Currently, I'm using Groq as the weak model and Anthropic as the strong model, and it shows this error:

openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable.

Support for Multiple LiteLLM Models

Great job with the current routing setup!
I’m wondering if there’s a possibility to expand the routing capabilities to include multiple LiteLLM models. Currently, it seems we can only route between one strong and one weak model.

Here’s why it would be beneficial for example:

Microsoft-PHI: Useful for enterprise tasks and Microsoft integrations.
Google-Gemma: Great for tasks that involve Google’s ecosystem.
Meta-Llama3: Ideal for open-source and research-based queries.
Azure-GPT: Perfect for projects , troubleshooting services, or offering guidance on optimizing cloud infrastructure .
Currently as we have, for high-quality responses on complex queries, a strong model like OpenAI GPT.

Supporting these models would help optimize costs and improve how we handle various data types and queries. Is there a way to integrate this with RouteLLM? Any advice or guidance would be greatly appreciated, let me know by when it will be published.

404 Client Error.

routellm version - 0.2.0
Python 3.10.14

import os

os.environ["OPENAI_API_KEY"] = "XXX"
os.environ["GROQ_API_KEY"] = "YYY"

from routellm.controller import Controller

client = Controller(
routers=["mf"],
strong_model="gpt-3.5-turbo",
weak_model="llama3-8b-8192"
)

response = client.chat.completions.create(

This tells RouteLLM to use the MF router with a cost threshold of 0.11593

model="router-mf-0.11593",
messages=[
{"role": "user", "content": "hi, how are you"}
]
)

message_content = response['choices'][0]['message']['content']
model_name = response['model']

print(f"Message content: {message_content}")
print(f"Model name: {model_name}")

Error:

HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/routellm/mf_gpt4_augmented/resolve/main/pytorch_model.bin

The above exception was the direct cause of the following exception:

EntryNotFoundError Traceback (most recent call last)
Cell In[1], line 8
4 os.environ["GROQ_API_KEY"] = "XX"
6 from routellm.controller import Controller
----> 8 client = Controller(
9 routers=["mf"],
10 strong_model="gpt-3.5-turbo",
11 weak_model="llama3-8b-8192"
12 )
14 response = client.chat.completions.create(
...
285 )

EntryNotFoundError: 404 Client Error. (Request ID: Root=1-66ab5a67-46661bc90bdf3b0f6b75110c;8df487c1-691d-4d82-af40-8eee63fd32ff)

how can we set parameters of model like temperature, streaming, etc?

HI folks,
I have configured my application as here i wanted to change the parameters of the model could you please suggest as how i can pass my own parameters here. I am trying to implement this in RAG Application

client = Controller(
  routers=["mf"],
  strong_model="gpt-4-1106-preview",
  weak_model="anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1",
  progress_bar=True
)

   response =resources.routellm.chat.completions.create(
    This tells RouteLLM to use the MF router with a cost threshold of 0.11593
    model="router-mf-0.1439",
    messages=[
                {"role": "system", "content":"""Handle chit-chat gracefully, if the user is greeting then greet them back.
                You are an honest and helpful AI assistant. Your job is to understand the Users 
                questions and only make use of the context provided to answer it clearly and precisely, 
                be descriptive, use amounts, values and percentages wherever necessary.
                Always include all necessary details. Stick to the context and never use any previous information.
                If no information is provided in the context then refrain from giving wrong answers.
                No preamble."""},
                {"role": "user", "content": query},  # Assuming 'query' is the user's input
                {"role": "assistant", "content": context}  # Including the context in the conversation
             ]
    )

GPT-4o not found in MODEL_IDS for mf router

  File "/home/andreas/PycharmProjects/RouteLLM/.venv/lib/python3.10/site-packages/starlette/routing.py", line 732, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/home/andreas/PycharmProjects/RouteLLM/routellm/openai_server.py", line 40, in lifespan
    ROUTERS_MAP[router] = ROUTER_CLS[router](**router_config)
  File "/home/andreas/PycharmProjects/RouteLLM/routellm/routers/routers.py", line 231, in __init__
    self.strong_model_id = MODEL_IDS[strong_model]
KeyError: 'gpt-4o'

I see there is a predefined list of LLMs in the MODEL_IDS dictionary.

Is there a way to specify arbitrary models

For example in my use case I want to use gpt-4o as the strong model and Qwen/Qwen2-72B-Instruct with Together.ai as an inference provider.

Is there a methodology to generate matrix factorization data for any model pair?

Are only GPT-4 and Mistral models currently supported for routing?

Are only GPT-4 and Mistral models currently supported for routing? such as, llama-3, qwen and so on...

BERT model working

I would like to know how BERT classifier decides between strong and weak models? Where can I see the working of bert_gpt4_augmented?

Why use Hadamard product in Matrix Factorization router?

What is the significance of using Hadamard product between model embedding and query embedding in the Matrix Factorization router? Why not directly compute the score using the dot product?

insufficient_quota error

I set up my server successfully but when I even ask a simple question server responds with the following error

INFO:     127.0.0.1:63236 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 399, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\fastapi\applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\middleware\errors.py", line 186, in __call__
    raise exc
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\middleware\errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\routing.py", line 77, in app 
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\routing.py", line 72, in app 
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\fastapi\routing.py", line 278, in app  
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\Desktop\Bisrat files\RouteLLM\routellm\openai_server.py", line 153, in create_chat_completion    
    routed_model = route_fn(prompt, threshold, ROUTED_PAIR)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\Desktop\Bisrat files\RouteLLM\routellm\routers\routers.py", line 42, in route
    if self.calculate_strong_win_rate(prompt) >= threshold:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\Desktop\Bisrat files\RouteLLM\routellm\routers\routers.py", line 235, in calculate_strong_win_rate
    winrate = self.model.pred_win_rate(
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\Desktop\Bisrat files\RouteLLM\routellm\routers\matrix_factorization\model.py", line 124, in pred_win_rate
    logits = self.forward([model_a, model_b], prompt)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\Desktop\Bisrat files\RouteLLM\routellm\routers\matrix_factorization\model.py", line 113, in forward
    OPENAI_CLIENT.embeddings.create(input=[prompt], model=self.embedding_model)
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\resources\embeddings.py", line 114, in create
    return self._post(
           ^^^^^^^^^^^
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 1261, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 942, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 1026, in _request
    return self._retry_request(
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 1074, in _retry_request
    return self._request(
           ^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 1026, in _request
    return self._retry_request(
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 1074, in _retry_request
    return self._request(
           ^^^^^^^^^^^^^^
  File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 1041, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

I dont understand how it could say quota limit exceeded before even a single response. please help me navigate this error.

Can we use 2 ollama models?

Why every single framework force everyone to use OpenAI, please allow to use 2 ollama models, for examle llama3:8b and strong model llama3:70b, we also need support for more models what if i want an SQL model in there for sql queries or something else.

Matrix factorization router is bloated

The current matrix factorization router (MFModel) is unnecessarily complex. Given that all operations in the forward pass are linear with no activations, we can significantly simplify this model.

Currently, we're doing several steps:

Embedding model IDs
Normalizing embeddings
Projecting text embeddings
Element-wise multiplication
Linear classification

Since these are all linear operations, they can be collapsed into a single matrix multiplication, embedding * model. This would:

Reduce code complexity
Improve performance
Decrease the number of parameters

For an example of what this would look like, here's a flattened vector for the `mixtral-8x7b-instruct-v0.1` model:

[0.017317, 0.01118, 0.303653, 0.579448, -0.376634, -0.208742, -0.193392, 0.639659, -0.085909, 0.107312, 0.300785, -0.349391, -0.384368, -0.145022, 0.317397, -0.063074, -0.128751, 0.243364, -0.181707, 0.808825, 0.275169, 0.666149, -0.115858, 0.155953, 0.24292, -0.197154, -0.157491, 0.11632, 0.197647, 0.040279, 0.409797, -1.24056, -0.511287, -0.393113, -0.108808, 0.039914, 0.366597, 0.135737, 0.198802, 0.119974, 0.153426, -0.22505, 0.674797, 0.284063, -0.196429, 0.155066, -0.212335, -0.363016, 0.212736, 0.211674, -0.372157, 0.010955, 0.037939, -0.066029, -0.07933, -0.101132, -0.311588, 0.077285, -0.207608, 0.125983, 0.510143, -0.255973, -0.096116, 0.229892, -0.434007, 0.344456, -0.137472, 0.41125, -0.052777, 0.06959, -0.043151, -0.062137, 0.162818, 0.041656, 0.077479, 0.126347, 0.061875, 0.116124, 0.247373, 0.453157, -0.101855, 0.040579, -0.552021, 0.12112, -0.823787, -0.296899, -0.46667, 0.022095, -0.310721, -0.401873, 0.016014, 0.548683, -0.438079, 0.239599, 0.445288, 0.132415, 0.160069, 0.509489, 0.058122, 0.108559, -0.005905, -0.425724, -0.189577, 0.053441, 0.535769, -0.008355, 0.142684, 0.009374, -0.219168, 0.033156, -0.420615, -0.145288, -0.135326, -0.172469, -0.371276, -0.215616, -0.413526, -0.300192, 0.005224, -0.410654, -0.407338, -0.086193, 0.244957, -0.122847, -0.180609, -0.221066, -0.014492, -0.125457, 0.077016, -0.481223, 0.571043, -0.406598, 0.58677, -0.793018, -0.574046, 0.168964, -0.140509, 0.062438, -0.574914, 0.517542, 0.305398, -0.040312, 0.133368, 0.227152, 0.194301, -0.204837, 0.117291, 0.12243, -0.357704, -0.12873, -0.305825, 0.041006, -0.307598, 0.295055, -0.178294, -0.434973, 0.395057, 0.150889, -0.384117, 0.362086, -0.52113, 0.616021, 0.011602, -0.140464, -0.408201, 0.412563, -0.194249, -0.455087, -0.418446, 0.00887, -0.351214, 0.140768, 0.436393, 0.102469, 0.367923, -0.026261, 0.122515, -0.436895, -0.119046, -0.242179, -0.215407, 0.443827, -0.048456, -0.215488, -0.181659, 0.28717, 0.001767, 0.122558, -0.494734, 0.113986, -0.307884, 0.331145, 0.101143, 0.196236, 0.120081, -0.567916, 0.431674, 0.210675, 0.397218, -0.003222, -0.124574, 0.163897, -0.012514, -0.437356, -0.083122, -0.277771, -0.072012, 0.215686, 0.06413, -0.13142, 0.094984, -0.38486, -0.072067, 0.495137, -0.393166, -0.230718, -0.116048, 0.712394, 0.279401, 0.238164, 0.041076, 0.148722, 0.580803, 0.614566, -0.147473, 0.432371, 0.713287, -0.012816, 0.17443, 0.122719, -0.168159, 0.062227, -0.511618, -0.242144, -0.15323, 0.176365, -0.331397, -0.130046, 0.520083, -0.236528, -0.234034, -0.201974, -0.235412, 0.408897, -0.56152, 0.197764, -0.57766, -0.745011, -0.153192, 0.378314, 0.060145, -0.132778, 0.742793, 0.08398, -0.493689, 0.071289, 0.147504, -0.078614, 0.068797, -0.36456, -0.150841, -0.128449, -0.523257, -0.515868, -0.293334, -0.1087, -0.216722, -0.37464, -0.362562, 0.145622, 0.273712, -0.309493, 0.331044, -0.169746, -0.116288, 0.106025, 0.075529, -0.081049, 0.245917, 0.180469, -0.562014, -0.29112, 0.020882, 0.134045, 0.106949, 0.326906, 0.184262, 0.028326, 0.03369, -0.251042, 0.196618, -0.420975, -0.021204, -0.00376, 0.19101, -0.335425, -0.217719, 0.111878, -0.016975, 0.30771, 0.433765, 0.150516, -0.073278, 0.171964, -0.305194, 0.080526, 0.08366, -0.170164, 0.442168, 0.106601, -0.04912, -0.071456, 0.259819, -0.111718, 0.138566, -0.60584, 0.147761, 0.152774, 0.057143, -0.759514, 0.069949, -0.825877, 0.335864, 0.199449, -0.266243, -0.403074, 0.20985, -0.188599, -0.019244, -0.375069, -0.421033, -0.1462, -0.220748, -0.061277, -0.135211, -0.141422, -0.215439, 0.091185, -0.007891, -0.274731, -0.594342, -0.451199, 0.021285, 0.272036, -0.007255, 0.172055, -0.032696, 0.376444, -0.175173, 0.255335, -0.264267, 0.083475, 0.179118, 0.091082, 0.260392, 0.171118, 0.421613, -0.558687, -0.341742, -0.279588, -0.10411, -0.058658, 0.086409, 0.492655, -0.210353, 0.551876, -0.128579, -0.1514, 0.193864, 0.246684, -0.30106, 0.512475, -0.348025, 0.269122, -0.478439, 0.593487, 0.375225, 0.332428, 0.340556, 0.264401, 0.087356, 0.632642, 0.088945, -0.560939, 0.390676, 0.162052, 0.411639, -0.289915, -0.632261, -0.413713, 0.355988, -0.485467, 0.383603, 0.303537, -0.381534, -0.092763, 0.417598, 0.803573, -0.405532, -0.347625, 0.285436, -0.178229, -0.400952, -0.26588, 0.369776, 0.145592, -0.439068, -0.263021, -0.086975, -0.229377, 0.395024, -0.386163, 0.785113, -0.064416, 0.236719, -0.15956, 0.083073, -0.106436, 0.145759, 0.221116, 0.033651, -0.142519, -0.135173, -0.163156, 0.111862, 0.309598, 0.234952, -0.487281, 0.028245, -0.869042, 0.15329, 0.262507, 0.154243, 0.327218, -0.018497, -0.247377, -0.144596, 0.131668, 0.129669, 0.204863, 0.201405, 0.348766, -0.056677, 0.067078, -0.473868, 0.084789, 0.342684, 0.190402, -0.130603, -0.294488, -0.053648, -0.393713, -0.288915, -0.081103, 0.031378, 0.284718, 0.303025, 0.199114, 0.252887, 0.023813, 0.317988, -0.108757, 0.021954, 0.25852, 0.133369, 0.033529, 0.348155, 0.484368, -0.082799, -0.478818, 0.139661, -0.112743, 0.242792, 0.214462, -0.035537, 0.198981, 0.720355, 0.105166, 0.118839, -0.398867, 0.056433, 0.290868, -0.166652, -0.187253, -0.025869, 0.348516, -0.194342, -0.018384, -0.707275, -0.212753, -0.342487, 0.366088, 0.502167, -0.481192, 0.002395, -0.303666, -0.496359, -0.437127, 0.070926, -0.164983, 0.493785, 0.254769, -0.330188, 0.69813, 0.110217, -0.165885, -0.09634, 0.056251, 0.016961, 0.357136, -0.484442, 0.201198, 0.066619, -0.262838, -0.326917, 0.323244, 0.05816, 0.367642, -0.142585, -0.375839, -0.334615, 0.190125, -0.029278, 0.099811, 0.143965, -0.275176, -0.58817, -0.144267, -0.359548, -0.310475, 0.534627, -0.576121, -0.194085, 0.052187, -0.187249, -0.235678, -0.122017, 0.375999, -0.086289, -0.127235, 0.080998, 0.181145, 0.157067, -0.26179, -0.451746, -0.135946, -0.236, 0.052646, 0.336281, 0.21719, 0.186457, -0.002216, -0.056215, -0.369369, 0.442009, -0.228632, -0.175233, -0.292619, 0.18085, -0.222465, -0.071054, -0.036178, 0.42584, -0.052242, -0.186202, 0.438149, -0.189797, -0.16556, 0.036239, -0.02704, -0.254496, -0.069539, -0.232275, -0.695319, 0.460565, 0.33609, 0.51992, -0.283644, 0.143016, 0.185549, 0.012047, -0.222176, -0.130095, -0.261126, -0.422626, 0.286046, 0.318453, -0.25702, 0.280548, 0.066077, 0.205378, 0.221395, 0.134313, 0.202538, -0.112085, 0.112352, -0.311995, -0.114661, -0.305415, 0.163122, -0.162758, 0.064207, 0.100317, -0.297041, 0.153704, 0.412633, -0.236838, -0.213884, 0.043544, 0.078991, 0.026837, 0.399776, -0.292028, -0.702604, 0.238641, -0.057664, -0.338922, 0.101509, -0.030345, -0.092672, 0.189603, -0.184702, -0.224473, 0.232278, 0.167241, 0.204301, -0.074669, -0.31327, -0.069146, 0.169052, 0.34982, 0.001693, 0.495445, 0.169925, -0.079298, -0.00096, 0.068827, -0.110808, 0.049159, -0.156822, 0.033281, -0.138699, 0.064114, -0.183973, 0.299447, 0.020633, -0.394375, 0.22391, 0.29888, -0.162223, -0.154018, 0.0686, 0.091588, 0.010075, 0.177063, 0.337276, -0.258455, -0.172135, -0.309286, 0.11186, -0.063176, -0.131384, -0.117094, -0.025922, 0.217625, 0.064211, 0.097853, 0.21063, 0.209421, -0.003702, -0.12937, 0.568447, 0.056538, 0.071752, 0.131685, 0.265961, 0.13205, -0.342845, -0.14158, 0.327599, 0.206992, 0.380256, -0.092596, -0.077388, -0.19744, 0.0181, 0.287433, 0.088687, 0.097779, -0.044891, -0.404558, 0.147617, 0.422414, 0.11152, 0.308355, -0.106925, 0.204491, 0.043149, 0.065036, -0.753266, 0.122351, 0.336833, -0.00801, -0.262349, -0.193282, -0.103019, -0.089863, 0.171337, 0.309414, 0.014423, 0.098344, -0.110209, -0.169665, -0.030896, -0.097471, 0.00666, 0.101595, 0.061852, 0.176964, -0.21323, -0.099782, 0.228022, -0.262198, -0.425247, 0.417079, 0.017299, -0.191564, 0.004748, -0.250221, 0.234701, -0.271065, -0.057453, 0.304677, 0.4701, 0.250589, -0.087086, -0.429968, -0.26403, -0.387913, -0.464612, -0.342326, -0.071384, 0.056032, 0.187852, 0.380555, 0.189432, 0.34011, 0.266143, 0.009143, -0.317522, -0.234059, 0.276891, 0.174809, 0.140528, -0.105288, -0.65848, 0.084518, -0.234592, 0.318019, 0.510351, 0.006479, 0.537869, -0.392096, -0.411233, -0.189889, 0.134191, -0.075683, -0.169409, 0.125705, -0.327027, -0.066445, -0.52144, -0.097577, -0.177766, 0.232948, -0.135097, -0.343601, -0.091137, 0.062618, 0.053287, 0.18644, -0.6094, 0.048837, 0.267879, -0.413453, -0.141747, 0.207981, -0.04925, -0.174698, -0.509869, -0.476397, 0.068638, -0.152651, 0.104868, 0.197331, -0.064872, -0.1051, -1.40418, -0.194817, 0.208227, -0.045253, -0.232286, 0.073835, 0.12477, 0.393212, 0.347051, -0.187002, 0.079182, -0.27366, -0.215268, 0.375153, 0.270839, -0.334651, -0.126299, 0.34891, -0.174526, 0.234166, -0.317101, 0.057596, -0.157946, 0.15384, 0.16841, 0.158807, -0.192711, 0.192967, -0.262208, 0.108206, 0.238273, 0.236885, -0.399003, 0.221671, 0.038937, -0.107384, 0.288186, 0.160961, -0.086901, 0.055572, -0.190251, -0.233012, -0.054056, -0.080065, 0.111019, -0.044721, 0.036763, 0.068096, -0.017873, 0.261569, 0.346434, 0.065229, -0.023851, -0.330086, 0.213761, 0.128141, -0.138356, -0.062674, 0.195684, 0.215495, 0.194634, -0.339133, -0.268465, -0.298594, -0.362164, -0.253306, -0.168292, 0.199113, -0.524123, -0.090773, -0.096247, 0.046664, -0.046513, 0.13497, 0.114262, -0.488398, -0.2347, 0.26051, 0.031243, -0.152594, 0.258885, -0.064539, -0.176934, -0.027078, 0.197796, -0.050404, 0.004199, -0.020745, -0.127675, 0.053641, 0.515427, 0.131214, 0.353022, 0.284469, 0.01992, 0.120054, -0.318418, -0.026164, 0.306722, 0.035191, 0.425452, 0.046934, 0.010072, -0.134704, -0.118026, 0.033954, 0.444288, 0.004718, 0.035425, -0.030341, 0.394551, -0.165347, -0.115437, -0.017297, -0.585792, 0.17584, 0.377414, 0.421793, 0.188193, 0.307312, 0.610973, -0.196335, -0.29751, -0.105334, 0.199592, -0.195532, -0.095663, 0.142824, 0.130411, -0.080841, 0.202719, 0.471838, -0.072826, 0.246151, 0.109777, -0.101721, 0.169312, 0.54931, -0.074526, 0.021988, -0.096728, -0.223985, -0.058271, 0.23175, -0.332564, 0.169538, -0.225755, 0.046639, 0.136866, -0.158008, 0.114861, 0.065593, -0.117845, 0.490567, -0.378452, 0.408763, 0.048036, 0.315145, -0.041749, 0.309414, 0.031155, 0.347439, -0.051953, -0.201888, 0.179567, 0.17787, 0.152476, -0.050791, 0.420996, -0.111863, 0.110077, 0.268456, -0.074361, -0.144558, 0.119518, 0.188343, 0.396397, -0.381355, 0.012706, 0.245918, 0.26378, 0.207468, 0.06862, 0.268775, 0.503796, -0.042588, 0.299801, 0.264099, 0.567906, 0.343754, 0.112813, -0.058419, 0.151873, 0.105714, 0.013268, -0.104881, 0.179048, 0.103319, 0.155907, -0.207802, -0.594822, 0.001902, 0.334797, -0.128813, 0.02412, 0.158227, 0.232278, -0.168783, -0.101024, 0.001426, -0.334838, -0.25871, -0.281469, 0.175912, 0.173545, 0.199818, 0.156694, -0.202074, -0.528855, 0.341782, -0.294037, -0.567092, 0.042527, 0.229844, -0.274017, 0.111275, 0.022757, -0.276101, 0.432179, 0.322151, -0.11445, 0.865446, 0.367544, 0.267589, 0.00913, -0.410267, 0.137246, -0.013712, 0.620266, -0.091809, -0.297659, -0.373554, 0.207084, -0.421513, -0.183964, -0.156403, 0.219091, -0.508866, 0.516564, -0.361563, -0.201876, 0.202988, 0.183052, -0.22674, 0.057602, 0.041183, -0.211405, 0.247517, 0.204372, 0.042675, -0.214661, -0.111943, 0.009249, -0.014273, -0.351459, 0.070249, -0.315316, 0.133022, -0.073426, -0.180068, -0.333467, -0.067528, 0.357887, 0.430013, 0.131229, 0.298485, 0.373571, -0.302588, -0.04142, -0.344667, -0.283525, 0.640575, 0.317337, 0.401381, 0.189486, 0.073186, 0.02416, -0.215443, 0.056143, 0.120336, -0.231008, -0.105986, -0.453503, -0.219785, -0.030274, -0.367342, -0.113358, 0.196147, 0.291157, 0.326472, 0.446857, -0.085561, 0.010959, 0.066616, 0.15023, -0.209559, -0.112984, 0.072598, -0.427699, -0.260073, 0.032521, 0.081192, -0.014159, 0.143266, 0.197289, 0.067981, 0.173343, -0.155237, 0.193014, -0.033441, -0.270513, 0.12482, -0.140087, -0.524852, -0.142413, -0.197585, 0.069683, 0.00106, -0.060416, 0.241788, -0.273508, 0.014679, -0.066452, -0.355985, -0.262008, 0.26785, -0.009632, 0.163352, -0.068926, 0.46138, -0.317769, -0.397394, 0.224559, 0.352467, -0.097191, -0.287376, 0.408935, 0.345993, 0.09068, 0.2473, 2.3111, -0.14702, -0.111799, -0.052716, 0.230692, 0.225265, -0.35181, 0.094639, -0.154193, 0.185283, -0.315491, -0.077438, 0.24265, -0.103315, -0.156623, -0.086985, -0.316301, 0.000796, -0.025065, -0.097864, -0.362233, -0.448295, -0.403811, 0.258856, -0.100113, -0.055167, 0.294756, 0.024366, 0.102181, -0.106253, 0.023481, 0.160745, 0.063656, 0.155556, -0.336469, 0.325614, -0.266145, -0.074525, 0.201849, 0.441004, -0.174538, 0.131324, 0.284181, -0.261139, 0.098757, -0.019434, -0.194059, -0.108849, -0.072083, -0.093592, -0.285213, -0.176247, 0.069006, 0.297378, -0.025485, 0.268425, -0.101778, 0.018244, 0.776521, 0.297483, 0.251349, -0.167599, -0.30711, 0.070886, 0.01418, 0.285411, -0.430578, -0.237813, 0.059797, 0.027026, -0.0401, 0.143306, -0.469388, 0.055392, 0.137084, 0.284571, 0.189084, -0.405384, 0.135162, -0.680802, -0.434545, -0.210474, 0.30213, 0.114895, 0.167591, -0.307093, -0.255949, 0.242898, 0.187186, 0.3594, -0.125649, 0.174752, 0.301497, -0.150837, 0.118552, 0.144685, 0.023964, 0.20746, -0.186843, 0.230801, 0.11998, 0.099391, -0.390997, 0.242291, -0.209336, -0.369022, 0.225537, -0.254627, -0.19489, 0.007398, 0.30297, -0.100568, -0.039901, -0.267365, 0.17685, 0.032181, -0.051405, -0.003954, 0.061989, -0.398622, -0.102953, 0.230554, 0.369276, -0.32691, 0.121757, 0.282954, 0.275177, 0.301383, -0.048143, -0.102173, 0.270449, 0.326503, 0.356696, 0.198148, 0.566387, 0.118633, 0.069914, 0.049507, 0.264942, -0.021149, -0.315653, 0.195143, -0.037403, -0.560274, 0.036958, 0.226462, -0.187307, 0.00932, 0.06245, 0.158091, -0.02271, 0.303259, -0.281134, 0.229444, 0.202054, -0.022002, -0.175618, -0.035272, -0.416639, -0.079588, -0.190756, 0.237299, 0.128946, -0.025495, 0.31631, 0.165038, -0.036987, -0.056892, -0.472618, -0.240427, 0.258912, 0.142983, -0.017613, 0.09934, 0.301944, -0.317137, -0.045731, 0.176888, -0.237915, 0.034828, -0.244753, -0.262084, 0.007381, 0.179293, 0.012775, 0.134795, -0.16332, -0.444582, -0.080167, 0.024672, -0.090209, -0.09143, 0.177423, 0.066397, -0.464973, 0.473688, 0.156524, -0.011874, -0.018553, 0.049021, -0.058733, -0.16094, -0.055641, 0.084314, -0.180604, -0.147321, 0.507487, 0.259353, 0.214523, 0.136566, 0.10569, -0.117942, 0.207137, 0.524199, 0.176873, 0.319673, 0.065076, 0.200993, 0.067377, -0.128274, -0.148678, -0.369512, -0.073067, 0.022234, -0.376015, -0.161213, -0.004808, -0.385252, -0.063738, 0.172607, -0.040167, -0.120519, 0.296494, -0.195137, 0.055634, 0.323904, -0.638334, -0.255347, -0.100382, 0.251132, -0.055979, 0.004391, -0.289993, -0.004406, 0.050617, 0.410566, 0.452379, -0.556643, 0.081581, 0.137408, 0.254382, 0.251986, 0.082583, -0.024478, -0.477649, 0.310222, 0.211715, 0.022005, 0.063267, -0.130571, 0.155438, 0.380635, 0.231092, 0.099042, -0.391679, -0.058661, -0.540002, -0.358878, -0.324142, 0.243863, -0.400055, 0.103157, -0.262598, -0.044676, -0.444585, 0.030034, 0.01668, 0.311564, 0.543531, -0.047709, -0.113976, -0.304748, -0.150807, -0.274888, 0.024604, -0.183968, 0.024504, 0.393683, -0.430544, -0.323938, 0.306146, -0.039433, -0.189903, 0.057104, 0.19676, 0.036725, 0.079969, -0.205473, -0.314785, 0.030175, -0.049927, 0.061419, -0.36235, -0.056072, 0.159138, 0.456674, 0.007084, 0.441482, -0.175448, 0.061765, 0.412505, -0.402356, -0.084174, 0.085337, -0.180057, 0.284374, 0.031825, 0.15114, 0.045856, 0.362218, 0.371848, 0.142496, 0.376347, 0.309523, 0.437986, -0.178713, -0.200895, -0.046065, 0.183416, -0.31115, 0.299963, -0.005362, 0.397519, -0.025268, 0.382294, -0.424654, -0.169118, 0.246686, -0.017109, -0.480841, -0.132066, 0.066515, -0.014366, 0.487456, -0.023139, 0.006938, 0.314802, 0.340747, -0.010792, 0.064729, 0.304637, 0.072488, -0.257531, -0.164407, -0.238009, 0.251726, 0.442151, -0.439882, -0.096664, 0.030146, -0.100694, -0.168094, -0.193923, 0.46795, 0.080172, 0.063586, -0.328571, -0.16416, -0.259619, 0.293085, -0.279067, 0.232538, 0.033095, -0.198362, -0.305268, -0.361208, 0.034213, 0.427696, -0.033954, -0.227259, 0.01694, -0.551509, -0.055286, -0.099024, 0.267421, 0.104194, 0.000865, -0.088973, 0.200319]

Server doesn't respond to CORS preflight requests

The server fails for some clients/chat interfaces trying to connect to it bc it doesn't respond to preflight requests (CORS), causing the client to not POST completion requests.

The following server output is observed:

INFO:     127.0.0.1:13579 - "OPTIONS /dashboard/billing/usage?start_date=2024-08-01&end_date=2024-08-07 HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:13580 - "OPTIONS /dashboard/billing/subscription HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:13590 - "OPTIONS /v1/v1/chat/completions HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:13645 - "OPTIONS /v1/chat/completions HTTP/1.1" 405 Method Not Allowed
INFO:     127.0.0.1:13655 - "OPTIONS /dashboard/billing/usage?start_date=2024-08-01&end_date=2024-08-07 HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:13656 - "OPTIONS /dashboard/billing/subscription HTTP/1.1" 404 Not Found

PR incoming

Langchain integration

How can RouteLLM be used with LangChain?

unable to edit/build with pdm

pdm isn't happy with some project structure. is easy to fix in pr

RouteLLM Useage instrauctions please

Hi There,
first, I want to thank you for this great project. I have successfully installed and configured RouteLLM on my machine, but I cannot find any information on how to execute it. Would you kindly provide some examples on how to use the tool? In particular i want an example on how to pass the prompt and get back the “model name”.

Thanks so much,

Pirouz

Question: was text-embedding-3-small included in the bills?

Hi developers, thanks for your effort on this project!

I have a question: in the paper, when calculating router monetary costs, were the costs of the OpenAI embedding model (i.e., text-embedding-3-small) included?

Support for Amazon Bedrock Routers

Add router support for models hosted on Amazon Bedrock.

about OpenAI api key error

this below code works fine

base_url = "http://xxxx"
api_key = "yummy"
client = OpenAI(base_url=base_url, api_key=api_key)
response = client.chat.completions.create(
     
        model="gpt4o",#
        messages=[
            {'role':"system","content":'you are a helpful assistant,help humman as much as you can'},
            {"role": "user", "content": "Hello! what are you doing?"}
        ],

    )

this below code runs error,the error is
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided:yummy. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

base_url = "http://xxxx"
api_key = "dddd"
client = Controller(
        routers=["mf"],
        api_key=api_key,
        api_base=base_url,
        strong_model="gpt-4o",
        weak_model="gpt-4o-mini",
    )

response = client.chat.completions.create(
        # This tells RouteLLM to use the MF router with a cost threshold of 0.11593
        model="router-mf-0.1159",#
        messages=[
            {'role':"system","content":'you are a helpful assistant,help humman as much as you can'},
            {"role": "user", "content": "Hello! what are you doing?"}
        ],

    )

I use the same api_url and api_key.Why the router code gave me error : Incorrect API key.
what happend to my code or env setting?

in this file routellm/routers/similarity_weighted/utils.py. if OPENAI_CLIENT = OpenAI(base_url=base_url, api_key="EMPTY"). it can work well.

Support function calling?

How routellm supports function calling?

How to set threshold value?

What this threshold value actually means?
routed_model = client.route(
prompt="What's the squareroot of 144?",
router="bert",
threshold=0.4066,
)
print(f"Prompt should be routed to {routed_model}")

[For 50.0% strong model calls for bert, threshold = 0.4066] what does it exactly implies? Does it mean that among 10 queries 50% will be routed to strong models?

Feature: Add an endpoint that only returns which model is the best

First of all, I want to congratulate you on this project. I think it is excellent. I would like to integrate this functionality with my existing workflow. I have a gateway that integrates multiple providers, and I need to know which model would be the best to call, instead of making a request through this API.

Therefore, I believe it would be beneficial to have an endpoint that simply indicates which model is the best. I can develop this feture by myself; I just want to know if you also think this would be a good feature so I can make a PR to contribute.

Support providing separate API keys for strong and weak models

Currently the controller only takes a scalar value API key argument. If I want to override the default API key for a specific call both models need to be from the same provider. I'd like to be able to provide a dict of { litellm_provider: key } to the api_key argument to enable precise key control.

use different embedding model

Were the effects of using different embedding models considered or tested? I'm just curious about it.

Related: #23 #39

Support for Ollama Planned/Possible?

Any ideas / plans to support ollama?
maybe following a similar approach as for TextGrad which runs perfectly on an embedded device like Jetson Orin.

PS. Awesome work, thx for sharing.

How to use the groq models?

Module controller missing - Linux & MAC

Hi Team ,
Thanks alot for the lib.Unfortunatly we couldn't use it with Linux and Mac
Version
routellm==0.1.0
Error
`

from routellm.controller import Controller
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'routellm.controller'

OpenAI dependency

Hello, I am facing this issue:
File "/app/app/src/router/__init__.py", line 5, in <module> from .gateway.router import router as gateway_router File "/app/app/src/router/gateway/router.py", line 4, in <module> from routellm.controller import Controller File "/usr/local/lib/python3.12/site-packages/routellm/controller.py", line 10, in <module> from routellm.routers.routers import ROUTER_CLS File "/usr/local/lib/python3.12/site-packages/routellm/routers/routers.py", line 17, in <module> from routellm.routers.matrix_factorization.model import MODEL_IDS, MFModel File "/usr/local/lib/python3.12/site-packages/routellm/routers/matrix_factorization/model.py", line 4, in <module> from routellm.routers.similarity_weighted.utils import OPENAI_CLIENT File "/usr/local/lib/python3.12/site-packages/routellm/routers/similarity_weighted/utils.py", line 11, in <module> OPENAI_CLIENT = OpenAI() ^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/openai/_client.py", line 104, in __init__ raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
Of course, I can set the environment variable, but I don't want to depend on it. It would be preferable to have a way to set the base_url, model, and api_key for this client as well.

ollama doesn't work

For other models, RouteLLM supports any provider that has an OpenAI-compatible interface, which includes a wide-range of both closed and open-source models running locally or in the cloud. Once you have an OpenAI-compatible endpoint, set the --alt-base-url and --alt-api-key flags to point to your endpoint. e.g. For Anyscale Endpoints,

I followed the instructions above to use my ollama, but it doesn't work at all. The error I get for requesting interface 6060/v1 still seems to point to OpenAI, am I doing something wrong?

Command: 'python -m routellm.openai_server --routers mf --config /Users/dali/PycharmProjects/simply_crawler/configuration/routellm_config.yaml --alt-base-url http://localhost:11434/v1/ --alt-api-key ollama'

The following is the error content:

Launching server with routers: ['mf'] INFO: Started server process [18063] INFO: Waiting for application startup. Loading mf: 100%|██████████| 1/1 [00:00<00:00, 1.54it/s] INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:6060 (Press CTRL+C to quit) INFO: 127.0.0.1:58563 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 412, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__ return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 758, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 74, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/openai_server.py", line 175, in create_chat_completion routed_model = route_fn(prompt, threshold, ROUTED_PAIR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/routers.py", line 42, in route if self.calculate_strong_win_rate(prompt) >= threshold: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/routers.py", line 235, in calculate_strong_win_rate winrate = self.model.pred_win_rate( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/matrix_factorization/model.py", line 124, in pred_win_rate logits = self.forward([model_a, model_b], prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/matrix_factorization/model.py", line 113, in forward OPENAI_CLIENT.embeddings.create(input=[prompt], model=self.embedding_model) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/resources/embeddings.py", line 114, in create return self._post( ^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1240, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 921, in request return self._request( ^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1020, in _request raise self._make_status_error_from_response(err.response) from None openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ollama. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} INFO: 127.0.0.1:58565 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 412, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__ return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 758, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 74, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/openai_server.py", line 175, in create_chat_completion routed_model = route_fn(prompt, threshold, ROUTED_PAIR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/routers.py", line 42, in route if self.calculate_strong_win_rate(prompt) >= threshold: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/routers.py", line 235, in calculate_strong_win_rate winrate = self.model.pred_win_rate( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/matrix_factorization/model.py", line 124, in pred_win_rate logits = self.forward([model_a, model_b], prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/matrix_factorization/model.py", line 113, in forward OPENAI_CLIENT.embeddings.create(input=[prompt], model=self.embedding_model) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/resources/embeddings.py", line 114, in create return self._post( ^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1240, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 921, in request return self._request( ^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1020, in _request raise self._make_status_error_from_response(err.response) from None openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ollama. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} ERROR: Exception in ASGI application Traceback (most recent call last): File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 412, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__ return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 758, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 74, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/openai_server.py", line 175, in create_chat_completion routed_model = route_fn(prompt, threshold, ROUTED_PAIR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/routers.py", line 42, in route if self.calculate_strong_win_rate(prompt) >= threshold: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/routers.py", line 235, in calculate_strong_win_rate winrate = self.model.pred_win_rate( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/matrix_factorization/model.py", line 124, in pred_win_rate logits = self.forward([model_a, model_b], prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/matrix_factorization/model.py", line 113, in forward OPENAI_CLIENT.embeddings.create(input=[prompt], model=self.embedding_model) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/resources/embeddings.py", line 114, in create return self._post( ^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1240, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 921, in request return self._request( ^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1020, in _request raise self._make_status_error_from_response(err.response) from None openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ollama. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} INFO: 127.0.0.1:58566 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

model directory naming conflict

The example config's model paths such as routellm/arena_battles_embeddings caused a conflict & error in my tooling bc the directory name is the same as the package name.