abanteai / spice Goto Github PK

accelerant

License: Apache License 2.0

Python 100.00%

spice's Introduction

Spice

Spice is a light wrapper for AI SDKs like OpenAI's and Anthropic's. Spice simplifies LLM creations, embeddings, and transcriptions without obscuring any underlying parameters or processes. Spice also makes it ridiculously easy to switch between different providers, such as OpenAI and Anthropic, without having to modify your code.

Spice also collects useful information such as tokens used, time spent, and cost for each call, making it easily available no matter which LLM provider is being used.

Install

Spice is listed under spiceai on PyPi. To install, simply pip install spiceai.

API Keys

Spice will automatically load .env files in your current directory. To add an API key, either use a .env file or set the environment variables manually. These are the current environment variables that Spice will use:

OPENAI_API_KEY=<api_key> # Required for OpenAI calls
OPENAI_API_BASE=<base_url> # If set, will set the base url for OpenAI calls.

AZURE_OPENAI_KEY=<api_key> # Required for Azure OpenAI calls
AZURE_OPENAI_ENDPOINT=<endpoint_url> # Required for Azure OpenAI calls.

ANTHROPIC_API_KEY=<api_key> # Required for Anthropic calls

Usage Examples

All examples can be found in scripts/run.py

from spice import Spice

client = Spice()

messages: List[SpiceMessage] = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "list 5 random words"},
]
response = await client.get_response(messages=messages, model="gpt-4-0125-preview")

print(response.text)

Streaming

# You can set a default model for the client instead of passing it with each call
client = Spice(default_text_model="claude-3-opus-20240229")

# You can easily load prompts from files, directories, or even urls.
client.load_prompt("prompt.txt", name="my prompt")

# Spice can also automatically render Jinja templates.
messages: List[SpiceMessage] = [
    {"role": "system", "content": client.get_rendered_prompt("my prompt", assistant_name="Ryan Reynolds")},
    {"role": "user", "content": "list 5 random words"},
]
stream = await client.stream_response(messages=messages)

async for text in stream:
    print(text, end="", flush=True)
# Retrieve the complete response from the stream
response = await stream.complete_response()

# Response always includes the final text, no need build it from the stream yourself
print(response.text)

# Response also includes helpful stats
print(f"Took {response.total_time:.2f}s")
print(f"Input/Output tokens: {response.input_tokens}/{response.output_tokens}")

Mixing Providers

# Commonly used models and providers have premade constants
from spice.models import GPT_4_0125_PREVIEW

# Alias models for easy configuration, even mixing providers
model_aliases = {
    "task1_model": GPT_4_0125_PREVIEW,
    "task2_model": "claude-3-opus-20240229",
    "task3_model": "claude-3-haiku-20240307",
}

client = Spice(model_aliases=model_aliases)

messages: List[SpiceMessage] = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "list 5 random words"},
]
responses = await asyncio.gather(
    client.get_response(messages=messages, model="task1_model"),
    client.get_response(messages=messages, model="task2_model"),
    client.get_response(messages=messages, model="task3_model"),
)

for i, response in enumerate(responses, 1):
    print(f"\nModel {i} response:")
    print(response.text)
    print(f"Characters per second: {response.characters_per_second:.2f}")
    if response.cost is not None:
        print(f"Cost: ${response.cost / 100:.4f}")

# Spice also tracks the total cost over multiple models and providers
print(f"Total Cost: ${client.total_cost / 100:.4f}")

Using unknown models

client = Spice()

messages: List[SpiceMessage] = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "list 5 random words"},
]

# To use Azure, specify the provider and the deployment model name
response = await client.get_response(messages=messages, model="first-gpt35", provider="azure")
print(response.text)

# Alternatively, to make a model and it's provider known to Spice, create a custom Model object
from spice.models import TextModel
from spice.providers import AZURE

AZURE_GPT = TextModel("first-gpt35", AZURE, context_length=16385)
response = await client.get_response(messages=messages, model=AZURE_GPT)
print(response.text)

# Creating the model automatically registers it in Spice's model list, so listing the provider is no longer needed
response = await client.get_response(messages=messages, model="first-gpt35")
print(response.text)

Vision models

client = Spice()

# Spice makes it easy to add images from files or the internet
from spice.spice_message import file_image_message, user_message

messages: List[SpiceMessage] = [user_message("What do you see?"), file_image_message("/path/to/image.png")]
response = await client.get_response(messages, GPT_4_1106_VISION_PREVIEW)
print(response.text)

# Alternatively, you can use the SpiceMessages wrapper to easily create your prompts
spice_messages: SpiceMessages = SpiceMessages(client)
spice_messages.add_user_message("What do you see?")
spice_messages.add_file_image_message("https://example.com/image.png")
response = await client.get_response(spice_messages, CLAUDE_3_OPUS_20240229)
print(response.text)

Embeddings and Transcriptions

client = Spice()
input_texts = ["Once upon a time...", "Cinderella"]

# Spice can easily fetch embeddings and audio transcriptions
from spice.models import TEXT_EMBEDDING_ADA_002, WHISPER_1

embeddings = await client.get_embeddings(input_texts, TEXT_EMBEDDING_ADA_002)
transcription = await client.get_transcription("/path/to/audio/file", WHISPER_1)
print(transcription.text)

spice's People

Contributors

Stargazers

Watchers

Forkers

biobootloader

spice's Issues

Add docstrings including specifics on what errors can be raised for all public functions / classes

catch_and_convert_errors should support retry

On APIError or APIConnectionError we may want to retry. I'm not sure context managers can support this but it'd be nice if the client could have a retry strategy and if it gets one of those exceptions it sleeps with exponential back off with configurable base and start value and then starts at the top of the contextmanager block. And then raises the exception after a configurable number of failures.

Catch errors in embeddings, embeddings_sync, and transcription

Add embedding model context length

Separate `call_llm` into two functions, one for streaming

The function that doesn't stream will directly return a SpiceResponse with complete info (no finalize issues)

The function that does stream will return a new class that can be looped over to stream through it. Internally it will buffer from OpenAI/Anthropic/Etc. It will expose a finalize method that will wait for the stream to finish and then return a SpiceResponse object with full info. Additionally in the case of a user interrupting the stream it will expose a way to get the "current" SpiceResponse, i.e. filled with full info but only up to that point in the stream.

Spice decorator

Make a decorator for all of our spice client functions; practically all of them take model and have to run get_model and get_client to get the client first thing. Would be nice to have a decorator do all of that for us.

Clean up the way internal clients are created / lazy load clients

When Spice is initialized, let the user specify a default model or model alias, and create any clients needed for those models then. However if the user later calls a model that a client isn't set up for, try to set it up then.

Also let the user optionally provide at Spice initialization a dictionary of values for API keys / base URLS / etc, and spice will use those to initialize clients. If a key isn't provided it'll try to be found from the environment.

Probably the easiest way to do this is create a function for each possible provider that attempts to set it up, raising if keys aren't provided and they aren't in the environment.

Add total session cost tracking to `Spice` - replacing how it's done on Mentat

Easier way to construct messages

Right now it's annoying navigating all the constants making a message (role, content, etc.) and it gets worse with images. Make a message class (maybe extending List[SpiceMessage]?) that can easily add new messages and images.

Add anthropic message conversion

Need to convert system messages and images.

Add converters

We have validators to allow reattempting to get correct API output. It would also be nice to have converters. If a converter throws an exception we can also catch and retry retries times.

Spice Prompt Management

prompts are a pain point in AI engineering:

defining them inline in your code is awkward
need to deal with textwrap.dedent, etc
editing them and having to change line wrapping is frustrating
storing them in separate files is better but then you have to load from files
sometimes you want to use string formatting, like f"blah blah {x + y} blah blah"
in some cases you might actually want multiple versions of prompts tuned for different models

I'm not aware of any solution that solves all of these pain points. A starting point might be a flow like this:

from spice import prompts

prompts.register_dir("path/to/prompts/directory")

# loads prompt from `path/to/prompts/directory/prompt_x.txt`
messages = [{content: prompts.get_prompt('prompt_x')}]

Maybe the above methods should be part of the spice client?

Regardless this will also help with future prompt iteration features, because spice will be aware of which parts of your messages were from prompts and could be edited.

A more full solution could include a new file format for prompts?

I've seen people use .toml to store prompts and jinja templates to fill things in: https://github.com/Codium-ai/AlphaCodium/blob/f608cb5479d878348c2ffa9b64e8515314366bc2/alpha_codium/settings/code_contests_prompts_fix_solution.toml

Add DALLE and other image model support

Replace `SpiceEmbeddings` and `SpiceWhisper` with `embeddings` and `whisper` methods on `Spice`

`call_llm` should create a Timing object and attach it to the SpiceResponse it returns

Anthropic token counting

Anthropic doesn't have any official way to count tokens; the best way is probably to just estimate token counts. Maybe we want to add a token estimate boolean to SpiceResponse to indicate that it's an estimate, and we would set it to true whenever we didn't get the tokens directly from the api (even for openai which we have hopefully accurate token counting functions)?

Add GPT-4o-2024-08-06 model to spice/models.py with updated costs

We need to add the new GPT-4o-2024-08-06 model to our spice/models.py file. This model is the latest snapshot of GPT-4o that supports Structured Outputs and has reduced costs compared to previous versions.

Changes to be made:

Add a new TextModel instance for GPT-4o-2024-08-06 with updated costs.
Update the comment for the existing GPT-4o model to reflect that it currently points to gpt-4o-2024-05-13.

Here's the code to add:

GPT_4o_2024_08_06 = TextModel(
    "gpt-4o-2024-08-06", 
    OPEN_AI, 
    input_cost=250,  # Reduced from 500
    output_cost=1000,  # Reduced from 1500
    context_length=128000
)
# Note: This model supports a max output of 16,384 tokens, which is larger than previous versions.

# Update the comment for the existing GPT-4o model
"""Warning: This model always points to OpenAI's latest GPT-4o model (currently gpt-4o-2024-05-13), so the input and output costs may be incorrect. We recommend using specific versions of GPT-4o instead."""

This new model should be added after the existing GPT-4o models, around line 85 in the current file.

Add way to pass in api keys to client

I think the best way to use Spice will definitely be to just have a .env, but for people who want to pass in specific api keys, how should we handle this? We could just have them pass in an env dictionary to the client, but that is slightly confusing since we wouldn't actually be changing the environment variables.

Add response validation

Add the following parameters to get_response:

validator: Optional[str -> bool] = None
retry_count: int = 0
stream_callback: Optional[str -> None] = None get_response calls validator on the final response before returning and if it returns false tries again up to retry_counttimes. Ifstream_callbackis passed in then the streaming api is called (as instream_response`) and stream_callback is called incrementally on each chunk. The most obvious use case for that is printing.

refactor model/provider/client management

the Spice class does a lot of work in its __init__ and the beginning of call_llm to get a client for each call and manage the options for setting it up. Let's pull all this out into a new ClientManager class that Spice would just initialize in it's own init and then call once at the beginning of each call_llm call to get the client for that call.

Also let's add documentation for the default_model / default_provider / model_aliases settings.

move dependencies from requirements.txt and dev-requirements.txt to pyproject.toml

change provider name from `OPEN_AI` to `OPENAI`

https://github.com/AbanteAI/spice/blob/f9b4af7032e6e4ca8aa3b811e7f6a9898088be74/spice/providers.py#L74C1-L74C49

Raise SpiceMissingKeyError whenever missing keys required for a provider, and update Mentat catch and again ask the user for key

Add fast way to get client / provider from model

Would take a model and maybe a provider, and would return the client or provider. This would be useful for things like counting tokens (maybe we would just have a function that would count tokens given model and provider).

strict / warning mode

Add a strict / warning mode to the client that would log warnings on certain events (a model name gets overwritten, an invalid parameter gets passed to a provider, etc.). Things that we don't want to crash for convenience sake because most of the time it'll be expected behavior, but could be useful for the user to know about.

Add whisper support

make a new file, spice/whisper.py. Model it after spice/embeddings.py. It should contain a class SpiceWhisper that has a method kinda like this:

    async def call_whisper_api(self, audio_path: Path) -> str:
        audio_file = open(audio_path, "rb")
        transcript = await self.async_client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
        )
        return transcript.text

expose SpiceWhisper from the __init__. use async openai / azure clients, not sync ones like spice/embeddings does.

Add fine tuned model recognizing

Would be able to recognize fine tuned models and already know their cost / max context length

Add support for GPT-4-0806 model in models.py

Objective

Add support for the new GPT-4-0806 model in the models.py file, including its updated pricing.

Background

OpenAI has released a new version of GPT-4, known as GPT-4-0806, which offers improved performance and lower prices. We need to update our spice/models.py file to include this new model option.

Implementation Steps

Update the spice/models.py file:
- Add a new TextModel instance for GPT-4-0806.
- Update the pricing information for this new model.
Ensure that the new model is properly registered in the models list.
Update any relevant documentation or README files to mention the new model option.
Add tests to ensure the new model can be selected and used correctly.

Code Changes

In spice/models.py, add the new model:

<ab_codebase_reference title="Location to add new model">
spice/models.py:72-83
</ab_codebase_reference>

Add the following code after the existing GPT-4 models:

GPT_4_0806 = TextModel(
    "gpt-4-0806",
    OPEN_AI,
    input_cost=300,  # Adjust this value to the correct input cost in cents / million tokens
    output_cost=600,  # Adjust this value to the correct output cost in cents / million tokens
    context_length=128000  # Adjust if the context length is different
)

Note: The input_cost and output_cost values in the code above are placeholders. Please replace them with the actual pricing for the GPT-4-0806 model.

Testing

Add unit tests in the appropriate test file (likely tests/test_models.py) to verify that:
- The GPT-4-0806 model can be retrieved using get_model_from_name("gpt-4-0806").
- The model has the correct attributes (name, provider, input_cost, output_cost, context_length).

Documentation

Update relevant documentation, such as README.md or any API documentation, to include information about the new GPT-4-0806 model option.

Additional Considerations

Ensure that the OPEN_AI provider in providers.py supports this new model version.
If there are any other parts of the codebase that explicitly list available models, update those as well.

json-mode with Claude missing "{"

We trick Claude into having a json mode by seeding its response with "{". It works, in that the rest of the response follows json format, except the final Response.test doesn't include the initial "{", so it's not actually json-parsable.

Return EmbeddingResponse and TranscriptionResponse objects

Right now we just return the raw data from embedding and transcription calls, which means the user has no way of fetching that data

Replace validators/converters with retry_strategy

Currently get_response supports converters and validators as ways to coerce model response and retry it it's bad. But we want to support more complicated behaviors such as:

increasing the temperature on failure
switching to a bigger model on failure
showing the model it's output so it can understand why it failed and self correct

To support these let's make get_response accept a retry_strategy argument which is of abstract class RetryStrategy which should have one function decide (name TBD, feel free to choose the name of you have a good idea) which accepts the call_args the number of the previous attempt (0 indexed) and the model output as text. It then returns a tuple, (behavior, next_call_args, result) where behavior is a new enum Behavior which can be either RETRY or RETURN. If it is set to Return the text and result are added to the spice response and returned. Otherwise we try again with next_call_args. It's up to the strategy to throw an exception or otherwise define a failure case after a certain number of iterations.

This will deprecate the current converter/validator/retries arguments. We should support them for now and if passed create a RetryStrategy called Default which reproduces the current behavior.