Giter Club home page Giter Club logo

spice's Introduction

Spice

Spice is a light wrapper for AI SDKs like OpenAI's and Anthropic's. Spice simplifies LLM creations, embeddings, and transcriptions without obscuring any underlying parameters or processes. Spice also makes it ridiculously easy to switch between different providers, such as OpenAI and Anthropic, without having to modify your code.

Spice also collects useful information such as tokens used, time spent, and cost for each call, making it easily available no matter which LLM provider is being used.

Install

Spice is listed under spiceai on PyPi. To install, simply pip install spiceai.

API Keys

Spice will automatically load .env files in your current directory. To add an API key, either use a .env file or set the environment variables manually. These are the current environment variables that Spice will use:

OPENAI_API_KEY=<api_key> # Required for OpenAI calls
OPENAI_API_BASE=<base_url> # If set, will set the base url for OpenAI calls.

AZURE_OPENAI_KEY=<api_key> # Required for Azure OpenAI calls
AZURE_OPENAI_ENDPOINT=<endpoint_url> # Required for Azure OpenAI calls.

ANTHROPIC_API_KEY=<api_key> # Required for Anthropic calls

Usage Examples

All examples can be found in scripts/run.py

from spice import Spice

client = Spice()

messages: List[SpiceMessage] = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "list 5 random words"},
]
response = await client.get_response(messages=messages, model="gpt-4-0125-preview")

print(response.text)

Streaming

# You can set a default model for the client instead of passing it with each call
client = Spice(default_text_model="claude-3-opus-20240229")

# You can easily load prompts from files, directories, or even urls.
client.load_prompt("prompt.txt", name="my prompt")

# Spice can also automatically render Jinja templates.
messages: List[SpiceMessage] = [
    {"role": "system", "content": client.get_rendered_prompt("my prompt", assistant_name="Ryan Reynolds")},
    {"role": "user", "content": "list 5 random words"},
]
stream = await client.stream_response(messages=messages)

async for text in stream:
    print(text, end="", flush=True)
# Retrieve the complete response from the stream
response = await stream.complete_response()

# Response always includes the final text, no need build it from the stream yourself
print(response.text)

# Response also includes helpful stats
print(f"Took {response.total_time:.2f}s")
print(f"Input/Output tokens: {response.input_tokens}/{response.output_tokens}")

Mixing Providers

# Commonly used models and providers have premade constants
from spice.models import GPT_4_0125_PREVIEW

# Alias models for easy configuration, even mixing providers
model_aliases = {
    "task1_model": GPT_4_0125_PREVIEW,
    "task2_model": "claude-3-opus-20240229",
    "task3_model": "claude-3-haiku-20240307",
}

client = Spice(model_aliases=model_aliases)

messages: List[SpiceMessage] = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "list 5 random words"},
]
responses = await asyncio.gather(
    client.get_response(messages=messages, model="task1_model"),
    client.get_response(messages=messages, model="task2_model"),
    client.get_response(messages=messages, model="task3_model"),
)

for i, response in enumerate(responses, 1):
    print(f"\nModel {i} response:")
    print(response.text)
    print(f"Characters per second: {response.characters_per_second:.2f}")
    if response.cost is not None:
        print(f"Cost: ${response.cost / 100:.4f}")

# Spice also tracks the total cost over multiple models and providers
print(f"Total Cost: ${client.total_cost / 100:.4f}")

Using unknown models

client = Spice()

messages: List[SpiceMessage] = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "list 5 random words"},
]

# To use Azure, specify the provider and the deployment model name
response = await client.get_response(messages=messages, model="first-gpt35", provider="azure")
print(response.text)

# Alternatively, to make a model and it's provider known to Spice, create a custom Model object
from spice.models import TextModel
from spice.providers import AZURE

AZURE_GPT = TextModel("first-gpt35", AZURE, context_length=16385)
response = await client.get_response(messages=messages, model=AZURE_GPT)
print(response.text)

# Creating the model automatically registers it in Spice's model list, so listing the provider is no longer needed
response = await client.get_response(messages=messages, model="first-gpt35")
print(response.text)

Vision models

client = Spice()

# Spice makes it easy to add images from files or the internet
from spice.spice_message import file_image_message, user_message

messages: List[SpiceMessage] = [user_message("What do you see?"), file_image_message("/path/to/image.png")]
response = await client.get_response(messages, GPT_4_1106_VISION_PREVIEW)
print(response.text)

# Alternatively, you can use the SpiceMessages wrapper to easily create your prompts
spice_messages: SpiceMessages = SpiceMessages(client)
spice_messages.add_user_message("What do you see?")
spice_messages.add_file_image_message("https://example.com/image.png")
response = await client.get_response(spice_messages, CLAUDE_3_OPUS_20240229)
print(response.text)

Embeddings and Transcriptions

client = Spice()
input_texts = ["Once upon a time...", "Cinderella"]

# Spice can easily fetch embeddings and audio transcriptions
from spice.models import TEXT_EMBEDDING_ADA_002, WHISPER_1

embeddings = await client.get_embeddings(input_texts, TEXT_EMBEDDING_ADA_002)
transcription = await client.get_transcription("/path/to/audio/file", WHISPER_1)
print(transcription.text)

spice's People

Contributors

biobootloader avatar pcswingle avatar jakethekoenig avatar granawkins avatar mentatbot[bot] avatar mentatai[bot] avatar

Stargazers

Nikolaus Schlemm avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

biobootloader

spice's Issues

catch_and_convert_errors should support retry

On APIError or APIConnectionError we may want to retry. I'm not sure context managers can support this but it'd be nice if the client could have a retry strategy and if it gets one of those exceptions it sleeps with exponential back off with configurable base and start value and then starts at the top of the contextmanager block. And then raises the exception after a configurable number of failures.

Separate `call_llm` into two functions, one for streaming

The function that doesn't stream will directly return a SpiceResponse with complete info (no finalize issues)

The function that does stream will return a new class that can be looped over to stream through it. Internally it will buffer from OpenAI/Anthropic/Etc. It will expose a finalize method that will wait for the stream to finish and then return a SpiceResponse object with full info. Additionally in the case of a user interrupting the stream it will expose a way to get the "current" SpiceResponse, i.e. filled with full info but only up to that point in the stream.

Spice decorator

Make a decorator for all of our spice client functions; practically all of them take model and have to run get_model and get_client to get the client first thing. Would be nice to have a decorator do all of that for us.

Clean up the way internal clients are created / lazy load clients

When Spice is initialized, let the user specify a default model or model alias, and create any clients needed for those models then. However if the user later calls a model that a client isn't set up for, try to set it up then.

Also let the user optionally provide at Spice initialization a dictionary of values for API keys / base URLS / etc, and spice will use those to initialize clients. If a key isn't provided it'll try to be found from the environment.

Probably the easiest way to do this is create a function for each possible provider that attempts to set it up, raising if keys aren't provided and they aren't in the environment.

Easier way to construct messages

Right now it's annoying navigating all the constants making a message (role, content, etc.) and it gets worse with images. Make a message class (maybe extending List[SpiceMessage]?) that can easily add new messages and images.

Add converters

We have validators to allow reattempting to get correct API output. It would also be nice to have converters. If a converter throws an exception we can also catch and retry retries times.

Spice Prompt Management

prompts are a pain point in AI engineering:

  • defining them inline in your code is awkward
  • need to deal with textwrap.dedent, etc
  • editing them and having to change line wrapping is frustrating
  • storing them in separate files is better but then you have to load from files
  • sometimes you want to use string formatting, like f"blah blah {x + y} blah blah"
  • in some cases you might actually want multiple versions of prompts tuned for different models

I'm not aware of any solution that solves all of these pain points. A starting point might be a flow like this:

from spice import prompts

prompts.register_dir("path/to/prompts/directory")

# loads prompt from `path/to/prompts/directory/prompt_x.txt`
messages = [{content: prompts.get_prompt('prompt_x')}]  

Maybe the above methods should be part of the spice client?

Regardless this will also help with future prompt iteration features, because spice will be aware of which parts of your messages were from prompts and could be edited.

A more full solution could include a new file format for prompts?

I've seen people use .toml to store prompts and jinja templates to fill things in: https://github.com/Codium-ai/AlphaCodium/blob/f608cb5479d878348c2ffa9b64e8515314366bc2/alpha_codium/settings/code_contests_prompts_fix_solution.toml

Anthropic token counting

Anthropic doesn't have any official way to count tokens; the best way is probably to just estimate token counts. Maybe we want to add a token estimate boolean to SpiceResponse to indicate that it's an estimate, and we would set it to true whenever we didn't get the tokens directly from the api (even for openai which we have hopefully accurate token counting functions)?

Add GPT-4o-2024-08-06 model to spice/models.py with updated costs

We need to add the new GPT-4o-2024-08-06 model to our spice/models.py file. This model is the latest snapshot of GPT-4o that supports Structured Outputs and has reduced costs compared to previous versions.

Changes to be made:

  1. Add a new TextModel instance for GPT-4o-2024-08-06 with updated costs.
  2. Update the comment for the existing GPT-4o model to reflect that it currently points to gpt-4o-2024-05-13.

Here's the code to add:

GPT_4o_2024_08_06 = TextModel(
    "gpt-4o-2024-08-06", 
    OPEN_AI, 
    input_cost=250,  # Reduced from 500
    output_cost=1000,  # Reduced from 1500
    context_length=128000
)
# Note: This model supports a max output of 16,384 tokens, which is larger than previous versions.

# Update the comment for the existing GPT-4o model
"""Warning: This model always points to OpenAI's latest GPT-4o model (currently gpt-4o-2024-05-13), so the input and output costs may be incorrect. We recommend using specific versions of GPT-4o instead."""

This new model should be added after the existing GPT-4o models, around line 85 in the current file.

Add way to pass in api keys to client

I think the best way to use Spice will definitely be to just have a .env, but for people who want to pass in specific api keys, how should we handle this? We could just have them pass in an env dictionary to the client, but that is slightly confusing since we wouldn't actually be changing the environment variables.

Add response validation

Add the following parameters to get_response:

  • validator: Optional[str -> bool] = None
  • retry_count: int = 0
  • stream_callback: Optional[str -> None] = None get_response calls validator on the final response before returning and if it returns false tries again up to retry_counttimes. Ifstream_callbackis passed in then the streaming api is called (as instream_response`) and stream_callback is called incrementally on each chunk. The most obvious use case for that is printing.

refactor model/provider/client management

the Spice class does a lot of work in its __init__ and the beginning of call_llm to get a client for each call and manage the options for setting it up. Let's pull all this out into a new ClientManager class that Spice would just initialize in it's own init and then call once at the beginning of each call_llm call to get the client for that call.

Also let's add documentation for the default_model / default_provider / model_aliases settings.

Add fast way to get client / provider from model

Would take a model and maybe a provider, and would return the client or provider. This would be useful for things like counting tokens (maybe we would just have a function that would count tokens given model and provider).

strict / warning mode

Add a strict / warning mode to the client that would log warnings on certain events (a model name gets overwritten, an invalid parameter gets passed to a provider, etc.). Things that we don't want to crash for convenience sake because most of the time it'll be expected behavior, but could be useful for the user to know about.

Add whisper support

make a new file, spice/whisper.py. Model it after spice/embeddings.py. It should contain a class SpiceWhisper that has a method kinda like this:

    async def call_whisper_api(self, audio_path: Path) -> str:
        audio_file = open(audio_path, "rb")
        transcript = await self.async_client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
        )
        return transcript.text

expose SpiceWhisper from the __init__. use async openai / azure clients, not sync ones like spice/embeddings does.

Add support for GPT-4-0806 model in models.py

Objective

Add support for the new GPT-4-0806 model in the models.py file, including its updated pricing.

Background

OpenAI has released a new version of GPT-4, known as GPT-4-0806, which offers improved performance and lower prices. We need to update our spice/models.py file to include this new model option.

Implementation Steps

  1. Update the spice/models.py file:

    • Add a new TextModel instance for GPT-4-0806.
    • Update the pricing information for this new model.
  2. Ensure that the new model is properly registered in the models list.

  3. Update any relevant documentation or README files to mention the new model option.

  4. Add tests to ensure the new model can be selected and used correctly.

Code Changes

In spice/models.py, add the new model:

<ab_codebase_reference title="Location to add new model">
spice/models.py:72-83
</ab_codebase_reference>

Add the following code after the existing GPT-4 models:

GPT_4_0806 = TextModel(
    "gpt-4-0806",
    OPEN_AI,
    input_cost=300,  # Adjust this value to the correct input cost in cents / million tokens
    output_cost=600,  # Adjust this value to the correct output cost in cents / million tokens
    context_length=128000  # Adjust if the context length is different
)

Note: The input_cost and output_cost values in the code above are placeholders. Please replace them with the actual pricing for the GPT-4-0806 model.

Testing

  • Add unit tests in the appropriate test file (likely tests/test_models.py) to verify that:
    • The GPT-4-0806 model can be retrieved using get_model_from_name("gpt-4-0806").
    • The model has the correct attributes (name, provider, input_cost, output_cost, context_length).

Documentation

Update relevant documentation, such as README.md or any API documentation, to include information about the new GPT-4-0806 model option.

Additional Considerations

  • Ensure that the OPEN_AI provider in providers.py supports this new model version.
  • If there are any other parts of the codebase that explicitly list available models, update those as well.

json-mode with Claude missing "{"

We trick Claude into having a json mode by seeding its response with "{". It works, in that the rest of the response follows json format, except the final Response.test doesn't include the initial "{", so it's not actually json-parsable.
image

Replace validators/converters with retry_strategy

Currently get_response supports converters and validators as ways to coerce model response and retry it it's bad. But we want to support more complicated behaviors such as:

  • increasing the temperature on failure
  • switching to a bigger model on failure
  • showing the model it's output so it can understand why it failed and self correct

To support these let's make get_response accept a retry_strategy argument which is of abstract class RetryStrategy which should have one function decide (name TBD, feel free to choose the name of you have a good idea) which accepts the call_args the number of the previous attempt (0 indexed) and the model output as text. It then returns a tuple, (behavior, next_call_args, result) where behavior is a new enum Behavior which can be either RETRY or RETURN. If it is set to Return the text and result are added to the spice response and returned. Otherwise we try again with next_call_args. It's up to the strategy to throw an exception or otherwise define a failure case after a certain number of iterations.

This will deprecate the current converter/validator/retries arguments. We should support them for now and if passed create a RetryStrategy called Default which reproduces the current behavior.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.