AIKit is a one-stop shop to quickly get started to host, deploy, build and fine-tune large language models (LLMs).
AIKit offers two main capabilities:
-
Inference: AIKit uses LocalAI, which supports a wide range of inference capabilities and formats. LocalAI provides a drop-in replacement REST API that is OpenAI API compatible, so you can use any OpenAI API compatible client, such as Kubectl AI, Chatbot-UI and many more, to send requests to open LLMs!
-
Fine Tuning: AIKit offers an extensible fine tuning interface. It supports Unsloth for fast, memory efficient, and easy fine-tuning experience.
๐ For full documentation, please see AIKit website!
- ๐ณ No GPU, Internet access or additional tools needed except for Docker!
- ๐ค Minimal image size, resulting in less vulnerabilities and smaller attack surface with a custom distroless-based image
- ๐ต Fine tune support
- ๐ Easy to use declarative configuration for inference and fine tuning
- โจ OpenAI API compatible to use with any OpenAI API compatible client
- ๐ธ Multi-modal model support
- ๐ผ๏ธ Image generation support with Stable Diffusion
- ๐ฆ Support for GGUF (
llama
), GPTQ (exllama
orexllama2
), EXL2 (exllama2
), and GGML (llama-ggml
) and Mamba models - ๐ข Kubernetes deployment ready
- ๐ฆ Supports multiple models with a single image
- ๐ฅ๏ธ Supports GPU-accelerated inferencing with NVIDIA GPUs
- ๐ Ensure supply chain security with SBOMs, Provenance attestations, and signed images
- ๐ Supports air-gapped environments with self-hosted, local, or any remote container registries to store model images for inference on the edge.
You can get started with AIKit quickly on your local machine without a GPU!
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3:8b
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llama-3-8b-instruct",
"messages": [{"role": "user", "content": "explain kubernetes in a sentence"}]
}'
Output should be similar to:
{
// ...
"model": "llama-3-8b-instruct",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Kubernetes is an open-source container orchestration system that automates the deployment, scaling, and management of applications and services, allowing developers to focus on writing code rather than managing infrastructure."
}
}
],
// ...
}
That's it! ๐ API is OpenAI compatible so this is a drop-in replacement for any OpenAI API compatible client.
AIKit comes with pre-made models that you can use out-of-the-box!
If it doesn't include a specific model, you can always create your own images, and host in a container registry of your choice!
Model | Optimization | Parameters | Command | Model Name | License |
---|---|---|---|---|---|
๐ฆ Llama 3 | Instruct | 8B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3:8b |
llama-3-8b-instruct |
Llama |
๐ฆ Llama 3 | Instruct | 70B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3:70b |
llama-3-70b-instruct |
Llama |
๐ฆ Llama 2 | Chat | 7B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama2:7b |
llama-2-7b-chat |
Llama |
๐ฆ Llama 2 | Chat | 13B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama2:13b |
llama-2-13b-chat |
Llama |
Instruct | 8x7B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b |
mixtral-8x7b-instruct |
Apache | |
Instruct | 3.8B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/phi3:3.8b |
phi-3-3.8b |
MIT | |
๐ก Gemma 1.1 | Instruct | 2B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/gemma:2b |
gemma-2b-instruct |
Gemma |
โจ๏ธ Codestral 0.1 | Code | 22B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/codestral:22b |
codestral-22b |
MNLP |
Note
To enable GPU acceleration, please see GPU Acceleration.
Please note that only difference between CPU and GPU section is the --gpus all
flag in the command to enable GPU acceleration.
Model | Optimization | Parameters | Command | Model Name | License |
---|---|---|---|---|---|
๐ฆ Llama 3 | Instruct | 8B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3:8b |
llama-3-8b-instruct |
Llama |
๐ฆ Llama 3 | Instruct | 70B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3:70b |
llama-3-70b-instruct |
Llama |
๐ฆ Llama 2 | Chat | 7B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama2:7b |
llama-2-7b-chat |
Llama |
๐ฆ Llama 2 | Chat | 13B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama2:13b |
llama-2-13b-chat |
Llama |
Instruct | 8x7B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b |
mixtral-8x7b-instruct |
Apache | |
Instruct | 3.8B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/phi3:3.8b |
phi-3-3.8b |
MIT | |
๐ก Gemma 1.1 | Instruct | 2B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/gemma:2b |
gemma-2b-instruct |
Gemma |
โจ๏ธ Codestral 0.1 | Code | 22B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/codestral:22b |
codestral-22b |
MNLP |
๐ For more information and how to fine tune models or create your own images, please see AIKit website!