Giter Club home page Giter Club logo

instinct.cpp's Introduction

✨ instinct.cpp

instinct.cpp is a toolkit for developing LLM-powered applications.

Discord C++ 20 License

🚨 This project is under active development and has not reached to GA stage of first major release. See more at Roadmap section.

Features

Components of instinct.cpp

What instinct.cpp offer:

  • Applications that are working out-of-box.
    • Assistant API server: Agent service that is fully compatible with OpenAI's Assistant API.
      • mini-assistant-api: Single binary for single node deployment with vector database and other dependencies bundled.
      • mighty-assistant-api: (WIP) A cloud native implementation that is highly scalable with distributed components and multi-tenant support.
    • chat-agent: A CLI application that create knowledge index with your docs (PDF,TXT,MD,...) and launch an HTTP server that is fully compatible with OpenAI ChatCompletion.
  • Frameworks to build LLM-based applications. Say it langchain.cpp.
    • Integration for privacy-first LLM providers: Built-in support for Ollama and other OpenAI compatible API services like vllm, llama.cpp server, nitro and more.
    • Building blocks for common application patterns like Chatbot, RAG, LLM Agent.
    • Functional chaining components for composable LLM pipelines.
    • Agent patterns: ReACT, OpenAI-based tool agent, LLMCompiler, ...

User Guides

For built-in applications:

For library itself:

Roadmap

Complete project plan is tracked at Project kanban.

Milestone Features DDL
v0.1.0 Long-short memory, PDF/TXT/DOCX ingestor, Chain programing paradigm, RAG reference app doc-agent 3.29
v0.1.1 Performance tuning, RAG evaluation, Function calling agent 4.16
v0.1.2 OpenAI Assistant API initial implementation, single-binary reference app mini-assistant 4.30
v0.1.3 * mini-assistant: tool calls with opensourced LLMs
5.17
v0.1.4 * doc-agent : rerank model
* mini-assistant: file-search tool support.
6.18 6.14
v0.1.5 Overall optimization 6.30
v0.1.6 code-interpreter in mini-assistant 7.15

Contributions are welcomed! You can join discord server, or contact me via email.

instinct.cpp's People

Contributors

robinqu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

instinct.cpp's Issues

RAG Evaluation

A notebook for experiment, including:

  • comparision with langchain as baseline.
  • ablation test for different retrivers, spliter paramters.

Function calling support

  • ReActor: LLM-based actor pattern.
  • Organized by requiremnts generated by two reference apps:
    • Search Agent - A general chat assistant
    • tools
      • HTTP call tool for various web search
      • summarizer
    • Analytics Agent - a junior data interpreter
      • text2sql
      • data visualization.

Distributed Vector DB client integration

  • Weaviate
  • mivlus

This is still low priority before first GA, because brute-force search is quick enough with DuckDB and DB vendors don't have official C++ client yet.

Design & implmentation of instinct-agent

Manifest

  • ToolAgent, or functional calling basement
    • rool rendering, invocation command parser
    • function tool and function toolkits protocol
    • with local memory and checkpoint
  • Multi-agent ochestration
    • control flow: loop/if-else/switch
    • human in the loop
  • distributed agent protocol. (Agent Protocol based implementaion?)
    • agent state checkpoint
    • disributed memory, tool server
  • built-in tools
    • external retriever: Google Search, Wikipedia, SERP API
    • calculator
    • LLM
    • python interpreter ?
  • built-in agents
    • ReACT Tool Execution Agent
    • Plan & Execute
    • LLMCompiler

Project plans

Stage 0 - POC - for one week

I'm still getting too many questions about implementation. Let's do some minimum implementation for inspirations.

  • ToolAgent
  • Built-in toolkits: Search (one of Google, Tavily, duc-duck-go or SERP API) , Calculator, Python Interpreter
  • Human in the loop: pause, resume

Stage 1 - Assistant API Server for v0.1.2

#19

First version of mini-assistant

  • Sync api calls for Assistant, File, Run, RunStep, Message, Thread
  • Function call support, using ReACT

Future developments

Sprint Features
v0.1.3 * file search tool
* paralled call using LLMCompiler ?
v0.1.4 * code interpreter
v0.1.5 * stream support
* scaliblity on cloud: PGSQL, Kafka, …

First version of file-search tool for assistant-api

file search

search pipeline

https://platform.openai.com/docs/assistants/tools/file-search/how-it-works

The file_search tool implements several retrieval best practices out of the box to help you extract the right data from your files and augment the model’s responses. The file_search tool:

  • Rewrites user queries to optimize them for search.
  • Breaks down complex user queries into multiple searches it can run in parallel.
  • Runs both keyword and semantic searches across both assistant and thread vector stores.
  • Reranks search results to pick the most relevant ones before generating the final response.

online search sources

https://platform.openai.com/docs/assistants/tools/file-search/vector-stores

Each vector_store can hold up to 10,000 files.
Today, you can attach at most one vector store to an assistant and at most one vector store to a thread.

vector store source:

  • tool_resources on assistant object -> vector_store_id
  • tool_resources on thread object -> vector_store_id
  • attachments on user message. -> file_id -> create a new VS or insert into VS of this thread?

tool choices

Does it always trigger file search if any vs is configured? It seems it's not anymore.

Read about users' complains after V2 is released.

I guess that internal agent will decide if it's necessary to call file-search.

Another discussion about how file search tool works:
https://community.openai.com/t/how-knowledge-base-files-are-handled-assistants-api/601721/14

data expiration

https://platform.openai.com/docs/assistants/tools/file-search/managing-costs-with-expiration-policies

Vector stores created using thread helpers (like tool_resources.file_search.vector_stores in Threads or message.attachments in Messages) have a default expiration policy of 7 days after they were last active (defined as the last time the vector store was part of a run).

data deletion

  • Deleting the vector store file object or,
  • By deleting the underlying file object (which removes the file it from all vector_store and code_interpreter configurations across all assistants and threads in your organization)

tech debt listing

This is a long running issue that tracks technical debts found in existing code base.

  • transaction management in instinct-assistant services.
  • ProtobufUtils refactoring
  • RE2, ICU, or both?

RAG Evaulation

A notebook for experiment, including:

  • comparision with langchain as baseline.
  • ablation test for different retrivers, spliter paramters.

Limitations of mini-assistant

This is long live issue that tracks limitations of mini-assistant implementation.

General speaking, mini-assistant is an all-in-one, single-node jukebox that mimick OpenAI's Assistant API. It's not intended for large-scale and distributed production system.

When mini-assistant is matured enought, and the community actually demands a more power version, I will start to work on mighty-assitant submodule.

Related issues:

Compatable server for OpenAI Assitant API

SYNOPSIS

In terms of developer friendly API for Agent API, should we choose Agent Protocol or OpenAI Assistant API?

We will go through following sections to disccuss and conclude.

  • Background of Agent protocol and OpenAI Assistant.
  • Comparision of two.
  • Responds of other opensource frameworks.
  • Conclusions

Initial datastore improvement

  • Duckdb instance sharing between docstore and vector
  • VectorStore & DocStore refactoring
  • Performance: Connection pool, multi-thread handling

Research on agent archtecture and current implementations

Background research

Readings

https://lilianweng.github.io/posts/2023-06-23-agent/

  • Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.
  • Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.
  • Reliability of natural language interface: Current agent system relies on natural language as an interface between LLMs and external components such as memory and tools. However, the reliability of model outputs is questionable, as LLMs may make formatting errors and occasionally exhibit rebellious behavior (e.g. refuse to follow an instruction). Consequently, much of the agent demo code focuses on parsing model output.

Present opensource solutions

langchain

https://python.langchain.com/docs/modules/agents/quick_start

AgentExecutor

from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_functions_agent
from langchain.agents import AgentExecutor


llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

from langchain import hub

# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/openai-functions-agent")
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

auto-gpt

https://github.com/Significant-Gravitas/AutoGPT

Anatomy of an Agent:

  • Profile: Sets an agent's personality and specialization.
  • Memory: Encompasses the agent's long-term and short-term memory, storing both historical data and recent interactions.
  • Planning: The strategy the agent employs to tackle problems.
  • Action: The stage where the agent's decisions translate to tangible results.

Image

Agent categories

General agents: like auto-gpt
Vertical agents: data-interpreter, code-interpreter, meta-gpt, agents built by coze.

Implmentation details

Antonamy of agent in lilianweng's blog.

Image

Components

  • planing: ReACT, Reflection, or that used in XAgent
  • tools: function tool protocol, toolkits
  • memory: state handling

API

High-level API: Assistant API in OpenAI.
Low-level API: Agent Protocol by autogpt

Digging deep

More local inference support

Todos

  • BGE-M3 Embedding support
  • Possible llama.cpp support chat model in guff format
  • initial support for parallelism: multi-instance, batching

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.