Milestone	Features	DDL
v0.1.0	Long-short memory, PDF/TXT/DOCX ingestor, `Chain` programing paradigm, RAG reference app `doc-agent`	3.29
v0.1.1	Performance tuning, RAG evaluation, Function calling agent	4.16
v0.1.2	OpenAI Assistant API initial implementation, single-binary reference app `mini-assistant`	4.30
v0.1.3	* `mini-assistant`: tool calls with opensourced LLMs	5.17
v0.1.4	* `doc-agent` : rerank model * `mini-assistant`: `file-search` tool support.	~~6.18~~ 6.14
v0.1.5	Overall optimization	6.30
v0.1.6	`code-interpreter` in `mini-assistant`	7.15

protobuf is not header-only

proto-objects has be an static archive. consider flatbuffer.

RAG Evaluation

A notebook for experiment, including:

comparision with langchain as baseline.
ablation test for different retrivers, spliter paramters.

Building & releasing issue

rpp via conan. blocking by victimsnino/ReactivePlusPlus#548
Make conan optional: migrate to FetchContent.
github actions for CI, at least for instinct-core and instinct-data modules.
validate library install and uninstall

Client integration with districuted vector store

My choices would be:

Weaviate
mivlus

This is still low priority before first GA, because brute-force search is quick enough with DuckDB and DB vendors don't have official C++ client yet.

External knowledge retriever

Wikipedia
SERP API](https://serpapi.com/dashboard)

Function calling support

ReActor: LLM-based actor pattern.
Organized by requiremnts generated by two reference apps:
- Search Agent - A general chat assistant
- tools
  - HTTP call tool for various web search
  - summarizer
- Analytics Agent - a junior data interpreter
  - text2sql
  - data visualization.

Distributed Vector DB client integration

Weaviate
mivlus

This is still low priority before first GA, because brute-force search is quick enough with DuckDB and DB vendors don't have official C++ client yet.

Design & implmentation of instinct-agent

Manifest

ToolAgent, or functional calling basement
- rool rendering, invocation command parser
- function tool and function toolkits protocol
- with local memory and checkpoint
Multi-agent ochestration
- control flow: loop/if-else/switch
- human in the loop
distributed agent protocol. (Agent Protocol based implementaion?)
- agent state checkpoint
- disributed memory, tool server
built-in tools
- external retriever: Google Search, Wikipedia, SERP API
- calculator
- LLM
- python interpreter ?
built-in agents
- ReACT Tool Execution Agent
- Plan & Execute
- LLMCompiler

Project plans

Stage 0 - POC - for one week

I'm still getting too many questions about implementation. Let's do some minimum implementation for inspirations.

ToolAgent
Built-in toolkits: Search (one of Google, Tavily, duc-duck-go or SERP API) , Calculator, Python Interpreter
Human in the loop: pause, resume

Stage 1 - Assistant API Server for v0.1.2

#19

First version of mini-assistant

Sync api calls for Assistant, File, Run, RunStep, Message, Thread
Function call support, using ReACT

Future developments

Sprint	Features
v0.1.3	* file search tool * paralled call using LLMCompiler ?
v0.1.4	* code interpreter
v0.1.5	* stream support * scaliblity on cloud: PGSQL, Kafka, …

Update pipeline in `doc-agent` to with latest progress

Use MultiPathRetriever with reranker
BM25 retriever
Build with application context
Benchmarks #8

First version of file-search tool for assistant-api

file search

search pipeline

https://platform.openai.com/docs/assistants/tools/file-search/how-it-works

The file_search tool implements several retrieval best practices out of the box to help you extract the right data from your files and augment the model’s responses. The file_search tool:

Rewrites user queries to optimize them for search.

Breaks down complex user queries into multiple searches it can run in parallel.

Runs both keyword and semantic searches across both assistant and thread vector stores.

Reranks search results to pick the most relevant ones before generating the final response.

online search sources

https://platform.openai.com/docs/assistants/tools/file-search/vector-stores

Each vector_store can hold up to 10,000 files.
Today, you can attach at most one vector store to an assistant and at most one vector store to a thread.

vector store source:

tool_resources on assistant object -> vector_store_id
tool_resources on thread object -> vector_store_id
attachments on user message. -> file_id -> create a new VS or insert into VS of this thread?

tool choices

Does it always trigger file search if any vs is configured? It seems it's not anymore.

Read about users' complains after V2 is released.

I guess that internal agent will decide if it's necessary to call file-search.

Another discussion about how file search tool works:
https://community.openai.com/t/how-knowledge-base-files-are-handled-assistants-api/601721/14

data expiration

https://platform.openai.com/docs/assistants/tools/file-search/managing-costs-with-expiration-policies

Vector stores created using thread helpers (like tool_resources.file_search.vector_stores in Threads or message.attachments in Messages) have a default expiration policy of 7 days after they were last active (defined as the last time the vector store was part of a run).

data deletion

Deleting the vector store file object or,
By deleting the underlying file object (which removes the file it from all vector_store and code_interpreter configurations across all assistants and threads in your organization)

Easy to have performance tuning

stream of sequence chain
multi-threading for embbeding multiple documents
multi-threading for MappingFunctionStep

tech debt listing

This is a long running issue that tracks technical debts found in existing code base.

transaction management in instinct-assistant services.
ProtobufUtils refactoring
RE2, ICU, or both?

Releasing issues of `v0.1.1`

conan package
- rpp via conan. blocking by victimsnino/ReactivePlusPlus#548
- figure out how to build a header-only package
github actions for CI
brew package

RAG Evaulation

A notebook for experiment, including:

comparision with langchain as baseline.
ablation test for different retrivers, spliter paramters.

External knowledge retrievers

SERP API
Wikipedia API

Limitations of mini-assistant

This is long live issue that tracks limitations of mini-assistant implementation.

General speaking, mini-assistant is an all-in-one, single-node jukebox that mimick OpenAI's Assistant API. It's not intended for large-scale and distributed production system.

When mini-assistant is matured enought, and the community actually demands a more power version, I will start to work on mighty-assitant submodule.

Related issues:

#20
#16

Compatable server for OpenAI Assitant API

SYNOPSIS

In terms of developer friendly API for Agent API, should we choose Agent Protocol or OpenAI Assistant API?

We will go through following sections to disccuss and conclude.

Background of Agent protocol and OpenAI Assistant.
Comparision of two.
Responds of other opensource frameworks.
Conclusions

Initial datastore improvement

Duckdb instance sharing between docstore and vector
VectorStore & DocStore refactoring
Performance: Connection pool, multi-thread handling

DuckDB DocStore and VectorStore refactoring

remove internal and custom appender classes
allow sharing db handle

Modular RAG implementations: reranker, multi-retrievers

For better evaluation result in HF QA dataset.

Reranking: BCE, BGE-M3 scoring
Query rewrite
- to generate SQL filter for given prompt
- to generate hyperthecial queries

Research on agent archtecture and current implementations

Background research

Readings

https://lilianweng.github.io/posts/2023-06-23-agent/

Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.

Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.

Reliability of natural language interface: Current agent system relies on natural language as an interface between LLMs and external components such as memory and tools. However, the reliability of model outputs is questionable, as LLMs may make formatting errors and occasionally exhibit rebellious behavior (e.g. refuse to follow an instruction). Consequently, much of the agent demo code focuses on parsing model output.

Present opensource solutions

langchain

https://python.langchain.com/docs/modules/agents/quick_start

AgentExecutor

from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_functions_agent
from langchain.agents import AgentExecutor


llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

from langchain import hub

# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/openai-functions-agent")
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

auto-gpt

https://github.com/Significant-Gravitas/AutoGPT

Anatomy of an Agent:

Profile: Sets an agent's personality and specialization.

Memory: Encompasses the agent's long-term and short-term memory, storing both historical data and recent interactions.

Planning: The strategy the agent employs to tackle problems.

Action: The stage where the agent's decisions translate to tangible results.

Agent categories

General agents: like auto-gpt
Vertical agents: data-interpreter, code-interpreter, meta-gpt, agents built by coze.

Implmentation details

Antonamy of agent in lilianweng's blog.

Components

planing: ReACT, Reflection, or that used in XAgent
tools: function tool protocol, toolkits
memory: state handling

API

High-level API: Assistant API in OpenAI.
Low-level API: Agent Protocol by autogpt

BGE-M3 Embedding support
Possible llama.cpp support chat model in guff format
initial support for parallelism: multi-instance, batching

robinqu / instinct.cpp Goto Github PK

instinct.cpp's Introduction

✨ instinct.cpp

Features

User Guides

Roadmap

instinct.cpp's People

Contributors

Stargazers

Watchers

Forkers

instinct.cpp's Issues