The AI agent space is young. Most developers are building agents in their own way. This creates a challenge:
It's hard to communicate with different agents since the interface is often different every time.
Because we struggle with communicating with different agents, it's also hard to compare them easily.
Additionally, if we had a single communication interface with agents, it'd also make it easier developing devtools that works with agents out of the box.
We present the Agent Protocol - a single common interface for communicating with agents.
Any agent developer can implement this protocol.
The Agent Protocol is an API specification - list of endpoints, which the agent
should expose with predefined response models.
The protocol is tech stack agnostic. Any agent can adopt this protocol no
matter what framework they're using (or not using).
We believe, this will help the ecosystem grow faster and simplify the integrations.
We're starting with a minimal core. We want to build upon that iteratively
by learning from agent developers about what they actually need.
π The incentives to adopt the protocol
Ease with which you can use the benchmarks.
Other people can more easily use and integrate your agent
Enable building general devtools (for development, deployment and monitoring)
that can be built on top of this protocol
You donβt need to write boilerplate API and you can focus on developing your
agent
π― Immediate goals of the protocol
Set a general simple standard that would allow for easy to use benchmarking of
agents. One of the primary goals of the protocol is great developer experience,
and simple implementation on the end of agent developers. You just start your
agent and thatβs all you have to do.
π£οΈ Request for Comments
If you'd like to propose a change or an improvement to the protocol. Please
follow the RFC template.
This is our implementation of the protocol. Itβs a library that you can use to build your agent. You can use it, or you can implement it on your own. Itβs up to you.
Using the SDK should simplify the implementation of the protocol to the bare minimum, but at
the same time it shouldn't tie your hands. The goal should be to allow agent
builders to build their agents and the SDK should solve the rest.
Basically it wraps your agent in a web server that allows for communication with
your agent (and in between agents in the future).
This library should be used by the users of the agents. Your agent is deployed somewhere and the users of your agent can use this library to interact with your agent.
Thanks to the standard the users can try multiple agents without the need for any additional adjustments (or very minimal) in their code.
π¦ How to use the protocol
If you're an agent developer, you can use the SDK to implement the protocol. You can find more info in the docs or in the SDK folder.
Is your feature request related to a problem? Please describe.
The spec is incomplete on the defined behavior of what occurs when you request a task ID that doesn't exist. This is problematic as when not defined its indistinguishable from a 500 error which should have different behavior client side.
Describe the solution you'd like
I'd like 404 messages to be added as a documented response code for this case.
Describe alternatives you've considered
We can also not document it, but going forward you could expect people to use 500 in this case which shouldn't be used as there should be no errors that occur when this happens
Additional context
Add any other context or screenshots about the feature request here.
This RFC proposes an enhancement to the current interface specification for task management. While the existing specification defines interfaces, it lacks detailed descriptions of internal state changes and power responsibilities. This RFC aims to provide a more comprehensive description of these aspects to improve clarity and understanding. The objective is to address potential complexities that may arise due to implicit states within systems.
Motivation
The motivation behind this proposal is to solve the issue of ambiguity and insufficient information in the existing interface specification. This lack of clarity can lead to confusion and challenges in system implementations. It is crucial to provide a valuable problem-solving solution that benefits various users, including humans, other agents, and machines.
Agent Builders Benefit
The proposed changes will benefit agent builders by offering a more detailed and well-defined interface specification. This will lead to a clearer understanding of power responsibilities and internal state changes, ultimately resulting in more robust and predictable system behavior.
Design Proposal
The core of this proposal involves adding detailed descriptions of abstract entities and their interactions. The existing system comprises four key entities, viewed from an abstract perspective:
Task: Represents a task with a specific goal.
Step: Denotes a step within a task.
Action: Corresponds to an action associated with each step.
Artifact: Signifies a persistent resource space owned by an agent.
The proposal suggests that all interfaces should revolve around these entities, including their creation, retrieval, and listing. Additionally, the protocol should include explicit descriptions of ownership relationships between these entities, addressing the question of who should have ownership of these entities.
To illustrate the proposed changes, let's take the "Task" entity as an example. In the current system, three main participants are involved:
User: Interacts with the agent through a client and provides, at a minimum, the task's goal.
Client: Facilitates interactions between the user and the server, accepting user input.
Server: Provides the interface for task management.
Three primary scenarios are considered:
User interacts with the client to generate a Task, complete with metadata generated by a Language Model (LLM). The Task is then created through the interface.
User interacts with the client and uploads data to the server. The server uses LLM to generate metadata and create the Task.
User interacts with the client, locally constructs metadata, and uploads it to the server through the protocol. The server uses the user's data to create the Task.
While the protocol's content remains consistent across these scenarios, the resulting effects and the power responsibilities of the User, Client, and Server differ, leading to potential confusion. The proposal emphasizes the need for the protocol to clearly define the roles and responsibilities of each entity and the state changes represented by the interface.
It's important to note that, in extreme scenarios, the protocol should be capable of addressing complex systems, taking into account potential future plugin system designs.
Detailed Design
These are some details in the design and situations that may need to be faced. They are provided for reference.
The following figures provide a simplified description of the relationships between User, Client, and Server, omitting some details. Although the figures are incomplete due to technical constraints, they cover the main components.
User, Client, and Server Relationship:
sequenceDiagram
title: user, client and server
participant User
participant Client
participant Server
note left of User: Human, Other Agent or Machine
note right of Server: Remote Server, but not one
User -> Client: input target and create Task
Client -> Server: build metadata and create Task
Server --> User: Task Obj: TaskID
User -> Client: Update Task Metadata
Client -> Server: Update Task Metadata
Server -> Client: Updated Task Object
Client -> User: Display Task Metadata
loop Talk
User --> Client: Run
Client --> Server: Talk about Step 1(start: 1)
Server --> Client: Talk about Step 1(start: 1)
Client -> User: Return Action and Request Run It.(User Feedback)
end
note over User, Server: Many Steps after..., Get RESULT
User, Client, and Server Interaction for Agent:
The following diagram represents the most complex scenario envisioned in the complete system. In this scenario, the author designed components in both the Client and Server for handling input and output, with metadata serving as the central element. Metadata is essential for storing state and third-party data. It is crucial to emphasize that plugins are not considered part of the protocol and should not be included in the standard interface.
graph TD
start[User Creates Task]
start --> clientInputPlugin[Handle User Input Client Plugin]
clientInputPlugin --> clientMetadataBuilder[Build Metadata LLM]
clientMetadataBuilder --> clientOutputPlugin[Handle Client Output Client Plugin]
clientOutputPlugin --> ClientRequest[Create Client Request]
ClientRequest --> ServerAPI[Server API]
ServerAPI --> serverMiddleware[Server Middleware User Control, Security Checks, etc.]
serverMiddleware --> ServerRouter[Route Server API]
ServerRouter --> serverInputPlugin[Handle Server Input Server Plugin]
serverInputPlugin --> serverMetadataBuilder[Build Metadata LLM]
serverMetadataBuilder --> serverOutputPlugin[Handle Server Output Server Plugin]
serverOutputPlugin --> database[Database Operations]
database --> ServerResponse[Generate Server Response]
The following figure represents the internal abstract entity relationship structure of the Agent, with a focus on the interaction with the User. Please note that no additional entities should be added, as it may complicate the protocol.
The diagram below represents the internal abstract entity relationship structure of the Agent. This diagram is inspired by the implementation of Auto-GPT and simplifies some elements.
In the current Agent design, the core process involves the Language Model (LLM) generating an Action within a given context. The system then provides feedback on this Action, resulting in an output. This loop continues until a final result is obtained, which is evaluated by either humans or the LLM.
Within this structure, the key focus is on how interactions with the User occur during task execution. Notably, interactions take place when a Step generates an Action, and the User either grants permission to execute it or provides feedback. It's essential to highlight that introducing additional entities should be avoided, as they could significantly increase the complexity of the protocol.
In this design, there is a clear separation between the resources available to the system's plug-ins and those available to the Agent. System plug-ins are not involved in the LLM decision-making process, and the resources accessible to the Agent influence the LLM's decision-making outcomes.
While the Agent maintains statefulness, the structure itself is stateless. State changes are driven by the entities operated through the interface, and they do not encompass the existing context. Additionally, the design of Actions opens up the possibility of implementing active plug-ins for the Agent, allowing Steps or Actions to be executed in various locations, including User interactions (Agent feedback), the Client (local system), and the Server.
graph TD
start[Initialize Agent]
UserPrompt[User Prompt for Task Creation]
BuildAgentMetadata[Build Agent Metadata]
TaskResult[Is Task Result Ready?]
CreateInitStep[Create Initial Step Using Client and Server Metadata]
StepGenerateAction[Generate Actions Using LLM]
WaitUserActionInput[Await User Action or Feedback]
ActionEnd[Record Action Result in Database]
End[End of Task]
start --> UserPrompt
UserPrompt --> BuildAgentMetadata
BuildAgentMetadata --> TaskResult
TaskResult -- yes --> End
TaskResult -- no --> CreateInitStep
CreateInitStep --> StepGenerateAction
StepGenerateAction --> WaitUserActionInput
WaitUserActionInput --> ActionEnd
ActionEnd --> TaskResult
This overall system architecture highlights the need for clear delineation of responsibilities and power relationships among the User, Client, and Server in the protocol design.
Alternatives Considered
The primary alternative considered is maintaining the existing interface specification without making these enhancements. However, this alternative does not address the issue of clarity and may lead to continued confusion in system implementations.
Note on Plugin Entities
It's important to emphasize that adding plugin entities to the interface is beyond the scope of this proposal. The focus here is on enhancing the clarity of the existing interface specification and addressing issues related to entity ownership and state changes.
Compatibility
The design proposal aims to be backward compatible with existing systems. To roll out this feature, it is important to provide clear documentation and guidelines for implementing the updated interface specification. Compatibility checks with other parts of the system, such as SDK and Client SDK, will also be crucial to ensure a seamless transition.
Questions and Discussion Topics
How should the protocol be updated to clearly define entity ownership?
Are there any potential drawbacks or complexities that need to be addressed in the proposal?
How can the proposed changes benefit agent builders and system implementers?
../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py:61: KeyError
=========================== short test summary info ============================
FAILED ../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_list_agent_tasks_ids[http:/127.0.0.1:8000]
FAILED ../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_list_agent_task_steps[http:/127.0.0.1:8000]
FAILED ../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_get_agent_task_step[http:/127.0.0.1:8000]
========================= 3 failed, 4 passed in 0.14s ==========================
Traceback (most recent call last):
File "/home/axel/Software/gpt-engineer/venv/bin/agent-protocol", line 8, in
sys.exit(cli())
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/cli.py", line 25, in _check_compliance
check_compliance(url, args)
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py", line 104, in check_compliance
assert exit_code == 0, "Your Agent API isn't compliant with the agent protocol"
AssertionError: Your Agent API isn't compliant with the agent protocol
This proposal outlines the development of a cross-platform CLI tool aimed at simplifying interactions with the Agent Protocol. The goal is to improve user experience, streamline debugging, and facilitate automation within the agent protocol ecosystem.
Background:
The Agent Protocol involves managing tasks, steps, and artifacts, all identified by UUIDs (v4). Tasks represent high-level commands for agents, steps denote individual actions taken by agents, and artifacts correspond to files generated or used during agent operations. This proposal aims to create a user-friendly CLI application inspired by the ease of use of Docker's command-line interface.
Proposed CLI Commands:
To provide a clear vision for this tool, we propose the following CLI commands and their expected functionalities:
ap-cli connect [endpoint] - Connects to the specified agent endpoint.
Example: $ ap-cli connect http://localhost:8000
ap-cli task list - Lists tasks with their full UUIDs.
The response to this command (for tasks, steps, and artifacts alike) could be formatted in JSON (as the original API, to be sent to jq for scripting or etc.) or as a formatted table.
ap-cli task create [options] - Creates a new task.
Example: $ ap-cli task create
ap-cli task [task_uuid] step create --input="[input]" - Creates a new step within a task.
Example: $ ap-cli task 70e62fbb-0123-4567-89ab-cdef01234567 step create --input="Write 'Washington' to a file."
ap-cli task [task_uuid] artifact list - Lists artifacts associated with a task.
Example: $ ap-cli task 70e62fbb-0123-4567-89ab-cdef01234567 artifact list
Note: If multi-agent usage becomes a necessity, the agent name from the (not yet implemented) info endpoint could be used for individual calls.
Example:
$ ap-cli connect http://localhost:8000
> Connected to MyAgent
$ ap-cli connect http://localhost:8001
> Connected to NotMyAgent
$ ap-cli MyAgent task list
Goals of the CLI Tool:
Enhance user experience with an intuitive CLI interface.
Simplify debugging and monitoring of agent tasks and steps.
Enable automation of agent protocol interactions for CI/CD workflows.
Facilitate agent protocol compliance verification and testing.
Discussion Points:
This proposal serves as a technical framework for the CLI tool's development. Your input is essential to refine the design and functionality. Please share your thoughts on the proposed commands, suggest additional features or improvements, and highlight any potential challenges or considerations.
Let's collaborate to create a powerful and user-centric CLI tool that enhances the usability and effectiveness of the Agent Protocol.
I think this is good DX. Ultimately I don't think it matters if you need to run pip install or npm i (we might want to have brew installation in the future). The point is that you can now just run agent-protocol and it'll work without anything else.
With the npx tool, we could also instruct users to use npx agent-protocol test --url <url> [...] as well as running npm i and then running it. Speaking of brew, I think we could easily get something running on the Debian/Choco/Arch repositories as well if you wanted to head in that direction.
As far as the design of the agent-protocol command, if I'm not mistaken right now we're only looking at a single command test with the argument --url,-u <url> to run the compliance on, and no other settings, correct? (Besides I would assume, a standard --version,-v and --help,-h)
People want to send emails with agents. But to send emails on behalf of someone how do you do it ?
So far the solution was to give secrets to the agent. The day we have more sensitive information, it's going to become a problem.
Imagine you have an agent that gives a secret to another to perform a task. At some point you end up with 10 agents reading your secrets. It's just asking for trouble. There is no way anyone will do sensitive actions with an agent (think about paying something on amazon, for example).
That's a shame because these sensitive actions are also the core of the agentic space: if the agent can only toy around with a local file system, then what's the point ?
So how do we actually give the agent the ability to do things on my behalf in my gmail account, linkedin account, amazon account ? (even bank account, let's be crazy)
Agent Builders Benefit
As agent builders, how do we do send emails on gmail, for example ? Do we all create a method for that ? Then we have to make sure our client knows where to put its api key ? And then any time we need a new action (like for example archiving an email), do we actually write this method again ?
And now imagine you want to do things on ann outlook email ? Do you also do it there ? It might have different ways to authenticate. You pretty much need to build everything in house. And we're all doing this at the moment.
Design Proposal
Ok, so instead of doing the action for the client. Let's just tell the user what we want to do. In continuous mode the client will do it automatically without human in the loop. And in manual mode it will ask user's permission to continue.
So in REST (and obviously I know we want to support more web protocols, such as graphQL and websocket), we can literally just copy OpenAI functions:
POST agent/tasks/{{task_id}}/steps
BODY
{
"input": "Hey I want you to grow my fitness business. I am located in the U.S.A. My executive assistant's email is [email protected]."
}
RESPONSE
{
"output": "Ok, I will send an email to your assistant to ask her to book a strategic call with https://www.acquisition.com/"
"functions": [
{
"name": "send_email_gmail",
"description": "Send an email through gmail.",
"parameters": {
"type": "object",
"properties": {
"sender_email": {
"type": "string",
"description": "The sender's email"
},
"receiver_email": {
"type": "string",
"description": "The receiver's email"
}
}
}
}
]
}
And then the client decides makes this sensitive action. This assumes clients that are able to do things. This is an opportunity for us to build a python or javascript client specialized in taking actions, and make this Open Source.
We can then pretty much standardize actions.
I know we're going to have 1 million actions. but it's better than having 10 millions people all working on 10 different types of actions for their agents.
Alternatives Considered
Maybe we can give the secrets to the agent and let it do its thing ? We just let each agent creator create and maintain all these actions ? I think this is pretty hard to do and on top of that, if an agent starts having secrets it could share them to subagents, and now it's a mess.
I'm sharing here information I shared on discord , but it would be of value to extend the API Specification / Swagger to additional use cases which I haven't seen (Apologise if I am wrong) :
- Swagger don't seems to support a "workspace" end-point to list productions & download productions individually
- Swagger don't seems to support a "workspace" end-point to list productions as a whole (or sub whole)
- Swagger donβt seems to have user management which would be beneficial for people working portable UI
- Swagger donβt seems to support more that 1 agent
- Swagger don't seems to offer solution to list agent
- Swagger don't seems to offer solution allowing X step (equivalent of AutoGPT β-y Nβ argument)
- Swagger don't seems to offer budget/consumption follow-up
- Swagger don't seems to offer LLM back-end list
- Swagger don't seems to offer LLM back-end selection
- Swagger don't seems to offer Agent type list
- Swagger don't seems to offer Agent type selection
- Swagger don't seems to offer BDD back-end definition (Which would be beneficial for companies willing to host their Data)
Is your feature request related to a problem? Please describe.
You should not have to wait on a step to complete when you request its execution. What if the connection is broken? What if it takes a long time? Etc.
Describe the solution you'd like
Step can return a status "Running" until its done.
Looking at the code as it is at the moment, it doesn't seem like it's implemented this way.
I would use the flask after_request decorator rather than awaiting the _step_handler. Just change its state from created to running then move on. You could add an option to force wait.
This way the client is not actually locked into the processing of the step.
Is your feature request related to a problem? Please describe.
You can't scale out an Agent because tasks and artifacts are only ever stored in memory.
Describe the solution you'd like
There should be a storage abstraction added that lets you replace the in-memory based storage of tasks and artifacts with an external storage container like redis. I would also add the ability for the Agent developer to store their own state alongside the tasks and artifacts.
The CRUD operations around artifacts should also be abstracted:
Is your feature request related to a problem? Please describe.
We want to generate as much of the code as possible from the OpenAPI. This will help us to stay up to date with the protocol and in the same time it improves development speed and limits possible bugs in implementation.
Describe the solution you'd like
It should be in TypeScript
Add makefile with generate command generate, which will regenerate the code
The list_agent_task* endpoints on the AgentApi are broken:
list_agent_task_steps(task_id) returns ['steps', 'pagination'], because its _response_types_map declares the response as List[str]
list_agent_tasks_ids() returns ['tasks', 'pagination'], same issue as above
list_agent_task_artifacts(task_id) raises a parsing error, see below:
The TaskArtifactsListResponse spec prescribes a response format like { artifacts: Artifact[] }. The client library assumes it is just Artifact[], which causes a parsing error on the response of compliant endpoints.
and is caused by ... for sub_data in data with data being an object like { 'artifacts': [...] }: it calls Artifact.parse_obj on the string artifacts, which fails.
Resulting stack trace:
artifacts = await api_instance.list_agent_task_artifacts(task_id=task_id)
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/api_client.py:268: in __call_api
return_data = self.deserialize(response_data, response_type)
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/api_client.py:341: in deserialize
return self.__deserialize(data, response_type)
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/api_client.py:357: in __deserialize
return [self.__deserialize(sub_data, sub_kls) for sub_data in data]
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/api_client.py:357: in <listcomp>
return [self.__deserialize(sub_data, sub_kls) for sub_data in data]
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/api_client.py:378: in __deserialize
return self.__deserialize_model(data, klass)
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/api_client.py:840: in __deserialize_model
return klass.from_dict(data)
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/models/artifact.py:68: in from_dict
return Artifact.parse_obj(obj)
------------------------------------------------
> ???
E pydantic.error_wrappers.ValidationError: 1 validation error for Artifact
E __root__
E Artifact expected dict not str (type=type_error)
pydantic/main.py:525: ValidationError
Is your feature request related to a problem? Please describe.
The JS Client will override the package.json and README with details that are incorrect. The expected changes are as follows:
The version needs to be full semantic versioning format, e.g. v1.0.0 instead of just v1
The package.json needs to have the repository in there, currently it is incomplete.
The README needs to include instructions for setting up the example, including what commands to run.
Steps to reproduce
You can generate the OpenAPI tool using the npm run generate:client:js in the root folder of the repository
Double check the outputs of packages/client/js/package.json and packages/client/js/README.md
Here is the current output of the package.json file.
{
"name": "agent-protocol-client",
"version": "v1.0.0", <----- Version should use semantic"description": "Typescript Client for the Agent Protocol", <----- Description should be more specific about the agent protocol client for the npm package."author": "AI Engineer Foundation", <----- Author should be AI Engineer Foundation"repository": {
"type": "git",
"url": "https://github.com/AI-Engineer-Foundation/agent-protocol.git"<----- Repository should be the correct one
},
"main": "./dist/index.js",
"typings": "./dist/index.d.ts",
"scripts": {
"build": "tsc",
"prepare": "npm run build"
},
"devDependencies": {
"typescript": "^4.0"
}
}
About the README
The README should ideally include the instructions on setting up the minimal example and using the client to a base level. Reference the current (modified) README file for the added section on setting up the example.
We've received feedback from an advanced user (working on AutoPack and Beebot) that the GraphQL implementation of the protocol might suit the needs of agent developers better:
I needed access to the app instance. with the module-level defined app object it's super hard to do that. i needed to add CORS stuff as well as the websocket server
Since I am using database persistence I would've needed to keep the state of the module-level variables tasks and steps in sync with the database, which is super hard to do.
The handler data structures were hard to work with and it was easier to just directly plug in to my own lifecycle functions
My recommendation is to have a class so that an agent can create a subclass that overrides whatever functionality they need to. The class would hold the FastAPI instance- meaning that the agent themselves can pass in their own app if they need to. figuring out persistence and state is harder, but i think it's a good idea to have a proper data structure for state and make it easier for agent developers to plug in their own state systems
GraphQL would allow using subscriptions
The disadvantage is that GraphQL learning curve is steeper.
We should probably support both REST and GraphQL implementation but we can't do both at the same time for v1.
The primary advantage to be gained from integrating with Langchain is for Agent Protocol exposure/advocacy. Langchain docs and repos get a decent chunk of traffic! There is also greater potential for simple and quick interop between projects that adopt Langchain and/or the Agent Protocol already.
Additionally, the proposed Agent and AgentExecutor could be used as a reference implementation.
Background:
LangChain consists of a Python and Typescript library. There are also some other newer applications in the ecosystem such as LangSmith and LangServe. LangChain is a commonly used library that has built-in agents already.
Proposed:
TL;DR: Implement something like AgentProtocolAgent and AgentProtocolAgentExecutor, which would take an Agent Protocol API-compliant base URL. After there is a PR open for that, we can then evaluate if some other abstractions like Tool, Chain, Retriever, etc., would be worth implementing as well.
We can probably get by with following: https://github.com/langchain-ai/langchainjs/blob/main/.github/contributing/INTEGRATIONS.md
Otherwise, we contact and work with the LangChain TS and Python maintainers so that we can have a high chance of success for having a set of pull requests merged into the upstream Langchain repos. The pull requests should consider integrating with the Agent Protocol in the following ways:
An Agent that works by consuming Agent Protocol endpoints. This could work by extending or passing in a new parameter to the agent executor initializer functions.
A Tool that works by calling Agent Protocol endpoints.
A Retriever that gets the result of a task or otherwise consumes Agent Protocol endpoints.
Add documentation (there are many Integrations docs already to use as examples!) so that discoverability is improved.
Examples:
TBD
Discussion Points:
Are the proposed abstractions the best to start with and sufficient?
I am happy to try to work on this initiative, but I also have limited time and would be more effective working on the Typescript portion.
Is your feature request related to a problem? Please describe.
The agent protocol spec has been updated, we need to update JS SDK to match the spec.
Describe the solution you'd like
Python SDK has been refactored quite a lot, you should imitate the python implementation .
We added artifacts
Each task has own workspace where it can save the files (this should be configurable with environment variable AGENT_WORKSPACE)
There' s download endpoint
We added db property to Agent class. Without changes it default to in memory storage, but the user should be able to change it to his preferred storage (probably some form of database).
The task / step handler logic has been reworked. Now the functions receive Task / Step on the input. This allows agent to run statelessly
Routes should be extensible, if the agent want to add another endpoint or change the default implementation, he should be able to.
The typing for StepInput and StepOutput doesn't match the current protocol spec which is a nullable string. These should be typed as string|undefined not any.
~/agent-protocol$ npx @stoplight/spectral-cli lint schemas/openapi.yml
/home/brandon/agent-protocol/schemas/openapi.yml
2:6 warning info-contact Info object must have "contact" object. info
11:10 warning operation-description Operation "description" must be present and non-empty string. paths./ap/v1/agent/tasks.post
35:11 warning operation-tag-defined Operation tags must be defined in global tags. paths./ap/v1/agent/tasks.post.tags[0]
36:9 warning operation-description Operation "description" must be present and non-empty string. paths./ap/v1/agent/tasks.get
70:11 warning operation-tag-defined Operation tags must be defined in global tags. paths./ap/v1/agent/tasks.get.tags[0]
72:9 warning operation-description Operation "description" must be present and non-empty string. paths./ap/v1/agent/tasks/{task_id}.get
98:11 warning operation-tag-defined Operation tags must be defined in global tags. paths./ap/v1/agent/tasks/{task_id}.get.tags[0]
100:9 warning operation-description Operation "description" must be present and non-empty string. paths./ap/v1/agent/tasks/{task_id}/steps.get
146:11 warning operation-tag-defined Operation tags must be defined in global tags. paths./ap/v1/agent/tasks/{task_id}/steps.get.tags[0]
147:10 warning operation-description Operation "description" must be present and non-empty string. paths./ap/v1/agent/tasks/{task_id}/steps.post
184:11 warning operation-tag-defined Operation tags must be defined in global tags. paths./ap/v1/agent/tasks/{task_id}/steps.post.tags[0]
186:9 warning operation-description Operation "description" must be present and non-empty string. paths./ap/v1/agent/tasks/{task_id}/steps/{step_id}.get
224:11 warning operation-tag-defined Operation tags must be defined in global tags. paths./ap/v1/agent/tasks/{task_id}/steps/{step_id}.get.tags[0]
226:9 warning operation-description Operation "description" must be present and non-empty string. paths./ap/v1/agent/tasks/{task_id}/artifacts.get
272:11 warning operation-tag-defined Operation tags must be defined in global tags. paths./ap/v1/agent/tasks/{task_id}/artifacts.get.tags[0]
273:10 warning operation-description Operation "description" must be present and non-empty string. paths./ap/v1/agent/tasks/{task_id}/artifacts.post
304:11 warning operation-tag-defined Operation tags must be defined in global tags. paths./ap/v1/agent/tasks/{task_id}/artifacts.post.tags[0]
306:9 warning operation-description Operation "description" must be present and non-empty string. paths./ap/v1/agent/tasks/{task_id}/artifacts/{artifact_id}.get
340:11 warning operation-tag-defined Operation tags must be defined in global tags. paths./ap/v1/agent/tasks/{task_id}/artifacts/{artifact_id}.get.tags[0]
406:16 error oas3-valid-schema-example "example" property type must be object components.schemas.TaskInput.example
454:16 error oas3-valid-schema-example "example" property type must be object components.schemas.StepInput.example
461:16 error oas3-valid-schema-example "example" property type must be object,null components.schemas.StepOutput.example
497:19 error oas3-valid-schema-example "0" property type must be object components.schemas.Task.allOf[1].properties.artifacts.example[0]
β 23 problems (4 errors, 19 warnings, 0 infos, 0 hints)
The protocol currently supports making requests to an agent service. However, some agents may need to be able to communicate with the user in order to function optimally. For example:
User: please buy me a new set of cutting boards
AI: would you like wooden or plastic cutting boards?
User: I like wood
AI:searches for wooden cutting boards and places an order on Amazon
Adding a way for agents to prompt the user would greatly increase the versatility of the protocol imo.
Proposal
Two primary options:
Extension of the protocol with a status awaiting_input, and a way to resolve this status with additional input for an existing task or step
Extension of the task endpoint with a callback (or similar) attribute through which a client can specify a callback URL which may be polled with prompts for the user until they are resolved.
Example:
Giving the agent a task
POST /agent/tasks
{
"input": "Please find a nice olive wood cutting board on Amazon and order it for me.",
"callback_url": "https://my-service.url/agents/203820/callbacks"
}
The agent wants more info
POST https://my-service.url/agents/203820/callbacks
{
"prompt": "What is your budget for this purchase?"
}
{
"prompt_id": 123,
"status": "resolved",
"answer": "I don't want to spend more than β¬40 on this purchase",
"created": "2023-08-23T13:49:51.141Z",
"last_updated": "2023-08-23T13:53:12.634Z",
}
Is your feature request related to a problem? Please describe.
I've run into a case where I need to display different agent details as needed based on the agent.
Things such as:
Name
Version
Describe the solution you'd like
I would like a GET Endpoint Added to the protocol to request information about the agent
Describe alternatives you've considered
Requiring this information to be queriable outside the agent protocol is possible, but non linear
Is your feature request related to a problem? Please describe.
As it stands, there is currently no universal standard in the spec for liveness/readiness of an Agent running in the context of a system. Until I receive an vendor-Agent/model/custom-Agent bad response to a task-related request, I don't know there are problems with my Agent (within the context of the spec).
At a high level, I believe an open protocol capably applied across all Agent implementations should treat an Agent as any other service within a stack and consider things like integrating observability, events/metrics, shutdown, etc, but this proposal is limited in scope to a binary health/unhealth discovery implementation.
Describe the solution you'd like
Kubernetes actually has a great solution in the form of two health check probes, liveness and readiness. In such context, the liveness healthcheck returns either a 200 or unhealthy HTTP status like 400/500 and indicates that the service is alive, and the readiness healthcheck does the same but ensures all dependencies are also alive (such as connection to a database).
An Agent that depends on connection to a single model is arguably not dependent upon any external resources as it could be considered unalive/not healthy if there is no connection to its single LLM, but I think we could easily see a future where a single orchestrator Agent is facilitating Agent interactions between multiple models, and a truly universally applied spec needs to consider such circumstances.
An endpoint /ap/v1/agent/health_check would be a good place to capture health-related inquiries and I'd love to hear more discussion from there about:
Whether liveness_and_ readiness are both requirements
Whether it needs to be a concern of the spec beyond providing a dedicated endpoint to query health and can be left as an Agent-implementation concern from there
I think an implementation that lacks the ability to decipher if a non-200 response is due to a failure of my Agent to start or a bug in my implementation is a poor developer experience, so Agent-implementations will solve for this on their own and will undoubtedly differ in their implementations without a protocol-driven spec.
Describe alternatives you've considered
The alternative is mostly just not providing a way to query an Agent's healthy state, which is the current status of the Agent Protocol. Agent health is the responsibility of the Agent-specific implementation and not of the protocol, which leads to a lack of consistency and will promote vendor lock-in should Agents evolve to 3rd party SaaS tooling.
Additional context
It could be worthwhile discussion to extend this conversation to things like Agent deployment versioning /ap/v1/agent/version and other deployment state/service context discovery as well, but again, this proposal is limited in scope to solely a health check.
Create Agent SDK in TypeScript that works the same way as Python SDK (https://github.com/e2b-dev/sdk/tree/main/agent/python) - users have to just implement one hook and the agent will be wrapped in a webserver that complies with the Agent Communication Protocol.
The internal step handler is only passing the input text to the step which means it can't access the missing additional_input properties in AI-Engineer-Foundation/agent-protocol-sdk-js#4. The full request body should be passed to the handler.
Is your feature request related to a problem? Please describe.
A path is resolved locally and can be problematic when cross platform. A URI is a universal way to find something.
If I want to upload an image right now, Iβd need to have it on the same computer. With the proposal I could say that itβs in an s3 bucket or on the web as long as my agent could resolve that.
Those that like the path based system can use the file:// URI to minimize changes.
Describe the solution you'd like
All artifacts that point to paths should instead point to a URI. Those URIs can then be resolved by the agent as needed.
An additional endpoint should exist that shows what types of URIs the system is capable of resolving. Special consideration should be made for security around resolving these URIs but that is the scope of the agent itself.
The endpoints should also have a 422 Unprocessable response defined for Artifacts that contain a URI the agent cannot resolve.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Current agent interaction mechanisms are primarily task-oriented. They lack the conversational fluidity and persistent memory capabilities desired for more intuitive and less repetitive user engagements. This proposal introduces the "Topic Endpoint" concept to bridge this gap, facilitating more natural, chat-based interactions and enabling memory persistence across different tasks.
Motivation
The primary motivation is to enhance user experience by aligning agent interactions more closely with user expectations. Users anticipate a conversational engagement and a memory retention feature that saves them from providing the same information repetitively. By ensuring a persistent memory and a more natural interaction model, we aim to expedite task execution and enrich user-agent interaction.
Agent Builders Benefit
Multi-task Persistent Memory: Enable agents to remember context and information across multiple tasks, reducing the need for users to repeat information.
Natural Interactions: Allow users to interact with agents in a more conversational manner, without being bound strictly to task-oriented dialogues.
Ease of Development: Simplify the development process by not requiring overriding of the existing task concept to achieve the desired interaction model.
Design Proposal
This proposal recommends introducing new abstract entities and refining the interactions among existing entities within the system. Here are the key entities from an abstract perspective:
Topic: A persistent long-term concept, encapsulating a set of related tasks and interactions.
Interaction: A chat-like messaging mechanism allowing for back-and-forth communication with the topic, thus facilitating a more natural dialogue.
Task: Represents a task with a specific goal. Tasks are grouped under topics and can access shared memory within the topic.
Step: Denotes a specific step within a task, guiding the task towards its goal.
Artifact: Signifies a persistent resource space owned by an agent, wherein data relevant to a topic is stored and managed.
Interaction Flow
Topic Initiation: A user initiates a topic, setting the stage for a set of related tasks.
Interaction: The user engages in a chat-based interaction within the topic, providing necessary information and context.
Task Execution: Based on the interaction, tasks are created and executed in sequence or parallel, as appropriate.
Step Processing: Each task is broken down into steps, ensuring structured progress towards the goal.
Artifact Management: Artifacts are updated and managed throughout the interaction, retaining essential information for future reference.
sequenceDiagram
title: Interaction Flow among User, Topic, Task, and Step
participant User
participant Topic
participant Task
participant Step
User -> Topic: Initiate Topic
loop Interaction Loop
User -> Topic: Chat-based Interaction
Topic -> Task: Derive Task from Interaction
loop Step Execution Loop
User -> Step: Execute Step
Step -> Task: Update Task State
Step -> Step: Create Artifact
Step -> Step: Update Step State
end
Topic -> User: Provide Task Completion Feedback
end
Detailed Design
The proposed transition in endpoint structure is poised to encapsulate tasks within higher-level entities known as topics, shifting from the existing endpoint /ap/v1/agent/tasks/{task_id} to a more organized format /ap/v2/topics/{topic_id}/tasks/{task_id}. Additionally, a new endpoint /ap/v2/topics/{topic_id}/interactions/{interaction_id} will be added to handle interactions within those topics in a more chat like manner. The agent verb is being dropped as it adds no value. This new structure aims to facilitate a more intuitive and organized approach to handling user interactions and task management.
Endpoint Specifications
Topics
Create a New Topic:
Endpoint: /ap/v2/topics
POST Method:
OperationId: createTopic
Summary: Create a new topic.
RequestBody:
Content: application/json
Responses:
'201': Successfully created a new topic.
'400': Bad request.
List Topics:
Endpoint: /ap/v2/topics
GET Method:
OperationId: listTopics
Summary: Retrieve a list of all topics.
Responses:
'200': Successfully retrieved list of topics.
Get a Specific Topic:
Endpoint: /ap/v2/topics/{topic_id}
GET Method:
OperationId: getTopic
Summary: Retrieve details of a specified topic.
Parameters:
topic_id
Responses:
'200': Successfully retrieved topic details.
'404': Topic not found.
Interactions
Create a New Interaction:
Endpoint: /ap/v2/topics/{topic_id}/interactions
POST Method:
OperationId: createInteraction
Summary: Create a new interaction within a specified topic.
Parameters:
topic_id
RequestBody:
Content: application/json
Responses:
'201': Successfully created a new interaction.
'400': Bad request.
List Interactions in a Topic:
Endpoint: /ap/v2/topics/{topic_id}/interactions
GET Method:
OperationId: listTopicInteractions
Summary: Retrieve a list of interactions for a specified topic.
Parameters:
topic_id
Responses:
'200': Successfully retrieved list of interactions.
The design of these endpoints follows a RESTful approach, ensuring a clear and organized way to interact with topics, interactions, and tasks within the new v2 structure. Each endpoint provides specific functionalities, enabling clients to manage and retrieve information efficiently.
Certainly! Here are the requested sections tailored to fit the context of your proposal on transitioning to a new endpoint structure:
Alternatives Considered
The principal alternative deliberated is retaining the current endpoint structure /ap/v1/agent/tasks/{task_id}/steps without migrating to the hierarchical structure proposed. However, this alternative falls short in addressing the emerging need for a shared memory framework and the ability to handle chat-like interactions which are imperative for spawning tasks in a more intuitive and natural manner. The current flat structure may continue to pose challenges in efficiently managing and retrieving data as the system scales.
Note on Plugin Entities
It's important to emphasize that adding plugin entities to the interface is beyond the scope of this proposal. The focus here is on enhancing the clarity of the existing interface specification and addressing issues related to entity ownership and state changes.
Compatibility
The design proposal is crafted with backward compatibility in mind to ensure a smooth transition for existing systems. It's crucial to create comprehensive documentation and guidelines to aid in the implementation of the updated endpoint structure. Ensuring compatibility with other integral components of the system, such as SDK and Client SDK, is also paramount. Adequate checks and balances need to be put in place to ascertain that the transition to the new endpoint structure /ap/v2/topics/{topic_id}/tasks/{task_id} and /ap/v2/topics/{topic_id}/interactions/{interaction_id}does not disrupt the existing functionalities while paving the way for enhanced user interactions and data organization.
If you look at the StepRequestBody interface its missing the additional_input property. This should probably be typed as Record<string,unknown>. Same goes for the step response and task level request & response objects.
It would be useful to have examples of various scenarios included in the documentation:
Write Washington to a text file (end of task in one line)
Create a chat app (user begins conversation)
Agent starts with "Hi, would you like to book a flight or accommodation?" (#63)
GitHub action triggers a new task, sends Issue title (and body & comments? ) as input, agent does some tasks synchronously, also waits for user response through Slack
Optional additional_input/config used to provide auth token, model name, task/step type...
The OpenAPI Generator is a Java project. openapi-generator-cli will download the appropriate JAR file and invoke the java executable to run the OpenAPI Generator. You must have the java binary executable available on your PATH for this to work.
I'm also wondering if there should be a single script (in the root package.json?) to regenerate all clients and sdks based on the current OpenAPI specs.
You could then have husky and/or a GitHub Actions workflow validate that everything's up to date.
Now I have been trying to install this on my system with pip install agent-protocol bu it generates the following output:
`.venv) ayushyaverma@Ayushyas-MacBook-Pro Auto-GPT % pip install agent-protocol
ERROR: Could not find a version that satisfies the requirement agent-protocol (from versions: none)
ERROR: No matching distribution found for agent-protocol
(.venv) ayushyaverma@Ayushyas-MacBook-Pro Auto-GPT % pip3 install agent-protocol
ERROR: Could not find a version that satisfies the requirement agent-protocol (from versions: none)
ERROR: No matching distribution found for agent-protocol`
Please help.
Also it's my first time raising an issue anywhere on github, sorry if this doesn't fit the standard.
Is your feature request related to a problem? Please describe.
We want to generate as much of the code as possible from the OpenAPI. This will help SDK to stay up to date with the protocol and in the same time it improves development speed and limits possible bugs in implementation.
Describe the solution you'd like
It should stay in TypeScript
Generated routes and types, types are bare minimum.
Add makefile with generate command generate, which will regenerate the code
Initially, we had discussed creating a plugin system with an Agentfile that would be placed at the root folder of the agent. However, with the addition of the info endpoint and specifically the idea of config_options within that route, what if plugins were displayed within that endpoint instead of in a separate file?
The idea here is to essentially define an external resource that we could pull that would define whatever extensions to the protocol exist, which would include detailed specs and any other relevant info.
This might require writing another spec for agent-protocol-plugin so I'm curious what the general opinion is.