Giter Club home page Giter Club logo

ai-engineer-foundation / agent-protocol Goto Github PK

View Code? Open in Web Editor NEW
776.0 12.0 91.0 1.33 MB

Common interface for interacting with AI agents. The protocol is tech stack agnostic - you can use it with any framework for building agents.

Home Page: https://agentprotocol.ai

License: MIT License

JavaScript 22.72% Shell 1.36% Makefile 0.08% Python 47.94% TypeScript 17.51% Jinja 0.11% MDX 10.12% CSS 0.15%
agents ai api javascript llms openapi protocol python typescript ai-agent

agent-protocol's Introduction

agent protocol

πŸ“š Docs

You can find more info in the docs.

🧾 Summary

The AI agent space is young. Most developers are building agents in their own way. This creates a challenge: It's hard to communicate with different agents since the interface is often different every time. Because we struggle with communicating with different agents, it's also hard to compare them easily. Additionally, if we had a single communication interface with agents, it'd also make it easier developing devtools that works with agents out of the box.

We present the Agent Protocol - a single common interface for communicating with agents. Any agent developer can implement this protocol. The Agent Protocol is an API specification - list of endpoints, which the agent should expose with predefined response models. The protocol is tech stack agnostic. Any agent can adopt this protocol no matter what framework they're using (or not using).

We believe, this will help the ecosystem grow faster and simplify the integrations.

We're starting with a minimal core. We want to build upon that iteratively by learning from agent developers about what they actually need.

πŸš€ The incentives to adopt the protocol

  • Ease with which you can use the benchmarks.
  • Other people can more easily use and integrate your agent
  • Enable building general devtools (for development, deployment and monitoring) that can be built on top of this protocol
  • You don’t need to write boilerplate API and you can focus on developing your agent

🎯 Immediate goals of the protocol

Set a general simple standard that would allow for easy to use benchmarking of agents. One of the primary goals of the protocol is great developer experience, and simple implementation on the end of agent developers. You just start your agent and that’s all you have to do.

πŸ—£οΈ Request for Comments

If you'd like to propose a change or an improvement to the protocol. Please follow the RFC template.

βš™οΈ Components

The most important part. It specifies which endpoints should the agent expose. The protocol is defined in OpenAPI specification.

How does the protocol work?

Right now the protocol is defined as a REST API (via the OpenAPI spec) with two essential routes for interaction with your agent:

  • POST /ap/v1/agent/tasks for creating a new task for the agent (for example giving the agent an objective that you want to accomplish)
  • POST /ap/v1/agent/tasks/{task_id}/steps for executing one step of the defined task

It has also a few additional routes for listing the tasks, steps and downloading / uploading artifacts.

This is our implementation of the protocol. It’s a library that you can use to build your agent. You can use it, or you can implement it on your own. It’s up to you.

Using the SDK should simplify the implementation of the protocol to the bare minimum, but at the same time it shouldn't tie your hands. The goal should be to allow agent builders to build their agents and the SDK should solve the rest.

Basically it wraps your agent in a web server that allows for communication with your agent (and in between agents in the future).

This library should be used by the users of the agents. Your agent is deployed somewhere and the users of your agent can use this library to interact with your agent.

Thanks to the standard the users can try multiple agents without the need for any additional adjustments (or very minimal) in their code.

πŸ“¦ How to use the protocol

If you're an agent developer, you can use the SDK to implement the protocol. You can find more info in the docs or in the SDK folder.

πŸ€— Adoption

Engaged projects in development of agent protocol

Open-source agents and projects that have adopted Agent Protocol

πŸ“ƒ High-level future roadmap

  • Agent-to-agent communication
  • Connection to the outside world:
    • 3rd party services (= β€œAgent I/O”)
    • Authentication on behalf of users
  • Protocol Plugins
  • Is there anything missing? Please submit an RFC with a proposed feature!

πŸ’¬ Public discourse & development

  • PRs and issues are welcome!
  • Join AIEF Discord and their dedicated agent-protocol channel
  • Join Auto-GPT Discord and their dedicated agent-protocol channel
  • Join e2b Discord and their dedicated agent-protocol channel

agent-protocol's People

Contributors

dorgjelli avatar hackgoofer avatar jakubno avatar jzanecook avatar mlejva avatar nalbion avatar pwuts avatar swiftyos avatar undertone0809 avatar valentatomas avatar vignesh14052002 avatar waynehamadi avatar wilsonianb avatar youngphlo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

agent-protocol's Issues

Add 404 responses to the valid responses where you look up by task id or step id

Is your feature request related to a problem? Please describe.
The spec is incomplete on the defined behavior of what occurs when you request a task ID that doesn't exist. This is problematic as when not defined its indistinguishable from a 500 error which should have different behavior client side.

Describe the solution you'd like
I'd like 404 messages to be added as a documented response code for this case.

Describe alternatives you've considered
We can also not document it, but going forward you could expect people to use 500 in this case which shouldn't be used as there should be no errors that occur when this happens

Additional context
Add any other context or screenshots about the feature request here.

RFC: Enhanced Interface Specification for Protocol

RFC: Enhanced Interface Specification for Protocol

Feature name Interface Specification Enhancement
Author(s) Ce-CDU ([email protected]
Updated 2023-10-22

Summary

This RFC proposes an enhancement to the current interface specification for task management. While the existing specification defines interfaces, it lacks detailed descriptions of internal state changes and power responsibilities. This RFC aims to provide a more comprehensive description of these aspects to improve clarity and understanding. The objective is to address potential complexities that may arise due to implicit states within systems.

Motivation

The motivation behind this proposal is to solve the issue of ambiguity and insufficient information in the existing interface specification. This lack of clarity can lead to confusion and challenges in system implementations. It is crucial to provide a valuable problem-solving solution that benefits various users, including humans, other agents, and machines.

Agent Builders Benefit

The proposed changes will benefit agent builders by offering a more detailed and well-defined interface specification. This will lead to a clearer understanding of power responsibilities and internal state changes, ultimately resulting in more robust and predictable system behavior.

Design Proposal

The core of this proposal involves adding detailed descriptions of abstract entities and their interactions. The existing system comprises four key entities, viewed from an abstract perspective:

  • Task: Represents a task with a specific goal.
  • Step: Denotes a step within a task.
  • Action: Corresponds to an action associated with each step.
  • Artifact: Signifies a persistent resource space owned by an agent.

The proposal suggests that all interfaces should revolve around these entities, including their creation, retrieval, and listing. Additionally, the protocol should include explicit descriptions of ownership relationships between these entities, addressing the question of who should have ownership of these entities.

To illustrate the proposed changes, let's take the "Task" entity as an example. In the current system, three main participants are involved:

  • User: Interacts with the agent through a client and provides, at a minimum, the task's goal.
  • Client: Facilitates interactions between the user and the server, accepting user input.
  • Server: Provides the interface for task management.

Three primary scenarios are considered:

  1. User interacts with the client to generate a Task, complete with metadata generated by a Language Model (LLM). The Task is then created through the interface.
  2. User interacts with the client and uploads data to the server. The server uses LLM to generate metadata and create the Task.
  3. User interacts with the client, locally constructs metadata, and uploads it to the server through the protocol. The server uses the user's data to create the Task.

While the protocol's content remains consistent across these scenarios, the resulting effects and the power responsibilities of the User, Client, and Server differ, leading to potential confusion. The proposal emphasizes the need for the protocol to clearly define the roles and responsibilities of each entity and the state changes represented by the interface.

It's important to note that, in extreme scenarios, the protocol should be capable of addressing complex systems, taking into account potential future plugin system designs.

Detailed Design

These are some details in the design and situations that may need to be faced. They are provided for reference.

The following figures provide a simplified description of the relationships between User, Client, and Server, omitting some details. Although the figures are incomplete due to technical constraints, they cover the main components.

User, Client, and Server Relationship:

sequenceDiagram
title: user, client and server
participant User
participant Client
participant Server

note left of User: Human, Other Agent or Machine
note right of Server: Remote Server, but not one

User -> Client: input target and create Task
Client -> Server: build metadata and create Task
Server --> User: Task Obj: TaskID
User -> Client: Update Task Metadata
Client -> Server: Update Task Metadata
Server -> Client: Updated Task Object
Client -> User: Display Task Metadata
loop Talk
  User --> Client: Run
  Client --> Server: Talk about Step 1(start: 1)
  Server --> Client: Talk about Step 1(start: 1)
  Client -> User: Return Action and Request Run It.(User Feedback)
end
note over User, Server: Many Steps after..., Get RESULT

User, Client, and Server Interaction for Agent:

The following diagram represents the most complex scenario envisioned in the complete system. In this scenario, the author designed components in both the Client and Server for handling input and output, with metadata serving as the central element. Metadata is essential for storing state and third-party data. It is crucial to emphasize that plugins are not considered part of the protocol and should not be included in the standard interface.

graph TD
  start[User Creates Task]
  start --> clientInputPlugin[Handle User Input Client Plugin]
  clientInputPlugin --> clientMetadataBuilder[Build Metadata LLM]
  clientMetadataBuilder --> clientOutputPlugin[Handle Client Output Client Plugin]
  clientOutputPlugin --> ClientRequest[Create Client Request]
  ClientRequest --> ServerAPI[Server API]
  ServerAPI --> serverMiddleware[Server Middleware User Control, Security Checks, etc.]
  serverMiddleware --> ServerRouter[Route Server API]
  ServerRouter --> serverInputPlugin[Handle Server Input Server Plugin]
  serverInputPlugin --> serverMetadataBuilder[Build Metadata LLM]
  serverMetadataBuilder --> serverOutputPlugin[Handle Server Output Server Plugin]
  serverOutputPlugin --> database[Database Operations]
  database --> ServerResponse[Generate Server Response]

The following figure represents the internal abstract entity relationship structure of the Agent, with a focus on the interaction with the User. Please note that no additional entities should be added, as it may complicate the protocol.

The diagram below represents the internal abstract entity relationship structure of the Agent. This diagram is inspired by the implementation of Auto-GPT and simplifies some elements.

In the current Agent design, the core process involves the Language Model (LLM) generating an Action within a given context. The system then provides feedback on this Action, resulting in an output. This loop continues until a final result is obtained, which is evaluated by either humans or the LLM.

Within this structure, the key focus is on how interactions with the User occur during task execution. Notably, interactions take place when a Step generates an Action, and the User either grants permission to execute it or provides feedback. It's essential to highlight that introducing additional entities should be avoided, as they could significantly increase the complexity of the protocol.

In this design, there is a clear separation between the resources available to the system's plug-ins and those available to the Agent. System plug-ins are not involved in the LLM decision-making process, and the resources accessible to the Agent influence the LLM's decision-making outcomes.

While the Agent maintains statefulness, the structure itself is stateless. State changes are driven by the entities operated through the interface, and they do not encompass the existing context. Additionally, the design of Actions opens up the possibility of implementing active plug-ins for the Agent, allowing Steps or Actions to be executed in various locations, including User interactions (Agent feedback), the Client (local system), and the Server.

Agent Internal Abstract Entity Relationship Structure:

graph TD
start[Initialize Agent]
UserPrompt[User Prompt for Task Creation]
BuildAgentMetadata[Build Agent Metadata]
TaskResult[Is Task Result Ready?]

CreateInitStep[Create Initial Step Using Client and Server Metadata]
StepGenerateAction[Generate Actions Using LLM]
WaitUserActionInput[Await User Action or Feedback]
ActionEnd[Record Action Result in Database]
End[End of Task]

start --> UserPrompt
UserPrompt --> BuildAgentMetadata
BuildAgentMetadata --> TaskResult

TaskResult -- yes --> End
TaskResult -- no --> CreateInitStep

CreateInitStep --> StepGenerateAction
StepGenerateAction --> WaitUserActionInput
WaitUserActionInput --> ActionEnd
ActionEnd --> TaskResult

This overall system architecture highlights the need for clear delineation of responsibilities and power relationships among the User, Client, and Server in the protocol design.

Alternatives Considered

The primary alternative considered is maintaining the existing interface specification without making these enhancements. However, this alternative does not address the issue of clarity and may lead to continued confusion in system implementations.

Note on Plugin Entities

It's important to emphasize that adding plugin entities to the interface is beyond the scope of this proposal. The focus here is on enhancing the clarity of the existing interface specification and addressing issues related to entity ownership and state changes.

Compatibility

The design proposal aims to be backward compatible with existing systems. To roll out this feature, it is important to provide clear documentation and guidelines for implementing the updated interface specification. Compatibility checks with other parts of the system, such as SDK and Client SDK, will also be crucial to ensure a seamless transition.

Questions and Discussion Topics

  • How should the protocol be updated to clearly define entity ownership?
  • Are there any potential drawbacks or complexities that need to be addressed in the proposal?
  • How can the proposed changes benefit agent builders and system implementers?

The agent protocol compliance check fails for the minimal reference implementation

The compliance test suggested here: https://agentprotocol.ai/compliance fails with the below error for the minimal reference implementation https://github.com/AI-Engineers-Foundation/agent-protocol-sdk-python/blob/main/examples/minimal.py and identically for the api implementation at in PR #698 in gpt-engineer-org/gpt-engineer#698 .

It looks unsurprising that the list check fails, given that the json has both the task_ids and pagination properties.

/home/axel/Software/gpt-engineer/venv/bin/python /home/axel/Software/gpt-engineer/venv/bin/agent-protocol test --url http://127.0.0.1:8000
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.3.1, pluggy-1.3.0 -- /home/axel/Software/gpt-engineer/venv/bin/python
cachedir: .pytest_cache
rootdir: /home/axel/Software
plugins: anyio-3.7.1, asyncio-0.21.1
asyncio: mode=strict
collecting ... collected 7 items

../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_create_agent_task[http:/127.0.0.1:8000] PASSED [ 14%]
../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_list_agent_tasks_ids[http:/127.0.0.1:8000] FAILED [ 28%]
../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_get_agent_task[http:/127.0.0.1:8000] PASSED [ 42%]
../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_list_agent_task_steps[http:/127.0.0.1:8000] FAILED [ 57%]
../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_execute_agent_task_step[http:/127.0.0.1:8000] PASSED [ 71%]
../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_list_artifacts[http:/127.0.0.1:8000] PASSED [ 85%]
../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_get_agent_task_step[http:/127.0.0.1:8000] FAILED [100%]

=================================== FAILURES ===================================
_______ TestCompliance.test_list_agent_tasks_ids[http://127.0.0.1:8000] ________

self = <agent_protocol.utils.compliance.main.TestCompliance object at 0x7f6b24c05e10>
url = 'http://127.0.0.1:8000'

def test_list_agent_tasks_ids(self, url):
    response = requests.get(f"{url}/ap/v1/agent/tasks")
    assert response.status_code == 200
  assert isinstance(response.json(), list)

E AssertionError

../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py:20: AssertionError
_______ TestCompliance.test_list_agent_task_steps[http://127.0.0.1:8000] _______

self = <agent_protocol.utils.compliance.main.TestCompliance object at 0x7f6b24c06560>
url = 'http://127.0.0.1:8000'

def test_list_agent_task_steps(self, url):
    # Create task
    response = requests.post(f"{url}/ap/v1/agent/tasks", json=self.task_data)
    task_id = response.json()["task_id"]
    response = requests.get(f"{url}/ap/v1/agent/tasks/{task_id}/steps")
    assert response.status_code == 200
  assert isinstance(response.json(), list)

E AssertionError

../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py:36: AssertionError
________ TestCompliance.test_get_agent_task_step[http://127.0.0.1:8000] ________

self = <agent_protocol.utils.compliance.main.TestCompliance object at 0x7f6b24c06b00>
url = 'http://127.0.0.1:8000'

def test_get_agent_task_step(self, url):
    # Create task
    response = requests.post(f"{url}/ap/v1/agent/tasks", json=self.task_data)
    task_id = response.json()["task_id"]
    # Get steps
    response = requests.get(f"{url}/ap/v1/agent/tasks/{task_id}/steps")
  step_id = response.json()[0]

E KeyError: 0

../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py:61: KeyError
=========================== short test summary info ============================
FAILED ../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_list_agent_tasks_ids[http:/127.0.0.1:8000]
FAILED ../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_list_agent_task_steps[http:/127.0.0.1:8000]
FAILED ../gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py::TestCompliance::test_get_agent_task_step[http:/127.0.0.1:8000]
========================= 3 failed, 4 passed in 0.14s ==========================
Traceback (most recent call last):
File "/home/axel/Software/gpt-engineer/venv/bin/agent-protocol", line 8, in
sys.exit(cli())
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/cli.py", line 25, in _check_compliance
check_compliance(url, args)
File "/home/axel/Software/gpt-engineer/venv/lib/python3.10/site-packages/agent_protocol/utils/compliance/main.py", line 104, in check_compliance
assert exit_code == 0, "Your Agent API isn't compliant with the agent protocol"
AssertionError: Your Agent API isn't compliant with the agent protocol

Proposal: Cross-Platform CLI Tool for the Agent Protocol

Objective:

This proposal outlines the development of a cross-platform CLI tool aimed at simplifying interactions with the Agent Protocol. The goal is to improve user experience, streamline debugging, and facilitate automation within the agent protocol ecosystem.

Background:

The Agent Protocol involves managing tasks, steps, and artifacts, all identified by UUIDs (v4). Tasks represent high-level commands for agents, steps denote individual actions taken by agents, and artifacts correspond to files generated or used during agent operations. This proposal aims to create a user-friendly CLI application inspired by the ease of use of Docker's command-line interface.

Proposed CLI Commands:

To provide a clear vision for this tool, we propose the following CLI commands and their expected functionalities:

  1. ap-cli connect [endpoint] - Connects to the specified agent endpoint.

    • Example: $ ap-cli connect http://localhost:8000
  2. ap-cli task list - Lists tasks with their full UUIDs.

    • The response to this command (for tasks, steps, and artifacts alike) could be formatted in JSON (as the original API, to be sent to jq for scripting or etc.) or as a formatted table.
  3. ap-cli task create [options] - Creates a new task.

    • Example: $ ap-cli task create
  4. ap-cli task [task_uuid] step create --input="[input]" - Creates a new step within a task.

    • Example: $ ap-cli task 70e62fbb-0123-4567-89ab-cdef01234567 step create --input="Write 'Washington' to a file."
  5. ap-cli task [task_uuid] artifact list - Lists artifacts associated with a task.

    • Example: $ ap-cli task 70e62fbb-0123-4567-89ab-cdef01234567 artifact list

Note: If multi-agent usage becomes a necessity, the agent name from the (not yet implemented) info endpoint could be used for individual calls.
Example:

$ ap-cli connect http://localhost:8000
> Connected to MyAgent

$ ap-cli connect http://localhost:8001
> Connected to NotMyAgent

$ ap-cli MyAgent task list

Goals of the CLI Tool:

  • Enhance user experience with an intuitive CLI interface.
  • Simplify debugging and monitoring of agent tasks and steps.
  • Enable automation of agent protocol interactions for CI/CD workflows.
  • Facilitate agent protocol compliance verification and testing.

Discussion Points:

This proposal serves as a technical framework for the CLI tool's development. Your input is essential to refine the design and functionality. Please share your thoughts on the proposed commands, suggest additional features or improvements, and highlight any potential challenges or considerations.

Let's collaborate to create a powerful and user-centric CLI tool that enhances the usability and effectiveness of the Agent Protocol.

Discussion: agent-protocol command

I think this is good DX. Ultimately I don't think it matters if you need to run pip install or npm i (we might want to have brew installation in the future). The point is that you can now just run agent-protocol and it'll work without anything else.

With the npx tool, we could also instruct users to use npx agent-protocol test --url <url> [...] as well as running npm i and then running it. Speaking of brew, I think we could easily get something running on the Debian/Choco/Arch repositories as well if you wanted to head in that direction.


As far as the design of the agent-protocol command, if I'm not mistaken right now we're only looking at a single command test with the argument --url,-u <url> to run the compliance on, and no other settings, correct? (Besides I would assume, a standard --version,-v and --help,-h)

I talked about using commander.js rather than click, but for the testing is there a specific js testing framework you like to use? If I look up "best alternative to pytest for javascript" I did come across Jasmine but I'm not sure if you wanted to go with that or something more used like Mocha or Jest. I went ahead and made a github comparison, and it looks like Jest takes the cake as far as popularity goes.

For the requests in the test I'll likely use the built-in fetch unless you have an axios preference.

Does that cover all the bases? Am I missing anything?

RFC: How do you make sensitive actions on behalf of the user ?

Agent Function Protocol

Feature name Example name
Author(s) Name ([email protected])
RFC PR: None
Updated 2023-08-21

Motivation

People want to send emails with agents. But to send emails on behalf of someone how do you do it ?
So far the solution was to give secrets to the agent. The day we have more sensitive information, it's going to become a problem.

Imagine you have an agent that gives a secret to another to perform a task. At some point you end up with 10 agents reading your secrets. It's just asking for trouble. There is no way anyone will do sensitive actions with an agent (think about paying something on amazon, for example).

That's a shame because these sensitive actions are also the core of the agentic space: if the agent can only toy around with a local file system, then what's the point ?

So how do we actually give the agent the ability to do things on my behalf in my gmail account, linkedin account, amazon account ? (even bank account, let's be crazy)

Agent Builders Benefit

As agent builders, how do we do send emails on gmail, for example ? Do we all create a method for that ? Then we have to make sure our client knows where to put its api key ? And then any time we need a new action (like for example archiving an email), do we actually write this method again ?

And now imagine you want to do things on ann outlook email ? Do you also do it there ? It might have different ways to authenticate. You pretty much need to build everything in house. And we're all doing this at the moment.

Design Proposal

Ok, so instead of doing the action for the client. Let's just tell the user what we want to do. In continuous mode the client will do it automatically without human in the loop. And in manual mode it will ask user's permission to continue.

So in REST (and obviously I know we want to support more web protocols, such as graphQL and websocket), we can literally just copy OpenAI functions:

POST agent/tasks/{{task_id}}/steps
BODY
{
	"input": "Hey I want you to grow my fitness business. I am located in the U.S.A. My executive assistant's email is [email protected]."
}
RESPONSE
{
	"output": "Ok, I will send an email to your assistant to ask her to book a strategic call with https://www.acquisition.com/"
"functions": [
    {
      "name": "send_email_gmail",
      "description": "Send an email through gmail.",
      "parameters": {
        "type": "object",
        "properties": {
          "sender_email": {
            "type": "string",
            "description": "The sender's email"
          },
          "receiver_email": {
            "type": "string",
            "description": "The receiver's email"
          }
        }
      }
    }
  ]
}

And then the client decides makes this sensitive action. This assumes clients that are able to do things. This is an opportunity for us to build a python or javascript client specialized in taking actions, and make this Open Source.

We can then pretty much standardize actions.

I know we're going to have 1 million actions. but it's better than having 10 millions people all working on 10 different types of actions for their agents.

Alternatives Considered

Maybe we can give the secrets to the agent and let it do its thing ? We just let each agent creator create and maintain all these actions ? I think this is pretty hard to do and on top of that, if an agent starts having secrets it could share them to subagents, and now it's a mess.

Compatibility

It's actually backwards compatible

Extend Swagger coverage

Hi,

I'm sharing here information I shared on discord , but it would be of value to extend the API Specification / Swagger to additional use cases which I haven't seen (Apologise if I am wrong) :

  • - Swagger don't seems to support a "workspace" end-point to list productions & download productions individually
  • - Swagger don't seems to support a "workspace" end-point to list productions as a whole (or sub whole)
  • - Swagger don’t seems to have user management which would be beneficial for people working portable UI
  • - Swagger don’t seems to support more that 1 agent
  • - Swagger don't seems to offer solution to list agent
  • - Swagger don't seems to offer solution allowing X step (equivalent of AutoGPT β€˜-y N’ argument)
  • - Swagger don't seems to offer budget/consumption follow-up
  • - Swagger don't seems to offer LLM back-end list
  • - Swagger don't seems to offer LLM back-end selection
  • - Swagger don't seems to offer Agent type list
  • - Swagger don't seems to offer Agent type selection
  • - Swagger don't seems to offer BDD back-end definition (Which would be beneficial for companies willing to host their Data)

My 2 cents,

Step POST should immediately return

Is your feature request related to a problem? Please describe.

You should not have to wait on a step to complete when you request its execution. What if the connection is broken? What if it takes a long time? Etc.

Describe the solution you'd like

Step can return a status "Running" until its done.

Looking at the code as it is at the moment, it doesn't seem like it's implemented this way.

I would use the flask after_request decorator rather than awaiting the _step_handler. Just change its state from created to running then move on. You could add an option to force wait.

This way the client is not actually locked into the processing of the step.

https://flask.palletsprojects.com/en/2.3.x/api/#flask.Flask.after_request

step = await _step_handler(step)

JS SDK - Feature: Add a storage abstraction for persisting task objects and artifacts to other storage locations like redis #2

Is your feature request related to a problem? Please describe.

You can't scale out an Agent because tasks and artifacts are only ever stored in memory.

Describe the solution you'd like

There should be a storage abstraction added that lets you replace the in-memory based storage of tasks and artifacts with an external storage container like redis. I would also add the ability for the Agent developer to store their own state alongside the tasks and artifacts.

The CRUD operations around artifacts should also be abstracted:

https://github.com/AI-Engineer-Foundation/agent-protocol-sdk-js/blob/main/src/agent.ts#L355

I may want to store them in some form of blob storage versus the local disk.

Original link: AI-Engineer-Foundation/agent-protocol-sdk-js#2

JS / TS Client library

Is your feature request related to a problem? Please describe.

We want to generate as much of the code as possible from the OpenAPI. This will help us to stay up to date with the protocol and in the same time it improves development speed and limits possible bugs in implementation.

Describe the solution you'd like

  • It should be in TypeScript
  • Add makefile with generate command generate, which will regenerate the code

Additional context

I would recommend using of the OpenAPI codegens.

Feel free to come with something else

[bug] `list_agent_task*` endpoints on Python client are broken

Package: agent-protocol-client v1.0.2

The list_agent_task* endpoints on the AgentApi are broken:

  • list_agent_task_steps(task_id) returns ['steps', 'pagination'], because its _response_types_map declares the response as List[str]

  • list_agent_tasks_ids() returns ['tasks', 'pagination'], same issue as above

  • list_agent_task_artifacts(task_id) raises a parsing error, see below:

The TaskArtifactsListResponse spec prescribes a response format like { artifacts: Artifact[] }. The client library assumes it is just Artifact[], which causes a parsing error on the response of compliant endpoints.

The error occurs here:

if type(klass) == str:
if klass.startswith("List["):
sub_kls = re.match(r"List\[(.*)]", klass).group(1)
return [self.__deserialize(sub_data, sub_kls) for sub_data in data]
and is caused by ... for sub_data in data with data being an object like { 'artifacts': [...] }: it calls Artifact.parse_obj on the string artifacts, which fails.

Resulting stack trace:

    artifacts = await api_instance.list_agent_task_artifacts(task_id=task_id)
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/api_client.py:268: in __call_api
    return_data = self.deserialize(response_data, response_type)
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/api_client.py:341: in deserialize
    return self.__deserialize(data, response_type)
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/api_client.py:357: in __deserialize
    return [self.__deserialize(sub_data, sub_kls) for sub_data in data]
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/api_client.py:357: in <listcomp>
    return [self.__deserialize(sub_data, sub_kls) for sub_data in data]
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/api_client.py:378: in __deserialize
    return self.__deserialize_model(data, klass)
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/api_client.py:840: in __deserialize_model
    return klass.from_dict(data)
/home/user/.local/lib/python3.11/site-packages/agent_protocol_client/models/artifact.py:68: in from_dict
    return Artifact.parse_obj(obj)
------------------------------------------------

>   ???
E   pydantic.error_wrappers.ValidationError: 1 validation error for Artifact
E   __root__
E     Artifact expected dict not str (type=type_error)

pydantic/main.py:525: ValidationError

JS Client - OpenAPI Generator generates wrong version format and incomplete details

Is your feature request related to a problem? Please describe.
The JS Client will override the package.json and README with details that are incorrect. The expected changes are as follows:

  • The version needs to be full semantic versioning format, e.g. v1.0.0 instead of just v1
  • The package.json needs to have the repository in there, currently it is incomplete.
  • The README needs to include instructions for setting up the example, including what commands to run.

Steps to reproduce

  • You can generate the OpenAPI tool using the npm run generate:client:js in the root folder of the repository
  • Double check the outputs of packages/client/js/package.json and packages/client/js/README.md

Here is the current output of the package.json file.

{
  "name": "agent-protocol-client",
  "version": "v1",
  "description": "OpenAPI client for agent-protocol-client",
  "author": "OpenAPI-Generator",
  "repository": {
    "type": "git",
    "url": "https://github.com/GIT_USER_ID/GIT_REPO_ID.git"
  },
  "main": "./dist/index.js",
  "typings": "./dist/index.d.ts",
  "scripts": {
    "build": "tsc",
    "prepare": "npm run build"
  },
  "devDependencies": {
    "typescript": "^4.0"
  }
}

Here is the expected output of the package.json

{
  "name": "agent-protocol-client",
  "version": "v1.0.0",     <----- Version should use semantic
  "description": "Typescript Client for the Agent Protocol", <----- Description should be more specific about the agent protocol client for the npm package.
  "author": "AI Engineer Foundation",   <----- Author should be AI Engineer Foundation
  "repository": {
    "type": "git",
    "url": "https://github.com/AI-Engineer-Foundation/agent-protocol.git" <----- Repository should be the correct one
  },
  "main": "./dist/index.js",
  "typings": "./dist/index.d.ts",
  "scripts": {
    "build": "tsc",
    "prepare": "npm run build"
  },
  "devDependencies": {
    "typescript": "^4.0"
  }
}

About the README

The README should ideally include the instructions on setting up the minimal example and using the client to a base level. Reference the current (modified) README file for the added section on setting up the example.

Needs GraphQL implementation

We've received feedback from an advanced user (working on AutoPack and Beebot) that the GraphQL implementation of the protocol might suit the needs of agent developers better:

  • I needed access to the app instance. with the module-level defined app object it's super hard to do that. i needed to add CORS stuff as well as the websocket server

  • Since I am using database persistence I would've needed to keep the state of the module-level variables tasks and steps in sync with the database, which is super hard to do.

  • The handler data structures were hard to work with and it was easier to just directly plug in to my own lifecycle functions

  • My recommendation is to have a class so that an agent can create a subclass that overrides whatever functionality they need to. The class would hold the FastAPI instance- meaning that the agent themselves can pass in their own app if they need to. figuring out persistence and state is harder, but i think it's a good idea to have a proper data structure for state and make it easier for agent developers to plug in their own state systems

  • GraphQL would allow using subscriptions

The disadvantage is that GraphQL learning curve is steeper.

We should probably support both REST and GraphQL implementation but we can't do both at the same time for v1.

Proposal: Langchain Integration

Objective:

The primary advantage to be gained from integrating with Langchain is for Agent Protocol exposure/advocacy. Langchain docs and repos get a decent chunk of traffic! There is also greater potential for simple and quick interop between projects that adopt Langchain and/or the Agent Protocol already.

Additionally, the proposed Agent and AgentExecutor could be used as a reference implementation.

Background:

LangChain consists of a Python and Typescript library. There are also some other newer applications in the ecosystem such as LangSmith and LangServe. LangChain is a commonly used library that has built-in agents already.

Proposed:

TL;DR: Implement something like AgentProtocolAgent and AgentProtocolAgentExecutor, which would take an Agent Protocol API-compliant base URL. After there is a PR open for that, we can then evaluate if some other abstractions like Tool, Chain, Retriever, etc., would be worth implementing as well.

We can probably get by with following: https://github.com/langchain-ai/langchainjs/blob/main/.github/contributing/INTEGRATIONS.md
Otherwise, we contact and work with the LangChain TS and Python maintainers so that we can have a high chance of success for having a set of pull requests merged into the upstream Langchain repos. The pull requests should consider integrating with the Agent Protocol in the following ways:

  • An Agent that works by consuming Agent Protocol endpoints. This could work by extending or passing in a new parameter to the agent executor initializer functions.

  • A Tool that works by calling Agent Protocol endpoints.

  • A Retriever that gets the result of a task or otherwise consumes Agent Protocol endpoints.

  • Add documentation (there are many Integrations docs already to use as examples!) so that discoverability is improved.

Examples:

TBD

Discussion Points:

Are the proposed abstractions the best to start with and sufficient?

I am happy to try to work on this initiative, but I also have limited time and would be more effective working on the Typescript portion.

JS/TS Agent Protocol v0.2.0

Is your feature request related to a problem? Please describe.
The agent protocol spec has been updated, we need to update JS SDK to match the spec.

Describe the solution you'd like
Python SDK has been refactored quite a lot, you should imitate the python implementation .

  • We added artifacts
  • Each task has own workspace where it can save the files (this should be configurable with environment variable AGENT_WORKSPACE)
  • There' s download endpoint
  • We added db property to Agent class. Without changes it default to in memory storage, but the user should be able to change it to his preferred storage (probably some form of database).
  • The task / step handler logic has been reworked. Now the functions receive Task / Step on the input. This allows agent to run statelessly
  • Routes should be extensible, if the agent want to add another endpoint or change the default implementation, he should be able to.

JS SDK - Bug: build error

~/agent-protocol/packages/sdk/js$ npm run build

> [email protected] build
> rm -rf ./dist && tsup

CLI Building entry: src/index.ts
CLI Using tsconfig: ../../../tsconfig.json
CLI tsup v7.1.0
CLI Using tsup config: /home/brandon/agent-protocol/packages/sdk/js/tsup.config.js
CLI Target: node16
CJS Build start
CJS dist/index.js     22.84 KB
CJS dist/index.js.map 44.38 KB
CJS ⚑️ Build success in 20ms
DTS Build start
src/agent.ts(350,3): error TS2322: Type 'boolean' is not assignable to type 'Artifact[]'.
src/agent.ts(351,3): error TS18048: 'task.artifacts' is possibly 'undefined'.

OpenAPI schema validation

Afaict the OpenAPI schemas were previously validated via

but this was removed in 949a177

Should swagger-cli and openapi-format be re-added?
I've alternatively used @stoplight/spectral-cli (and prettier) before:
https://github.com/interledger/open-payments/blob/main/.github/workflows/validate-openapi.yaml

~/agent-protocol$ npx @stoplight/spectral-cli lint schemas/openapi.yml

/home/brandon/agent-protocol/schemas/openapi.yml
   2:6   warning  info-contact               Info object must have "contact" object.                        info
  11:10  warning  operation-description      Operation "description" must be present and non-empty string.  paths./ap/v1/agent/tasks.post
  35:11  warning  operation-tag-defined      Operation tags must be defined in global tags.                 paths./ap/v1/agent/tasks.post.tags[0]
  36:9   warning  operation-description      Operation "description" must be present and non-empty string.  paths./ap/v1/agent/tasks.get
  70:11  warning  operation-tag-defined      Operation tags must be defined in global tags.                 paths./ap/v1/agent/tasks.get.tags[0]
  72:9   warning  operation-description      Operation "description" must be present and non-empty string.  paths./ap/v1/agent/tasks/{task_id}.get
  98:11  warning  operation-tag-defined      Operation tags must be defined in global tags.                 paths./ap/v1/agent/tasks/{task_id}.get.tags[0]
  100:9  warning  operation-description      Operation "description" must be present and non-empty string.  paths./ap/v1/agent/tasks/{task_id}/steps.get
 146:11  warning  operation-tag-defined      Operation tags must be defined in global tags.                 paths./ap/v1/agent/tasks/{task_id}/steps.get.tags[0]
 147:10  warning  operation-description      Operation "description" must be present and non-empty string.  paths./ap/v1/agent/tasks/{task_id}/steps.post
 184:11  warning  operation-tag-defined      Operation tags must be defined in global tags.                 paths./ap/v1/agent/tasks/{task_id}/steps.post.tags[0]
  186:9  warning  operation-description      Operation "description" must be present and non-empty string.  paths./ap/v1/agent/tasks/{task_id}/steps/{step_id}.get
 224:11  warning  operation-tag-defined      Operation tags must be defined in global tags.                 paths./ap/v1/agent/tasks/{task_id}/steps/{step_id}.get.tags[0]
  226:9  warning  operation-description      Operation "description" must be present and non-empty string.  paths./ap/v1/agent/tasks/{task_id}/artifacts.get
 272:11  warning  operation-tag-defined      Operation tags must be defined in global tags.                 paths./ap/v1/agent/tasks/{task_id}/artifacts.get.tags[0]
 273:10  warning  operation-description      Operation "description" must be present and non-empty string.  paths./ap/v1/agent/tasks/{task_id}/artifacts.post
 304:11  warning  operation-tag-defined      Operation tags must be defined in global tags.                 paths./ap/v1/agent/tasks/{task_id}/artifacts.post.tags[0]
  306:9  warning  operation-description      Operation "description" must be present and non-empty string.  paths./ap/v1/agent/tasks/{task_id}/artifacts/{artifact_id}.get
 340:11  warning  operation-tag-defined      Operation tags must be defined in global tags.                 paths./ap/v1/agent/tasks/{task_id}/artifacts/{artifact_id}.get.tags[0]
 406:16    error  oas3-valid-schema-example  "example" property type must be object                         components.schemas.TaskInput.example
 454:16    error  oas3-valid-schema-example  "example" property type must be object                         components.schemas.StepInput.example
 461:16    error  oas3-valid-schema-example  "example" property type must be object,null                    components.schemas.StepOutput.example
 497:19    error  oas3-valid-schema-example  "0" property type must be object                               components.schemas.Task.allOf[1].properties.artifacts.example[0]

βœ– 23 problems (4 errors, 19 warnings, 0 infos, 0 hints)

RFC: Support for bidirectional communication (prompting the user)

The protocol currently supports making requests to an agent service. However, some agents may need to be able to communicate with the user in order to function optimally. For example:

  1. User: please buy me a new set of cutting boards
  2. AI: would you like wooden or plastic cutting boards?
  3. User: I like wood
  4. AI: searches for wooden cutting boards and places an order on Amazon

Adding a way for agents to prompt the user would greatly increase the versatility of the protocol imo.

Proposal

Two primary options:

  1. Extension of the protocol with a status awaiting_input, and a way to resolve this status with additional input for an existing task or step

  2. Extension of the task endpoint with a callback (or similar) attribute through which a client can specify a callback URL which may be polled with prompts for the user until they are resolved.
    Example:

    1. Giving the agent a task
    POST /agent/tasks
    {
      "input": "Please find a nice olive wood cutting board on Amazon and order it for me.",
      "callback_url": "https://my-service.url/agents/203820/callbacks"
    }
    1. The agent wants more info
    POST https://my-service.url/agents/203820/callbacks
    {
      "prompt": "What is your budget for this purchase?"
    }
    {
      "prompt_id": 123,
      "status": "pending",
      "created": "2023-08-23T13:49:51.141Z",
      "last_updated": "2023-08-23T13:49:51.141Z"
    }
    1. The agent polls the client until the prompt is resolved
    GET https://my-service.url/agents/203820/callbacks/123

    Responses:

    {
      "prompt_id": 123,
      "status": "pending",
      "created": "2023-08-23T13:49:51.141Z",
      "last_updated": "2023-08-23T13:49:51.141Z"
    }
    {
      "prompt_id": 123,
      "status": "resolved",
      "answer": "I don't want to spend more than €40 on this purchase",
      "created": "2023-08-23T13:49:51.141Z",
      "last_updated": "2023-08-23T13:53:12.634Z",
    }
    {
      "prompt_id": 123,
      "status": "rejected",
      "created": "2023-08-23T13:49:51.141Z",
      "last_updated": "2023-08-23T13:53:12.634Z",
    }

Alternatives

  • Extending the protocol with full chatting capabilities:
    • GET /agent/tasks/<task_id>/chats
      List chats regarding task <task_id>

    • POST /agent/tasks/<task_id>/chats
      Start a new chat regarding task <task_id>

    • POST /agent/tasks/<task_id>/chats/<chat_id>/messages
      Post a new message in an existing chat

    • GET /agent/tasks/<task_id>/chats/<chat_id>/messages
      Get all messages in a chat

    • POST /agent/tasks/<task_id>/chats/<chat_id>/close
      Close/resolve a chat

Add an Endpoint to get Agent Info

Is your feature request related to a problem? Please describe.
I've run into a case where I need to display different agent details as needed based on the agent.

Things such as:

  • Name
  • Version

Describe the solution you'd like
I would like a GET Endpoint Added to the protocol to request information about the agent

Describe alternatives you've considered
Requiring this information to be queriable outside the agent protocol is possible, but non linear

Proposal: Agent healthcheck endpoint to advance production-readiness of deployed agents

Is your feature request related to a problem? Please describe.
As it stands, there is currently no universal standard in the spec for liveness/readiness of an Agent running in the context of a system. Until I receive an vendor-Agent/model/custom-Agent bad response to a task-related request, I don't know there are problems with my Agent (within the context of the spec).

At a high level, I believe an open protocol capably applied across all Agent implementations should treat an Agent as any other service within a stack and consider things like integrating observability, events/metrics, shutdown, etc, but this proposal is limited in scope to a binary health/unhealth discovery implementation.

Describe the solution you'd like
Kubernetes actually has a great solution in the form of two health check probes, liveness and readiness. In such context, the liveness healthcheck returns either a 200 or unhealthy HTTP status like 400/500 and indicates that the service is alive, and the readiness healthcheck does the same but ensures all dependencies are also alive (such as connection to a database).

An Agent that depends on connection to a single model is arguably not dependent upon any external resources as it could be considered unalive/not healthy if there is no connection to its single LLM, but I think we could easily see a future where a single orchestrator Agent is facilitating Agent interactions between multiple models, and a truly universally applied spec needs to consider such circumstances.

An endpoint /ap/v1/agent/health_check would be a good place to capture health-related inquiries and I'd love to hear more discussion from there about:

  1. Whether liveness_and_ readiness are both requirements
  2. Whether it needs to be a concern of the spec beyond providing a dedicated endpoint to query health and can be left as an Agent-implementation concern from there

I think an implementation that lacks the ability to decipher if a non-200 response is due to a failure of my Agent to start or a bug in my implementation is a poor developer experience, so Agent-implementations will solve for this on their own and will undoubtedly differ in their implementations without a protocol-driven spec.

Describe alternatives you've considered
The alternative is mostly just not providing a way to query an Agent's healthy state, which is the current status of the Agent Protocol. Agent health is the responsibility of the Agent-specific implementation and not of the protocol, which leads to a lack of consistency and will promote vendor lock-in should Agents evolve to 3rd party SaaS tooling.

Additional context
It could be worthwhile discussion to extend this conversation to things like Agent deployment versioning /ap/v1/agent/version and other deployment state/service context discovery as well, but again, this proposal is limited in scope to solely a health check.

Use URIs for Artifacts rather than Paths

Is your feature request related to a problem? Please describe.
A path is resolved locally and can be problematic when cross platform. A URI is a universal way to find something.

If I want to upload an image right now, I’d need to have it on the same computer. With the proposal I could say that it’s in an s3 bucket or on the web as long as my agent could resolve that.

Those that like the path based system can use the file:// URI to minimize changes.

Describe the solution you'd like
All artifacts that point to paths should instead point to a URI. Those URIs can then be resolved by the agent as needed.

An additional endpoint should exist that shows what types of URIs the system is capable of resolving. Special consideration should be made for security around resolving these URIs but that is the scope of the agent itself.

The endpoints should also have a 422 Unprocessable response defined for Artifacts that contain a URI the agent cannot resolve.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

RFC: Topic Endpoint

Feature name Interface Specification Enhancement
Author(s) swiftyos ([email protected])
Updated 2023-11-02

Summary

Current agent interaction mechanisms are primarily task-oriented. They lack the conversational fluidity and persistent memory capabilities desired for more intuitive and less repetitive user engagements. This proposal introduces the "Topic Endpoint" concept to bridge this gap, facilitating more natural, chat-based interactions and enabling memory persistence across different tasks.

Motivation

The primary motivation is to enhance user experience by aligning agent interactions more closely with user expectations. Users anticipate a conversational engagement and a memory retention feature that saves them from providing the same information repetitively. By ensuring a persistent memory and a more natural interaction model, we aim to expedite task execution and enrich user-agent interaction.

Agent Builders Benefit

  • Multi-task Persistent Memory: Enable agents to remember context and information across multiple tasks, reducing the need for users to repeat information.
  • Natural Interactions: Allow users to interact with agents in a more conversational manner, without being bound strictly to task-oriented dialogues.
  • Ease of Development: Simplify the development process by not requiring overriding of the existing task concept to achieve the desired interaction model.

Design Proposal

This proposal recommends introducing new abstract entities and refining the interactions among existing entities within the system. Here are the key entities from an abstract perspective:

  • Topic: A persistent long-term concept, encapsulating a set of related tasks and interactions.
  • Interaction: A chat-like messaging mechanism allowing for back-and-forth communication with the topic, thus facilitating a more natural dialogue.
  • Task: Represents a task with a specific goal. Tasks are grouped under topics and can access shared memory within the topic.
  • Step: Denotes a specific step within a task, guiding the task towards its goal.
  • Artifact: Signifies a persistent resource space owned by an agent, wherein data relevant to a topic is stored and managed.

Interaction Flow

  1. Topic Initiation: A user initiates a topic, setting the stage for a set of related tasks.
  2. Interaction: The user engages in a chat-based interaction within the topic, providing necessary information and context.
  3. Task Execution: Based on the interaction, tasks are created and executed in sequence or parallel, as appropriate.
  4. Step Processing: Each task is broken down into steps, ensuring structured progress towards the goal.
  5. Artifact Management: Artifacts are updated and managed throughout the interaction, retaining essential information for future reference.
sequenceDiagram
title: Interaction Flow among User, Topic, Task, and Step
participant User
participant Topic
participant Task
participant Step

User -> Topic: Initiate Topic
loop Interaction Loop
    User -> Topic: Chat-based Interaction
    Topic -> Task: Derive Task from Interaction
        loop Step Execution Loop
            User -> Step: Execute Step
            Step -> Task: Update Task State
            Step -> Step: Create Artifact
            Step -> Step: Update Step State
        end
    Topic -> User: Provide Task Completion Feedback
end

Detailed Design

The proposed transition in endpoint structure is poised to encapsulate tasks within higher-level entities known as topics, shifting from the existing endpoint /ap/v1/agent/tasks/{task_id} to a more organized format /ap/v2/topics/{topic_id}/tasks/{task_id}. Additionally, a new endpoint /ap/v2/topics/{topic_id}/interactions/{interaction_id} will be added to handle interactions within those topics in a more chat like manner. The agent verb is being dropped as it adds no value. This new structure aims to facilitate a more intuitive and organized approach to handling user interactions and task management.

Endpoint Specifications

Topics

  1. Create a New Topic:

    • Endpoint: /ap/v2/topics
    • POST Method:
      • OperationId: createTopic
      • Summary: Create a new topic.
      • RequestBody:
        • Content: application/json
      • Responses:
        • '201': Successfully created a new topic.
        • '400': Bad request.
  2. List Topics:

    • Endpoint: /ap/v2/topics
    • GET Method:
      • OperationId: listTopics
      • Summary: Retrieve a list of all topics.
      • Responses:
        • '200': Successfully retrieved list of topics.
  3. Get a Specific Topic:

    • Endpoint: /ap/v2/topics/{topic_id}
    • GET Method:
      • OperationId: getTopic
      • Summary: Retrieve details of a specified topic.
      • Parameters:
        • topic_id
      • Responses:
        • '200': Successfully retrieved topic details.
        • '404': Topic not found.

Interactions

  1. Create a New Interaction:

    • Endpoint: /ap/v2/topics/{topic_id}/interactions
    • POST Method:
      • OperationId: createInteraction
      • Summary: Create a new interaction within a specified topic.
      • Parameters:
        • topic_id
      • RequestBody:
        • Content: application/json
      • Responses:
        • '201': Successfully created a new interaction.
        • '400': Bad request.
  2. List Interactions in a Topic:

    • Endpoint: /ap/v2/topics/{topic_id}/interactions
    • GET Method:
      • OperationId: listTopicInteractions
      • Summary: Retrieve a list of interactions for a specified topic.
      • Parameters:
        • topic_id
      • Responses:
        • '200': Successfully retrieved list of interactions.
  3. Get a Specific Interaction in a Topic:

    • Endpoint: /ap/v2/topics/{topic_id}/interactions/{interaction_id}
    • GET Method:
      • OperationId: getTopicInteraction
      • Summary: Retrieve details of a specified interaction within a topic.
      • Parameters:
        • topic_id, interaction_id
      • Responses:
        • '200': Successfully retrieved interaction details.
        • '404': Interaction not found.

The design of these endpoints follows a RESTful approach, ensuring a clear and organized way to interact with topics, interactions, and tasks within the new v2 structure. Each endpoint provides specific functionalities, enabling clients to manage and retrieve information efficiently.

Certainly! Here are the requested sections tailored to fit the context of your proposal on transitioning to a new endpoint structure:

Alternatives Considered

The principal alternative deliberated is retaining the current endpoint structure /ap/v1/agent/tasks/{task_id}/steps without migrating to the hierarchical structure proposed. However, this alternative falls short in addressing the emerging need for a shared memory framework and the ability to handle chat-like interactions which are imperative for spawning tasks in a more intuitive and natural manner. The current flat structure may continue to pose challenges in efficiently managing and retrieving data as the system scales.

Note on Plugin Entities

It's important to emphasize that adding plugin entities to the interface is beyond the scope of this proposal. The focus here is on enhancing the clarity of the existing interface specification and addressing issues related to entity ownership and state changes.

Compatibility

The design proposal is crafted with backward compatibility in mind to ensure a smooth transition for existing systems. It's crucial to create comprehensive documentation and guidelines to aid in the implementation of the updated endpoint structure. Ensuring compatibility with other integral components of the system, such as SDK and Client SDK, is also paramount. Adequate checks and balances need to be put in place to ascertain that the transition to the new endpoint structure /ap/v2/topics/{topic_id}/tasks/{task_id} and /ap/v2/topics/{topic_id}/interactions/{interaction_id}does not disrupt the existing functionalities while paving the way for enhanced user interactions and data organization.

Example scenarios in documentation

It would be useful to have examples of various scenarios included in the documentation:

  • Write Washington to a text file (end of task in one line)
  • Create a chat app (user begins conversation)
  • Agent starts with "Hi, would you like to book a flight or accommodation?" (#63)
  • GitHub action triggers a new task, sends Issue title (and body & comments? ) as input, agent does some tasks synchronously, also waits for user response through Slack
  • Optional additional_input/config used to provide auth token, model name, task/step type...

OpenAPI generator

Should a single tool be used to generate all of the SDK and client libraries from the OpenAPI schema?

It looks like the python client was generated using @openapitools/openapi-generator-cli

The OpenAPI Generator is a Java project. openapi-generator-cli will download the appropriate JAR file and invoke the java executable to run the OpenAPI Generator. You must have the java binary executable available on your PATH for this to work.

I'm not crazy about needing Java. Would it be ok to instead require devs to have docker (just for client generation)?
https://www.npmjs.com/package/@openapitools/openapi-generator-cli#use-docker-instead-of-running-java-locally

There's also:

I'm also wondering if there should be a single script (in the root package.json?) to regenerate all clients and sdks based on the current OpenAPI specs.
You could then have husky and/or a GitHub Actions workflow validate that everything's up to date.

Not able to install agent-protocol

Now I have been trying to install this on my system with pip install agent-protocol bu it generates the following output:
`.venv) ayushyaverma@Ayushyas-MacBook-Pro Auto-GPT % pip install agent-protocol

ERROR: Could not find a version that satisfies the requirement agent-protocol (from versions: none)
ERROR: No matching distribution found for agent-protocol
(.venv) ayushyaverma@Ayushyas-MacBook-Pro Auto-GPT % pip3 install agent-protocol

ERROR: Could not find a version that satisfies the requirement agent-protocol (from versions: none)
ERROR: No matching distribution found for agent-protocol`

Please help.

Also it's my first time raising an issue anywhere on github, sorry if this doesn't fit the standard.

Generate JS SDK from OpenAPI spec

Is your feature request related to a problem? Please describe.

We want to generate as much of the code as possible from the OpenAPI. This will help SDK to stay up to date with the protocol and in the same time it improves development speed and limits possible bugs in implementation.

Describe the solution you'd like

  • It should stay in TypeScript
  • Generated routes and types, types are bare minimum.
  • Add makefile with generate command generate, which will regenerate the code

Additional context

These could be relevant tools:

Feel free to come with something else

Plugins Feature Discussion Thread

Initially, we had discussed creating a plugin system with an Agentfile that would be placed at the root folder of the agent. However, with the addition of the info endpoint and specifically the idea of config_options within that route, what if plugins were displayed within that endpoint instead of in a separate file?

The idea here is to essentially define an external resource that we could pull that would define whatever extensions to the protocol exist, which would include detailed specs and any other relevant info.

This might require writing another spec for agent-protocol-plugin so I'm curious what the general opinion is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.