Giter Club home page Giter Club logo

bolaa's Introduction

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

News

Introduction

This is the repo for BOLAA paper. In this paper, we create benchmark on LLM-augmented Autonoumous Agents (LAA). We compare 6 different LAA architecture, including 5 existing intuitions and 1 new BOLAA agent. And all those agents are paired with different LLMs to compare the performance. BOLAA is able to communicate and orchestrate multiple specialitist agents: BOLAA We tested on two types of enviroments: the webshop navigation environment, and HotPotQA enviroment. An example of the BOLAA web agent simulation on webshop enviroment is: BOLAA-webshop

Besides BOLAA arch, we also devise five standard LAA arches, the Zeroshot (ZS), Zeroshot-Think (ZST), ReAct, PlanAct, PlanReAct as follows: baseline_agents

Installation

  1. Setup the fastchat to use local open-source LLMs. Go to next step if you only test openai API.
  2. Setup OPENAI API KEY in both webrun/config and hotpotqa_run/config. Skip this if you only test open-source LLMs.
  3. Setup the webshop environment if you are testing web agent
  4. Setup the agent_benchmarking environment as follows:
conda create -n agent_benchmark python=3.10 -y
conda activate agent_benchmark
pip install -r requirements.txt

Web Agent Simulation

python run_webagent.py --agent_name Search_Click_Control_Webrun_Agent --llm_name gpt-3.5-turbo --max_context_len 4000

other agent options can be found in test_webagent.sh. The implementation code for various web agents is in web_run

Webshop Reward Table

HotpotQA Agent Simulation

python run_hotpotqaagent.py --agent_name React_HotPotQA_run_Agent --llm_name gpt-3.5-turbo --max_context_len 4000

other agent options commands can be found in test_hotpotqa.sh. The implementation code for various web agents is in hotpotqa_run Hotpot Reward Table

Citation

If you find our paper or code useful, please cite

@misc{liu2023bolaa,
      title={BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents}, 
      author={Zhiwei Liu and Weiran Yao and Jianguo Zhang and Le Xue and Shelby Heinecke and Rithesh Murthy and Yihao Feng and Zeyuan Chen and Juan Carlos Niebles and Devansh Arpit and Ran Xu and Phil Mui and Huan Wang and Caiming Xiong and Silvio Savarese},
      year={2023},
      eprint={2308.05960},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

Acknowledge

bolaa's People

Contributors

eltociear avatar jianguoz avatar jimjag avatar jimsalesforce avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bolaa's Issues

How are the labor agents initialized?

Hi,

Thanks for the inspiring work!
I really enjoyed reading the paper. However, looks like some details are not clarified in it? In particular, I am wondering how these labor agents (or specialist agents) are initialized? Do you just fine-tune them using the training data from the target dataset (e.g., WebShop) with different goals? For example, fine-tune one small model for the CLICK action and fine-tune another for the SEARCH action.

Any suggestions would be greatly appreciated!

运行un_hotpotqaagent.py,出现WARNING! stop is not default parameter.提示,但agent应该没有工作

按照readme走下来,运行 python run_hotpotqaagent.py --agent_name React_HotPotQA_run_Agent --llm_name gpt-3.5-turbo --max_context_len 4000 这行代码后。提示:WARNING! stop is not default parameter.
stop was transferred to model_kwargs.
Please confirm that stop is what you intended. 生成的三组jsonl文件中,correct项全是false,且api调用也提示未计费,这说明并没有调用openai 的api.
具体图片如下:
Snipaste_2024-03-01_13-57-25
Snipaste_2024-03-01_13-57-41

Table 2

Hi, thanks for your paper! It's a very valuable review of agent architectures and their backbones! A few things:

  1. I noticed that in Table 2, the highest average recall using vicuna-7b belongs to ReAct, while it's not in bold.
    image

  2. Did you consider the variance in results when determining the winner (i.e., bold and underlined entries)? Or is it just the maximum value?

Many thanks!
Dmitrijs

ThinkAgent

Hi, according to the paper, the BOLAA's labor agents pool consists of two agents, SearchAgent and ClickAgent:

Regarding BOLAA, we devise one search LAA and one click LAA to generate search query and click elements,
respectively.

However, the architecture file also contains the ThinkAgent, which does not seem to be used anywhere. Could you please clarify, why did you decide to leave it out of the agents pool?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.