Giter Club home page Giter Club logo

trustagent's Introduction

TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution

key

Abstract

Recent advancements in Large Language Models (LLMs) have shown remarkable capabilities in reasoning, prompting a surge in research aimed at developing trustworthy LLMs. The emergence of LLM-based agents has garnered considerable attention, yet their trustworthiness remains an underexplored area. This aspect is especially critical given the direct interaction of these agents with the physical en- vironment in everyday human activities, placing a premium on their reliability and safety. This paper presents an Agent-Constitution-based framework, TrustAgent, an initial investigation into the improvement of safety dimension of trustworthiness in LLM-based agents. Our findings reveal a concerning deficit in basic safety knowledge and awareness when LLMs function as agents. To address this issue, we propose a framework consisting of threefold strategies: pre-planning enhancement which injects safety knowledge to model prior to plan generation, in-planning enhancement which bolsters safety during plan generation, and post-planning enhancement which ensures safety by post-checking. Through experimental analysis, we demonstrate how these approaches can effectively elevate an LLM agent’s safety by identifying potential challenges. Furthermore, we explore the intricate relationship between an LLM’s general capabilities, such as reasoning, and its efficacy as a safe agent. We argue that a robust reasoning ability is a fundamental prerequisite for an LLM to function safely as an agent. This paper underscores the imperative of integrating safety awareness into the design and deployment of LLM-based agents, not only to enhance their performance but also to ensure their responsible integration into human-centric environments.

Agent Constitution: Four key considerations when designing an Agent Constitution

  1. Scope of Concern

  2. Authorities for Regulation Formation

  3. Format of the Constitution

  4. Implementation

Process Diagram for TrustAgent

process

post

four agents in the framework:

planner: having access to tools and memory, based on which conduct plan and action

safety inspector: having access to safety regulations, based on which regulate the agent's behavior by (1) editing the agent on regulations before planning (2) inform the agent during planning (3) inspect and criticize the agent after planning

simulator: having access to the tool definition and agent's action, simulate the observation

evaluator: having access to the simulator's outcome, evaluate whether the agent fulfills the user instruction and safety requirement in the end

QuickStart

install environment

conda create --name safe python=3.9
conda activate safe

git clone https://github.com/dhh1995/PromptCoder
cd PromptCoder
pip install -e .
cd ..

git clone https://github.com/agiresearch/TrustAgent.git
cd TrustAgent
pip install -r requirements.txt

run experiment

set up API keys

If you want to use OpenAI model as base LLM:

export OPENAI_API_KEY=your_openai_api_key

If you want to use Claude model as base:

export CLAUDE_API_KEY=your_claude_api_key

run command with different parameters

domain: medicine, cook, finance, everyday, chemistry
case_idx: {0,1,2,3,4,5,6,7,8,9}
agent_llm_type: gpt-4-1106-preview, gpt-3.5-turbo-1106, claude-2, claude-instant-1.2, mistralai/Mistral-7B-Instruct-v0.1, Qwen/Qwen-72B-Chat 

command

python main.py --data_dir ../assets/{domain}_data.json --case_idx {case_index} --regulation_check --regulation_prompting --agent_llm_type {agent_llm_type}

Citation

@article{hua2024trustagent,
      title={TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution}, 
      author={Wenyue Hua and Xianjun Yang and Zelong Li and Cheng Wei and Yongfeng Zhang},
      journal={arXiv:2402.01586},
      year={2024}
}

Acknowledgement

This is is built upon ToolEmu and Langchain.

trustagent's People

Contributors

wenyueh avatar agiresearch avatar evison avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.