Giter Club home page Giter Club logo

awesome-tool-llm's Introduction

🛠️ Awesome LMs with Tools

Awesome PRs Welcome arXiv

Language models (LMs) are powerful yet mostly for text-generation tasks. Tools have substantially enhanced their performance for tasks that require complex skills.

Based on our recent survey about LM-used tools, "What Are Tools Anyway? A Survey from the Language Model Perspective", we provide a structured list of literature relevant to tool-augmented LMs.

  • Tool basics ($\S2$)
  • Tool use paradigm ($\S3$)
  • Scenarios ($\S4$)
  • Advanced methods ($\S5$)
  • Evaluation ($\S6$)

If you find our paper or code useful, please cite the paper:

@article{wang2022what,
  title={What Are Tools Anyway? A Survey from the Language Model Perspective},
  author={Zhiruo Wang, Zhoujun Cheng, Hao Zhu, Daniel Fried, Graham Neubig},
  journal={arXiv preprint arXiv:2403.15452},
  year={2024}
}

$\S2$ Tool Basics

$\S2.1$ What are tools? 🛠️

  • Definition and discussion of animal-used tools

    Animal tool behavior: the use and manufacture of tools by animals Shumaker, Robert W., Kristina R. Walkup, and Benjamin B. Beck. 2011 [Book]

  • Early discussions on LM-used tools

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin, Yujia, et al. 2023.07 [Paper]

  • A survey on augmented LMs, including tool augmentation

    Augmented Language Models: a Survey Mialon, Grégoire, et al. 2023.02 [Paper]

$\S2.3$ Tools and "Agents" 🤖

  • Definition of agents

    Artificial intelligence a modern approach Russell, Stuart J., and Peter Norvig. 2016 [Book]

  • Survey about agents that perceive and act in the environment

    The Rise and Potential of Large Language Model Based Agents: A Survey Xi, Zhiheng, et al. 2023.09 [Preprint]

  • Survey about the cognitive architectures for language agents

    Cognitive Architectures for Language Agents Sumers, Theodore R., et al. 2023.09 [Paper]

$\S3$ The basic tool use paradigm

  • Early works that set up the commonly used tooling paradigm

    Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 [Paper]

Inference-time prompting

  • Provide in-context examples for tool-using on visual programming problems

    Visual Programming: Compositional visual reasoning without training Gupta, Tanmay, and Aniruddha Kembhavi. 2023 [Paper]

  • Tool learning via in-context examples on reasoning problems involving text or multi-modal inputs

    Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models Lu, Pan, et al. 2024 [Paper]

  • In-context learning based tool using for reasoning problems in BigBench and MMLU

    ART: Automatic multi-step reasoning and tool-use for large language models Paranjape, Bhargavi, et al. 2023.03 [Preprint]

  • Providing tool documentation for in-context tool learning

    Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models Hsieh, Cheng-Yu, et al. 2023.08 [Preprint]

Learning by training

  • Training on human annotated examples of (NL input, tool-using solution output) pairs

    API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs Li, Minghao, et al. 2023.12 [Paper]

    Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems Kadlčík, Marek, et al. 2023 [Paper]

  • Training on model-synthesized examples

    ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang, Qiaoyu, et al. 2023.06 [Preprint]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin, Yujia, et al. 2023.07 [Paper]

    MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use Huang, Yue, et al. 2023.10 [Paper]

    Making Language Models Better Tool Learners with Execution Feedback Qiao, Shuofei, et al. 2023.05 [Preprint]

    LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error Wang, Boshi, et al. 2024.03 [Preprint]

  • Self-training with bootstrapped examples

    Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 Paper

$\S4$ Scenarios

Knowledge access 📚

  • Collect data from structured knowledge sources, e.g., databases, knowledge graphs, etc.

    LaMDA: Language Models for Dialog Applications Thoppilan, Romal, et al. 2022.01 [Paper]

    TALM: Tool Augmented Language Models Parisi, Aaron, Yao Zhao, and Noah Fiedel. 2022.05 [Preprint]

    ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings Hao, Shibo, et al. 2024 [Paper]

    ToolQA: A Dataset for LLM Question Answering with External Tools Zhuang, Yuchen, et al. 2024 [Paper]

    Middleware for LLMs: Tools are Instrumental for Language Agents in Complex Environments Gu, Yu, et al. 2024 [Paper]

    GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information Jin, Qiao, et al. 2024 [Paper]

  • Search information from the web

    Internet-augmented language models through few-shot prompting for open-domain question answering Lazaridou, Angeliki, et al. 2022.03 [Paper]

    Internet-Augmented Dialogue Generation Komeili, Mojtaba, Kurt Shuster, and Jason Weston. 2022 [Paper]

  • Viewing retrieval models as tools under the retrieval-augmented generation context

    Retrieval-based Language Models and Applications Asai, Akari, et al. 2023 [Tutorial]

    Augmented Language Models: a Survey Mialon, Grégoire, et al. 2023.02 [Paper]

Computation activities 🔣

  • Using calculator for math calculations

    Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 [Paper]

    Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems Kadlčík, Marek, et al. 2023 [Paper]

  • Using programs/Python interpreter to perform more complex operations

    Pal: Program-aided language models Gao, Luyu, et al. 2023 [Paper]

    Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks Chen, Wenhu, et al. 2022.11 [Paper]

    Mint: Evaluating llms in multi-turn interaction with tools and language feedback Wang, Xingyao, et al. 2023.09 [Paper]

    MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning Das, Debrup, et al. 2024 [Paper]

    ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Gou, Zhibin, et al. 2023.09 [Paper]

  • Tools for more advanced business activities, e.g., financial, medical, education, etc.

    On the Tool Manipulation Capability of Open-source Large Language Models Xu, Qiantong, et al. 2023.05 [Paper]

    ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang, Qiaoyu, et al. 2023.06 [Preprint]

    Mint: Evaluating llms in multi-turn interaction with tools and language feedback Wang, Xingyao, et al. 2023.09 [Paper]

    AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning Jin, Qiao, et al. 2024.02 [Paper]

Interaction with the world 🌐

  • Access real-time or real-world information such as weather, location, etc.

    On the Tool Manipulation Capability of Open-source Large Language Models Xu, Qiantong, et al. 2023.05 [Paper]

    ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang, Qiaoyu, et al. 2023.06 [Preprint]

  • Managing personal events such as calendar or emails

    Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 [Paper]

  • Tools in embodied environments, e.g., the Minecraft world

    Voyager: An Open-Ended Embodied Agent with Large Language Models Wang, Guanzhi, et al. 2023.05 [Paper]

  • Tools interacting with the physical world

    ProgPrompt: Generating Situated Robot Task Plans using Large Language Models Singh, Ishika, et al. 2023 [Paper]

    Alfred: A benchmark for interpreting grounded instructions for everyday tasks Shridhar, Mohit, et al. 2020 [Paper]

    Autonomous chemical research with large language models Boiko, Daniil A., et al. 2023 [Paper]

Non-textual modalities 🎞️

  • Tools providing access to information in non-textual modalities

    Vipergpt: Visual inference via python execution for reasoning Surís, Dídac, Sachit Menon, and Carl Vondrick. 2023 [Paper]

    MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action Yang, Zhengyuan, et al. 2023.03 [Preprint]

    AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn Gao, Difei, et al. 2023.06 [Preprint]

  • Tools that can answer questions about data in other modalities

    Visual Programming: Compositional visual reasoning without training Gupta, Tanmay, and Aniruddha Kembhavi. 2023 [Paper]

Special-skilled models 🤗

  • Text-generation models that can perform specific tasks, e.g., question answering, machine translation

    Toolformer: Language Models Can Teach Themselves to Use Tools Schick, Timo, et al. 2024 [Paper]

    ART: Automatic multi-step reasoning and tool-use for large language models Paranjape, Bhargavi, et al. 2023.03 [Preprint]

  • Integration of available models on Huggingface, TorchHub, TensorHub, etc.

    HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face Shen, Yongliang, et al. 2024 [Paper]

    Gorilla: Large language model connected with massive apis Patil, Shishir G., et al. 2023.05 [Paper]

    Taskbench: Benchmarking large language models for task automation Shen, Yongliang, et al. 2023.11 [Paper]

$\S5$ Advanced methods

$\S5.1$ Complex tool selection and usage 🧐

  • Train retrievers that map natural language instructions to tool documentation

    DocPrompting: Generating Code by Retrieving the Docs Zhou, Shuyan, et al. 2022.07 [Paper]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin, Yujia, et al. 2023.07 [Paper]

  • Ask LMs to write hypothetical tool descriptions and search relevant tools

    CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets Yuan, Lifan, et al. 2023.09 [Paper]

  • Complex tool usage, e.g., parallel calls

    Function Calling and Other API Updates Eleti, Atty, et al. 2023.06 [Blog]

    An LLM Compiler for Parallel Function Calling Kim, Sehoon, et al. 2023.12 [Paper]

$\S5.2$ Tools in programmatic contexts 👩‍💻

  • Domain-specific logical forms to query structured data

    Semantic parsing on freebase from question-answer pairs Berant, Jonathan, et al. 2013 [Paper]

    Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task Yu, Tao, et al. 2018.09 [Paper]

    Break It Down: A Question Understanding Benchmark Wolfson, Tomer, et al. 2020 [Paper]

  • Domain-specific actions for agentic tasks such as web navigation

    Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration Liu, Evan Zheran, et al. 2018.02 [Paper]

    WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents Yao, Shunyu, et al. 2022.07 [Paper]

    Webarena: A realistic web environment for building autonomous agents Zhou, Shuyan, et al. 2023.07 [Paper]

  • Using external Python libraries as tools

    ToolCoder: Teach Code Generation Models to use API search tools Zhang, Kechi, et al. 2023.05 [Paper]

  • Using expert designed functions as tools to answer questions about images

    Visual Programming: Compositional visual reasoning without training Gupta, Tanmay, and Aniruddha Kembhavi. 2023 [Paper]

    Vipergpt: Visual inference via python execution for reasoning Surís, Dídac, Sachit Menon, and Carl Vondrick. 2023 [Paper]

  • Using GPT as a tool to query external Wikipedia knowledge for table-based question answering

    Binding Language Models in Symbolic Languages Cheng, Zhoujun, et al. 2022.10 [Paper]

  • Incorporate QA API and operation APIs to assist table-based question answering

    API-Assisted Code Generation for Question Answering on Varied Table Structures Cao, Yihan, et al. 2023.12 [Paper]

$\S5.3$ Tool creation and reuse 👩‍🔬

  • Approaches to abstract libraries for domain-specific logical forms from a large corpus

    DreamCoder: growing generalizable, interpretable knowledge with wake--sleep Bayesian program learning Ellis, Kevin, et al. 2020.06 [Paper]

    Leveraging Language to Learn Program Abstractions and Search Heuristics] Wong, Catherine, et al. 2021 [Paper]

    Top-Down Synthesis for Library Learning Bowers, Matthew, et al. 2023 [Paper]

    LILO: Learning Interpretable Libraries by Compressing and Documenting Code Grand, Gabriel, et al. 2023.10 [Paper]

  • Make and learn skills (Java programs) in the embodied Minecraft world

    Voyager: An Open-Ended Embodied Agent with Large Language Models Wang, Guanzhi, et al. 2023.05 [Paper]

  • Leverage LMs as tool makers on BigBench tasks

    Large Language Models as Tool Makers Cai, Tianle, et al. 2023.05 [Preprint]

  • Create tools for math and table QA tasks by example-wise tool making

    CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation Qian, Cheng, et al. 2023.05 [Paper]

  • Make tools via heuristic-based training and tool deduplication

    CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets Yuan, Lifan, et al. 2023.09 [Paper]

  • Learning tools by refactoring a small amount of programs

    ReGAL: Refactoring Programs to Discover Generalizable Abstractions Stengel-Eskin, Elias, Archiki Prasad, and Mohit Bansal. 2024.01 [Preprint]

  • A training-free approach to make tools via execution consistency

    🎁 TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks Wang, Zhiruo, Daniel Fried, and Graham Neubig. 2024.01 [Preprint]

$\S6$ Evaluation: Testbeds

$\S6.1.1$ Repurposed existing datasets

  • Datasets that require reasoning over texts

    Measuring Mathematical Problem Solving With the MATH Dataset Hendrycks, Dan, et al. 2021.03 [Paper]

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models Srivastava, Aarohi, et al. 2022.06 [Paper]

  • Datasets that require reasoning over structured data, e.g., tables

    Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning Lu, Pan, et al. 2022.09 [Paper]

    Compositional Semantic Parsing on Semi-Structured Tables Pasupat, Panupong, and Percy Liang. 2015 [Paper]

    HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation Cheng, Zhoujun, et al. 2022 [Paper]

  • Datasets that require reasoning over other modalities, e.g., images and image pairs

    Gqa: A new dataset for real-world visual reasoning and compositional question answering Hudson, Drew A., and Christopher D. Manning. 2019.02 [Paper]

    A Corpus for Reasoning about Natural Language Grounded in Photographs Suhr, Alane, et al. 2019 [Paper]

  • Example datasets that require retriever model (tool) to solve

    Natural Questions: A Benchmark for Question Answering Research Kwiatkowski, Tom, et al. 2019 [Paper]

    TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension Joshi, Mandar, et al. 2017 [Paper]

$\S6.1.2$ Aggregated API benchmarks

  • Collect RapidAPIs and use models to synthesize examples for evaluation

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin, Yujia, et al. 2023.07 [Paper]

  • Collect APIs from PublicAPIs and use models to synthesize examples

    ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang, Qiaoyu, et al. 2023.06 [Preprint]

  • Collect APIs from PublicAPIs and manually annotate examples for evaluation

    API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs Li, Minghao, et al. 2023.12 [Paper]

  • Collect APIs from OpenAI plugin list and use models to synthesize examples

    MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use Huang, Yue, et al. 2023.10 [Paper]

  • Collect neural model tools from Huggingface hub, TorchHub, and TensorHub

    Gorilla: Large language model connected with massive apis Patil, Shishir G., et al. 2023.05 [Paper]

  • Collect neural model tools from Huggingface

    HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face Shen, Yongliang, et al. 2024 [Paper]

  • Collect tools from Huggingface and PublicAPIs

    Taskbench: Benchmarking large language models for task automation Shen, Yongliang, et al. 2023.11 [Paper]

  • Collect Action Sequences in real-world macOS/iPadOS/iOS.

    ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents Shen, Haiyang, et al. 2024.07 [Paper]

awesome-tool-llm's People

Contributors

zorazrw avatar vishruth-v avatar eachsheep avatar debobanerjee avatar debrup-61 avatar andy-jqa avatar gooodte avatar entslscheia avatar

Stargazers

Xinyue Liu avatar jason law avatar Siyu avatar Iman Mohammadi avatar Florence avatar Zhiyuan Ma avatar Doing avatar Itay Etelis avatar  avatar Ahmad Rezaie avatar  avatar Kun-Lin Lee avatar Kade Heckel avatar Ian Liu avatar Ying Kit WONG avatar oking9 avatar  avatar Nathaniel D Hendrix avatar Razan Baltaji avatar Sagor Sarker avatar  avatar Lancelot avatar Apurv Verma avatar Ricky avatar Qijing avatar  avatar  avatar Yaohai Zhou avatar hzhou avatar 千古兴亡知衡权 avatar Changle Qu avatar Yixuan Jin avatar Tatttt avatar Xianjie Shi avatar Minseok Yang avatar Sharad Jain avatar Qian (Ana) Liu avatar Ne Luo avatar JohnZhou avatar Hiroo Takizawa avatar mandarin avatar Junhua Liu avatar Kunlun Zhu avatar FengXie avatar  avatar Josh Teneycke avatar Jun Wan avatar Bing avatar Jeff Carpenter avatar Zhitong (Payton) Guo avatar  avatar Peilong Ma avatar  avatar Shaokun Zhang avatar Atma avatar baeseongsu avatar  avatar Yusuke avatar Daniel Camarena avatar  avatar Doug Holton avatar Dominic Sun avatar  avatar Mingqian Zheng avatar kuafu avatar JinfengZhang avatar Rui Shao avatar ProgrammerUnknown avatar Jun Yan avatar Yihao Feng avatar  avatar Xianyu Chen avatar Yoshitaka Inoue avatar Yulei Niu avatar Hiroaki Hayashi avatar  avatar Jason avatar Gurumurthi V Ramanan avatar Ziyang Huang avatar yaki avatar  avatar null avatar sc zz avatar  avatar Mark Wolfe avatar  avatar  avatar  avatar  avatar Shuyan Zhou avatar Aashiq Muhamed avatar Francisco Aboim avatar Jason Chen avatar JackZhu avatar Xiangqing Shen avatar 2 avatar Jiang Jiwen avatar  avatar  avatar song avatar

Watchers

engg@cynepia avatar Markus Rauhalahti avatar  avatar  avatar

awesome-tool-llm's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.