Giter Club home page Giter Club logo

rci-agent's Introduction

RCI Agent for MiniWoB++

Welcome to the codebase for our paper, "Language Models can Solve Computer Tasks". In this codebase, you will find the implementation of our RCI agent, which uses a pre-trained language model to execute computer tasks in MiniWoB++ benchmark guided by natural language. The agent employs a simple RCI prompting scheme that allows it to improve its outputs.

overview

[Website] [Arxiv Paper] [PDF]

Dependencies

The RCI agent is implemented in Python 3.9 and requires the following dependencies:

  • gym
  • openai
  • selenium
  • Pillow
  • regex
pip install -r requirements.txt

Note: MiniWoB++ is not officially supported on Windows. Please refer to this issue.

Usage

Setup

To run the code, you must first install MiniWoB++ and configure your OpenAI API key. MiniWoB++ is integrated with the OpenAI Gym environment. Navigate to the computergym directory and execute the following command to install it:

cd computergym
pip install -e .

Once that's done, you need to write your OpenAI API key in the example_config.json file, then rename the file to config.json

Run

To run the code, simply execute the following command:

python main.py --env [TASK NAME] --llm [LLM NAME] --num-episodes [NUM EPISODES] --erci [NUM Explicit RCI] --irci [NUM Implicit RCI] --sgrounding

Here are the arguments you need to specify:

  • --env: Name of the MiniWoB++ task you want to run. You can see the list of available tasks in available_tasks.txt
  • --llm: Name of the language model you want to use. The model name and corresponding API name are specified below:
    • chatgpt: "gpt-3.5-turbo"
    • davinci: "text-davinci-003"
    • ada: "ada"
    • babbage: "babbage"
    • curie: "curie"
    • davinci1: "davinci"
    • davinci2: "text-davinci-002"
  • --num-episodes: Number of episodes to run the task
  • --erci: The number of explicit RCI loop for an action plan. -1 will remove the action plan sampling.
  • --irci: The number of implicit RCI loop for the agent grounding.
  • --sgrounding: If this is True, then the state grounding update is enabled.
  • --headless: If this is True, then the MiniWoB++ environment will run in headless mode.

Consider running the following command to verify if everything is functioning correctly:

python main.py --env choose-list --llm chatgpt --num-episodes 1 --irci 1 --sgrounding

Evaluation

Our project's approach has yielded impressive results, with our agent achieving the second-highest score out of all tested models. We have observed that our agent outperforms the baselines, with the exception of CC-Net (SL + RL), which uses dictionary-based typing actions.

What sets our RCI agent apart is that it accomplished this feat using 120 times fewer samples than WebN-T5-3B and 11,000 times fewer samples than CC-Net. Obtaining expert demonstrations and defining reward functions for computer tasks can be a daunting challenge, but our research highlights the potential of using LLMs to overcome these obstacles and achieve success in general computer tasks.

Check out our paper!

Our paper is available on Arxiv. If you use this code in your research, we kindly ask that you cite our paper.

@article{kim2023language,
      title={Language Models can Solve Computer Tasks}, 
      author={Geunwoo Kim and Pierre Baldi and Stephen McAleer},
      journal={arXiv preprint arXiv:2303.17491},
      year={2023},
}

rci-agent's People

Contributors

posgnu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rci-agent's Issues

How to reproduce the results in Table 1&2?

Hi authors,

Thanks for your excellent work! I'm wondering if you could provide more details on how to produce the results in Table 1 and 2, for example:

  • Which LM did you use? (Section 3.1 said "InstructGPT-3 + RLHF", but which specific checkpoint?)
  • What are the hyperparameters?
  • What's the full prompt?
  • Would it be possible to provide the dataset split/model predictions/relevant code in order to reproduce the results?

Thanks in advance!

Prompt/Codebase for reasoning tasks

Hi, first of all thank you so much for the amazing work and paper.

I was actually interested in evaluating RCI across different reasoning tasks and was wondering about the prompt used in the paper for reasoning. Is it the prompt text mentioned in figure 2 or was there any additional info in the instructions? Also any chance on releasing the RCI codebase for the reasoning tasks?

Thank you for your time.

MINIWOB_BASE_URL environment variable not defined

Hello,

After installing the different required packages, I tried to run an experiment on the choose-list environment using the command line you provided :

python main.py --env choose-list --llm chatgpt --num-episodes 1 --irci 1 --sgrounding

And I got the following error :

(RCI-agent) PS C:\Users\Tom\Desktop\rci-agent> python main.py --env choose-list --llm chatgpt --num-episodes 1 --irci 1 --sgrounding
INFO:root:Starting WebDriver Instance 0
C:\Users\Tom\miniconda3\envs\RCI-agent\lib\site-packages\gym\utils\passive_env_checker.py:20: UserWarning: WARN: It seems a Box observation space is an image but the `dtype` is not `np.uint8`, actual type: int32. If the Box observation space is not an image, we recommend flattening the observation to have only a 1D vector.
  logger.warn(
C:\Users\Tom\miniconda3\envs\RCI-agent\lib\site-packages\gym\utils\passive_env_checker.py:174: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator.
  logger.warn(
C:\Users\Tom\miniconda3\envs\RCI-agent\lib\site-packages\gym\utils\passive_env_checker.py:187: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information.
  logger.warn(
INFO:selenium.webdriver.common.selenium_manager:Applicable driver not found; attempting to install with Selenium Manager (Beta)

DevTools listening on ws://127.0.0.1:51802/devtools/browser/6802c38e-d0ec-42dd-b55b-0574baaefc72
ERROR:root:Page did not load properly. Wrong MINIWOB_BASE_URL?
INFO:root:Closed instance 0
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\Tom\miniconda3\envs\RCI-agent\lib\threading.py", line 980, in _bootstrap_inner
    self.run()
  File "c:\users\tom\desktop\rci-agent\computergym\computergym\miniwob\miniwob_interface\instance.py", line 128, in run
    self.create_driver()
  File "c:\users\tom\desktop\rci-agent\computergym\computergym\miniwob\miniwob_interface\instance.py", line 200, in create_driver
    raise e
        (No symbol) [0x0034A304]
        (No symbol) [0x0035C482]
        (No symbol) [0x0034A0B6]
        (No symbol) [0x00327E08]
        (No symbol) [0x00328F2D]
        GetHandleVerifier [0x006C8E3A+2540266]
        GetHandleVerifier [0x00708959+2801161]
        GetHandleVerifier [0x0070295C+2776588]
        GetHandleVerifier [0x004F2280+612144]
        (No symbol) [0x00404F6C]
        (No symbol) [0x004011D8]
        (No symbol) [0x004012BB]
        (No symbol) [0x003F4857]
        BaseThreadInitThunk [0x76C97D59+25]
        RtlInitializeExceptionChain [0x77C9B74B+107]
        RtlClearBits [0x77C9B6CF+191]

I tried to run the file environment.py and got the same issue. The reason is that the environment variable MINIWOB_BASE_URL is not defined.

debug console :

import os 
base_url=os.environ.get("MINIWOB_BASE_URL")
print(base_url)
None

Am I supposed to define this environment variable myself?

PS : I'm running on Windows 11, Python 3.9.16, and I use a conda env.

Bug Report: Some task can't be addressed when Headless parameter is enabled

Bug Description:
I encountered a strange bug while running two almost similar benchmarks for the "enter-time" task (but it might consider many other tasks). The only difference between the two runs is the value of the "headless" parameter. In the first case, I set it to False (headless = False), while in the second case, I left it as True, which was the default value.

Steps to Reproduce:

  1. Git clone the SNow_benchmark branch from my fork and follow the installation in the README.md
    .
  2. Set the headless parameter to False and run the benchmark for the "enter-time" :
    python main.py --env enter-time --llm chatgpt --num-episodes 1 --irci 1 --sgrounding
  3. Set the headless parameter to True and run the benchmark for the "enter-time" :
    python main.py --env enter-time --llm chatgpt --num-episodes 1 --irci 1 --sgrounding --headless
    Expected Behavior:
    The results should be identical, regardless of the value of the headless parameter.

Actual Behavior:
When the headless parameter is disabled (set to False), certain actions are not allowed or counted, resulting in a failed task. (I could benchmark the task several time, I will still get the same results)

(RCI-agent-WSL) thirdcore@DESKTOP-5I4C9HH:~/rci-agent$ python main.py --env enter-time --llm chatgpt --num-episodes 1 --irci 1 --sgrounding 
False
INFO:root:Starting WebDriver Instance 0
INFO:selenium.webdriver.common.selenium_manager:Applicable driver not found; attempting to install with Selenium Manager (Beta)
INFO:root:Send a request to the language model from initialize_plan
INFO:root:The number of generated action steps: 4
INFO:root:Send a request to the language model from generate_action
INFO:root:The executed instruction: clickxpath //*[@id="tt"]
INFO:root:Send a request to the language model from generate_action
INFO:root:The executed instruction: type 02:07PM
INFO:root:Send a request to the language model from generate_action
INFO:root:The executed instruction: clickxpath //*[@id="subbtn"]
success rate: 1.0
(RCI-agent-WSL) thirdcore@DESKTOP-5I4C9HH:~/rci-agent$ python main.py --env enter-time --llm chatgpt --num-episodes 1 --irci 1 --sgrounding --headless
True
INFO:root:Starting WebDriver Instance 0
INFO:selenium.webdriver.common.selenium_manager:Applicable driver not found; attempting to install with Selenium Manager (Beta)
INFO:root:Send a request to the language model from initialize_plan
INFO:root:The number of generated action steps: 4
INFO:root:Send a request to the language model from generate_action
INFO:root:The executed instruction: clickxpath //*[@id="tt"]
INFO:root:Send a request to the language model from generate_action
INFO:root:The executed instruction: type 1017AM
INFO:root:Send a request to the language model from generate_action
INFO:root:The executed instruction: clickxpath //*[@id="subbtn"]
success rate: 0.0

Additional Information:
I'm still investigating the root cause of this issue. It seems that when the browser is not displayed, some actions are restricted or not properly accounted for, leading to the task failure. Did you have the same behavior, is there something that I'm missing ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.