Giter Club home page Giter Club logo

Comments (3)

joshbickett avatar joshbickett commented on July 20, 2024 1

Originally I tried a bunch of things to try to avoid repetition. gpt-4-v-preview just doesn't seem as good at following instructions

First I added language like this to the prompt.

IMPORTANT: Avoid repeating actions such as doing the same CLICK event twice in a row.

That didn't help that much.. so I played with presence_penalty & frequency_penalty, which maybe helped a little.. hard to say for sure.

response = client.chat.completions.create(
            model="gpt-4-vision-preview",
            messages=pseudo_messages,
            presence_penalty=1,
            frequency_penalty=1,
            temperature=0.7,
            max_tokens=300,
        )

What made the largest impact was advice from @mshumer to add the actual previous_action to the prompt so that GPT sees it in a very obvious way. This improved it slightly, but there's still an issue as noticed. Ultimately our agent-1 model will not have this problem, but to fix this with gpt-4-v I recommend playing around with the {previous_action} part of the prompting system


{previous_action}

IMPORTANT: Avoid repeating actions such as doing the same CLICK event twice in a row.

Objective: {objective}
"""
...
def format_vision_prompt(objective, previous_action):
    """
    Format the vision prompt
    """
    if previous_action:
        previous_action = f"Here was the previous action you took: {previous_action}"
    else:
        previous_action = ""
    prompt = VISION_PROMPT.format(objective=objective, previous_action=previous_action)
    return prompt

from self-operating-computer.

michaelhhogue avatar michaelhhogue commented on July 20, 2024

Hello @yibie. Can you confirm if you still have this issue on the most recent version of the repo?

from self-operating-computer.

AzorianMatt avatar AzorianMatt commented on July 20, 2024

@michaelhhogue I have this issue occasionally on the main branch currently. It's not 100% by any means as it will often progress to other steps even though it never really succeeds at prior steps. Most of the time it will attempt 2 - 3 launches of the browser, and then move on to the next step. Sometimes though, it does just seem to keep repeating the search command but I usually cut it off after 7 - 8 attempts before the loop limit kicks in.

from self-operating-computer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.