Comments (3)
Originally I tried a bunch of things to try to avoid repetition. gpt-4-v-preview
just doesn't seem as good at following instructions
First I added language like this to the prompt.
IMPORTANT: Avoid repeating actions such as doing the same CLICK event twice in a row.
That didn't help that much.. so I played with presence_penalty
& frequency_penalty
, which maybe helped a little.. hard to say for sure.
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=pseudo_messages,
presence_penalty=1,
frequency_penalty=1,
temperature=0.7,
max_tokens=300,
)
What made the largest impact was advice from @mshumer to add the actual previous_action
to the prompt so that GPT sees it in a very obvious way. This improved it slightly, but there's still an issue as noticed. Ultimately our agent-1
model will not have this problem, but to fix this with gpt-4-v I recommend playing around with the {previous_action}
part of the prompting system
{previous_action}
IMPORTANT: Avoid repeating actions such as doing the same CLICK event twice in a row.
Objective: {objective}
"""
...
def format_vision_prompt(objective, previous_action):
"""
Format the vision prompt
"""
if previous_action:
previous_action = f"Here was the previous action you took: {previous_action}"
else:
previous_action = ""
prompt = VISION_PROMPT.format(objective=objective, previous_action=previous_action)
return prompt
from self-operating-computer.
Hello @yibie. Can you confirm if you still have this issue on the most recent version of the repo?
from self-operating-computer.
@michaelhhogue I have this issue occasionally on the main
branch currently. It's not 100% by any means as it will often progress to other steps even though it never really succeeds at prior steps. Most of the time it will attempt 2 - 3 launches of the browser, and then move on to the next step. Sometimes though, it does just seem to keep repeating the search command but I usually cut it off after 7 - 8 attempts before the loop limit kicks in.
from self-operating-computer.
Related Issues (20)
- [FEATURE] No update instructions?
- [BUG] WINDOWS install not finding gpt-4-with-ocr HOT 5
- [BUG] Unable to activate the virtual environment
- [BUG] Not running on Ubuntu 22.04.4 LTS HOT 3
- CogVLM Support - A better LLaVa
- [BUG] -m gemini-pro-vision asking for OPENAI_API_KEY HOT 2
- [FEATURE] Add Remote Ollama Capability
- [BUG] Cannot seem to select the right emails to delete.
- [FEATURE] Learning Process HOT 1
- [FEATURE] GUI Interface and further connectivity
- [BUG] operate -m llava return error local variable 'content' referenced before assignment
- [BUG] ModuleNotFoundError: No module named 'pkg_resources' HOT 4
- [BUG] Need GPT-4 ? HOT 1
- [FEATURE] Azure open AI support HOT 2
- OpenSource free Vision model use Instead of openAI HOT 5
- Github
- [Linux]: X get_image failed: error 8 (73, 0, 1316) [Error] --> cannot access local variable 'content' where it is not associated with a value HOT 2
- [For be deleted]
- [BUG] No such file or directory Xauthority
- [BUG] Brief Description of the Issue
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from self-operating-computer.