Comments (5)
@klxu03 @shubhexists Here's an example of how you could scroll using PyAutoGUI from https://pyautogui.readthedocs.io/en/latest/mouse.html. This would probably be preferred over simulating any scrolling as the .scroll(...)
function scrolls as a human would with a mouse wheel. This should also be multi-platform.
>>> pyautogui.scroll(10) # scroll up 10 "clicks"
>>> pyautogui.scroll(-10) # scroll down 10 "clicks"
>>> pyautogui.scroll(10, x=100, y=100) # move mouse cursor to 100, 200, then scroll up 10 "clicks"
The CLICK action & prompt could be modified to support scroll amount as a response or something like that.
I might open a PR for this if you're not working on it @klxu03.
from self-operating-computer.
@klxu03 I have #76 opened! Feel free to clone the PR and try it out. Let me know if you have any feedback or suggestions.
I have it taking more of an "exploration" approach rather than knowing ahead of time what will be shown after scrolling (as a human would). When the model scrolls, it can choose to not do a left click so as to not accidentally click on something after doing the scroll.
from self-operating-computer.
How about using pyautogui
for scrolling pressing the arrow down key? The issue is can the model ( GPT 4) identify if it has to scroll?
from self-operating-computer.
Interested to see how the scroll performs. I'll take a look this week
from self-operating-computer.
@michaelhhogue thanks for sending! yeah maybe can you open up a PR with pyautogui to see performance. I'm not sure how it would be easy to bake this into the prompt and for GPT to figure out it needs to scroll.
When I talked about baking it into Selenium, it'd be with additional functionality like Selenium taking a picture of the entire website (including the parts of the site below the current viewer) so the model knows what is down there
from self-operating-computer.
Related Issues (20)
- [Question] About the Third-party API
- [FEATURE] No update instructions?
- [BUG] WINDOWS install not finding gpt-4-with-ocr HOT 5
- [BUG] Unable to activate the virtual environment
- [BUG] Not running on Ubuntu 22.04.4 LTS HOT 3
- CogVLM Support - A better LLaVa
- [BUG] -m gemini-pro-vision asking for OPENAI_API_KEY HOT 2
- [FEATURE] Add Remote Ollama Capability
- [BUG] Cannot seem to select the right emails to delete.
- [FEATURE] Learning Process HOT 1
- [FEATURE] GUI Interface and further connectivity
- [BUG] operate -m llava return error local variable 'content' referenced before assignment
- [BUG] ModuleNotFoundError: No module named 'pkg_resources' HOT 4
- [BUG] Need GPT-4 ? HOT 1
- [FEATURE] Azure open AI support HOT 2
- OpenSource free Vision model use Instead of openAI HOT 5
- Github
- [Linux]: X get_image failed: error 8 (73, 0, 1316) [Error] --> cannot access local variable 'content' where it is not associated with a value HOT 2
- [For be deleted]
- [BUG] No such file or directory Xauthority
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from self-operating-computer.