Giter Club home page Giter Club logo

Comments (15)

michaelhhogue avatar michaelhhogue commented on August 21, 2024 1

@centopw I am going to try out your install script in #19 and see how it works.

from self-operating-computer.

centopw avatar centopw commented on August 21, 2024 1

@michaelhhogue Then how about this? I don't really work with Windows that much so this draft only work with Mac using webbrowser and Linux xdg-setting,

def get_default_browser_macos():
        return webbrowser.get().name

def get_default_browser_linux():
        result = subprocess.run(["xdg-settings", "get", "default-web-browser"], stdout=subprocess.PIPE, text=True)
        browser_name = result.stdout.strip()
        return browser_name

from self-operating-computer.

michaelhhogue avatar michaelhhogue commented on August 21, 2024

@centopw Thanks for this proposed change. It's interesting to see that you can just open the default browser by searching for "browser" in Mac OS. Do you have any ideas on how the default browser could be opened on Windows and Linux? I've tested just searching for "browser" on my Linux distro and it doesn't find the default.

from self-operating-computer.

centopw avatar centopw commented on August 21, 2024

Issue Description

When searching for browsers on different Linux distros, the current behavior is as follows:

Ubuntu 22.04.3

  • Returns all available browsers but fails to display the correct default browser.

    Ubuntu Screenshot

Kali Linux 2023.3

  • Similar to Ubuntu, it shows all available browsers but does not identify the default browser correctly.

    Kali Linux Screenshot

Proposed Changes

Two potential solutions have been considered:

  1. Script Improvement (PR #19):
    Enhance the existing scripts to prompt the user for their default browser choice and update the main.py with the selected browser.

  2. Update main.py:
    Modify main.py to prompt the user to select the default browser every time it runs.

Pros & Cons

Both options offer improved accuracy:

  • The user can specify the location of the search bar for each browser, expanding support for future browsers.

Drawbacks:

  1. Option 1:

    • Pros: Users can set their preferred default browser with the updated scripts.
    • Cons: Users must run the additional script (#19) for installation; otherwise, it defaults to Google Chrome.
  2. Option 2:

    • Pros: User flexibility in selecting the default browser each time.
    • Cons: Users are required to input their default browser choice with every run.

from self-operating-computer.

centopw avatar centopw commented on August 21, 2024

With this proposal I have draft a simple update for the main.py as below:

 # Ask the user for their default browser
    default_browser = prompt(
        "Please enter your default browser (e.g., Chrome, Firefox): "
    )

    # Adjust the behavior based on the user's default browser
    if default_browser.lower() == "chrome":
        browser_prompt = "Google Chrome"
        browser_address_bar = {"x": "50%", "y": "9%"}
    elif default_browser.lower() == "firefox":
        browser_prompt = "Mozilla Firefox"
        browser_address_bar = {"x": "50%", "y": "10%"}
    else:
        # Default to Chrome behavior if the input is unknown
        browser_prompt = "Google Chrome"
        browser_address_bar = {"x": "50%", "y": "9%"}

    message_dialog(
        title="Self-Operating Computer",
        text=f"Ask a computer to do anything. Default browser set to {browser_prompt}.",
        style=style,
    ).run()

    print("SYSTEM", platform.system())

    # Update the prompts based on the chosen/default browser
    VISION_PROMPT = f"""
    You are a Self-Operating Computer. You use {browser_prompt} as your default browser.

    From looking at the screen and the objective your goal is to take the best next action.

    To operate the computer you have the four options below.

    1. CLICK - Move mouse and click
    2. TYPE - Type on the keyboard
    3. SEARCH - Search for a program on {browser_prompt} and open it
    4. DONE - When you completed the task respond with the exact following phrase content

    Here are the response formats below.

    1. CLICK
    Response: CLICK {{ "x": "percent", "y": "percent", "description": "~description here~", "reason": "~reason here~" }}

    2. TYPE
    Response: TYPE "value you want to type"

    2. SEARCH
    Response: SEARCH "app you want to search for on {browser_prompt}"

    3. DONE
    Response: DONE

    Here are examples of how to respond.
    ...
    """

from self-operating-computer.

centopw avatar centopw commented on August 21, 2024

Also Instead of asking user to type out we can incorporate a menu function that allow user to select a pre-define selection of browser

from self-operating-computer.

michaelhhogue avatar michaelhhogue commented on August 21, 2024

@centopw Interesting. I think the ideal solution would be to just automatically detect the default browser if possible. On Windows, I'm pretty sure this can just be read from the registry using OpenKey. For Linux, this would probably be found in xdg-settings. I'm not sure about Mac OS. It would probably require some special permissions to access that system setting. If no default browser was found, it could just default to searching for "browser" or something. What do you think about this approach?

from self-operating-computer.

centopw avatar centopw commented on August 21, 2024

If you want to go with terminal approach we could simply open any website then from the terminal
ex:

  • Linux: xdg-open http://www.google.com
  • Windows: start http://www.google.com
  • MacOS: open http://www.google.com

When run this command in the terminal it will automatically open with default browser on each system. One more thing that I think we could benefit from this is since it always open the google.com website so we can define where the search location is avoid miss click even more

Screenshot 2023-12-02 at 5 08 57 PM

from self-operating-computer.

michaelhhogue avatar michaelhhogue commented on August 21, 2024

@centopw That's an interesting approach. However, the project is aiming more towards only giving the model control over the OS via mouse movements, mouse clicks, key-presses, and search operations (from key-presses). Running xdg-open, start, or open from the code itself would violate that vision (restricting the model to only have the same inputs to the OS as a human: mouse and keyboard).

So, having the model open a terminal and run xdg-open using only the cursor and key-presses would be a valid operation (although not very practical). Running xdg-open from the python code itself wouldn't be valid. Hope that makes sense.

The program should probably follow this order of operations:

Get name of the user's default browser (either manually or automatically) -> Give default browser name to model in prompt -> Model references default browser name to be included in the search action.

from self-operating-computer.

michaelhhogue avatar michaelhhogue commented on August 21, 2024

@centopw I'll test this out as well and get back with you.

from self-operating-computer.

Kreijstal avatar Kreijstal commented on August 21, 2024

What if browser is already open?

from self-operating-computer.

centopw avatar centopw commented on August 21, 2024

@Kreijstal For now I don't think if the browser open effect anything. But that is an interesting ideas I will play around with it and let you know.

from self-operating-computer.

michaelhhogue avatar michaelhhogue commented on August 21, 2024

@centopw Just noting here that I haven't yet tested any default browser checking. Want to first see what happens with #19.

from self-operating-computer.

joshbickett avatar joshbickett commented on August 21, 2024

Problem

Currently, the application is prompt to use Google Chrome by default, limiting accessibility and user experience for individuals using alternative browsers. This monolithic approach excludes a significant user base and hinders the platform's adaptability to diverse browser environments.

Proposal

This issue advocates for a transition from Chrome-centric development to a more inclusive approach that supports a broader range of web browsers. The goal is to enhance accessibility, improve user experience, and adhere to web standards that promote compatibility across different platforms.

Proposed Changes

When testing I realize that on MacOS you can open your default browser by just type in the search bar

browser

So instead of Google Chrome you can search browsers then enter it will open the browser without the need of user have to use Google Chrome. Since most browser have the search bar at the same location you can still use the default setting for it.

I originally hacked in Google Chrome as the default, but agree we've out grown this. Chrome is 70% of the market if I understand correctly though. Would it make sense to "check for chrome" and if it doesn't find it then search for "browser" as shown above?

- Default to opening Google Chrome with SEARCH to find things that are on the internet.

from self-operating-computer.

joshbickett avatar joshbickett commented on August 21, 2024

With this proposal I have draft a simple update for the main.py as below:

 # Ask the user for their default browser
    default_browser = prompt(
        "Please enter your default browser (e.g., Chrome, Firefox): "
    )

    # Adjust the behavior based on the user's default browser
    if default_browser.lower() == "chrome":
        browser_prompt = "Google Chrome"
        browser_address_bar = {"x": "50%", "y": "9%"}
    elif default_browser.lower() == "firefox":
        browser_prompt = "Mozilla Firefox"
        browser_address_bar = {"x": "50%", "y": "10%"}
    else:
        # Default to Chrome behavior if the input is unknown
        browser_prompt = "Google Chrome"
        browser_address_bar = {"x": "50%", "y": "9%"}

    message_dialog(
        title="Self-Operating Computer",
        text=f"Ask a computer to do anything. Default browser set to {browser_prompt}.",
        style=style,
    ).run()

    print("SYSTEM", platform.system())

    # Update the prompts based on the chosen/default browser
    VISION_PROMPT = f"""
    You are a Self-Operating Computer. You use {browser_prompt} as your default browser.

    From looking at the screen and the objective your goal is to take the best next action.

    To operate the computer you have the four options below.

    1. CLICK - Move mouse and click
    2. TYPE - Type on the keyboard
    3. SEARCH - Search for a program on {browser_prompt} and open it
    4. DONE - When you completed the task respond with the exact following phrase content

    Here are the response formats below.

    1. CLICK
    Response: CLICK {{ "x": "percent", "y": "percent", "description": "~description here~", "reason": "~reason here~" }}

    2. TYPE
    Response: TYPE "value you want to type"

    2. SEARCH
    Response: SEARCH "app you want to search for on {browser_prompt}"

    3. DONE
    Response: DONE

    Here are examples of how to respond.
    ...
    """

I lean away from asking the user additional questions if possible, but curious what the community thinks

from self-operating-computer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.