Giter Club home page Giter Club logo

selenium-proxy-integration-python's Introduction

Oxylabs’ Residential Proxies integration with Selenium in Python

Oxylabs promo code

Requirements

For the integration to work, you'll need to install Selenium Wire to extend Selenium’s Python bindings as implementing proxies that require authentication using default Selenium module complicates the process too much.

You can do it using pip command:

pip install selenium-wire

Another recommended package is webdriver-manager. It simplifies the management of binary drivers for different browsers, so you don't need to manually download a new version of a web driver after each update. Visit the official project directory on pypi to find out more information.

You can install the following using pip as well:

pip install webdriver-manager

Required version of Python: Python 3.5 (or higher)

Proxy Authentication

For proxies to work, you'll need to specify your account credentials inside the main.py file.

USERNAME = "your_username"
PASSWORD = "your_password"
ENDPOINT = "pr.oxylabs.io:7777"

Adjust the your_username and your_password value fields with the username and password of your Oxylabs account.

Testing Proxy Connection

To see if the proxy is working, try visiting ip.oxylabs.io/location If everything is working correctly, it will return an IP address of a proxy that you're using.

Full Code

from selenium.webdriver.common.by import By
from seleniumwire import webdriver
# A package to have a chromedriver always up-to-date.
from webdriver_manager.chrome import ChromeDriverManager

USERNAME = "your_username"
PASSWORD = "your_password"
ENDPOINT = "pr.oxylabs.io:7777"


def get_chrome_proxy(user: str, password: str, endpoint: str) -> dict:
    wire_options = {
        "proxy": {
            "http": f"http://{user}:{password}@{endpoint}",
            "https": f"http://{user}:{password}@{endpoint}",
        }
    }

    return wire_options


def execute_driver():
    options = webdriver.ChromeOptions()
    options.headless = True
    seleniumwire_options = {
        **get_chrome_proxy(USERNAME, PASSWORD, ENDPOINT),
        "driver_path": ChromeDriverManager().install(),
    }
    driver = webdriver.Chrome(
        options=options,
        seleniumwire_options=seleniumwire_options,
    )
    try:
        driver.get("https://ip.oxylabs.io/location")
        return f'\nYour IP is: {driver.find_element(By.CSS_SELECTOR, "pre").text}'
    finally:
        driver.quit()


if __name__ == "__main__":
    print(execute_driver())

If you're having any trouble integrating proxies with Selenium and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

selenium-proxy-integration-python's People

Contributors

augustoxy avatar oxyjohan avatar oxylabsorg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

selenium-proxy-integration-python's Issues

ModuleNotFoundError: No module named 'blinker._saferef'

python version 3.10
MacOS m1 14.1.2 (23B92)

selenium-wire==5.1.0
webdriver-manager==4.0.1

Traceback (most recent call last): File "/Documents/projects/selenium/main.py", line 4, in <module> from seleniumwire import webdriver File "/Documents/projects/selenium/.venv/lib/python3.10/site-packages/seleniumwire/webdriver.py", line 28, in <module> from seleniumwire import backend, utils File "/Documents/projects/selenium/.venv/lib/python3.10/site-packages/seleniumwire/backend.py", line 4, in <module> from seleniumwire.server import MitmProxy File "/Documents/projects/selenium/.venv/lib/python3.10/site-packages/seleniumwire/server.py", line 5, in <module> from seleniumwire.handler import InterceptRequestHandler File "/Documents/projects/selenium/.venv/lib/python3.10/site-packages/seleniumwire/handler.py", line 5, in <module> from seleniumwire import har File "/Documents/projects/selenium/.venv/lib/python3.10/site-packages/seleniumwire/har.py", line 11, in <module> from seleniumwire.thirdparty.mitmproxy import connections File "/Documents/projects/selenium/.venv/lib/python3.10/site-packages/seleniumwire/thirdparty/mitmproxy/connections.py", line 10, in <module> from seleniumwire.thirdparty.mitmproxy.net import tls, tcp File "/Documents/projects/selenium/.venv/lib/python3.10/site-packages/seleniumwire/thirdparty/mitmproxy/net/tls.py", line 15, in <module> import seleniumwire.thirdparty.mitmproxy.options File "Documents/projects/selenium/.venv/lib/python3.10/site-packages/seleniumwire/thirdparty/mitmproxy/options.py", line 5, in <module> from seleniumwire.thirdparty.mitmproxy import optmanager File "/Documents/projects/selenium/.venv/lib/python3.10/site-packages/seleniumwire/thirdparty/mitmproxy/optmanager.py", line 9, in <module> import blinker._saferef ModuleNotFoundError: No module named 'blinker._saferef'

SSl Proxy Support

Hi, I have created a package named botasaurus-proxy-authentication, which enables SSL support for proxies requiring authentication.

For instance, when using an authenticated proxy with a tool like seleniumwire to scrape a Cloudflare-protected website such as G2.com, a non-SSL connection typically results in being blocked.

To illustrate, run this code:

First, install the required packages:

python -m pip install selenium_wire chromedriver_autoinstaller

Then, execute this Python script:

from seleniumwire import webdriver
from chromedriver_autoinstaller import install

# Define the proxy
proxy_options = {
    'proxy': {
        'http': 'http://username:password@proxy-provider-domain:port', # Replace with your proxy
        'https': 'http://username:password@proxy-provider-domain:port', # Replace with your proxy
    }
}

# Install and set up the driver
driver_path = install()
driver = webdriver.Chrome(driver_path, seleniumwire_options=proxy_options)

# Navigate to the desired URL
link = 'https://www.g2.com/products/github/reviews'
driver.get("https://www.google.com/")
driver.execute_script(f'window.location.href = "{link}"')

# Wait for user input
input("Press Enter to exit...")

# Clean up
driver.quit()

You'll likely be blocked by Cloudflare:

blocked

First, install the required packages:

python -m pip install botasaurus-proxy-authentication

However, using botasaurus_proxy_authentication with proxies circumvents this problem. Notice the difference by running the following code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from chromedriver_autoinstaller import install
from botasaurus_proxy_authentication import add_proxy_options

# Define the proxy settings
proxy = 'http://username:password@proxy-provider-domain:port'  # Replace with your proxy

# Set Chrome options
chrome_options = Options()
add_proxy_options(chrome_options, proxy)

# Install and set up the driver
driver_path = install()
driver = webdriver.Chrome(driver_path, options=chrome_options)

# Navigate to the desired URL
link = 'https://www.g2.com/products/github/reviews'
driver.get("https://www.google.com/")
driver.execute_script(f'window.location.href = "{link}"')

# Wait for user input
input("Press Enter to exit...")

# Clean up
driver.quit()

Result:
not blocked

I suggest using botasaurus_proxy_authentication for its SSL support for authenticated proxies, improving the success rate of scraping Cloudflare-protected websites and thus increasing revenue for Oxylabs.
Also, Thanks Oxylabs for your Great Work in Proxy.
Good Luck to the Team.

sample is broken

I got TypeError:

WebDriver.init() got multiple values for argument 'options'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.