Giter Club home page Giter Club logo

nodriver's Introduction

NODRIVER

This package provides next level webscraping and browser automation using a relatively simple interface.

  • This is the official successor of the Undetected-Chromedriver python package.
  • No more webdriver, no more selenium

Direct communication provides even better resistance against web applicatinon firewalls (WAF’s), while performance gets a massive boost. This module is, contrary to undetected-chromedriver, fully asynchronous.

What makes this package different from other known packages, is the optimization to stay undetected for most anti-bot solutions.

Another focus point is usability and quick prototyping, so expect a lot to work -as is- , with most method parameters having best practice defaults. Using 1 or 2 lines, this is up and running, providing best practice config by default.

While usability and convenience is important. It’s also easy to fully customizable everything using the entire array of CDP domains, methods and events available.

Some features

  • A blazing fast undetected chrome (-ish) automation library
  • No chromedriver binary or Selenium dependency
  • This equals bizarre performance increase and less detections!
  • Up and running in 1 line of code*
  • uses fresh profile on each run, cleans up on exit
  • save and load cookies to file to not repeat tedious login steps
  • smart element lookup, by selector or text, including iframe content. this could also be used as wait condition for a element to appear, since it will retry for the duration of until found. single element lookup by text using tab.find(), accepts a best_match flag, which will not naively return the first match, but will match candidates by closest matching text length.
  • descriptive __repr__ for elements, which represent the element as html
  • utility function to convert a running undetected_chromedriver.Chrome instance to a nodriver.Browser instance and contintue from there
  • packed with helpers and utility methods for most used and important operations

Installation

Since it’s a part of undetected-chromedriver, installation goes via

pip install undetected-chromedriver

In case you don’t want undetected-chromedriver, this package can be installed using

pip install nodriver

usage example

The aim of this project (just like undetected-chromedriver, somewhere long ago) is to keep it short and simple, so you can quickly open an editor or interactive session, type or paste a few lines and off you go.

import asyncio
import nodriver as uc

async def main():
    browser = await uc.start()
    page = await browser.get('https://www.nowsecure.nl')

    await page.save_screenshot()
    await page.get_content()
    await page.scroll_down(150)
    elems = await page.select_all('*[src]')
    for elem in elems:
        await elem.flash()

    page2 = await browser.get('https://twitter.com', new_tab=True)
    page3 = await browser.get('https://github.com/ultrafunkamsterdam/nodriver', new_window=True)

    for p in (page, page2, page3):
       await p.bring_to_front()
       await p.scroll_down(200)
       await p   # wait for events to be processed
       await p.reload()
       if p != page3:
           await p.close()


if __name__ == '__main__':

    # since asyncio.run never worked (for me)
    uc.loop().run_until_complete(main())

A more concrete example, which can be found in the ./example/ folder, shows a script to create a twitter account

import random
import string
import logging

logging.basicConfig(level=30)

import nodriver as uc

months = [
    "january",
    "february",
    "march",
    "april",
    "may",
    "june",
    "july",
    "august",
    "september",
    "october",
    "november",
    "december",
]


async def main():
    driver = await uc.start()

    tab = await driver.get("https://twitter.com")

    # wait for text to appear instead of a static number of seconds to wait
    # this does not always work as expected, due to speed.
    print('finding the "create account" button')
    create_account = await tab.find("create account", best_match=True)

    print('"create account" => click')
    await create_account.click()

    print("finding the email input field")
    email = await tab.select("input[type=email]")

    # sometimes, email field is not shown, because phone is being asked instead
    # when this occurs, find the small text which says "use email instead"
    if not email:
        use_mail_instead = await tab.find("use email instead")
        # and click it
        await use_mail_instead.click()

        # now find the email field again
        email = await tab.select("input[type=email]")

    randstr = lambda k: "".join(random.choices(string.ascii_letters, k=k))

    # send keys to email field
    print('filling in the "email" input field')
    await email.send_keys("".join([randstr(8), "@", randstr(8), ".com"]))

    # find the name input field
    print("finding the name input field")
    name = await tab.select("input[type=text]")

    # again, send random text
    print('filling in the "name" input field')
    await name.send_keys(randstr(8))

    # since there are 3 select fields on the tab, we can use unpacking
    # to assign each field
    print('finding the "month" , "day" and "year" fields in 1 go')
    sel_month, sel_day, sel_year = await tab.select_all("select")

    # await sel_month.focus()
    print('filling in the "month" input field')
    await sel_month.send_keys(months[random.randint(0, 11)].title())

    # await sel_day.focus()
    # i don't want to bother with month-lengths and leap years
    print('filling in the "day" input field')
    await sel_day.send_keys(str(random.randint(0, 28)))

    # await sel_year.focus()
    # i don't want to bother with age restrictions
    print('filling in the "year" input field')
    await sel_year.send_keys(str(random.randint(1980, 2005)))

    await tab

    # let's handle the cookie nag as well
    cookie_bar_accept = await tab.find("accept all", best_match=True)
    if cookie_bar_accept:
        await cookie_bar_accept.click()

    await tab.sleep(1)

    next_btn = await tab.find(text="next", best_match=True)
    # for btn in reversed(next_btns):
    await next_btn.mouse_click()

    print("sleeping 2 seconds")
    await tab.sleep(2)  # visually see what part we're actually in

    print('finding "next" button')
    next_btn = await tab.find(text="next", best_match=True)
    print('clicking "next" button')
    await next_btn.mouse_click()

    # just wait for some button, before we continue
    await tab.select("[role=button]")

    print('finding "sign up"  button')
    sign_up_btn = await tab.find("Sign up", best_match=True)
    # we need the second one
    print('clicking "sign up"  button')
    await sign_up_btn.click()

    print('the rest of the "implementation" is out of scope')
    # further implementation outside of scope
    await tab.sleep(10)
    driver.stop()

    # verification code per mail


if __name__ == "__main__":
    # since asyncio.run never worked (for me)
    # i use
    uc.loop().run_until_complete(main())

nodriver's People

Contributors

ultrafunkamsterdam avatar

Stargazers

s1nk avatar Evan Sarmiento avatar Thomas Lekanger avatar  avatar Melon avatar Nitin Rai avatar Arash mohammadi avatar Shuai Shao avatar Feng Wang avatar Sitelix LLC avatar listenfree avatar  avatar fuyu avatar Rupesh avatar Andrew Venza avatar  avatar  avatar Yokkin avatar  avatar sahad avatar Itamar avatar Anton Skvortsov avatar SDK19 avatar PJ Eby avatar  avatar Eric Yoong Min Chun avatar Thomas Staats avatar  avatar Tuan Nguyen avatar Cooper avatar  avatar Jarred Block avatar Mathieu Morel avatar rotomicora avatar  Deleted user  avatar  avatar Tobias Mühl avatar Artur Spatari avatar Zhu Wei avatar Rafael Calleja avatar  avatar  avatar  avatar  avatar Stefano Dipierro avatar ghj1976 avatar  avatar Rafid Khan avatar Gleb Antonevich avatar Jim_Di avatar  avatar Killin avatar  avatar Fco. Javier Clavero Álvarez avatar Alol avatar Filipp Frizzy avatar Wayde Gilliam avatar Drippy avatar Valentin Ivanov avatar Meng Jun avatar yahya jabary avatar Michael Bourke avatar Yaroslav Pankovych avatar Wu Tingfeng avatar Mahmoud Hashemi avatar Audi avatar sipc.ink avatar Deep-Tech Showcase and Eaglepoint Funding avatar Muhammed Kaplan avatar Ionel Cristian Mărieș avatar masa350z avatar Chi avatar  avatar Alexander Shcherbinin avatar Peter avatar  avatar Eyssette avatar Yinuo Wang avatar Daniel Shemesh avatar  avatar SLON avatar Mohamed Oxmento avatar  avatar ANHY Krishna Fitiavana avatar  avatar  avatar  avatar  avatar  avatar Samuel Kalu avatar Pickaxe828 avatar Snow avatar Alexander Priyomko avatar seanmamasde avatar FuseTim avatar Cory Sebastian avatar Will Clarke avatar Sean O'Connor avatar implicitlycorrect avatar Hádamo Egito avatar

Watchers

 avatar Daniel Y avatar dayson avatar Thomas Staats avatar jaskolek avatar ghj1976 avatar Youcef Kouchkar avatar Franco M avatar Michael Jacobsen avatar  avatar Benjamin Kang avatar  avatar  avatar  avatar  avatar Javier avatar Dirk Beukes avatar axiangcoding avatar Chetan Jain avatar Shrey Marwaha avatar  avatar Wong H avatar yunglean_ avatar Divyanshu Tiwari avatar  avatar  avatar Deep-Tech Showcase and Eaglepoint Funding avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nodriver's Issues

Add arguments to browser

how do i add arguments to the browser, tried setting Config(browser_args=['incognito']) and conf.browser_args.append('incognito') and neither of those work

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.