Giter Club home page Giter Club logo

pyppeteer's Introduction

Attention: This repo is unmaintained and has been outside of minor changes for a long time. Please consider playwright-python as an alternative.

If you are interested in maintaining this, please contact me

pyppeteer

PyPI PyPI version Documentation CircleCI codecov

Note: this is a continuation of the pyppeteer project

Unofficial Python port of puppeteer JavaScript (headless) chrome/chromium browser automation library.

Installation

pyppeteer requires Python >= 3.8

Install with pip from PyPI:

pip install pyppeteer

Or install the latest version from this github repo:

pip install -U git+https://github.com/pyppeteer/pyppeteer@dev

Usage

Note: When you run pyppeteer for the first time, it downloads the latest version of Chromium (~150MB) if it is not found on your system. If you don't prefer this behavior, ensure that a suitable Chrome binary is installed. One way to do this is to run pyppeteer-install command before prior to using this library.

Full documentation can be found here. Puppeteer's documentation and its troubleshooting guide are also great resources for pyppeteer users.

Examples

Open web page and take a screenshot:

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('https://example.com')
    await page.screenshot({'path': 'example.png'})
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Evaluate javascript on a page:

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('https://example.com')
    await page.screenshot({'path': 'example.png'})

    dimensions = await page.evaluate('''() => {
        return {
            width: document.documentElement.clientWidth,
            height: document.documentElement.clientHeight,
            deviceScaleFactor: window.devicePixelRatio,
        }
    }''')

    print(dimensions)
    # >>> {'width': 800, 'height': 600, 'deviceScaleFactor': 1}
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Differences between puppeteer and pyppeteer

pyppeteer strives to replicate the puppeteer API as close as possible, however, fundamental differences between Javascript and Python make this difficult to do precisely. More information on specifics can be found in the documentation.

Keyword arguments for options

puppeteer uses an object for passing options to functions/methods. pyppeteer methods/functions accept both dictionary (python equivalent to JavaScript's objects) and keyword arguments for options.

Dictionary style options (similar to puppeteer):

browser = await launch({'headless': True})

Keyword argument style options (more pythonic, isn't it?):

browser = await launch(headless=True)

Element selector method names

In python, $ is not a valid identifier. The equivalent methods to Puppeteer's $, $$, and $x methods are listed below, along with some shorthand methods for your convenience:

puppeteer pyppeteer pyppeteer shorthand
Page.$() Page.querySelector() Page.J()
Page.$$() Page.querySelectorAll() Page.JJ()
Page.$x() Page.xpath() Page.Jx()

Arguments of Page.evaluate() and Page.querySelectorEval()

puppeteer's version of evaluate() takes a JavaScript function or a string representation of a JavaScript expression. pyppeteer takes string representation of JavaScript expression or function. pyppeteer will try to automatically detect if the string is function or expression, but it will fail sometimes. If an expression is erroneously treated as function and an error is raised, try setting force_expr to True, to force pyppeteer to treat the string as expression.

Examples:

Get a page's textContent:

content = await page.evaluate('document.body.textContent', force_expr=True)

Get an element's textContent:

element = await page.querySelector('h1')
title = await page.evaluate('(element) => element.textContent', element)

Roadmap

See projects

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

pyppeteer's People

Contributors

berekuk avatar bollwyvl avatar cdbridger avatar clarksun avatar d33tah avatar danedens avatar esemi avatar granitosaurus avatar hartym avatar hubertroy avatar jflick58 avatar kwrazi avatar marksteward avatar mattwmaster58 avatar mborsetti avatar miyakogi avatar nordgaren avatar pcxdme avatar pdesgarets avatar pythad avatar raymondguo-db avatar rs2 avatar rymdhund avatar scp10011 avatar starenka avatar therefromhere avatar wackazong avatar wildgarden avatar yykani avatar zachmullen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyppeteer's Issues

Overhaul docs

Eventually, we are going to need to update the docs. This issue discusses some problems that I think eventually should be addressed. Feel leave any comments you have about anything.

What docs generator to use?

The abandoned project used sphinx. There are remnants of that left behind.

I must admit I'm not experienced with doc generating solutions. I've played around with Sphinx, and the whole thing seems a bit obtuse. makefiles, batch files, intermediate doc tree files, it all seems messy and over complicated for this (relatively simple) project documentation case. The same goes for Doxygen. Another

For that reason MkDocs + mkdocstrings seems like a perfect fit. It's simple, it's fast, and it has the potential to look very good.

It's work noting that mkdocstrings is still fairly young and seems to only support the Google doc string format at the moment, but it still is my preference for now.

What docstring format?

The abandoned project used rST. Currently, most, if not all, docstrings use this format. Converting to any other format shouldn't really be a big issue, as I find the rST to be the 'ugliest' of the 3 main formats. At the moment I'm partial to Google

Where to host?

Read the Docs or Github pages seem the only sensible options, and they both seem great. I'm slightly partial to Read the Docs though, but either way this seems inconsequential.

Methodology for Updating docs

In my opinion, updating docs is always going to be a tedious process if we include them embed them within the relevant doc strings — which, at the moment, I strongly feel we should. To mitigate this annoyance I propose we bite the bullet and write a utility to insert the relevant doc strings into the project from this URL: https://github.com/puppeteer/puppeteer/blob/$VERSION/docs/api.md. From a high level, the script could probably begin by parsing the API md document with mistune. In the meantime though, copy/pasting doc info is not an unworkable stop gap solution.

How to catch JavaScript / Network issues

Hi, thank you for pyppeteer2!

I'm trying to scrap webpages with JS support, and during the scraping, I also want to track all JavaScript errors, like on this page: https://javascript.info/try-catch (when I open this page in Chrome, I can see two JS errors in my console in DevTools, like "TypeError: Cannot read property 'querySelectorAll' of null")

I'm trying to catch the same errors with pyppeteer2, but no luck so far:

import asyncio
from pyppeteer import launch


async def main():
    browser = await launch()
    page = await browser.newPage()

    def logme(msg):
        print(msg)
        print(msg.text, msg.type)

    page.on('console', logme)
    page.on('javascript', logme)
    page.on('pageerror', logme)
    page.on('error', logme)
    await page.goto('https://javascript.info/try-catch')

    await page.content()

    await browser.close()


asyncio.get_event_loop().run_until_complete(main())

Is it possible to do? How can I get all JS errors after rendering the page?

Any way to save/load cache?

I run pyppeteer on AWS Lambda. One of the problems with running on Lambda is that the functions are short-lived, and whenever the function restarts, the cache (and cookies) start clean.

It would be nice if I could save the cache to S3, then reload the cache the next time the browser launches from a new Lambda function. Is it possible to save and load the cache? Thanks!

browser.close() raises error message when `userDataDir` option is set

I'm using latest version of pyppeteer (0.2.2 from dev branch) on Windows 10.

Minimal code to reproduce the issue:

import asyncio
from pyppeteer import launch

async def test():
    browser = await launch({'userDataDir': 'test'})
    await browser.close()

asyncio.run(test())

And here is the error message:

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "C:\[ projects ]\kroger-cli\venv\lib\site-packages\pyppeteer\launcher.py", line 151, in _close_process
    self._loop.run_until_complete(self.killChrome())
  File "D:\Python\Python38\lib\asyncio\base_events.py", line 591, in run_until_complete
    self._check_closed()
  File "D:\Python\Python38\lib\asyncio\base_events.py", line 508, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
sys:1: RuntimeWarning: coroutine 'Launcher.killChrome' was never awaited
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

Removing the userDataDir option from the launch eliminates the issue (i.e. everything is working as expected).

Is it something you can help with? Happy to provide any logs/details that could help. Thanks!

Getting valid page handles after redirects

Hi everyone. I'm wondering if there is any documentation regarding getting valid page handles after redirects. I'm trying to get to grips with Pyppeteer but struggling with a couple of things right now. I've Googled around, but if there is any documentation I may have missed I'd really appreciate a nod in the right direction.

Many thanks in advance for any help.

Pyppeteer not detecting chrome installed when using docker

I have a docker container which uses selenium and i'm trying to switch it to pyppeteer but it doesn't seem to detect that i have chrome already installed
So when i run the tests it goes and downloads chrome

Example taken from here https://github.com/buildkite/docker-puppeteer

RUN  apt-get update \
     && apt-get install -y wget --no-install-recommends \
     && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
     && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
     && apt-get update \
     && apt-get install -y google-chrome-unstable --no-install-recommends \
     && rm -rf /var/lib/apt/lists/* \
     && wget --quiet https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh -O /usr/sbin/wait-for-it.sh \
     && chmod +x /usr/sbin/wait-for-it.sh

This is what i do right now for selenium

RUN apt-get update && apt-get install -y \
        curl \
        unzip \
        libglib2.0-dev \
        libnss3=2:3.26.2-1.1+deb9u1 \
        libgconf-2-4=3.2.6-4+b1 \
        libfontconfig1=2.11.0-6.7+b1 \
        xvfb && \
    curl https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb -o /chrome.deb && \
    dpkg -i /chrome.deb || apt-get install -yf && \
    rm /chrome.deb && \
    curl https://chromedriver.storage.googleapis.com/2.31/chromedriver_linux64.zip -o /usr/local/chromedriver && \
    unzip /usr/local/chromedriver -d /usr/local/bin && \
    chmod u+x /usr/local/bin/chromedriver
ENV CHROMEDRIVER=/usr/local/bin/chromedriver

Protocol error Browser.close: Target closed

My development environment is win10 with python3.7.6 and pyppeteer2 of pup2.1.1 branch
There was an excpetion raised when I tried to run the below code.
(If I pass headless=False to launch, the exception doesn't be raised)
Finally I read this miyakogi/pyppeteer#171 (comment) and some relevant issues, but I still have no idea how to fix the issue.

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch(userDataDir=r'c:\temp')
    page = await browser.newPage()
    await page.goto('https://example.com')
    await browser.close()


asyncio.get_event_loop().run_until_complete(main())
Transport connection closed: code = 1006 (connection closed abnormally [internal]), no reason
Traceback (most recent call last):
  File "D:/workspace/sthq/htmlspider/main.py", line 13, in <module>
    asyncio.get_event_loop().run_until_complete(main())
  File "D:\Program Files\Python37\lib\asyncio\base_events.py", line 583, in run_until_complete
    return future.result()
  File "D:/workspace/sthq/htmlspider/main.py", line 10, in main
    await browser.close()
  File "D:\venv\htmlspider\lib\site-packages\pyppeteer\browser.py", line 275, in close
    await self._closeCallback()
  File "D:\venv\htmlspider\lib\site-packages\pyppeteer\launcher.py", line 128, in close
    return await self._close_proc()
  File "D:\venv\htmlspider\lib\site-packages\pyppeteer\launcher.py", line 103, in _close_proc
    await self.connection.send('Browser.close')
pyppeteer.errors.NetworkError: Protocol error Browser.close: Target closed.

Websocket connection is lost on some websites (ConnectionClosed)

Error trace:

[E:pyppeteer.connection] connection unexpectedly closed
Task exception was never retrieved
future: <Task finished coro=<Connection._async_send() done, defined at /home/prox/refundr/venv/lib/python3.7/site-packages/pyppeteer/connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
  File "/home/prox/refundr/venv/lib/python3.7/site-packages/websockets/protocol.py", line 827, in transfer_data
    message = await self.read_message()
  File "/home/prox/refundr/venv/lib/python3.7/site-packages/websockets/protocol.py", line 895, in read_message
    frame = await self.read_data_frame(max_size=self.max_size)
  File "/home/prox/refundr/venv/lib/python3.7/site-packages/websockets/protocol.py", line 971, in read_data_frame
    frame = await self.read_frame(max_size)
  File "/home/prox/refundr/venv/lib/python3.7/site-packages/websockets/protocol.py", line 1051, in read_frame
    extensions=self.extensions,
  File "/home/prox/refundr/venv/lib/python3.7/site-packages/websockets/framing.py", line 105, in read
    data = await reader(2)
  File "/home/prox/storage/anconda3/lib/python3.7/asyncio/streams.py", line 677, in readexactly
    raise IncompleteReadError(incomplete, n)
asyncio.streams.IncompleteReadError: 0 bytes read on a total of 2 expected bytes

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/prox/refundr/venv/lib/python3.7/site-packages/pyppeteer/connection.py", line 73, in _async_send
    await self.connection.send(msg)
  File "/home/prox/refundr/venv/lib/python3.7/site-packages/websockets/protocol.py", line 555, in send
    await self.ensure_open()
  File "/home/prox/refundr/venv/lib/python3.7/site-packages/websockets/protocol.py", line 803, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: code = 1006 (connection closed abnormally [internal]), no reason

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/prox/refundr/venv/lib/python3.7/site-packages/pyppeteer/connection.py", line 79, in _async_send
    await self.dispose()
  File "/home/prox/refundr/venv/lib/python3.7/site-packages/pyppeteer/connection.py", line 170, in dispose
    await self._on_close()
  File "/home/prox/refundr/venv/lib/python3.7/site-packages/pyppeteer/connection.py", line 153, in _on_close
    f'Protocol error {cb.method}: Target closed.',  # type: ignore
asyncio.base_futures.InvalidStateError: invalid state

What my code does is simply:

class ShortExample:
    def scrape(self):
        self.url = "https://www.kids-world.com/sv/tommy-hilfiger-tshirt-marinblaa-p-104610.html?gclid=EAIaIQobChMIjebdl9zd6AIVBKsYCh1HAg2gEAkYAyABEgIRpvD_BwE"
        await self.page.goto(
            self.url,
            {"waitUntil": 'domcontentloaded'}
        )

        await self.page.evaluate(self.open_js("js/getElementClasses.js"), f"//img[@*='{url}']", url)

This is a long standing bug, that have been discussed in the issue tracker of old pyppeteer:
miyakogi/pyppeteer#62

I've tried all "workarounds" mentioned but none of the possible workarounds does fix the error.

And by the way, I'm on pyppeteer2@dev

Screenshot Quality argument not working?

The quality argument in coroutine screenshot is not working well

The code i tested with

 browser = await launch(
            headless=True,
            executablePath=EXEC_PATH
            )
page = await browser.newPage()
await page.goto(link)
await page.screenshot({'fullPage': True, 'path': './FILES/584512526/3726/Python Dictionarieswebshotbot.jpeg', 'type': 'jpeg', 'quality': 1})

Unfortunately the output quality is same if i change the quality argv to 100
Below is an example of a screenshot taken with quality 1

alenpaul2001 (AlenPaulVarghese) · GitHub-webshotbot (6)

Is there something wrong with my code

UnicodeDecodeError on Response body

Unable to obtain Response body for requests of non-text objects, such as images, as Response.json() and Response.text() throw UnicodeDecodeErrors. The following snippet produces output including gif and png:

browser = await pyppeteer.launch()
try:
    page = await browser.newPage()

    @page.on('requestfinished')
    async def handler(r):
        if r.response.status == 200:
            try:
                data = await r.response.text()
            except UnicodeDecodeError:
                print(r.url.split('.')[-1])
    
    await page.goto('https://www.google.com.au')
    await page.waitFor(3000)
finally:
    await browser.close()

It seems like something is attempting to interpret binary data as utf8 (and then complaining that it isn't valid utf8). The tracebacks include:

File "[..]site-packages/pyppeteer/network_manager.py", line 673, in text
    return content.decode('utf-8')

Tested on pyppeteer 0.0.25, MacOS.

Pyppeteer Browser closed unexpectedly in heroku

I recently deployed an app in heroku . It uses python pyppeteer package. I didnt had any issues while testing on repl.it. But unfortunately in heroku the browser keeps crashing.

I used requirement.txt for installing pyppeteer package. I also tried using apt heroku buildpack for installing the requirements needed for pupeteer to work as per here

My program :

async def mainer(link, path, is_image):
    browser = await launch(args=['--no-sandbox'])
    page = await browser.newPage()
    await page.goto(link)
    if is_image:
        await page.screenshot({'path': f'{path}', 'fullPage': True, 'type': 'png'})
    else:
        await page.pdf({'path': f'{path}'})
    await browser.close()

here is the full traceback error from heroku :

2020-05-14T19:39:50.115643+00:00 app[worker.1]:     await handler.callback(self.client, *args)
2020-05-14T19:39:50.115644+00:00 app[worker.1]:   File "/app/plugins/downloader.py", line 61, in cb_
2020-05-14T19:39:50.115645+00:00 app[worker.1]:     await mainer(url,file,mode)
2020-05-14T19:39:50.115645+00:00 app[worker.1]:   File "/app/plugins/downloader.py", line 13, in mainer
2020-05-14T19:39:50.115646+00:00 app[worker.1]:     browser = await launch(args=['--no-sandbox'])
2020-05-14T19:39:50.115646+00:00 app[worker.1]:   File "/app/.heroku/python/lib/python3.7/site-packages/pyppeteer/launcher.py", line 305, in launch
2020-05-14T19:39:50.115647+00:00 app[worker.1]:     return await Launcher(options, **kwargs).launch()
2020-05-14T19:39:50.115648+00:00 app[worker.1]:   File "/app/.heroku/python/lib/python3.7/site-packages/pyppeteer/launcher.py", line 166, in launch
2020-05-14T19:39:50.115648+00:00 app[worker.1]:     self.browserWSEndpoint = get_ws_endpoint(self.url)
2020-05-14T19:39:50.115648+00:00 app[worker.1]:   File "/app/.heroku/python/lib/python3.7/site-packages/pyppeteer/launcher.py", line 225, in get_ws_endpoint
2020-05-14T19:39:50.115649+00:00 app[worker.1]:     raise BrowserError('Browser closed unexpectedly:\n')
2020-05-14T19:39:50.115649+00:00 app[worker.1]: pyppeteer.errors.BrowserError: Browser closed unexpectedly:
2020-05-14T19:39:50.115650+00:00 app[worker.1]: 

Tuple index out of range in Pyppeteer Exception _rewriteError

Hey team,

I am getting a bit of an odd error that may be a bug, to be truthful I am unsure.

I am using Python 3.8.2.

For reference, I am automatically scanning a large number of URLs that may or may not exist/be online - basically think of them as the worst possible types of URLs you could throw at it - timeouts, URL doesn't actually exist, you get the idea. Each URL is checked to see if it loads with http:// or https:// protocol (as some cases only load with the specific protocol but not the other). As it goes, I am trying to catch any redirections and add them to a list. By the time this code runs, the browser has been launched and a page has been created. Keep in mind this isn't the exact code (for example Exception catching is much more thorough), but roughly the logic is as follows:

    async def cCheck(self,page,pCheck):    
        try:
            with async_timeout.timeout(120):
                httpsURL = url.replace('http://','https://', 1)
                httpURL = url.replace('https://','http://', 1)
                if pCheck is True: #All this does is flip between HTTP vs HTTPS - the flag is determined in the prior function
                    if url.startswith('http://'):
                        url = httpsURL
                    elif url.startswith('https://'):
                        url = httpURL
                else:
                    url = url
                redirectionHistory = [url]
                htmlSource = 'Source Code Not Found'
                pageTitle = 'Page Title Not Found'
                browserResponse = None             
                page.setDefaultNavigationTimeout(60000)
                page.on(pyppeteer.frame_manager.FrameManager.Events.FrameNavigated,
                    lambda event: redirectionHistory.append(page.url))
                browserResponse = await page.goto(url,waitUntil=['load','domcontentloaded','networkidle0'])
                htmlSource = await page.content()
                pageTitle = await page.title()
        except Exception as e:
            log.error('Could not parse results of '+url+' due to the following error: '+str(e)+ ' traceback: ' + traceback.format_exc())
            return
        if browserResponse is not None:
            #Do Stuff
        else:
            #Do Stuff

I am getting a transient error (as in, most URLs I am checking this does not happen) with the below traceback.

When this occurs, it occurs as an uncaught exception and everything grinds to a halt.

2020-05-27 08:00:00,000 - model.check - ERROR - check : 424 - Could not parse results of http://notarealdomain.com due to the following error: tuple index out of range traceback: 

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/pyppeteer/execution_context.py", line 105, in evaluateHandle
    'userGesture': True,
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/afcc/watcher/model/check.py", line 405, in cCheck
    htmlSource = await page.content()
  File "/usr/local/lib/python3.6/dist-packages/pyppeteer/page.py", line 805, in content
    return await frame.content()
  File "/usr/local/lib/python3.6/dist-packages/pyppeteer/frame_manager.py", line 393, in content
    '''.strip())
  File "/usr/local/lib/python3.6/dist-packages/pyppeteer/frame_manager.py", line 309, in evaluate
    pageFunction, *args, force_expr=force_expr)
  File "/usr/local/lib/python3.6/dist-packages/pyppeteer/execution_context.py", line 54, in evaluate
    pageFunction, *args, force_expr=force_expr)
  File "/usr/local/lib/python3.6/dist-packages/pyppeteer/execution_context.py", line 108, in evaluateHandle
    _rewriteError(e)
  File "/usr/local/lib/python3.6/dist-packages/pyppeteer/execution_context.py", line 235, in _rewriteError
    if error.args[0].endswith('Cannot find context with specified id'):
IndexError: tuple index out of range

What seems to be happening is it is trying to fire the _rewriteError in execution_context.py, where it hits the line:

if error.args[0].endswith('Cannot find context with specified id'):

in

def _rewriteError(error: Exception) -> None:
    if error.args[0].endswith('Cannot find context with specified id'):
        msg = 'Execution context was destroyed, most likely because of a navigation.'  # noqa: E501
        raise type(error)(msg)
    raise error

And it then generates an uncaught IndexError, at which point everything breaks down.

If I understand the above error correctly, it is caused by trying to process a page that has since navigated to a different URL/rendered JS. However, I have goto's waitUntil set so it should be waiting for it to finish, yes?

What seems to be happening is it is hitting:

                htmlSource = await page.content()

And then tries to fire the ''Execution context was destroyed, most likely because of a navigation.', but then fails because the line:

    if error.args[0].endswith('Cannot find context with specified id'):

Causes an IndexError.

I've been trying to diagnose this on my own for days, so I'm finally turning to here to see if anyone has any suggestions, and to run it by you as a possible bug.

Please tell me if you need any further information/code.

Thanks in advance to anyone who is able to help out.

Browser gets disconnected on net::ERR_CONNECTION_RESET

  • Browser gets disconnected on net::ERR_CONNECTION_RESET .
  • I'm in a situation where I need to capture the screenshots of multiple domains. So I created one browser and looping through the URLs one by one
  • If any of the URL throws connection reset then the browser would get disconnected.
import asyncio
import pyppeteer
from pyppeteer import launch
from pyppeteer.errors import PyppeteerError, TimeoutError,NetworkError
import logging


async def screen(browser,url):
	print(f"[=] TAKING : {url}")
	try:
		page = await browser.newPage()
		await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3419.0 Safari/537.36')
		await page.goto(url,timeout=10000)
		name = url.replace("://",'__').replace('/','_')
		await asyncio.sleep(0.5) #await page.waitFor(2)
		await page.screenshot(path=f"{name}.jpg",quality=10,fullPage=True)
		print(f"[+] TOOK : {url}")
		await page.close()
	except NetworkError as e:
		print(f"[-] PYPETER NetworkError: {url} : {e}")
	except PyppeteerError as e:
		print(f"[-] PYPETER EXCEPTION: {url} : {e}")
		return False
	except ConnectionError as e:
		print(f"[-] CONNECTION ERROR : {url} : {e}")
		return False
	except Exception as e:
		print(f"[-] OTHER EXCEPTION: {url} : {e}")
		return False

async def main():
	urls = ['https://slc-b-origin-aktest-1.paypal.com','https://paypal.com']
	browser = await launch(ignoreHTTPSErrors=True,logLevel=logging.DEBUG,args=['--no-sandbox','--disable-gpu','--single-process'])
	for url in urls:
		await asyncio.create_task(screen(browser,url))


asyncio.run(main())

Any better way to handle this situation?

  • OUTPUT:
[=] TAKING : https://slc-b-origin-aktest-1.paypal.com
[-] PYPETER EXCEPTION: https://slc-b-origin-aktest-1.paypal.com : net::ERR_CONNECTION_RESET at https://slc-b-origin-aktest-1.paypal.com
[=] TAKING : https://paypal.com
[-] PYPETER EXCEPTION: https://paypal.com : Navigation failed because browser has disconnected

Screenshot is not working in headless

I am try screenshot after page loaded, but after full loaded I can't screenshot. I have to update chrome version to solve the problem. Does update chrome or be able to use chromedrive is in roadmap?

OS: Windows 10 1809
IDE: PyCharm 2019.3
PyVersion: 3.8.2

code:

async def test():
    browser = await launch

    page = await browser.newPage()

    await page.setViewport(viewport={'width': 1280, 'height': 800})
    await page.setJavaScriptEnabled(enabled=True)
    await page.setUserAgent(
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
        'AppleWebKit/537.36 (KHTML, like Gecko) '
        'Chrome/58.0.3029.110 '
        'Safari/537.36 '
        'Edge/16.16299')
    print('new page')
    await page.goto(full_url, {"waitUtil": 'networkidle0', "timeout": 0})
    await asyncio.sleep(5)
    print('await start')
    await Promise.race([
        asyncio.create_task(page.waitForNavigation({"waitUtil": 'networkidle0', "timeout": 0})),
        asyncio.create_task(asyncio.sleep(10))
    ])
    print('await done')
    print(f'screenshot start')
    buffer = await page.screenshot()
    print(f'screenshot end')
    await page.close()
    print(buffer)

output:

new page
await start
await done
screenshot start

Page goto returns None Even in Headmode

Hi,
For some URLs, the goto() does not fail, but still returns None. But when we process it later or after a no of retries, it returns correct response.

For example url:
https://www.vitals.com/dentists/Dr_John_C_Paris.html

I have used pyppeteer2 as well as pyppeteer with Python 3.6
Here is a minimal example

res = await page.goto(url, options={
                  "waitUntil": "domcontentloaded",
                  "timeout": 0
        })

if res is not None:
    # some processing done
else:
    # error raised

Multiload Browser Sessions.

puppeteer/puppeteer#85

add to Line 110
https://github.com/miyakogi/pyppeteer/blob/dev/pyppeteer/browser.py#L110

async def newIncognitoPage(self) -> Page:
    """Make new incognito page on this browser and return its object."""
    browserContextId = (await self._connection.send(
        'Target.createBrowserContext', {})).get('browserContextId')
    targetId = (await self._connection.send(
        'Target.createTarget',
        {'url': 'about:blank', 'browserContextId': browserContextId})).get('targetId')
    target = self._targets.get(targetId)
    if target is None:
        raise BrowserError('Failed to create target for page.')
    if not await target._initializedPromise:
        raise BrowserError('Failed to create target for page.')
    page = await target.page()
    if page is None:
        raise BrowserError('Failed to create page.')
    return page, browserContextId

Migrate to AsyncIOEventEmitter or BaseEventEmitter

EventEmitter aka CompatEventEmitter, will eventually be deprecated and removed from the pyee module. See this: https://github.com/jfhbrook/pyee/blob/master/pyee/_compat.py#L35.

pyee.EventEmitter is deprecated and will be removed in a future major version; you should instead use either pyee.AsyncIOEventEmitter, pyee.TwistedEventEmitter, pyee.ExecutorEventEmitter, pyee.TrioEventEmitter, or pyee.BaseEventEmitter.

Right now I'm thinking AsyncIOEventEmitter would be the best option, it provides a superset of the functionality found in BaseEventEmitter, and there's bound to be callbacks that would ideally be asynchronous.

future: <Future finished exception=NetworkError('Protocol error (Target.detachFromTarget): No session with given id')>

  • sometimes application throws the following exception, How to handle and avoid such exceptions. and what are the causes that are making an exception?
Future exception was never retrieved
future: <Future finished exception=NetworkError('Protocol error (Target.detachFromTarget): No session with given id')>
pyppeteer.errors.NetworkError: Protocol error (Target.detachFromTarget): No session with given id
Future exception was never retrieved
future: <Future finished exception=NetworkError('Protocol error (Target.detachFromTarget): No session with given id')>
pyppeteer.errors.NetworkError: Protocol error (Target.detachFromTarget): No session with given id
[+] TOOK : https://republicwireless.com/account/phones/cdrs/
[+] TAKING : https://republicwireless.com/account/tickets/detail/
Future exception was never retrieved
future: <Future finished exception=NetworkError('Protocol error (Target.detachFromTarget): No session with given id')>
pyppeteer.errors.NetworkError: Protocol error (Target.detachFromTarget): No session with given id
[+] TOOK : https://republicwireless.com/account/tickets/detail/
[+] TAKING : http://www.republicwireless.com/invite/U77S2DV5?cid=dis:raf:in_app:rw_app:&utm_medium=dis&utm_campaign=raf:in_app&utm_source=rw_
.......

import asyncio
from pyppeteer import launch
from pyppeteer.errors import PyppeteerError, TimeoutError,NetworkError

N_THREADS = 5
q = asyncio.Queue()

urls = []
with open('urls.txt') as f:
	for i in f.readlines():
		i = i.strip()
		urls.append(i)
f.close()

def _patch_pyppeteer():
    from typing import Any
    from pyppeteer import connection, launcher
    import websockets.client

    class PatchedConnection(connection.Connection):  # type: ignore
        def __init__(self, *args: Any, **kwargs: Any) -> None:
            super().__init__(*args, **kwargs)
            # the _ws argument is not yet connected, can simply be replaced with another
            # with better defaults.
            self._ws = websockets.client.connect(
                self._url,
                loop=self._loop,
                # the following parameters are all passed to WebSocketCommonProtocol
                # which markes all three as Optional, but connect() doesn't, hence the liberal
                # use of type: ignore on these lines.
                # fixed upstream but not yet released, see aaugustin/websockets#93ad88
                max_size=None,  # type: ignore
                ping_interval=None,  # type: ignore
                ping_timeout=None,  # type: ignore
            )

    connection.Connection = PatchedConnection
    # also imported as a  global in pyppeteer.launcher
    launcher.Connection = PatchedConnection

_patch_pyppeteer()

async def screen(browser,url):
	print(f"[+] TAKING : {url}")
	try:
		page = await browser.newPage()
		await page.goto(url)
		name = url.replace("://",'__').replace('/','_')
		await asyncio.sleep(0.5) #await page.waitFor(2)
		await page.screenshot(path=f"{name}.jpg",type='jpeg',quality=10,fullPage=True)
		print(f"[+] TOOK : {url}")
		await page.close()
	except NetworkError as e:
		print(f"[+] PYPETER NetworkError: {url} : {e}")
	except PyppeteerError as e:
		print(f"[+] PYPETER EXCEPTION: {url} : {e}")
		return False
	except ConnectionError:
		print(f"[+] CONNECTION ERROR : {url}")
		return False
	except Exception as e:
		print(f"[+] OTHER EXCEPTION: {url} : {e}")
		return False

async def ProcessQueue():
	while not q.empty():
		try:
			browser,get_url = await q.get()
			await screen(browser,get_url)
		except Exception as e:
			print("ERROR",e)
	return True

async def main():
	browser = await launch({'headless':True,'ignoreHTTPSErrors': True,'acceptInsecureCerts':True,'args': ['--no-sandbox','--disable-setuid-sandbox']})
	
	for each_url in urls:
		await q.put((browser,each_url))

	tasks = []
	for each_thread in range(N_THREADS):
		task = asyncio.create_task(ProcessQueue())
		tasks.append(task)

	try:
		await asyncio.gather(*tasks)
	except Exception as e:
		print("ERROR2",e)

	await browser.close()

asyncio.get_event_loop().run_until_complete(main())
Ubuntu  18

> python3 -V
Python 3.8.2

> pip install -U git+https://github.com/pyppeteer/pyppeteer2@dev

Using pyppeteer behind a proxy

Hi,

When we launch pyppeteer from behind a proxy for the first time, the downloader can't fetch the chromium zip.

MaxRetryError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /chromium-browser-snapshots/Linux_x64/588429/chrome-linux.zip (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7ffa71bf6e90>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

By using an urllib3.ProxyManager instead of the urllib3.PoolManager and setting the proxy manually in

with urllib3.PoolManager() as http:
the issue can be solved

pyppeteer hangs when using it with nest_asyncio

nest_asyncio is currently the only way to run async code that has is own main loop on jupyter or ipython.

pyppeteer (the original project) was until a few versions, compatible with nest_asyncio, but the last version is broken the same way as pyppeteer2.

If you run this:

import asyncio
from pyppeteer import launch
import nest_asyncio

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('https://www.google.cl')
    await page.screenshot({'path': 'example.png'})
    await browser.close()


nest_asyncio.apply()
asyncio.get_event_loop().run_until_complete(main())

The code will hang forever at at browser = await launch(), with no error messages at all.

Note that this only happens with launch() from pyppeteer, every other async function will run just fine with nest_asyncio (is not a nest_asyncio bug).

Tested on python 3.7.6

Re-evaluate type hints and class member access

Many type hints are very vague compared to what they could be. I think if we revisited all the type hints we could improve this project a bit. Another thing we could do it proper class member access - currently, a lot of 'protected' class attributes are accessed outside of the class instance.

Fixing #26 would be implied.

Implement Enums for chrome browser protocol

Out of the scope of this PR, but I wonder if it'd be useful to write a script to generate enums based on this, and modify CDPSession to send the str representation of those enums. The benefit would be autocompletion from your IDE when developing. OTOH, it would (overall) add a lot of imports, and most commands are copy/pasted from the puppeteer, so it's unlikely errors would enter from not have devtools protocal commands as enums.

Protocol source: https://github.com/ChromeDevTools/devtools-protocol/blob/master/json/browser_protocol.json

i.e.

self._client.send(
            'Emulation.setDeviceMetricsOverride', {'arg':'value'})

# would become something like
from pyppeteer.protocol import Emulation
self._client.send(
            Emulation.setDeviceMetricsOverride, {'arg':'value'})

Ability to use without asyncio

I would love to see the ability to use this tool without needing to worry about async/await logic that IMO makes code less readable. I understand there are reasons for others to want to make use of the concurrent connections, but for many tasks it's simply unneeded.

question: `browser.on`

is there a way to implement in pyppeteer the following puppeteer functionality?

browser.on('targetchanged', target => console.log(target.url()));

it's quite useful to track the URLs visited by a headless execution.

I searched among the browser stuff in the docs but I could not find the appropriate API.
sorry for the straightforward question and thanks for the help!

Updating issue

When I update Pyppeteer to version 0.2.2 the pip list shows this has happened, but when I run the following on the command line REPL

import pyppeteer
print (pyppeteer.version)

it shows as the old version 0.0.25. Has anyone encountered this issue before? I'm running it on Raspian on a Pi4 and using Python 3.7.

Many thanks in advance for any help.

Page.goto returns None instead of exception

Hi,
For some URLs, the goto method does not fail, but returns None.

For example:

Here is a minimal example (using pyppeteer2 0.2.2 and Python 3.7.4):

import asyncio
from pyppeteer import launch

urls = [
    'http://www.swisscamps.ch/de/index.php',
    'http://www.whisky-club-oberwallis.ch/brennereien']

async def main():
    browser = await launch(headless=True)
    page = await browser.newPage()
    for url in urls:
        response = await page.goto(url, waitUntil='networkidle0')
        print(url, response)
    await page.close()
    await browser.close()

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Output:

http://www.swisscamps.ch/de/index.php None
http://www.whisky-club-oberwallis.ch None

Using urllib3 instead, the same URLs throw an exception:

Received response with content-encoding: gzip, but failed to decode it. Error -3 while decompressing data: incorrect header check

Would it be possible to throw exceptions instead, or at least say in the documentation None is a possible return values ?

To me, this is especially problematic since puppeteer has this note in the documentation:

NOTE page.goto either throws an error or returns a main resource response. The only exceptions are navigation to about:blank or navigation to the same URL with a different hash, which would succeed and return null.

Coroutine page.pdf is not rendering properly

HowtoMoveth-webshotbot (2).pdf
Perhaps it maybe an issue with youtube, cause the header is sticky anyway to avoid this.
Also some thumbnails of the videos are missing, I already set the wait untill argument but sometimes while loading heavy sites some images wont get loaded. The issue is with pdf only apparently.

code I used to generate PDF:

browser = await launch(
                headless=True,
                executablePath=EXEC_PATH
                )
            page = await browser.newPage()
            await page.goto(link, {"waitUtil": 'networkidle0'})
await page.emulateMedia('screen')
await page.pdf{'width': 1280, 'height': 720, 'path': './FILES/584512526/4605/HowtoMoveth-webshotbot.pdf', 'printBackground': True}

Websocket patch

Apply the following patrch or Find either better solution to the timeout issue.

def disable_timeout_pyppeteer():
    import pyppeteer.connection
    original_method = pyppeteer.connection.websockets.client.connect
    def new_method(*args, **kwargs):
        kwargs['ping_interval'] = None
        kwargs['ping_timeout'] = None
        return original_method(*args, **kwargs)

    pyppeteer.connection.websockets.client.connect = new_method

Related issues:
Number 62
Number 267
Number 178
Number 170
Number 160
Others

Problem with launching Chrome in AWS Lambda

Not sure if this is a bug or I'm doing something wrong. My basic test works fine in local env (mac) but fails to launch Chrome in AWS Lambda env. Apologize if I'm spamming here. I have posted question in stackoverflow as well but no response from community yet, hence posting here.

Exception originated from get_ws_endpoint() due to websocket response timeout

raise BrowserError('Browser closed unexpectedly:\n')

Tried with Python3.6, 3.7 and 3.8

Test

import os
import json
import asyncio
import logging
import pyppeteer
from pyppeteer import launch

def lambda_handler(event, context):
    asyncio.get_event_loop().run_until_complete(main())

async def main():
    browser = await launch({
        'headless': True,
        'args': [  '--no-sandbox' ]
    })
    page = await browser.newPage()
    await page.goto('http://example.com')
    await page.screenshot({'path': '/tmp/example.png'})
    await browser.close()
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

Exception:

Response:
{
  "errorMessage": "Browser closed unexpectedly:\n",
  "errorType": "BrowserError",
  "stackTrace": [
    "  File \"/var/task/lambda_handler.py\", line 23, in lambda_handler\n    asyncio.get_event_loop().run_until_complete(main())\n",
    "  File \"/var/lang/lib/python3.8/asyncio/base_events.py\", line 616, in run_until_complete\n    return future.result()\n",
    "  File \"/var/task/lambda_handler.py\", line 72, in main\n    browser = await launch({\n",
    "  File \"/opt/python/pyppeteer/launcher.py\", line 307, in launch\n    return await Launcher(options, **kwargs).launch()\n",
    "  File \"/opt/python/pyppeteer/launcher.py\", line 168, in launch\n    self.browserWSEndpoint = get_ws_endpoint(self.url)\n",
    "  File \"/opt/python/pyppeteer/launcher.py\", line 227, in get_ws_endpoint\n    raise BrowserError('Browser closed unexpectedly:\\n')\n"
  ]
}

Request ID:
"06be0620-8b5c-4600-a76e-bc785210244e"

Function Logs:

    START RequestId: 06be0620-8b5c-4600-a76e-bc785210244e Version: $LATEST
    ---- files in /tmp ----
    [W:pyppeteer.chromium_downloader] start chromium download.
    Download may take a few minutes.

      0%|          | 0/108773488 [00:00<?, ?it/s]
     11%|█▏        | 12267520/108773488 [00:00<00:00, 122665958.31it/s]
     27%|██▋       | 29470720/108773488 [00:00<00:00, 134220418.14it/s]
     42%|████▏     | 46172160/108773488 [00:00<00:00, 142570388.86it/s]
     58%|█████▊    | 62607360/108773488 [00:00<00:00, 148471487.93it/s]
     73%|███████▎  | 79626240/108773488 [00:00<00:00, 154371569.93it/s]
     88%|████████▊ | 95754240/108773488 [00:00<00:00, 156353972.12it/s]
    100%|██████████| 108773488/108773488 [00:00<00:00, 161750092.47it/s]
    [W:pyppeteer.chromium_downloader] 
    chromium download done.
    [W:pyppeteer.chromium_downloader] chromium extracted to: /tmp/local-chromium/588429
    -----
    /tmp/local-chromium/588429/chrome-linux/chrome
    [ERROR] BrowserError: Browser closed unexpectedly:

    Traceback (most recent call last):
      File "/var/task/lambda_handler.py", line 23, in lambda_handler
        asyncio.get_event_loop().run_until_complete(main())
      File "/var/lang/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
        return future.result()
      File "/var/task/lambda_handler.py", line 72, in main
        browser = await launch({
      File "/opt/python/pyppeteer/launcher.py", line 307, in launch
        return await Launcher(options, **kwargs).launch()
      File "/opt/python/pyppeteer/launcher.py", line 168, in launch
        self.browserWSEndpoint = get_ws_endpoint(self.url)
      File "/opt/python/pyppeteer/launcher.py", line 227, in get_ws_endpoint
        raise BrowserError('Browser closed unexpectedly:\n')END RequestId: 06be0620-8b5c-4600-a76e-bc785210244e
    REPORT RequestId: 06be0620-8b5c-4600-a76e-bc785210244e  Duration: 33370.61 ms   Billed Duration: 33400 ms   Memory Size: 3008 MB    Max Memory Used: 481 MB Init Duration: 445.58 ms

Choose a CI for builds

It looks like we have both TravisCI and Appveyor both have configs in this project. IMO, there's little benefit to having to CIs, and it just adds bloat and maintenance overhead (feel free to correct me if I'm wrong here).

Add Python 3.8 to CI

Python 3.8 has been out for about 4 months now, I think this could be added to the build pipeline pretty easily.

--enable-automation

After annotation -- enable automation, the browser can't detect the principle of automation tool

Currently unmaintained - call for contributors and maintainers!

Edit from Mattwmaster58: This repo is unmaintained and has been outside of minor changes for a long time. Please consider playwright-python as an alternative.

The original repository has been unmaitained for a while and pyppeteer turned out to be quite a hit so I've established and organization and a fork - calling for contributors and additional maintainers!

Join us at matrix room: #[email protected] or https://matrix.to/#/!ScehqfCSdMAUhZoeDC:matrix.org?via=matrix.org

SSL error while downloading chromium for the first time

While downloading chromium for the first time, i got the following error:

OpenSSL.SSL.Error: [('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')]

I had to use https://github.com/kiwi0fruit/pyppdf/blob/11d082f7a35cdac2ae3e7ffa7022c1d1e9747cd2/pyppdf/patch_pyppeteer/patch_pyppeteer.py#L59 to solve my issue.

As seen in the above link, It uses HTTPS to download while pyppeteer uses HTTP. Can't HTTPS be used for pyppeteer to solve this issue?

network_manager.py is missing attribute _requestIdToResponseWillBeSent

As the title says. Setting await page.setRequestInterception(True) and crawling any page will cause an exception AttributeError: 'NetworkManager' object has no attribute '_requestIdToResponseWillBeSent'. Adding the attribute on top with self._requestIdToResponseWillBeSent = {} solves the issue.

Deprecate JS-style option arguments

In my mind, there's really no tangible benefit to the user to using this method IMO. Here's the downsides though:

  • less specific typing
  • annoying to develop, every public function has to have an extra arg and line

Unless someone else comes up with a really good reason, we'll be getting rid of these in the near future. Some incidental fulfillment of this issue may occur with #16

Navigation Timeout Exceeded after clicking logon button.

I am going to log in E*trade website.

await asyncio.gather(*[
    page.evaluate('(element) => element.click()', element),
    page.waitForNavigation(),
])

await asyncio.gather(*[
    page.click("#logon_button"),
    page.waitForNavigation(),
])

Both of them, give me Navigation timeout error!

pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 60000 ms exceeded

UTF-8 characters are coming as garbage

Hi all,
I'm using pyppeteer2==0.2.2. UTF-8 characters are coming as garbage. I have tried setting

await page.setExtraHTTPHeaders({'Accept-Charset': 'utf-8', 'Content-Type': 'text/html; charset=utf-8',})
but it doesn't work. Any help is appreciated.

Shadowing builtins

We should come to a more formal decision on shadowing builtins. Currently, in the pup2.1.1 branch, it seems like all occurrences of shadowed builtins (eg type) have been suffixed with _ (usually by me 😉 ). However, I don't think this is the best/most ergonomic solution, but I'm really open for input here. Admittedly, I thought PEP 8 had a section on this, however it actually refers to shadowing keywords, not builtins:

single_trailing_underscore_: used by convention to avoid conflicts with Python keyword

Of course, shadowing the builtins comes downsides. Here's 3 I can think of:

  • it will be a little harder to use the builtin within that function
  • if you miss it in the function type signature, weird errors will result
  • IDEs will complain

I hadn't given it much thought, but now that I have, I think that not suffixing _ is the right thing to do and will reduce the js/python translation overhead. Probably the most prominent example of 'prior art' can be found in python's very own argparse, which doesn't suffix _.

Actually, upon looking through the codebase it looks like only type and id and suffixed, so this isn't as widespread an occurrence as I thought it would be, but all my points still stand. Either way I think this should be codified in CONTRIBUTING.md

@miracle2k, @Granitosaurus

Clarify which methods allow keyword arguments or require a dict

In the README it says:

puppeteer uses an object for passing options to functions/methods. pyppeteer methods/functions accept both dictionary (python equivalent to JavaScript's objects) and keyword arguments for options.

Dictionary style options (similar to puppeteer):

browser = await launch({'headless': True})

Keyword argument style options (more pythonic, isn't it?):

browser = await launch(headless=True)

This is great and I 100% prefer the keyword argument-style options. But not all methods seem to support them, for example page.setViewport:

await page.setViewport(width=800, height=600, deviceScaleFactor=2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-50-4d077b87a1ec> in <module>
----> 1 await page.setViewport(width=800, height=600, deviceScaleFactor=2)

TypeError: setViewport() got an unexpected keyword argument 'width'

I think it would be best if all the methods in the pyppeteer API supported keyword arguments but, barring that, documentation on which ones do would be second best.

Thanks for the great library, it's exactly what I was looking for!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.