Giter Club home page Giter Club logo

browserless's Introduction

browserless.io logo

Docker Pulls GitHub package.json version (subfolder of monorepo) Chrome CI Firefox CI Webkit CI Multi CI

Browserless allows remote clients to connect and execute headless work, all inside of docker. It supports the standard, unforked Puppeteer and Playwright libraries, as well offering REST-based APIs for common actions like data collection, PDF generation and more.

We take care of common issues such as missing system-fonts, missing external libraries, and performance improvements, along with edge-cases like downloading files and managing sessions. For details, check out the documentation site built into the project which includes Open API docs.

If you've been struggling to deploy headless browsers without running into issues or bloated resource requirements, then Browserless was built for you. Run the browsers in our cloud or your own, free for non-commercial uses.

Table of Contents

External links

  1. Full documentation site
  2. Live Debugger (using browserless.io)
  3. Docker
  4. Slack

Features

General

  • Parallelism and request-queueing are built-in + configurable.
  • Fonts and emoji's working out-of-the-box.
  • Debug Viewer for actively viewing/debugging running sessions.
  • An interactive puppeteer debugger, so you can see what the headless browser is doing and use its DevTools.
  • Works with unforked Puppeteer and Playwright.
  • Configurable session timers and health-checks to keep things running smoothly.
  • Error tolerant: if Chrome dies it won't.
  • Support for running and development on Apple's M1 machines

Cloud

Our cloud accounts include all the general features plus extras, such as:

How it works

Browserless listens for both incoming websocket requests, generally issued by most libraries, as well as pre-build REST APIs to do common functions (PDF generation, images and so on). When a websocket connects to Browserless it starts Chrome and proxies your request into it. Once the session is done then it closes and awaits for more connections. Some libraries use Chrome's HTTP endpoints, like /json to inspect debug-able targets, which Browserless also supports.

You still execute the script itself which gives you total control over what library you want to choose and when to do upgrades. This also comes with the benefit of keep your code proprietary and able to run on numerous platforms. We simply take care of all the browser-aspects and offer a management layer on top of the browser.

Docker

Tip

See more options on our full documentation site.

  1. docker run -p 3000:3000 ghcr.io/browserless/chromium
  2. Visit http://localhost:3000/docs to see the documentation site.
  3. See more at our docker package.

Hosting Providers

We offer a first-class hosted product located here. Alternatively you can host this image on just about any major platform that offers hosting for docker. Our hosted service takes care of all the machine provisioning, notifications, dashboards and monitoring plus more:

  • Easily upgrade and toggle between versions at the press of a button. No managing repositories and other code artifacts.
  • Never need to update or pull anything from docker. There's literally zero software to install to get started.
  • Scale your consumption up or down with different plans. We support up to thousands of concurrent sessions at a given time.

If you're interested in using this image for commercial aspects, then please read the below section on licensing.

Puppeteer

Puppeteer allows you to specify a remote location for chrome via the browserWSEndpoint option. Setting this for Browserless is a single line of code change.

Before

const browser = await puppeteer.launch();

After

const browser = await puppeteer.connect({
  browserWSEndpoint: 'ws://localhost:3000',
});

Playwright

We support running with playwright via their their browser's remote connection protocols interface out of the box. Just make sure that your Docker image, playwright browser type and endpoint match:

Before

import pw from "playwright";
const browser = await pw.firefox.launch();

After

docker run -p 3000:3000 ghcr.io/browserless/firefox
# or ghcr.io/browserless/multi for all the browsers
import pw from "playwright-core";

const browser = await pw.firefox.connect(
  'ws://localhost:3000/firefox/playwright',
);

After that, the rest of your code remains the same with no other changes required.

Extending (NodeJS SDK)

Browserless comes with built-in extension capabilities, and allows for extending nearly any aspect of the system (for Version 2+). For more details on how to write your own routes, build docker images, and more, see our SDK README.md or simply run "npx @browserless.io/browserless create" in a terminal and follow the onscreen prompts.

Usage with other libraries

Most libraries allow you to specify a remote instance of Chrome to interact with. They are either looking for a websocket endpoint, a host and port, or some address. Browserless supports these by default, however if you're having issues please make an issue in this project and we'll try and work with the library authors to get them integrated with browserless. Please note that in V2 we no longer support selenium or webdriver integrations.

You can find a much larger list of supported libraries on our documentation site.

Motivations

Running Chrome on lambda or on your own is a fantastic idea but in practice is quite challenging in production. You're met with pretty tough cloud limits, possibly building Chrome yourself, and then dealing with odd invocation issues should everything else go ok. A lot of issues in various repositories are due to just challenges of getting Chrome running smoothly in AWS (see here). You can see for yourself by going to nearly any library and sorting issues by most commented.

Getting Chrome running well in docker is also a challenge as there's quiet a few packages you need in order to get Chrome running. Once that's done then there's still missing fonts, getting libraries to work with it, and having limitations on service reliability. This is also ignoring CVEs, access-controls, and scaling strategies.

All of these issues prompted us to build a first-class image and workflow for interacting with Chrome in a more streamlined way. With Browserless you never have to worry about fonts, extra packages, library support, security, or anything else. It just works reliably like any other modern web service. On top of that it comes with a prescribed approach on how you interact with Chrome, which is through socket connections (similar to a database or any other external appliance). What this means is that you get the ability to drive Chrome remotely without having to do updates/releases to the thing that runs Chrome since it's divorced from your application.

Licensing

SPDX-License-Identifier: SSPL-1.0 OR Browserless Commercial License.

If you want to use Browserless to build commercial sites, applications, or in a continuous-integration system that's closed-source then you'll need to purchase a commercial license. This allows you to keep your software proprietary whilst still using browserless. You can purchase a commercial license here. A commercial license grants you:

  • Priority support on issues and features.
  • On-premise running as well as running on public cloud providers for commercial/CI purposes for proprietary systems.
  • Ability to modify the source (forking) for your own purposes.
  • A new admin user-interface.

Not only does it grant you a license to run such a critical piece of infrastructure, but you are also supporting further innovation in this space and our ability to contribute to it.

If you are creating an open source application under a license compatible with the Server Side License 1.0, you may use Browserless under those terms.

browserless's People

Contributors

adriansillo avatar alexloyola avatar almogcohen avatar amotzte avatar andymrtnzp avatar anteprimorac avatar apeckham avatar arenstar avatar ashiknesin avatar blopker avatar brianhawley avatar cristian-gabbanini avatar deadwards90 avatar denzonl avatar dependabot-preview[bot] avatar dependabot[bot] avatar devonsams avatar filipoliko avatar greenkeeper[bot] avatar jasonparekh avatar joelgriffith avatar kadaan avatar kikobeats avatar louiswrwright avatar olofsj avatar snyk-bot avatar tomasc avatar unlikelyzero avatar zach-browserless avatar zinggi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

browserless's Issues

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper App’s white list on Github. You'll find this list on your repo or organization’s settings page, under Installed GitHub Apps.

Caching performance

As a consequence of the fact that each browser keeps its own separate userDataDir by default, and since browserless maintains a pool of browsers to serve incoming requests, it seems to me that caching will be somewhat sub-optimal... for N browsers there are N caches to fill, more cache misses, and more cached data.

Is there any way to ensure that the same browser (and associated cache) is used for the same source, so that a consistent cache is used for a particular user? Perhaps based on some ID or key, or even IP address?

Ideally a common cache area would be used for all browsers, but I'm not sure this is possible without horrible conflicts. I did try passing the same userDataDir to all browsers and indeed it didn't work out well. Perhaps using a shared memory mount (/dev/shm) would work better since that is "supposed" to be shared between processes?

An in-range update of puppeteer is breaking the build 🚨

The dependency puppeteer was updated from 1.8.0 to 1.9.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

puppeteer is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Release Notes for v1.9.0

Big Changes

API Changes

Added:

Bug Fixes

  • #2374 - feat(browser): Run puppeteer in browser (POC)
  • #2377 - Certificates error using puppeteer
  • #2721 - page.goto doesn't clear internal timeout when the browser is closed
  • #2888 - Cannot read property '_bodyLoadedPromiseFulfill' of null
  • #2918 - Support waitForNavigation for frames
  • #3104 - Full page screenshot fails when defaultViewport is null
  • #3109 - Is it clear what <...Type> means in the docs?
  • #3204 - docs: mention require('puppeteer-core')
  • #3221 - As for puppeteer to emulate the movement of the mouse while pressing?
  • #3232 - Add documentation and examples for iframe API.
  • #3234 - Black render with omitBackground: true
  • #3340 - Does --filter=SomeTest do anything when running npm run unit

Raw Notes

4abf7d1 - docs(bundling): add docs about bundling for web (#3348)
8becb31 - test: add failing test for page.select (#3346)
5ebfe1a - docs(contributing): remove the --filter note (#3342)
cd54ce3 - fix(types): upgrade node types to 8.10.34 (#3341)
c9657f8 - docs(api.md): minor grammar and consistency fixes (#3320)
c237947 - chore(types): upgrade to TypeScript 3.1.1 (#3331)
842fee8 - fix(page): full page screenshot when defaultViewport is null (#3306)
e75e36b - feat(chromium): roll Chromium to r594312 (#3310)
85aca8e - chore(testserver): prepare test server (#3294)
9c89090 - chore(testrunner): fix readme description (#3293)
12e317c - chore: add .npmignore for testrunner (#3290)
5b3ddf5 - chore(testrunner): bump version to v0.5.0-post (#3291)
907d9be - chore: prepare testrunner to be published to npm (#3289)
4e48dfc - feat(launcher): add experimental "transport" option to pptr.connect (#3265)
5acf953 - feat(frame): introduce Frame.goto and Frame.waitForNavigation (#3276)
ad49f79 - docs(api.md): Fix description of SecurityDetails class (#3277)
0b9d8a6 - feat: async stacks for all "async" public methods (#3262)
9223bca - refactor: move navigation management to FrameManager (#3266)
27477a1 - docs(api.md): Fix typo (#3273)
b97bddf - refactor: unify response tracking in page.goto and waitForNavigation (#3258)
a1a211d - chore: nicer stack highlight (#3259)
a4abb4a - feat(chromium): Roll Chromium to r591618 (#3263)
7f00860 - fix(browserfetcher): Fix windows fetching (#3256)
f5d388a - docs(api.md): add example for Mouse class (#3244)
d547b9d - fix(browser): browser closing/disconnecting should abort navigations (#3245)
f0beabd - chore: drop DEBUG for public API calls (#3246)
d929f7e - fix: set JPG background to white when omitBackground option is used (#3240)
6ec3ce6 - chore: make sure Puppeteer bundling works (#3239)
f49687f - docs(api.md): add frame example (#3237)
a582acd - feat(chromium): roll Chromium to r590951 (#3236)
7ec0801 - fix: expect Network.responseReceived event is never dispatched (#3233)
c644a3b - test: make sure zero-width screenshots don't hang (#3214)
9c4b6d0 - refactor: use browser-compliant interface of 'ws' (#3218)
56b3bd8 - docs(readme.md): Added yarn guide also to puppeteer-core (#3227)
6581ee9 - docs: add ndb as a debugging tip (#3195)
1b2c811 - refactor: move Connection to use ConnectionTransport (#3217)
c967aeb - docs(api.md): add an include statement for puppeteer-core (#3213)
c5511ec - docs(api.md): Clarify how to call page.setCookie (#3215)
78e9d5c - chore: bump version to v1.8.0-post (#3212)

Commits

The new version differs by 40 commits.

  • f6c05e6 chore: mark version v1.9.0 (#3350)
  • 4abf7d1 docs(bundling): add docs about bundling for web (#3348)
  • 8becb31 test: add failing test for page.select (#3346)
  • 5ebfe1a docs(contributing): remove the --filter note (#3342)
  • cd54ce3 fix(types): upgrade node types to 8.10.34 (#3341)
  • c9657f8 docs(api.md): minor grammar and consistency fixes (#3320)
  • c237947 chore(types): upgrade to TypeScript 3.1.1 (#3331)
  • 842fee8 fix(page): full page screenshot when defaultViewport is null (#3306)
  • e75e36b feat(chromium): roll Chromium to r594312 (#3310)
  • 85aca8e chore(testserver): prepare test server (#3294)
  • 9c89090 chore(testrunner): fix readme description (#3293)
  • 12e317c chore: add .npmignore for testrunner (#3290)
  • 5b3ddf5 chore(testrunner): bump version to v0.5.0-post (#3291)
  • 907d9be chore: prepare testrunner to be published to npm (#3289)
  • 4e48dfc feat(launcher): add experimental "transport" option to pptr.connect (#3265)

There are 40 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Loading issue

Hey,

I've an issue with a website when I try to load it in my debugger but it works in the browserless debugger

On MY debugger
capture d ecran 2018-10-17 a 13 16 01

On browserless.io debugger
capture d ecran 2018-10-17 a 13 14 02

I know that's an issue that came from the website, but I'm wondering why I got this error

Possible EventEmitter memory leak

Hi,

This page explain that is a good practice to close browser connection after some processing.

After 10 connections, Browserless container throw this log :

(node:26) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGHUP listeners added. Use emitter.setMaxListeners() to increase limit

Here is an example :

  1. Mount a Browserless container on Docker :
    docker run -p 3000:3000 browserless/chrome:release-puppeteer-1.3.0

  2. Exec this code example :

const puppeteer = require('puppeteer');

(async function () {

    // Synchronous Loop
    for (let i = 0; i < 20; i++) {
        const browser = await puppeteer.connect({
            browserWSEndpoint: `ws://127.0.0.1:3000`,
        });

        const page = await browser.newPage();

        try {
            await page.goto('https://www.google.com');
            await page.screenshot({
                path: './browserless.png'
            });
            browser.close();

            await new Promise(resolve => setTimeout(resolve, 5000));
            console.log('close: ' + i)
        } catch (error) {
            console.error({
                error
            }, 'Something happened!');
            browser.close();
        }
    }
})();```

After 10 loops, a warning happened on Browserless container terminal.

An in-range update of husky is breaking the build 🚨

The devDependency husky was updated from 1.1.0 to 1.1.1.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

husky is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 8 commits.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

error: HTTP/1.1 500 The module 'request' is not whitelisted in VM

my code is like this:
const request = require('request'); module.exports = async ({page, context, id}) => { };
then I got this error message.
And in your website,you said that "You can currently require 'url', 'util', 'path', 'querystring', 'lodash', 'node-fetch', and 'request' in your functions. Please contact us for adding a module".
So I wander if I write my code in a wrong way?

Using with docker-compose ?

Hi ! Thanks for this awesome docker image !
How to use with docker-compose ?

Here is my docker-compose.yml.

version: '3' # https://blog.codeship.com/using-docker-compose-for-nodejs-development/
services:
  app:
    build:
      dockerfile: Dockerfile.dev
      context: .
    image: app
    environment:
      - TIMBER=true
      - NODE_ENV=development
    command: node --inspect=0.0.0.0:3001 --require dotenv/config ./dist/index.js
    ports:
      - "8080:8080"
      - "9229:3001"
    links:
      - browserless
    depends_on:
      - browserless
  browserless:
    image: browserless/chrome:latest
    container_name: "browserless"
    environment:
      - DEBUG=browserless/chrome
      - MAX_CONCURRENT_SESSIONS=10
    ports:
        - 3002:3000

What should i put as url to connect to it with puppeteer ?
app.locals.browser = await puppeteer.connect({browserWSEndpoint: 'ws://localhost:3002'}) doesn't work.

Thanks for your help !

get WebSocket is not open: readyState 3 (CLOSED) from puppeteer in second fetching html loop

Hi :)
in my scenario i have 10 "Active" page. in initialize phase i connect with puppeteer to browserless and make a chromium instance.
in iterating loop i want to read each pages Html.
here browserless package.json start configuration :
"dev": "npm run build && cross-env ENABLE_CORS=true MAX_CONCURRENT_SESSIONS=1 MAX_QUEUE_LENGTH=20 PREBOOT_CHROME=true CHROME_REFRESH_TIME=3600000 KEEP_ALIVE=true DEBUG=browserless* PORT=3030 node build/index.js"
as you see i set PREBOOT_CHROME=true and KEEP_ALIVE=true property.
My problem :
as i mentioned in initialize phase with puppeteer.connect i connect to browserless. in first iterate 10 instance of chromium created. in second step (NOT second iterate but still in first iterate) i use browser.newPage to get a page and in second step with page.content i get page content. note puppeteer.connect (first step) is in exported from module and singleton (i think :) )

first iterate complete successfully. but after it browserless closes chromium instances! and in another iterate loops i got >get WebSocket is not open: readyState 3 (CLOSED) error!

what should i do ? can you help me @joelgriffith ?

Browserless proxy terminates session in response to cookie DELETE request

I've been trying to get capybara working with browserless in a rails app using selenium. During the test cycle capybara will tell the selenium driver to reset cookies. Selenium uses a DELETE request to /session/XXXXXXXXXX/cookie to handle this. It appears that browserless's proxy somewhat naively treats all DELETE requests by closing the session leading to all subsequent requests from capybara/selenium to fail.

Here's a small script that demonstrates the issue:

require 'selenium-webdriver'
Selenium::WebDriver.logger.level = 'debug'

chrome_options = Selenium::WebDriver::Remote::Capabilities.chrome(
  "chromeOptions" => { args: %w( --headless --no-sandbox --disable-gpu ) }
)

s = Selenium::WebDriver::Remote::Driver.new(desired_capabilities: chrome_options, url: "http://chrome:4444/webdriver")

s.manage.delete_all_cookies
s.navigate.to('about:blank') # Will fail

It seems from the documentation that there are a handful of DELETE requests in the spec. I. think the appropriate thing to do is to only close the session when encountering a DELETE to the /session/ID url.

I'm gonna mess around with a PR for this but feel free to comment in the mean time if you have encountered this and feel like my approach is ill advised for some reason.

Browserless Debugger not fetching results

I followed the Docker Quickstart:

  1. docker pull browserless/chrome
  2. docker run --shm-size=1gb -p 3000:3000 browserless/chrome
  3. Visit http://localhost:3000/ to use the interactive debugger.

I tried this on two separate VPS's, same issue - running Ubuntu Server 16.04.3 LTS.

It's up on one of my test systems: http://144.217.188.229:3000/


Edit: I've also tried the Node Quickstart, same issue.

[nodemon] failed to start process, "ts-node" exec not found

Hi,

the docker version is working properly but if I try to npm run dev I'll get following error:

[nodemon] failed to start process, "ts-node" exec not found

What it's working in my case:

  1. changing port to 8081 in src/config.ts
  2. npm run build
  3. npm run start

Can't run page.$$

docker image tag: puppeteer-1.9.0
puppeteer-core: 1.9.0
Node.js version: 8.9.1

After a page is open, I tried to run page.$$(selector) . At this point this error message showed up in the logs from the Node app:

Error: Protocol error (Target.activateTarget): Target closed.

What does this error message mean? How can I fix it?

An in-range update of express-http-proxy is breaking the build 🚨

The dependency express-http-proxy was updated from 1.3.0 to 1.4.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

express-http-proxy is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 2 commits.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Dynamicaly uploading files with browserWSEndpoint

Hey,
i'm trying this out as a potential service for web-testing, and one of the first tests I tried was how to handle the file upload.
When connected through puppeteer with browserWSEndpoint, I just can't seem to upload files dynamicaly, while without the browserWSEndpoint, it works fine.
I've even tried passing down the headers --allow-file-access-from-files&--disable-web-security but without any luck.

Here's how i'm trying:

const browser = await puppeteer.connect({
    browserWSEndpoint: 'ws://localhost:32769',
    headless: true
})
const page = await browser.newPage();
await page.goto('http://localhost:8080', {waitUntil: 'networkidle2'});

(...)

let testUpload = async () => {
    const upload = await page.$("input");
    await upload.uploadFile('test.jpg');
    await page.screenshot({path: 'test_s_'+Date.now()+'.png'});
}

(...)

I'm 100% sure that all the elements exist, being the DOM elements or the files.
Any ideas or is this a limitation?

Cheers

Repo should link to additional docs

You have other documentation on https://docs.browserless.io but you don't point that out in the README or anywhere else in the repo. I was having an issue with the debugger session disconnecting and was digging into whether it was in issue with my proxy server but it turns out all I needed was to know about the CONNECTION_TIMEOUT env variable mentioned here. So I think it would make sense to specifically call out the additional documentation somewhere obvious such as the README or maybe to combine all the docs for the open source project and separate them from the service docs.

I also just want add a 👏👏 great job on the project. I've been trying to get something setup with puppeteer and headless chrome for awhile and this is super useful and much appreciated. And the web debugger is 🔥😍.

await page.click didn't resolve after append chrome inspector

First of all, thank you for this great project, i'm working on another project to show puppeteer action steps by chrome inspect and i got lots of ideas from your project.

Now i'm stuck by an very weird problem, once i append the puppeteer code with chrome inspector (i load a frame which url like https://chrome-devtools-frontend.appspot.com/serve_file/@7f3cdc3f76faecc6425814688e3b2b71bf1630a4/inspector.html?ws=127.0.0.1:3000/devtools/page/(CA197D0A33141BE44B19C0603ADD9E7C)&remoteFrontend=true ), the await page.click('dom-selecter') action alway blocked, but when i run puppeteer without chrome inspect, everything works fine.

there is no error show up. and i tried debug page.click function, the code run to below:

send(method, params = {}) {
    if (!this._connection)
      return Promise.reject(new Error(`Protocol error (${method}): Session closed. Most likely the page has been closed.`));
    const id = ++this._lastId;
    const message = JSON.stringify({id, method, params});
    debugSession('SEND ► ' + message);
    this._connection.send('Target.sendMessageToTarget', {sessionId: this._sessionId, message}).catch(e => {
      // The response from target might have been already dispatched.
      if (!this._callbacks.has(id))
        return;
      const callback = this._callbacks.get(id);
      this._callbacks.delete(id);
      callback.reject(e);
    });
    return new Promise((resolve, reject) => {
      this._callbacks.set(id, {resolve, reject, method});
    });
  }

it return a promise but this promise never resolved, so the code after page.click never run,

i tried figure out this problem for several days, but still no any progress , so i add an issue here to find any help from you, Do you have any idea about this problem?

i tried below code in browserless, and sometime this problem show.

await page.goto("https://weidian.com/item.html?itemID=1692458617")
await page.click("#buy_now");
await page.waitForSelecter('#item_control');
await page.click('#sku_ul > li:nth-child(2) > a')

Support for Cookies Object?

I emailed earlier but wanted to create an official issue about adding support for cookies in Chrome. This could be accomplished by passing a cookies object.

Let me know if you have follow up questions.

This site can’t be reached ... unexpectedly closed the connection.

I have installed the browserless/chrome docker image (as a service), and have opened port 3000, so that it should in theory be accessible via browserless.mysite.com:3000 (like lots of other web apps I've deployed in the same way). But when I browse to browserless.mysite.com:3000 I just get:

This site can’t be reached
browserless.mysite.com unexpectedly closed the connection

Any ideas?

Site can't load due to audio

Hi,

(it's me again)

I guess this error is probably due to the way the website has been developped but I'm wondering if there a fix for this.
When screenshoting this website I got an error that prevents to load the website
Uncaught (in promise) DOMException: Failed to load because no supported source was found.

https://chrome.browserless.io/?script=await%20page.goto(%27http%3A%2F%2Fhki.paris%2Fhome%27)%3B

With dumpio, I got more info :
[0926/072802.585761:INFO:CONSOLE(12)] "The Web Audio autoplay policy will be re-enabled in Chrome 70 (October 2018). Please check that your website is compatible with it. https://goo.gl/7K7WLu", source: http://hki.paris/build/desktop.js?v=1 (12) [0926/072810.183949:ERROR:render_media_log.cc(30)] MediaEvent: MEDIA_ERROR_LOG_ENTRY {"error":"FFmpegDemuxer: no supported streams"} [0926/072810.416384:ERROR:render_media_log.cc(30)] MediaEvent: PIPELINE_ERROR DEMUXER_ERROR_NO_SUPPORTED_STREAMS [0926/072810.921032:INFO:CONSOLE(0)] "Uncaught (in promise) NotSupportedError: Failed to load because no supported source was found.", source: http://hki.paris/home (0)

I tried to install FFMpeg on my server but It doesn't changed anything.

Any idea?

Thanks!

Edit:
I'm using chrome & not chromium

Edit 2 :
Here the list of args that I'm launching puppeteer with :
args: [ '--disable-gpu', '--no-sandbox', '--disable-setuid-sandbox', '--disable-translate', '--mute-audio', '--hide-scrollbars', '--disable-translate', '--ignore-certificate-errors', '--ignore-certificate-errors-spki-list' ],

An in-range update of ts-jest is breaking the build 🚨

The dependency ts-jest was updated from 23.10.0 to 23.10.1.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

ts-jest is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 13 commits.

  • d9c5b45 Merge pull request #743 from huafu/release/23.10.1
  • e4a3a09 chore(release): 23.10.1
  • ab94359 Merge pull request #742 from huafu/fix-740-no-js-compilation-with-allow-js
  • a844fd4 Merge branch 'master' into fix-740-no-js-compilation-with-allow-js
  • 18dced1 Merge pull request #741 from huafu/e2e-weird-deep-paths
  • 9e7d6a0 test(config): adds test related to allowJs
  • 374dca1 fix(compile): js files were never transpiled thru TS
  • 70fd9af ci(cache): removes some paths from the caching
  • c12dfff fix(windows): normalize paths
  • 0141098 test(e2e): deep paths and coverage
  • 6ccbff3 Merge pull request #736 from huafu/detect-import-and-throw-if-none
  • a2a4be2 fix(config): warn instead of forcing ESM interoperability
  • 21644da Merge pull request #735 from huafu/master

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Implement option to set server host

I encountered an issue in our gitlab-ci pipeline, that sometimes connection reset error occured when two or more docker containers communicate with each other. I found out, that to solve this issue, I should set the server host to bind to IP 0.0.0.0. I updated the IP address and it seems to be working without any issue now. We created our own image with this change, but it would be great, to have this also integrated in the official repo.

An in-range update of lighthouse is breaking the build 🚨

The dependency lighthouse was updated from 3.1.1 to 3.2.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

lighthouse is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Release Notes for 3.2.0 (2018-09-26)

Full Changelog

New Audits

  • add js-libraries audit, just listing detected js libs (#6081)

Faster

  • driver: deliver trace as events rather than a stream (#6056)
  • network-recorder: consider iframe responses finished. helps avoid pageload timeout (#6078)
  • replace WebInspector traceparser with native JSON.parse (#6099)

Core

  • add emulatedFormFactor setting (#6098)
  • remove some trivial uses of WebInspector (#6090)
  • use cssstyle to parse CSS colors instead of WebInspector (#6091)
  • initial refactor of computedArtifact import/caching (#5907)
  • asset-saver: stop creating screenshot files during --save-assets (#6066)
  • content-width: not applicable on desktop (#5893)
  • driver: add check to make sure Runtime.evaluate result exists (#6089)
  • icons: Add PNG check to manifest icon validation (#6024)
  • lhr: add top-level runtimeError (#6014)
    • gather-runner: include error status codes in pageLoadError (#6051)
    • smooth rough edges of pageLoadError display and reporting (#6083)
  • net-request: transferSize now shared via 'X-TotalFetchedSize' (#6050)
  • don't allow analysis of file:// urls (#5936)

Report

  • dont show zero ms savings in preconnect, preload audits (#5983)
  • align table headings & columns left/right (#6063)
  • audit: make dom-size table prettier (#6065)
  • cursor:pointer on Passed Audits, etc (#5977)
  • psi: remove redundant varience disclaimer (#6110)
  • util: ✅ audits should be in Passed Audits (#5963)
  • vulnerable-jslibs: tweak snyk link for highlighted matches (#6096)
  • xbrowser: replace Typed OM getComputedStyle() with CSSOM equivalent (#5984)

CLI

  • add --print-config flag (#6107)

Deps

Docs

  • readme: add lighthouse4u (#6008)
  • readme: updated report screenshot to 3.1.0 (#6042)
  • readme: add lighthouse-badges to related projects (#5969)
  • recipes: update custom-audit package.json (#6007)
  • releasing: minor updates (#5345)

i18n

  • roll latest strings from TC (#6109)
  • mv locale files (#5981)
  • speed up replacement regex (#6072)

Misc

  • bump bundlesize threshold a little more (#6055)
  • runner: added locale to settings that can change between -G and -A (#6080)
  • tsc: add type checking to sentry usage (#5993)
Commits

The new version differs by 44 commits.

  • 081864e 3.2.0 (#6120)
  • e15f87d docs(releasing): minor updates (#5345)
  • 34b55b3 cli: add --print-config flag (#6107)
  • 72b59c5 core(content-width): not applicable on desktop (#5893)
  • a097a23 report(psi): remove redundant varience disclaimer (#6110)
  • 3a6f6c5 deps: [email protected] (#6106)
  • f5c043d core: add emulatedFormFactor setting (#6098)
  • b8d1496 i18n: roll latest strings from TC (#6109)
  • b49a1d2 deps: [email protected] (#6102)
  • fed4a88 report(vulnerable-jslibs): tweak snyk link for highlighted matches (#6096)
  • ce96d76 core(asset-saver): stop creating screenshot files during --save-assets (#6066)
  • 67302a0 core: update chrome-devtools-frontend to latest (#6101)
  • f0e6dd9 core(driver): add check to make sure Runtime.evaluate result exists (#6089)
  • 14d6450 core: replace WebInspector traceparser with native JSON.parse (#6099)
  • 265d956 core: remove some trivial uses of WebInspector (#6090)

There are 44 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

user-data-dir

In order to get caching of service workers working in browserless (and puppeteer generally) I find that it's necessary to set a userDataDir via the --user-data-dir passed to puppeteer.launch(). This may be necessary to get plain http caching working too... not sure.

You can see (for service worker caching) this via the following... when the flag is passed via the ws, the browser is able to cache the content in the service worker, and subsequently (after the content has been cached) you'll see "retrieved from service worker cache". Without the flag, you just get "retrieved from network" no matter how many times you request the content.

(async () => {
  const browser = await puppeteer.connect({
    browserWSEndpoint: 'ws://browserless.mydomain.com:3000?--user-data-dir=/tmp'
  });
  const page = await browser.newPage();
  const url = 'https://cloud3squared.com/files/sw-stackoverflow-demo/index.html';
  await page.goto(url, { waitUntil: 'networkidle0' });
  await page.screenshot({ path: './temp.png' });
  browser.close();
})();

This is fine when only one browser has been instantiated by browserless under the hood. The problem comes when multiple browsers are all passed the same --user-data-dir and are all trying to access and write to the same storage area.

At least, I'm seeing a few issues when using browserless which I think result from this. It's partly educated guesswork at this stage.

I wonder if it's necessary to be able to somehow use a different --user-data-dir for each different browser that is instantiated, so that there is no conflict?

Or maybe it's possible to use /dev/shm (if it's big enough and hasn't been disabled) for --user-data-dir (I think that /dev/shm is "supposed" to be shared between different processes) ... I haven't yet tried that ... mainly because I can't since browserless disables /dev/shm by default.

P.S. the page rendered by browserless when "fetched from network" in the above example seems to be missing (unable to render) this character... the "tick" is rendered fine (for "fetched from service worker cache")

MaxListenersExceededWarning if MAX_CONCURRENT_SESSION>10 and PREBOOT_CHROME=true

If you use PREBOOT_CHROME & MAX_CONCURRENT_SESSION > 10 then you get a MaxListenersExceededWarning for a bunch of events:

% docker run -e "PREBOOT_CHROME=true" -e "MAX_CONCURRENT_SESSIONS=11" -p 3000:3000 browserless/chrome:latest
(node:27) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 exit listeners added. Use emitter.setMaxListeners() to increase limit
(node:27) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGINT listeners added. Use emitter.setMaxListeners() to increase limit
(node:27) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGTERM listeners added. Use emitter.setMaxListeners() to increase limit
(node:27) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGHUP listeners added. Use emitter.setMaxListeners() to increase limit

From Node documentation:

By default EventEmitters will print a warning if more than 10 listeners are added for a particular event.

Maybe browserless should increase the max listener allowed to the MAX_CONCURRENT_SESSION ?

An in-range update of @types/node is breaking the build 🚨

The dependency @types/node was updated from 10.11.5 to 10.11.6.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

@types/node is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

RFC: Multiple version of Chrome in one image

Wanted to reach out and gather thoughts on bundling the last 2-5 version of Chrome in single image vs maintaining numerous image's with the only difference being the Chrome version.

PROS

  • Less docker and related built maintenance
  • Easier to understand build tools
  • Quick way to test different versions for performance and other intricacies
  • Can apply semver builds in docker (vs confusing release-puppeteer-x.x.x)

CONS

  • Bigger image bundles
  • Slightly more complex runtime (have to specify the puppeteer version in the URL or something similar).
  • Having to download several version of Chrome, and building relevant tooling for each in the debugger (all the tooltip stuffs).
  • Things like pre-booting Chrome become harder since you'd have to pre-boot one version only vs many.

Thoughts? Comments? Concerns?

DEPTH_ZERO_SELF_SIGNED_CERT Nginx Proxy

Hey, I can connect to my https://browserless.site just fine and the wss inititiates just fine. However, when I'm using the wss://browserless.site I get an error that returns DEPTH_ZERO_SELF_SIGNED_CERT. Any idea how I can fix this?

I did have it working over ws:// as the Getting Started recommends but I'd like to have my connection encrypted if possible.

An in-range update of @types/node is breaking the build 🚨

The dependency @types/node was updated from 10.11.0 to 10.11.1.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

@types/node is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper App’s white list on Github. You'll find this list on your repo or organization’s settings page, under Installed GitHub Apps.

Unexpected PDF file size

When converting a HTML file that contains emojis, my PDF becomes extremely large. Without any emoji characters, the email is around 1.5MB. With emoji characters, the email becomes around 8MB in file size. Other than these emojis, the file is nothing special, just some plain text.

I saw that one of the features/advantages of Browserless it that fonts/emojis are supported out of the box, so this seems an unusual issue. I can print the file as PDF and it will be less than 1MB, so browserless seems to make this file exceptionally large.

Possibility of defining remote userDataDir

Hello! First of all, thanks for the amazing project. It helps a lot.

My problem is: I'm running browserless (hosted) in a ECS cluster. So my scraping jobs just connect to it (on the launch method) and get things running.

I need to keep a log-in session alive for several hours - without actually needing to keep a Chrome instance alive for that long (most of the time I'm just waiting for things to happen). So I'm saving the userDataDir somewhere and, when I need to get back to that session, I launch a new Chrome using this previously saved dir.

But I'm saving the userDataDir from a different machine than the one I run chrome.

Can I pass, somehow, this dir to browserless (without being way too hacky)?

Timeout issues ?

I'm wondering, would there be any timeout issues using this technique ?
For example if I'd launch a big crawl for +24 hours

I'd say no, but maybe you pushed it to its limits :)

Thanks

An in-range update of lodash is breaking the build 🚨

The dependency lodash was updated from 4.17.10 to 4.17.11.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

lodash is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Problem with docker run option --shm-size=1gb

According to this it is necessary to pass --shm-size=1gb to docker run.

The problem I have is that I would not be firing browserless up via docker run but rather docker service create... and it's not possible to pass --shm-size as an option to that. There is a possible workaround to pass --mount instead, but to cut a long story short... that's not possible for me.

I have encountered this shm issue before, when deploying headless chrome to docker in a different way, and for that I am passing the option --disable-dev-shm-usage to puppeteer.launch()... based on this advice.

These two different options (i.e. --shm-size=1gb passed to docker run and --disable-dev-shm-usage passed to puppeteer.launch()) seem to be addressing the same problem, so...

Is it possible to pass a custom option (and in particular, --disable-dev-shm-usage) to puppeteer.launch() in browserless?

I think, if that were possible, it would solve my problem!

Would it anyway be better to have browserless pass --disable-dev-shm-usage to puppeteer.launch() by default... since there is then no longer any need to advise users to pass --shm-size=1gb to docker run? Just a simple docker run -p 3000:3000 browserless/chrome.

But that, as well as the ability to pass other custom options, would be nice.

EDIT OK now that I look at the browserless launch code, I see that you do already pass disable-dev-shm-usage to puppeteer.launch()... so I suppose that the above question then becomes... is it actually still necessary to pass --shm-size=1gb to docker run if you're also passing disable-dev-shm-usage to puppeteer.launch()... aren't they there for the same reason?

Document how to run Docker container with custom arguments

Having read README.md, it is unclear how to start a docker container with customer arguments.

I have tried:

$ docker run -p 3000:3000 browserless/chrome --proxy-server=127.0.0.1:8050

But that does not work.

Furthermore, even if this did work, how would one then acquire the host IP address? That requires checking ifconfig on the container.

Perhaps whats needed is an initialisation script (JavaScript) that could be loaded to the container and used to configure the instance.

Support Puppeteer 1.5

With Puppeteer 1.5 connecting fails with:

     Error: Protocol error (Target.getBrowserContexts): 'Target.getBrowserContexts' wasn't found undefined
      at Promise (node_modules/puppeteer/lib/Connection.js:86:56)
  From previous event:
      at Connection.send (node_modules/puppeteer/lib/Connection.js:85:12)
      at Function.connect (node_modules/puppeteer/lib/Launcher.js:257:50)
      at <anonymous>

Proxy per page?

Hey!

One can set a proxy like:

  const browser = await puppeteer.launch({
    headless: false,
    args: [
      '--proxy-server=PROXY_URI'
    ],
  });

Is there a possibility to set different proxies for different pages in the same browser instance?
Like:

browser1, page1 => proxy1
browser1, page2 => proxy2

Cheers

Dockerhub tag for build with puppeteer-1.1.1

Hello,

I was wondering if it would be possible to create a tag on Docker Hub for the build with puppeteer-1.1.1
I'd prefer to use this over latest to prevent potentially breaking changes being pulled when I'm rebuilding my service.

Thanks a lot,
Andy

Video tag support

Hi,

As puppeteer doesn't support mp4 codec by default and the only way to fix that is to add a path to Chrome in puppeteer.launch. (see here and here)

How could it work with browserless?
Is there a way to launch browserless with a specific Chrome app?

Thanks!

An in-range update of joi is breaking the build 🚨

The dependency joi was updated from 13.6.0 to 13.7.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

joi is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 9 commits.

  • e4b82f6 13.7.0
  • 6bbbdaf Add documentation for #1562.
  • 1e837de Merge pull request #1599 from rluba/patch-1
  • fd1911a Link to isemail for email() options
  • a496210 Merge pull request #1572 from dnalborczyk/patch-1
  • 73f3efd Update API.md
  • da70a73 Merge pull request #1562 from kanongil/symbol-support
  • 070d3c9 Remove symbol key for map and revise stringification
  • 8f7f242 Add symbol() type

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

You’re doing an amazing job!

Hi, just want to say you really are an inspiration towards an indiehacker path. Always good to see someone who’s hustling and making it work and living life on their own terms.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.