browserless / browserless Goto Github PK

View Code? Open in Web Editor NEW

7.9K 7.9K 655.0 97.76 MB

Deploy headless browsers in Docker. Run on our cloud or bring your own. Free for non-commercial uses.

Home Page: https://browserless.io

License: Other

Shell 0.20% JavaScript 6.40% TypeScript 90.99% Dockerfile 1.85% HTML 0.57%

browserless chrome docker firefox nodejs playwright puppeteer typescript webkit websocket

browserless's Introduction

Note

Looking for v1.x.x of browserless? You can find it here, although we recommend migrating to v2.

Browserless allows remote clients to connect and execute headless work, all inside of docker. It supports the standard, unforked Puppeteer and Playwright libraries, as well offering REST-based APIs for common actions like data collection, PDF generation and more.

We take care of common issues such as missing system-fonts, missing external libraries, and performance improvements, along with edge-cases like downloading files and managing sessions. For details, check out the documentation site built into the project which includes Open API docs.

If you've been struggling to deploy headless browsers without running into issues or bloated resource requirements, then Browserless was built for you. Run the browsers in our cloud or your own, free for non-commercial uses.

External links
Features
How it works
Extending (NodeJS SDK)
Usage with other libraries
Motivations
Licensing

External links

Features

General

Parallelism and request-queueing are built-in + configurable.
Fonts and emoji's working out-of-the-box.
Debug Viewer for actively viewing/debugging running sessions.
An interactive puppeteer debugger, so you can see what the headless browser is doing and use its DevTools.
Works with unforked Puppeteer and Playwright.
Configurable session timers and health-checks to keep things running smoothly.
Error tolerant: if Chrome dies it won't.
Support for running and development on Apple's M1 machines

Cloud

Our cloud accounts include all the general features plus extras, such as:

Inbuilt residential proxy
/unblock API for avoiding detectors
Hybrid automations for streaming login windows during scripts
Ability to upload and run extensions (coming soon)
SSO, tokens and user roles

How it works

Browserless listens for both incoming websocket requests, generally issued by most libraries, as well as pre-build REST APIs to do common functions (PDF generation, images and so on). When a websocket connects to Browserless it starts Chrome and proxies your request into it. Once the session is done then it closes and awaits for more connections. Some libraries use Chrome's HTTP endpoints, like /json to inspect debug-able targets, which Browserless also supports.

You still execute the script itself which gives you total control over what library you want to choose and when to do upgrades. This also comes with the benefit of keep your code proprietary and able to run on numerous platforms. We simply take care of all the browser-aspects and offer a management layer on top of the browser.

Docker

Tip

See more options on our full documentation site.

docker run -p 3000:3000 ghcr.io/browserless/chromium
Visit http://localhost:3000/docs to see the documentation site.
See more at our docker package.

Hosting Providers

We offer a first-class hosted product located here. Alternatively you can host this image on just about any major platform that offers hosting for docker. Our hosted service takes care of all the machine provisioning, notifications, dashboards and monitoring plus more:

Easily upgrade and toggle between versions at the press of a button. No managing repositories and other code artifacts.
Never need to update or pull anything from docker. There's literally zero software to install to get started.
Scale your consumption up or down with different plans. We support up to thousands of concurrent sessions at a given time.

If you're interested in using this image for commercial aspects, then please read the below section on licensing.

Puppeteer

Puppeteer allows you to specify a remote location for chrome via the browserWSEndpoint option. Setting this for Browserless is a single line of code change.

Before

const browser = await puppeteer.launch();

After

const browser = await puppeteer.connect({
  browserWSEndpoint: 'ws://localhost:3000',
});

Playwright

We support running with playwright via their their browser's remote connection protocols interface out of the box. Just make sure that your Docker image, playwright browser type and endpoint match:

Before

import pw from "playwright";
const browser = await pw.firefox.launch();

After

docker run -p 3000:3000 ghcr.io/browserless/firefox
# or ghcr.io/browserless/multi for all the browsers

import pw from "playwright-core";

const browser = await pw.firefox.connect(
  'ws://localhost:3000/firefox/playwright',
);

After that, the rest of your code remains the same with no other changes required.

Extending (NodeJS SDK)

Browserless comes with built-in extension capabilities, and allows for extending nearly any aspect of the system (for Version 2+). For more details on how to write your own routes, build docker images, and more, see our SDK README.md or simply run "npx @browserless.io/browserless create" in a terminal and follow the onscreen prompts.

Usage with other libraries

Most libraries allow you to specify a remote instance of Chrome to interact with. They are either looking for a websocket endpoint, a host and port, or some address. Browserless supports these by default, however if you're having issues please make an issue in this project and we'll try and work with the library authors to get them integrated with browserless. Please note that in V2 we no longer support selenium or webdriver integrations.

You can find a much larger list of supported libraries on our documentation site.

Motivations

Running Chrome on lambda or on your own is a fantastic idea but in practice is quite challenging in production. You're met with pretty tough cloud limits, possibly building Chrome yourself, and then dealing with odd invocation issues should everything else go ok. A lot of issues in various repositories are due to just challenges of getting Chrome running smoothly in AWS (see here). You can see for yourself by going to nearly any library and sorting issues by most commented.

Getting Chrome running well in docker is also a challenge as there's quiet a few packages you need in order to get Chrome running. Once that's done then there's still missing fonts, getting libraries to work with it, and having limitations on service reliability. This is also ignoring CVEs, access-controls, and scaling strategies.

All of these issues prompted us to build a first-class image and workflow for interacting with Chrome in a more streamlined way. With Browserless you never have to worry about fonts, extra packages, library support, security, or anything else. It just works reliably like any other modern web service. On top of that it comes with a prescribed approach on how you interact with Chrome, which is through socket connections (similar to a database or any other external appliance). What this means is that you get the ability to drive Chrome remotely without having to do updates/releases to the thing that runs Chrome since it's divorced from your application.

Licensing

SPDX-License-Identifier: SSPL-1.0 OR Browserless Commercial License.

If you want to use Browserless to build commercial sites, applications, or in a continuous-integration system that's closed-source then you'll need to purchase a commercial license. This allows you to keep your software proprietary whilst still using browserless. You can purchase a commercial license here. A commercial license grants you:

Priority support on issues and features.
On-premise running as well as running on public cloud providers for commercial/CI purposes for proprietary systems.
Ability to modify the source (forking) for your own purposes.
A new admin user-interface.

Not only does it grant you a license to run such a critical piece of infrastructure, but you are also supporting further innovation in this space and our ability to contribute to it.

If you are creating an open source application under a license compatible with the Server Side License 1.0, you may use Browserless under those terms.

browserless's People

Contributors

Stargazers

Watchers

Forkers

honsa hbcbh1999 chenkaigithub agdolla prasanta-biswas ekampf robertjchristian raidus johnmcdowall jurby icarusmiles gianfebrian drmrbrewer tomdyson iamcsk anothervision obaid tombyrer pipemachine lukasbob ben52 pdiaz deadwards90 lvbreda tsabat casualuser lukebillo foxundermoon esendex andymathys antero10 hilalisa amondnet filipoliko anshul1927 r00tarded kalki5 ubnt-pavel dudagroup-hienhoang webmaia tednology jpambrun yazinsai kentk87 lpellegr outerim 9g ishan-marikar barryp mattwoods11 zodimo michaelwills enterstudio tobiaslins exentrich ashiknesin lorenzoqricambi yi-ge gaybro8777 huoyijie lgs kevinsegal xose thezedwards codesorter2015 barnull nullcc tolgahan airnomad asiellb zyearw1024 immanuelsegol j4m355 mpskovvang scratchy144 imfantuan taosx alezzigo suqcnn hantmac hhy5277 muratboyraz yknx4 alpha-health-ai ymrzkrrs boliangt awesome-javascript wuqundong520 cristian-gabbanini allen-oneill dmop davguij katoni awesome-archive edgoracn sagarteria o0x2a sbrown345 netist marpo60

browserless's Issues

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper App’s white list on Github. You'll find this list on your repo or organization’s settings page, under Installed GitHub Apps.

Caching performance

As a consequence of the fact that each browser keeps its own separate userDataDir by default, and since browserless maintains a pool of browsers to serve incoming requests, it seems to me that caching will be somewhat sub-optimal... for N browsers there are N caches to fill, more cache misses, and more cached data.

Is there any way to ensure that the same browser (and associated cache) is used for the same source, so that a consistent cache is used for a particular user? Perhaps based on some ID or key, or even IP address?

Ideally a common cache area would be used for all browsers, but I'm not sure this is possible without horrible conflicts. I did try passing the same userDataDir to all browsers and indeed it didn't work out well. Perhaps using a shared memory mount (/dev/shm) would work better since that is "supposed" to be shared between processes?

An in-range update of puppeteer is breaking the build 🚨

The dependency puppeteer was updated from `1.8.0` to `1.9.0`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

puppeteer is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Release Notes for v1.9.0

Big Changes

Chromium 71.0.3563.0 (r594312)
Debugging: Puppeteer errors now have async stack traces
Puppeteer now can be bundled with Browserify.
It's now possible to navigate frames using frame.goto and frame.waitForNavigation

API Changes

Added:

Bug Fixes

#2374 - feat(browser): Run puppeteer in browser (POC)
#2377 - Certificates error using puppeteer
#2721 - page.goto doesn't clear internal timeout when the browser is closed
#2888 - Cannot read property '_bodyLoadedPromiseFulfill' of null
#2918 - Support waitForNavigation for frames
#3104 - Full page screenshot fails when defaultViewport is null
#3109 - Is it clear what <...Type> means in the docs?
#3204 - docs: mention require('puppeteer-core')
#3221 - As for puppeteer to emulate the movement of the mouse while pressing?
#3232 - Add documentation and examples for iframe API.
#3234 - Black render with omitBackground: true
#3340 - Does --filter=SomeTest do anything when running npm run unit

Raw Notes

4abf7d1 - docs(bundling): add docs about bundling for web (#3348)
8becb31 - test: add failing test for page.select (#3346)
5ebfe1a - docs(contributing): remove the --filter note (#3342)
cd54ce3 - fix(types): upgrade node types to 8.10.34 (#3341)
c9657f8 - docs(api.md): minor grammar and consistency fixes (#3320)
c237947 - chore(types): upgrade to TypeScript 3.1.1 (#3331)
842fee8 - fix(page): full page screenshot when defaultViewport is null (#3306)
e75e36b - feat(chromium): roll Chromium to r594312 (#3310)
85aca8e - chore(testserver): prepare test server (#3294)
9c89090 - chore(testrunner): fix readme description (#3293)
12e317c - chore: add .npmignore for testrunner (#3290)
5b3ddf5 - chore(testrunner): bump version to v0.5.0-post (#3291)
907d9be - chore: prepare testrunner to be published to npm (#3289)
4e48dfc - feat(launcher): add experimental "transport" option to pptr.connect (#3265)
5acf953 - feat(frame): introduce Frame.goto and Frame.waitForNavigation (#3276)
ad49f79 - docs(api.md): Fix description of SecurityDetails class (#3277)
0b9d8a6 - feat: async stacks for all "async" public methods (#3262)
9223bca - refactor: move navigation management to FrameManager (#3266)
27477a1 - docs(api.md): Fix typo (#3273)
b97bddf - refactor: unify response tracking in page.goto and waitForNavigation (#3258)
a1a211d - chore: nicer stack highlight (#3259)
a4abb4a - feat(chromium): Roll Chromium to r591618 (#3263)
7f00860 - fix(browserfetcher): Fix windows fetching (#3256)
f5d388a - docs(api.md): add example for Mouse class (#3244)
d547b9d - fix(browser): browser closing/disconnecting should abort navigations (#3245)
f0beabd - chore: drop DEBUG for public API calls (#3246)
d929f7e - fix: set JPG background to white when omitBackground option is used (#3240)
6ec3ce6 - chore: make sure Puppeteer bundling works (#3239)
f49687f - docs(api.md): add frame example (#3237)
a582acd - feat(chromium): roll Chromium to r590951 (#3236)
7ec0801 - fix: expect Network.responseReceived event is never dispatched (#3233)
c644a3b - test: make sure zero-width screenshots don't hang (#3214)
9c4b6d0 - refactor: use browser-compliant interface of 'ws' (#3218)
56b3bd8 - docs(readme.md): Added yarn guide also to puppeteer-core (#3227)
6581ee9 - docs: add ndb as a debugging tip (#3195)
1b2c811 - refactor: move Connection to use ConnectionTransport (#3217)
c967aeb - docs(api.md): add an include statement for puppeteer-core (#3213)
c5511ec - docs(api.md): Clarify how to call page.setCookie (#3215)
78e9d5c - chore: bump version to v1.8.0-post (#3212)

Commits

The new version differs by 40 commits.

f6c05e6 chore: mark version v1.9.0 (#3350)
4abf7d1 docs(bundling): add docs about bundling for web (#3348)
8becb31 test: add failing test for page.select (#3346)
5ebfe1a docs(contributing): remove the --filter note (#3342)
cd54ce3 fix(types): upgrade node types to 8.10.34 (#3341)
c9657f8 docs(api.md): minor grammar and consistency fixes (#3320)
c237947 chore(types): upgrade to TypeScript 3.1.1 (#3331)
842fee8 fix(page): full page screenshot when defaultViewport is null (#3306)
e75e36b feat(chromium): roll Chromium to r594312 (#3310)
85aca8e chore(testserver): prepare test server (#3294)
9c89090 chore(testrunner): fix readme description (#3293)
12e317c chore: add .npmignore for testrunner (#3290)
5b3ddf5 chore(testrunner): bump version to v0.5.0-post (#3291)
907d9be chore: prepare testrunner to be published to npm (#3289)
4e48dfc feat(launcher): add experimental "transport" option to pptr.connect (#3265)

There are 40 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

Loading issue

Hey,

I've an issue with a website when I try to load it in my debugger but it works in the browserless debugger

On MY debugger

On browserless.io debugger

I know that's an issue that came from the website, but I'm wondering why I got this error

Possible EventEmitter memory leak

Hi,

This page explain that is a good practice to close browser connection after some processing.

After 10 connections, Browserless container throw this log :

(node:26) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGHUP listeners added. Use emitter.setMaxListeners() to increase limit

Here is an example :

Mount a Browserless container on Docker :
docker run -p 3000:3000 browserless/chrome:release-puppeteer-1.3.0
Exec this code example :

const puppeteer = require('puppeteer');

(async function () {

    // Synchronous Loop
    for (let i = 0; i < 20; i++) {
        const browser = await puppeteer.connect({
            browserWSEndpoint: `ws://127.0.0.1:3000`,
        });

        const page = await browser.newPage();

        try {
            await page.goto('https://www.google.com');
            await page.screenshot({
                path: './browserless.png'
            });
            browser.close();

            await new Promise(resolve => setTimeout(resolve, 5000));
            console.log('close: ' + i)
        } catch (error) {
            console.error({
                error
            }, 'Something happened!');
            browser.close();
        }
    }
})();```

After 10 loops, a warning happened on Browserless container terminal.

An in-range update of husky is breaking the build 🚨

The devDependency husky was updated from `1.1.0` to `1.1.1`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

husky is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 8 commits.

09bbaa3 1.1.1
86379ab Update CHANGELOG.md
f588dac Update CHANGELOG.md
dd57ab7 Check Node version before running hook
821d88d Update CHANGELOG.md
c6b9925 Merge branch 'master' of https://github.com/typicode/husky
1406071 Check HUSKY_SKIP_INSTALL value before other checks
cb9a517 Update BACKERS.md

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

error: HTTP/1.1 500 The module 'request' is not whitelisted in VM

my code is like this:
const request = require('request'); module.exports = async ({page, context, id}) => { };
then I got this error message.
And in your website,you said that "You can currently require 'url', 'util', 'path', 'querystring', 'lodash', 'node-fetch', and 'request' in your functions. Please contact us for adding a module".
So I wander if I write my code in a wrong way?

Using with docker-compose ?

Hi ! Thanks for this awesome docker image !
How to use with docker-compose ?

Here is my docker-compose.yml.

version: '3' # https://blog.codeship.com/using-docker-compose-for-nodejs-development/
services:
  app:
    build:
      dockerfile: Dockerfile.dev
      context: .
    image: app
    environment:
      - TIMBER=true
      - NODE_ENV=development
    command: node --inspect=0.0.0.0:3001 --require dotenv/config ./dist/index.js
    ports:
      - "8080:8080"
      - "9229:3001"
    links:
      - browserless
    depends_on:
      - browserless
  browserless:
    image: browserless/chrome:latest
    container_name: "browserless"
    environment:
      - DEBUG=browserless/chrome
      - MAX_CONCURRENT_SESSIONS=10
    ports:
        - 3002:3000

What should i put as url to connect to it with puppeteer ?
app.locals.browser = await puppeteer.connect({browserWSEndpoint: 'ws://localhost:3002'}) doesn't work.

Thanks for your help !

Japanese fonts

Hi,

When I made a screenshot of this website http://fshr-kyoto.com, fonts are missing.
Any idea how to display them?

get WebSocket is not open: readyState 3 (CLOSED) from puppeteer in second fetching html loop

Hi :)
in my scenario i have 10 "Active" page. in initialize phase i connect with puppeteer to browserless and make a chromium instance.
in iterating loop i want to read each pages Html.
here browserless package.json start configuration :
"dev": "npm run build && cross-env ENABLE_CORS=true MAX_CONCURRENT_SESSIONS=1 MAX_QUEUE_LENGTH=20 PREBOOT_CHROME=true CHROME_REFRESH_TIME=3600000 KEEP_ALIVE=true DEBUG=browserless* PORT=3030 node build/index.js"
as you see i set PREBOOT_CHROME=true and KEEP_ALIVE=true property.
My problem :
as i mentioned in initialize phase with puppeteer.connect i connect to browserless. in first iterate 10 instance of chromium created. in second step (NOT second iterate but still in first iterate) i use browser.newPage to get a page and in second step with page.content i get page content. note puppeteer.connect (first step) is in exported from module and singleton (i think :) )

first iterate complete successfully. but after it browserless closes chromium instances! and in another iterate loops i got >get WebSocket is not open: readyState 3 (CLOSED) error!

what should i do ? can you help me @joelgriffith ?

Browserless proxy terminates session in response to cookie DELETE request

I've been trying to get capybara working with browserless in a rails app using selenium. During the test cycle capybara will tell the selenium driver to reset cookies. Selenium uses a DELETE request to /session/XXXXXXXXXX/cookie to handle this. It appears that browserless's proxy somewhat naively treats all DELETE requests by closing the session leading to all subsequent requests from capybara/selenium to fail.

Here's a small script that demonstrates the issue:

require 'selenium-webdriver'
Selenium::WebDriver.logger.level = 'debug'

chrome_options = Selenium::WebDriver::Remote::Capabilities.chrome(
  "chromeOptions" => { args: %w( --headless --no-sandbox --disable-gpu ) }
)

s = Selenium::WebDriver::Remote::Driver.new(desired_capabilities: chrome_options, url: "http://chrome:4444/webdriver")

s.manage.delete_all_cookies
s.navigate.to('about:blank') # Will fail

It seems from the documentation that there are a handful of DELETE requests in the spec. I. think the appropriate thing to do is to only close the session when encountering a DELETE to the /session/ID url.

I'm gonna mess around with a PR for this but feel free to comment in the mean time if you have encountered this and feel like my approach is ill advised for some reason.

Browserless Debugger not fetching results

I followed the Docker Quickstart:

docker pull browserless/chrome
docker run --shm-size=1gb -p 3000:3000 browserless/chrome
Visit http://localhost:3000/ to use the interactive debugger.

I tried this on two separate VPS's, same issue - running Ubuntu Server 16.04.3 LTS.

It's up on one of my test systems: http://144.217.188.229:3000/

Edit: I've also tried the Node Quickstart, same issue.

[nodemon] failed to start process, "ts-node" exec not found

Hi,

the docker version is working properly but if I try to npm run dev I'll get following error:

[nodemon] failed to start process, "ts-node" exec not found

What it's working in my case:

changing port to 8081 in src/config.ts
npm run build
npm run start

Can't run page.$$

docker image tag: puppeteer-1.9.0
puppeteer-core: 1.9.0
Node.js version: 8.9.1

After a page is open, I tried to run page.$$(selector) . At this point this error message showed up in the logs from the Node app:

Error: Protocol error (Target.activateTarget): Target closed.

What does this error message mean? How can I fix it?

An in-range update of express-http-proxy is breaking the build 🚨

The dependency express-http-proxy was updated from `1.3.0` to `1.4.0`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

express-http-proxy is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 2 commits.

8fcb97a 1.4.0
4436f47 Revert "updating source control locations"

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

Error on Installation

I'm unable to install this. Pretty hard for me. Please make this easy to install

Dynamicaly uploading files with browserWSEndpoint

Hey,
i'm trying this out as a potential service for web-testing, and one of the first tests I tried was how to handle the file upload.
When connected through puppeteer with browserWSEndpoint, I just can't seem to upload files dynamicaly, while without the browserWSEndpoint, it works fine.
I've even tried passing down the headers --allow-file-access-from-files&--disable-web-security but without any luck.

Here's how i'm trying:

const browser = await puppeteer.connect({
    browserWSEndpoint: 'ws://localhost:32769',
    headless: true
})
const page = await browser.newPage();
await page.goto('http://localhost:8080', {waitUntil: 'networkidle2'});

(...)

let testUpload = async () => {
    const upload = await page.$("input");
    await upload.uploadFile('test.jpg');
    await page.screenshot({path: 'test_s_'+Date.now()+'.png'});
}

(...)

I'm 100% sure that all the elements exist, being the DOM elements or the files.
Any ideas or is this a limitation?

Cheers

Repo should link to additional docs

You have other documentation on https://docs.browserless.io but you don't point that out in the README or anywhere else in the repo. I was having an issue with the debugger session disconnecting and was digging into whether it was in issue with my proxy server but it turns out all I needed was to know about the CONNECTION_TIMEOUT env variable mentioned here. So I think it would make sense to specifically call out the additional documentation somewhere obvious such as the README or maybe to combine all the docs for the open source project and separate them from the service docs.

I also just want add a 👏👏 great job on the project. I've been trying to get something setup with puppeteer and headless chrome for awhile and this is super useful and much appreciated. And the web debugger is 🔥😍.

await page.click didn't resolve after append chrome inspector

First of all, thank you for this great project, i'm working on another project to show puppeteer action steps by chrome inspect and i got lots of ideas from your project.

Now i'm stuck by an very weird problem, once i append the puppeteer code with chrome inspector (i load a frame which url like https://chrome-devtools-frontend.appspot.com/serve_file/@7f3cdc3f76faecc6425814688e3b2b71bf1630a4/inspector.html?ws=127.0.0.1:3000/devtools/page/(CA197D0A33141BE44B19C0603ADD9E7C)&remoteFrontend=true ), the await page.click('dom-selecter') action alway blocked, but when i run puppeteer without chrome inspect, everything works fine.

there is no error show up. and i tried debug page.click function, the code run to below:

send(method, params = {}) {
    if (!this._connection)
      return Promise.reject(new Error(`Protocol error (${method}): Session closed. Most likely the page has been closed.`));
    const id = ++this._lastId;
    const message = JSON.stringify({id, method, params});
    debugSession('SEND ► ' + message);
    this._connection.send('Target.sendMessageToTarget', {sessionId: this._sessionId, message}).catch(e => {
      // The response from target might have been already dispatched.
      if (!this._callbacks.has(id))
        return;
      const callback = this._callbacks.get(id);
      this._callbacks.delete(id);
      callback.reject(e);
    });
    return new Promise((resolve, reject) => {
      this._callbacks.set(id, {resolve, reject, method});
    });
  }

it return a promise but this promise never resolved, so the code after page.click never run,

i tried figure out this problem for several days, but still no any progress , so i add an issue here to find any help from you, Do you have any idea about this problem?

i tried below code in browserless, and sometime this problem show.

await page.goto("https://weidian.com/item.html?itemID=1692458617")
await page.click("#buy_now");
await page.waitForSelecter('#item_control');
await page.click('#sku_ul > li:nth-child(2) > a')

Support for Cookies Object?

I emailed earlier but wanted to create an official issue about adding support for cookies in Chrome. This could be accomplished by passing a cookies object.

Let me know if you have follow up questions.

This site can’t be reached ... unexpectedly closed the connection.

I have installed the browserless/chrome docker image (as a service), and have opened port 3000, so that it should in theory be accessible via browserless.mysite.com:3000 (like lots of other web apps I've deployed in the same way). But when I browse to browserless.mysite.com:3000 I just get:

This site can’t be reached
browserless.mysite.com unexpectedly closed the connection

Any ideas?

Site can't load due to audio

Hi,

(it's me again)

I guess this error is probably due to the way the website has been developped but I'm wondering if there a fix for this.
When screenshoting this website I got an error that prevents to load the website
Uncaught (in promise) DOMException: Failed to load because no supported source was found.

https://chrome.browserless.io/?script=await%20page.goto(%27http%3A%2F%2Fhki.paris%2Fhome%27)%3B

With dumpio, I got more info :
[0926/072802.585761:INFO:CONSOLE(12)] "The Web Audio autoplay policy will be re-enabled in Chrome 70 (October 2018). Please check that your website is compatible with it. https://goo.gl/7K7WLu", source: http://hki.paris/build/desktop.js?v=1 (12) [0926/072810.183949:ERROR:render_media_log.cc(30)] MediaEvent: MEDIA_ERROR_LOG_ENTRY {"error":"FFmpegDemuxer: no supported streams"} [0926/072810.416384:ERROR:render_media_log.cc(30)] MediaEvent: PIPELINE_ERROR DEMUXER_ERROR_NO_SUPPORTED_STREAMS [0926/072810.921032:INFO:CONSOLE(0)] "Uncaught (in promise) NotSupportedError: Failed to load because no supported source was found.", source: http://hki.paris/home (0)

I tried to install FFMpeg on my server but It doesn't changed anything.

Any idea?

Thanks!

Edit:
I'm using chrome & not chromium

Edit 2 :
Here the list of args that I'm launching puppeteer with :
args: [ '--disable-gpu', '--no-sandbox', '--disable-setuid-sandbox', '--disable-translate', '--mute-audio', '--hide-scrollbars', '--disable-translate', '--ignore-certificate-errors', '--ignore-certificate-errors-spki-list' ],

An in-range update of ts-jest is breaking the build 🚨

The dependency ts-jest was updated from `23.10.0` to `23.10.1`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

ts-jest is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 13 commits.

d9c5b45 Merge pull request #743 from huafu/release/23.10.1
e4a3a09 chore(release): 23.10.1
ab94359 Merge pull request #742 from huafu/fix-740-no-js-compilation-with-allow-js
a844fd4 Merge branch 'master' into fix-740-no-js-compilation-with-allow-js
18dced1 Merge pull request #741 from huafu/e2e-weird-deep-paths
9e7d6a0 test(config): adds test related to allowJs
374dca1 fix(compile): js files were never transpiled thru TS
70fd9af ci(cache): removes some paths from the caching
c12dfff fix(windows): normalize paths
0141098 test(e2e): deep paths and coverage
6ccbff3 Merge pull request #736 from huafu/detect-import-and-throw-if-none
a2a4be2 fix(config): warn instead of forcing ESM interoperability
21644da Merge pull request #735 from huafu/master

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

Implement option to set server host

I encountered an issue in our gitlab-ci pipeline, that sometimes connection reset error occured when two or more docker containers communicate with each other. I found out, that to solve this issue, I should set the server host to bind to IP 0.0.0.0. I updated the IP address and it seems to be working without any issue now. We created our own image with this change, but it would be great, to have this also integrated in the official repo.

An in-range update of lighthouse is breaking the build 🚨

The dependency lighthouse was updated from `3.1.1` to `3.2.0`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

lighthouse is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Release Notes for 3.2.0 (2018-09-26)

Full Changelog

New Audits

add js-libraries audit, just listing detected js libs (#6081)

Faster

driver: deliver trace as events rather than a stream (#6056)
network-recorder: consider iframe responses finished. helps avoid pageload timeout (#6078)
replace WebInspector traceparser with native JSON.parse (#6099)

Core

add emulatedFormFactor setting (#6098)
remove some trivial uses of WebInspector (#6090)
use cssstyle to parse CSS colors instead of WebInspector (#6091)
initial refactor of computedArtifact import/caching (#5907)
asset-saver: stop creating screenshot files during --save-assets (#6066)
content-width: not applicable on desktop (#5893)
driver: add check to make sure Runtime.evaluate result exists (#6089)
icons: Add PNG check to manifest icon validation (#6024)
lhr: add top-level runtimeError (#6014)
- gather-runner: include error status codes in pageLoadError (#6051)
- smooth rough edges of pageLoadError display and reporting (#6083)
net-request: transferSize now shared via 'X-TotalFetchedSize' (#6050)
don't allow analysis of file:// urls (#5936)

Report

dont show zero ms savings in preconnect, preload audits (#5983)
align table headings & columns left/right (#6063)
audit: make dom-size table prettier (#6065)
cursor:pointer on Passed Audits, etc (#5977)
psi: remove redundant varience disclaimer (#6110)
util: ✅ audits should be in Passed Audits (#5963)
vulnerable-jslibs: tweak snyk link for highlighted matches (#6096)
xbrowser: replace Typed OM getComputedStyle() with CSSOM equivalent (#5984)

CLI

add --print-config flag (#6107)

Deps

snyk: update snyk snapshot (#6074)
[email protected] (#6106)
[email protected] (#6102)
[email protected] (faster sort) (#6073)
chrome-devtools-frontend@latest (#6101)

Docs

readme: add lighthouse4u (#6008)
readme: updated report screenshot to 3.1.0 (#6042)
readme: add lighthouse-badges to related projects (#5969)
recipes: update custom-audit package.json (#6007)
releasing: minor updates (#5345)

i18n

roll latest strings from TC (#6109)
mv locale files (#5981)
speed up replacement regex (#6072)

Misc

bump bundlesize threshold a little more (#6055)
runner: added locale to settings that can change between -G and -A (#6080)
tsc: add type checking to sentry usage (#5993)

Commits

The new version differs by 44 commits.

081864e 3.2.0 (#6120)
e15f87d docs(releasing): minor updates (#5345)
34b55b3 cli: add --print-config flag (#6107)
72b59c5 core(content-width): not applicable on desktop (#5893)
a097a23 report(psi): remove redundant varience disclaimer (#6110)
3a6f6c5 deps: [email protected] (#6106)
f5c043d core: add emulatedFormFactor setting (#6098)
b8d1496 i18n: roll latest strings from TC (#6109)
b49a1d2 deps: [email protected] (#6102)
fed4a88 report(vulnerable-jslibs): tweak snyk link for highlighted matches (#6096)
ce96d76 core(asset-saver): stop creating screenshot files during --save-assets (#6066)
67302a0 core: update chrome-devtools-frontend to latest (#6101)
f0e6dd9 core(driver): add check to make sure Runtime.evaluate result exists (#6089)
14d6450 core: replace WebInspector traceparser with native JSON.parse (#6099)
265d956 core: remove some trivial uses of WebInspector (#6090)

There are 44 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

Allow for sending in `content` POST body for /screenshot

Similar to the /pdf route, we should also allow for sending in a content parameter so that users don't have visit a live site for screenshots

user-data-dir

In order to get caching of service workers working in browserless (and puppeteer generally) I find that it's necessary to set a userDataDir via the --user-data-dir passed to puppeteer.launch(). This may be necessary to get plain http caching working too... not sure.

You can see (for service worker caching) this via the following... when the flag is passed via the ws, the browser is able to cache the content in the service worker, and subsequently (after the content has been cached) you'll see "retrieved from service worker cache". Without the flag, you just get "retrieved from network" no matter how many times you request the content.

(async () => {
  const browser = await puppeteer.connect({
    browserWSEndpoint: 'ws://browserless.mydomain.com:3000?--user-data-dir=/tmp'
  });
  const page = await browser.newPage();
  const url = 'https://cloud3squared.com/files/sw-stackoverflow-demo/index.html';
  await page.goto(url, { waitUntil: 'networkidle0' });
  await page.screenshot({ path: './temp.png' });
  browser.close();
})();

This is fine when only one browser has been instantiated by browserless under the hood. The problem comes when multiple browsers are all passed the same --user-data-dir and are all trying to access and write to the same storage area.

At least, I'm seeing a few issues when using browserless which I think result from this. It's partly educated guesswork at this stage.

I wonder if it's necessary to be able to somehow use a different --user-data-dir for each different browser that is instantiated, so that there is no conflict?

Or maybe it's possible to use /dev/shm (if it's big enough and hasn't been disabled) for --user-data-dir (I think that /dev/shm is "supposed" to be shared between different processes) ... I haven't yet tried that ... mainly because I can't since browserless disables /dev/shm by default.

P.S. the page rendered by browserless when "fetched from network" in the above example seems to be missing (unable to render) this character... the "tick" is rendered fine (for "fetched from service worker cache")

MaxListenersExceededWarning if MAX_CONCURRENT_SESSION>10 and PREBOOT_CHROME=true

If you use PREBOOT_CHROME & MAX_CONCURRENT_SESSION > 10 then you get a MaxListenersExceededWarning for a bunch of events:

% docker run -e "PREBOOT_CHROME=true" -e "MAX_CONCURRENT_SESSIONS=11" -p 3000:3000 browserless/chrome:latest
(node:27) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 exit listeners added. Use emitter.setMaxListeners() to increase limit
(node:27) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGINT listeners added. Use emitter.setMaxListeners() to increase limit
(node:27) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGTERM listeners added. Use emitter.setMaxListeners() to increase limit
(node:27) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGHUP listeners added. Use emitter.setMaxListeners() to increase limit

From Node documentation:

By default EventEmitters will print a warning if more than 10 listeners are added for a particular event.

Maybe browserless should increase the max listener allowed to the MAX_CONCURRENT_SESSION ?

Feature Request: allow option to disable the debugger

Per our conversation @joelgriffith, I am submitting this request to be able to disable the front-end debugger altogether. Maybe HEADLESS=true would just off the web ui?

Thanks much!

An in-range update of @types/node is breaking the build 🚨

The dependency @types/node was updated from `10.11.5` to `10.11.6`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

@types/node is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

RFC: Multiple version of Chrome in one image

Wanted to reach out and gather thoughts on bundling the last 2-5 version of Chrome in single image vs maintaining numerous image's with the only difference being the Chrome version.

PROS

Less docker and related built maintenance
Easier to understand build tools
Quick way to test different versions for performance and other intricacies
Can apply semver builds in docker (vs confusing release-puppeteer-x.x.x)

CONS

Bigger image bundles
Slightly more complex runtime (have to specify the puppeteer version in the URL or something similar).
Having to download several version of Chrome, and building relevant tooling for each in the debugger (all the tooltip stuffs).
Things like pre-booting Chrome become harder since you'd have to pre-boot one version only vs many.

Thoughts? Comments? Concerns?

DEPTH_ZERO_SELF_SIGNED_CERT Nginx Proxy

Hey, I can connect to my https://browserless.site just fine and the wss inititiates just fine. However, when I'm using the wss://browserless.site I get an error that returns DEPTH_ZERO_SELF_SIGNED_CERT. Any idea how I can fix this?

I did have it working over ws:// as the Getting Started recommends but I'd like to have my connection encrypted if possible.

Debugging connection was closed. Reason: WebSocket disconnected

I have followed the straightforward instructions from docker and was doing proxy via nginx, also configured the ssl as well via certbot. But it says debugging connection was closed. Below was the url I am trying to achieve.

https://browserless.ngelmat.me

An in-range update of @types/node is breaking the build 🚨

The dependency @types/node was updated from `10.11.0` to `10.11.1`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

@types/node is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

Unexpected PDF file size

When converting a HTML file that contains emojis, my PDF becomes extremely large. Without any emoji characters, the email is around 1.5MB. With emoji characters, the email becomes around 8MB in file size. Other than these emojis, the file is nothing special, just some plain text.

I saw that one of the features/advantages of Browserless it that fonts/emojis are supported out of the box, so this seems an unusual issue. I can print the file as PDF and it will be less than 1MB, so browserless seems to make this file exceptionally large.

Possibility of defining remote userDataDir

Hello! First of all, thanks for the amazing project. It helps a lot.

My problem is: I'm running browserless (hosted) in a ECS cluster. So my scraping jobs just connect to it (on the launch method) and get things running.

I need to keep a log-in session alive for several hours - without actually needing to keep a Chrome instance alive for that long (most of the time I'm just waiting for things to happen). So I'm saving the userDataDir somewhere and, when I need to get back to that session, I launch a new Chrome using this previously saved dir.

But I'm saving the userDataDir from a different machine than the one I run chrome.

Can I pass, somehow, this dir to browserless (without being way too hacky)?

Timeout issues ?

I'm wondering, would there be any timeout issues using this technique ?
For example if I'd launch a big crawl for +24 hours

I'd say no, but maybe you pushed it to its limits :)

Thanks

An in-range update of lodash is breaking the build 🚨

The dependency lodash was updated from `4.17.10` to `4.17.11`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

lodash is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build failed (Details).

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

Puppeteer v1.8.0

There is an issue "ignoreHTTPSErrors is not working when request interception is on" which is fixed with puppeteer v1.8.0. Can you move the project onto the latest version, Thanks Glenn

puppeteer/puppeteer#1159 (comment)

Problem with docker run option --shm-size=1gb

According to this it is necessary to pass --shm-size=1gb to docker run.

The problem I have is that I would not be firing browserless up via docker run but rather docker service create... and it's not possible to pass --shm-size as an option to that. There is a possible workaround to pass --mount instead, but to cut a long story short... that's not possible for me.

I have encountered this shm issue before, when deploying headless chrome to docker in a different way, and for that I am passing the option --disable-dev-shm-usage to puppeteer.launch()... based on this advice.

These two different options (i.e. --shm-size=1gb passed to docker run and --disable-dev-shm-usage passed to puppeteer.launch()) seem to be addressing the same problem, so...

Is it possible to pass a custom option (and in particular, --disable-dev-shm-usage) to puppeteer.launch() in browserless?

I think, if that were possible, it would solve my problem!

Would it anyway be better to have browserless pass --disable-dev-shm-usage to puppeteer.launch() by default... since there is then no longer any need to advise users to pass --shm-size=1gb to docker run? Just a simple docker run -p 3000:3000 browserless/chrome.

But that, as well as the ability to pass other custom options, would be nice.

EDIT OK now that I look at the browserless launch code, I see that you do already pass disable-dev-shm-usage to puppeteer.launch()... so I suppose that the above question then becomes... is it actually still necessary to pass --shm-size=1gb to docker run if you're also passing disable-dev-shm-usage to puppeteer.launch()... aren't they there for the same reason?

Document how to run Docker container with custom arguments

Having read README.md, it is unclear how to start a docker container with customer arguments.

I have tried:

$ docker run -p 3000:3000 browserless/chrome --proxy-server=127.0.0.1:8050

But that does not work.

Furthermore, even if this did work, how would one then acquire the host IP address? That requires checking ifconfig on the container.

Perhaps whats needed is an initialisation script (JavaScript) that could be loaded to the container and used to configure the instance.

Support Puppeteer 1.5

With Puppeteer 1.5 connecting fails with:

     Error: Protocol error (Target.getBrowserContexts): 'Target.getBrowserContexts' wasn't found undefined
      at Promise (node_modules/puppeteer/lib/Connection.js:86:56)
  From previous event:
      at Connection.send (node_modules/puppeteer/lib/Connection.js:85:12)
      at Function.connect (node_modules/puppeteer/lib/Launcher.js:257:50)
      at <anonymous>

Proxy per page?

Hey!

One can set a proxy like:

  const browser = await puppeteer.launch({
    headless: false,
    args: [
      '--proxy-server=PROXY_URI'
    ],
  });

Is there a possibility to set different proxies for different pages in the same browser instance?
Like:

browser1, page1 => proxy1
browser1, page2 => proxy2

Cheers

Dockerhub tag for build with puppeteer-1.1.1

Hello,

I was wondering if it would be possible to create a tag on Docker Hub for the build with puppeteer-1.1.1
I'd prefer to use this over latest to prevent potentially breaking changes being pulled when I'm rebuilding my service.

Thanks a lot,
Andy

Chinese characters do not render properly: missing system fallback font (?)

Repro steps:

Render a page containing some Chinese characters such as the following: 中文樣本文本
Load page with browserless + capture as pdf or png

Expected: Characters rendered correctly in output

Actual: Chinese characters rendered as missing glyph "box"

How to configure a proxy?

Is there a way to configure a proxy used by Chrome to make all the HTTP requests?

Video tag support

Hi,

As puppeteer doesn't support mp4 codec by default and the only way to fix that is to add a path to Chrome in puppeteer.launch. (see here and here)

How could it work with browserless?
Is there a way to launch browserless with a specific Chrome app?

Thanks!

An in-range update of joi is breaking the build 🚨

The dependency joi was updated from `13.6.0` to `13.7.0`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

joi is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 9 commits.

e4b82f6 13.7.0
6bbbdaf Add documentation for #1562.
1e837de Merge pull request #1599 from rluba/patch-1
fd1911a Link to isemail for email() options
a496210 Merge pull request #1572 from dnalborczyk/patch-1
73f3efd Update API.md
da70a73 Merge pull request #1562 from kanongil/symbol-support
070d3c9 Remove symbol key for map and revise stringification
8f7f242 Add symbol() type

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

You’re doing an amazing job!

Hi, just want to say you really are an inspiration towards an indiehacker path. Always good to see someone who’s hustling and making it work and living life on their own terms.

browserless / browserless Goto Github PK

browserless's Introduction

Table of Contents

External links

Features

General

Cloud

How it works

Docker

Hosting Providers

Puppeteer

Playwright

Extending (NodeJS SDK)

Usage with other libraries

Motivations

Licensing

browserless's People

Contributors

Stargazers

Watchers

Forkers

browserless's Issues

The dependency puppeteer was updated from 1.8.0 to 1.9.0.

Big Changes

API Changes

Added:

Bug Fixes

Raw Notes

The devDependency husky was updated from 1.1.0 to 1.1.1.

The dependency express-http-proxy was updated from 1.3.0 to 1.4.0.

The dependency ts-jest was updated from 23.10.0 to 23.10.1.

The dependency lighthouse was updated from 3.1.1 to 3.2.0.

New Audits

Faster

Core

Report

CLI

Deps

Docs

i18n

Misc

The dependency @types/node was updated from 10.11.5 to 10.11.6.

The dependency @types/node was updated from 10.11.0 to 10.11.1.

The dependency lodash was updated from 4.17.10 to 4.17.11.

The dependency joi was updated from 13.6.0 to 13.7.0.

Recommend Projects

Recommend Topics

Recommend Org

The dependency puppeteer was updated from `1.8.0` to `1.9.0`.

The devDependency husky was updated from `1.1.0` to `1.1.1`.

The dependency express-http-proxy was updated from `1.3.0` to `1.4.0`.

The dependency ts-jest was updated from `23.10.0` to `23.10.1`.

The dependency lighthouse was updated from `3.1.1` to `3.2.0`.

The dependency @types/node was updated from `10.11.5` to `10.11.6`.

The dependency @types/node was updated from `10.11.0` to `10.11.1`.

The dependency lodash was updated from `4.17.10` to `4.17.11`.

The dependency joi was updated from `13.6.0` to `13.7.0`.