Giter Club home page Giter Club logo

docker-puppeteer's Introduction

puppeteer docker image

docker image with Google Puppeteer installed

and screenshots scripts

nodesource/node

docker tags

  • latest
  • 1
  • 1.1.1
  • 1.1.0
  • 1.0.0
  • 0
  • 0.13.0
  • 0.12.0
  • 0.11.0
  • 0.10.2
  • 0.10.1
  • 0.10.0
  • 0.9.0

install

docker pull alekzonder/puppeteer:latest
# OR
docker pull alekzonder/puppeteer:1.0.0
# OR
docker pull alekzonder/puppeteer:1

before usage

  1. you should pass --no-sandbox, --disable-setuid-sandbox args when launch browser
const puppeteer = require('puppeteer');

(async() => {

    const browser = await puppeteer.launch({
        args: [
            '--no-sandbox',
            '--disable-setuid-sandbox'
        ]
    });

    const page = await browser.newPage();

    await page.goto('https://www.google.com/', {waitUntil: 'networkidle2'});

    browser.close();

})();
  1. if you got page crash with BUS_ADRERR (chromium issue), increase shm-size on docker run with --shm-size argument
docker run --shm-size 1G --rm -v <path_to_script>:/app/index.js alekzonder/puppeteer:latest
  1. If you're seeing random navigation errors (unreachable url) it's likely due to ipv6 being enabled in docker. Navigation errors are caused by ERR_NETWORK_CHANGED (-21) in chromium. Disable ipv6 in your container using --sysctl net.ipv6.conf.all.disable_ipv6=1 to fix:
docker run --shm-size 1G --sysctl net.ipv6.conf.all.disable_ipv6=1 --rm -v <path_to_script>:/app/index.js alekzonder/puppeteer:latest
  1. add --enable-logging for chrome debug logging http://www.chromium.org/for-testers/enable-logging
const puppeteer = require('puppeteer');

(async() => {

    const browser = await puppeteer.launch({args: [
        '--no-sandbox',
        '--disable-setuid-sandbox',

        // debug logging
        '--enable-logging', '--v=1'
    ]});

usage

mount your script to /app/index.js

docker run --shm-size 1G --rm -v <path_to_script>:/app/index.js alekzonder/puppeteer:latest

custom script from dir

docker run --shm-size 1G --rm \
 -v <path_to_dir>:/app \
 alekzonder/puppeteer:latest \
 node my_script.js

screenshots tools

simple screenshot tools in image

docker run --shm-size 1G --rm -v /tmp/screenshots:/screenshots \
 alekzonder/puppeteer:latest \
 <screenshot,full_screenshot,screenshot_series,full_screenshot_series> 'https://www.google.com' 1366x768

screenshot tools syntax

<tool> <url> <width>x<height> [<delay_in_ms>]

  • delay_in_ms: is optional (defaults to 0)
    • Waits for delay_in_ms milliseconds before taking the screenshot

screenshot

docker run --shm-size 1G --rm -v /tmp/screenshots:/screenshots \
 alekzonder/puppeteer:latest \
 screenshot 'https://www.google.com' 1366x768

output: one line json

{
    "date":"2017-09-01T05:03:27.464Z",
    "timestamp":1504242207,
    "filename":"screenshot_1366_768.png",
    "width":1366,
    "height":768
}

got screenshot in /tmp/screenshots/screenshot_1366_768.png

full_screenshot

save full screenshot of page

docker run --shm-size 1G --rm -v /tmp/screenshots:/screenshots \
 alekzonder/puppeteer:latest \
 full_screenshot 'https://www.google.com' 1366x768

screenshot_series, full_screenshot_series

adds datetime in ISO format into filename

useful for cron screenshots

docker run --shm-size 1G --rm -v /tmp/screenshots:/screenshots \
 alekzonder/puppeteer:latest \
 screenshot_series 'https://www.google.com' 1366x768
docker run --shm-size 1G --rm -v /tmp/screenshots:/screenshots \
 alekzonder/puppeteer:latest \
 full_screenshot_series 'https://www.google.com' 1366x768
2017-09-01T05:08:55.027Z_screenshot_1366_768.png
# OR
2017-09-01T05:08:55.027Z_full_screenshot_1366_768.png

docker-puppeteer's People

Contributors

alekzonder avatar captn3m0 avatar esdras avatar foxted avatar hofmeister avatar ityoung avatar ryuheechul avatar schliflo avatar sidkwok avatar tinti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

docker-puppeteer's Issues

Fails with 'jest-puppeteer' due to missing 'procps' package

I found this while using this image in GitLab's CI environment, using jest-puppeteer. The default globalTeardown relies on the ps utility which is not included by default.

It errors out:

PASS functional-tests/app.test.js
Test Suites: 2 passed, 2 total
Tests:       5 passed, 5 total
Snapshots:   0 total
Time:        18.289 s
Ran all test suites.
events.js:292
      throw er; // Unhandled 'error' event
      ^
Error: spawn ps ENOENT
    at Process.ChildProcess._handle.onexit (internal/child_process.js:267:19)
    at onErrorNT (internal/child_process.js:469:16)
    at processTicksAndRejections (internal/process/task_queues.js:84:21)
Emitted 'error' event on ChildProcess instance at:
    at Process.ChildProcess._handle.onexit (internal/child_process.js:273:12)
    at onErrorNT (internal/child_process.js:469:16)
    at processTicksAndRejections (internal/process/task_queues.js:84:21) {
  errno: 'ENOENT',
  code: 'ENOENT',
  syscall: 'spawn ps',
  path: 'ps',
  spawnargs: [ '-o', 'pid', '--no-headers', '--ppid', 119 ]
}

To work around this, I would need to install the procps package after pulling the image (using before_script):

apt-get update && apt-get install -y procps

Unfortunately, the user pptruser is non-privileged and there is no easy way of adding a package in a CI environment, so I suggest adding this to the base image.

Screenshots of long pages get interrupted

Screnshooting longer pages result in incomplete images.

Example:

docker run --shm-size 1G --rm -v ${PWD}:/screenshots alekzonder/puppeteer:latest full_screenshot_series "https://medium.com/@micheledaliessi/how-does-the-blockchain-work-98c8cd01d2ae" 1366x768 1500

Gives this result:
image

Is there any way to receive a complete screenshot of longer pages?

Node modules conflict

What if I would install module foo (for example), which would require compiling with something like node-gyp, then with the code like this:

const puppeteer = require('puppeteer');
const foo = require('foo');
....

and with executing recommended command like this:

$ docker run --shm-size 1G --rm -v <my-application>:/app/index.js alekzonder/puppeteer:latest

My foo module would be used on the environment it wasn't compiled for, correct?

Furthermore, what if my project has a module which has puppeteer in dependencies (i.e. something like this – https://github.com/americanexpress/jest-image-snapshot) my understanding is when you do –

const puppeteer = require('puppeteer');
...

puppeteer would be loaded of one of submodule, not of the docker one?

If so, what would be the recommended workaround?

Cannot find module '/app/index.js'

docker run --rm -v /root/www/app/index.js:/app/index.js alekzonder/puppeteer:latest
module.js:549
throw err;
^

Error: Cannot find module '/app/index.js'
at Function.Module._resolveFilename (module.js:547:15)
at Function.Module._load (module.js:474:25)
at Function.Module.runMain (module.js:693:10)
at startup (bootstrap_node.js:191:16)
at bootstrap_node.js:612:3

Error: EACCES: permission denied, open '/app/w.js'

I used the synology‘s docker.

# docker run --shm-size 1G --rm -v /volume1/homes/lese/app:/app alekzonder/puppeteer:latest node w.js

fs.js:646
  return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
                 ^

Error: EACCES: permission denied, open '/app/w.js'
    at Object.fs.openSync (fs.js:646:18)
    at Object.fs.readFileSync (fs.js:551:33)
    at Object.Module._extensions..js (module.js:662:20)
    at Module.load (module.js:565:32)
    at tryModuleLoad (module.js:505:12)
    at Function.Module._load (module.js:497:3)
    at Function.Module.runMain (module.js:693:10)
    at startup (bootstrap_node.js:191:16)
    at bootstrap_node.js:612:3

and w.js just console.log(1)

I don't seem to have permission when I log in with root and ssh

Container don't exit on error

(node:6) UnhandledPromiseRejectionWarning: Error: EACCES: permission denied, open '/app/out/0502721643.pdf'


(node:6) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)


(node:6) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

But container is still in running status.

Error: net::ERR_FAILED at https://www.google.com/

root@test:~/test# node index.js
(node:10397) UnhandledPromiseRejectionWarning: Error: net::ERR_FAILED at https://www.google.com/
at navigate (/home/s/node_modules/puppeteer/lib/FrameManager.js:120:37)
at process._tickCallback (internal/process/next_tick.js:68:7)
-- ASYNC --
at Frame. (/home/s/node_modules/puppeteer/lib/helper.js:111:15)
at Page.goto (/home/s/node_modules/puppeteer/lib/Page.js:670:49)
at Page. (/home/s/node_modules/puppeteer/lib/helper.js:112:23)
at /home/s/test/index.js:14:16
at process._tickCallback (internal/process/next_tick.js:68:7)
(node:10397) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:10397) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Your Readme Is Letting You Down

I would make a PR if I could get this to work.

Path to what script? In the container? On the host? What is index.js?
image

Am I supposed to do "before usage" 1,2,3,4? Why number things that aren't steps? Before usage of what? The jobs? A script to do things automatically?

The stuff under usage looks like the stuff in before usage? Is before usage supposed to be for bugs? Is that supposed to be in a "common issues" section?

Do I need these example scripts? Am I putting them into an index.js file or do I need that and something else? Is that a node app that I'm screenshotting? Can I just run the job, or are these two different options?

can't npm install globally

npm WARN checkPermissions Missing write access to /usr/local/lib/node_modules
npm ERR! path /usr/local/lib/node_modules
npm ERR! code EACCES
npm ERR! errno -13
npm ERR! syscall access
npm ERR! Error: EACCES: permission denied, access '/usr/local/lib/node_modules'
npm ERR!  { Error: EACCES: permission denied, access '/usr/local/lib/node_modules'
npm ERR!   stack: 'Error: EACCES: permission denied, access \'/usr/local/lib/node_modules\'',
npm ERR!   errno: -13,
npm ERR!   code: 'EACCES',
npm ERR!   syscall: 'access',
npm ERR!   path: '/usr/local/lib/node_modules' }
npm ERR! 
npm ERR! Please try running this command again as root/Administrator.

Monitoring requested url returns "undefined"

Thank you for creating this easy to start docker image.
While trying to use the following code to monitor the URL requested,
the code returns:"undefined"
How can I get the requested URLs?
Thank you in advance.

//////////////////////////////
const puppeteer = require('puppeteer');
 (async() => {

    const browser = await puppeteer.launch({
        args: [
            '--no-sandbox',
            '--disable-setuid-sandbox',
            '--enable-logging', '--v=1'
        ]
    });

    const page = await browser.newPage();
    await page.setRequestInterception(true);
    page.on('request', request => {
	    console.log(request.url);
            request.continue();
	  });

    await page.goto('https://camo.githubusercontent.com/05492f5e135964801f6cbe748dc7668925a965e2/687474703a2f2f646f636b6572692e636f2f696d6167652f616c656b7a6f6e6465722f707570706574656572', {waitUntil: 'networkidle2'});

    browser.close();
})();
//////////////

Bind volumes don't work

Thanks for the excellent work Alek,

Your Dockerfile changes the user from root which breaks bind volumes on docker for linux. The use case is that I output a pdf to the bind volume so I can access it from the docker host. When I run

docker run -it --entrypoint=/bin/bash -v `pwd`/tmp:/app/tmp alekzonder/docker-puppeteer
pptruser@ip-172-31-26-224:/app$ cd tmp
pptruser@ip-172-31-26-224:/app/tmp$ mkdir f
mkdir: cannot create directory ‘f’: Permission denied
pptruser@ip-172-31-26-224:/app/tmp$ ls -all .
total 8
drwxrwxr-x 2      500      500 4096 Feb 27 18:43 .
drwxr-xr-x 1 pptruser pptruser 4096 Feb 27 19:21 ..

So you see the bind volume is not owned by pptruser, therefore we cannot write files to it to share with the docker host. You might want to mention a workaround in Readme.md. If you can suggest one I'm happy to submit a PR.

Can't install additional software

Using it as GitLab CI image. Need PIP3 library for deployment

image: alekzonder/puppeteer:latest

services:
  - docker:dind

Getting an error:

$ apt-get install python3-pip -y
E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?

yarn version is too old

The base docker image node:8-slim has yarn-v1.6.0, but this image has yarn-v1.3.2.

$ docker run alekzonder/puppeteer:latest yarn -v
1.3.2

$ docker run node:8-slim yarn -v
1.6.0

Error: spawn EACCES

The version 1.1.0 throws an Error: spawn EACCES when I try to launch puppeteer. Not sure if this important but I have puppeteer installed locally in my project.

can't take screenshots EACCES permission denied exception

docker run --shm-size 1G --rm -v $(pwd):/screenshots alekzonder/docker-puppeteer:latest full_screenshot 'https://www.google.com' 1366x768

{ Error: EACCES: permission denied, open '/screenshots/full_screenshot_1366_768.png'
errno: -13,
code: 'EACCES',
syscall: 'open',
path: '/screenshots/full_screenshot_1366_768.png' } Promise {
{ Error: EACCES: permission denied, open '/screenshots/full_screenshot_1366_768.png'
errno: -13,
code: 'EACCES',
syscall: 'open',
path: '/screenshots/full_screenshot_1366_768.png' } }

root password

I'm having connectivity issues (only the first request worked, then i get could not resolve error) and i would like to do things like service networking restart inside the machine but i need the root password for that.
¿Could you give it?

O.S: Windows 10
Command:

 docker run --net=host --dns 8.8.8.8 --dns 8.8.4.4 --rm -v ~/Phpstorm/test:/app/ alekzonder/puppeteer:latest node index.js

Timed out after 30000 ms while trying to connect to Chrome! The only Chrome revision guaranteed to work is r571375

Avoid --no-sandbox with a non-root user.

The Official Puppeteer Docker guidance includes various steps to avoid operating as a root user, which allows sidestepping the sandbox issues. I configured something based on the official docs, with a bit of flare for this project, by adding the following layers to my Dockerfile:

# Add user so we don't need --no-sandbox.
RUN groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
    && mkdir -p /home/pptruser/Downloads \
    && chown -R pptruser:pptruser /home/pptruser \
    && chown -R pptruser:pptruser /screenshots \
    && chown -R pptruser:pptruser /usr/local/share/.config/yarn/global/node_modules

USER pptruser

Allow running in headful mode

Great docker setup here, but was hoping I'd be able to run this in headful mode out of the box (headless: false).

can't launch chrome extension

I want to load a extension, but it always tips Navigation Timeout Exceeded: 30000ms exceeded.

async ({ option }) => {

const pathToExtension = '../mydir';
  const browser = puppeteer.launch({
    headless: false,
    args: [
      `--disable-dev-shm-usage`,
      `--disable-extensions-except=${pathToExtension}`,
      '--no-sandbox', 
      '--disable-setuid-sandbox',
      `--load-extension=${pathToExtension}`
    ]
  });

});

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.