alekzonder / docker-puppeteer Goto Github PK
View Code? Open in Web Editor NEWdocker image with Google Puppeteer installed
Home Page: https://hub.docker.com/r/alekzonder/puppeteer/
License: MIT License
docker image with Google Puppeteer installed
Home Page: https://hub.docker.com/r/alekzonder/puppeteer/
License: MIT License
I would make a PR if I could get this to work.
Path to what script? In the container? On the host? What is index.js?
Am I supposed to do "before usage" 1,2,3,4? Why number things that aren't steps? Before usage of what? The jobs? A script to do things automatically?
The stuff under usage looks like the stuff in before usage? Is before usage supposed to be for bugs? Is that supposed to be in a "common issues" section?
Do I need these example scripts? Am I putting them into an index.js file or do I need that and something else? Is that a node app that I'm screenshotting? Can I just run the job, or are these two different options?
The Official Puppeteer Docker guidance includes various steps to avoid operating as a root user, which allows sidestepping the sandbox issues. I configured something based on the official docs, with a bit of flare for this project, by adding the following layers to my Dockerfile:
# Add user so we don't need --no-sandbox.
RUN groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
&& mkdir -p /home/pptruser/Downloads \
&& chown -R pptruser:pptruser /home/pptruser \
&& chown -R pptruser:pptruser /screenshots \
&& chown -R pptruser:pptruser /usr/local/share/.config/yarn/global/node_modules
USER pptruser
The version 1.1.0
throws an Error: spawn EACCES
when I try to launch puppeteer. Not sure if this important but I have puppeteer installed locally in my project.
Thank you for creating this easy to start docker image.
While trying to use the following code to monitor the URL requested,
the code returns:"undefined"
How can I get the requested URLs?
Thank you in advance.
//////////////////////////////
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch({
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--enable-logging', '--v=1'
]
});
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', request => {
console.log(request.url);
request.continue();
});
await page.goto('https://camo.githubusercontent.com/05492f5e135964801f6cbe748dc7668925a965e2/687474703a2f2f646f636b6572692e636f2f696d6167652f616c656b7a6f6e6465722f707570706574656572', {waitUntil: 'networkidle2'});
browser.close();
})();
//////////////
I want to load a extension, but it always tips Navigation Timeout Exceeded: 30000ms exceeded.
async ({ option }) => {
const pathToExtension = '../mydir';
const browser = puppeteer.launch({
headless: false,
args: [
`--disable-dev-shm-usage`,
`--disable-extensions-except=${pathToExtension}`,
'--no-sandbox',
'--disable-setuid-sandbox',
`--load-extension=${pathToExtension}`
]
});
});
Great docker setup here, but was hoping I'd be able to run this in headful mode out of the box (headless: false
).
Is there any reason preventing this docker image from using node:lts
(10.15.*)?
I used the synology‘s docker.
# docker run --shm-size 1G --rm -v /volume1/homes/lese/app:/app alekzonder/puppeteer:latest node w.js
fs.js:646
return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
^
Error: EACCES: permission denied, open '/app/w.js'
at Object.fs.openSync (fs.js:646:18)
at Object.fs.readFileSync (fs.js:551:33)
at Object.Module._extensions..js (module.js:662:20)
at Module.load (module.js:565:32)
at tryModuleLoad (module.js:505:12)
at Function.Module._load (module.js:497:3)
at Function.Module.runMain (module.js:693:10)
at startup (bootstrap_node.js:191:16)
at bootstrap_node.js:612:3
and w.js just console.log(1)
I don't seem to have permission when I log in with root and ssh
it used by karma... and would make it much easier to call and also upgrade the image if the path changes
root@test:~/test# node index.js
(node:10397) UnhandledPromiseRejectionWarning: Error: net::ERR_FAILED at https://www.google.com/
at navigate (/home/s/node_modules/puppeteer/lib/FrameManager.js:120:37)
at process._tickCallback (internal/process/next_tick.js:68:7)
-- ASYNC --
at Frame. (/home/s/node_modules/puppeteer/lib/helper.js:111:15)
at Page.goto (/home/s/node_modules/puppeteer/lib/Page.js:670:49)
at Page. (/home/s/node_modules/puppeteer/lib/helper.js:112:23)
at /home/s/test/index.js:14:16
at process._tickCallback (internal/process/next_tick.js:68:7)
(node:10397) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:10397) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
(node:6) UnhandledPromiseRejectionWarning: Error: EACCES: permission denied, open '/app/out/0502721643.pdf'
(node:6) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:6) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
But container is still in running status.
Line 41 in 35983d4
Could you explain that comment a bit more?
Screnshooting longer pages result in incomplete images.
Example:
docker run --shm-size 1G --rm -v ${PWD}:/screenshots alekzonder/puppeteer:latest full_screenshot_series "https://medium.com/@micheledaliessi/how-does-the-blockchain-work-98c8cd01d2ae" 1366x768 1500
Is there any way to receive a complete screenshot of longer pages?
What if I would install module foo
(for example), which would require compiling with something like node-gyp
, then with the code like this:
const puppeteer = require('puppeteer');
const foo = require('foo');
....
and with executing recommended command like this:
$ docker run --shm-size 1G --rm -v <my-application>:/app/index.js alekzonder/puppeteer:latest
My foo
module would be used on the environment it wasn't compiled for, correct?
Furthermore, what if my project has a module which has puppeteer
in dependencies (i.e. something like this – https://github.com/americanexpress/jest-image-snapshot) my understanding is when you do –
const puppeteer = require('puppeteer');
...
puppeteer
would be loaded of one of submodule, not of the docker one?
If so, what would be the recommended workaround?
Any utf-8 char in screenshot seems can't display
I was using 'latest' tag and realised that it was using puppeteer version 1.6.0 while tag '1.8.0' uses puppeteer version 1.8.0. IMO the 'latest' tag should point to the latest release.
I found this while using this image in GitLab's CI environment, using jest-puppeteer. The default globalTeardown
relies on the ps
utility which is not included by default.
It errors out:
PASS functional-tests/app.test.js
Test Suites: 2 passed, 2 total
Tests: 5 passed, 5 total
Snapshots: 0 total
Time: 18.289 s
Ran all test suites.
events.js:292
throw er; // Unhandled 'error' event
^
Error: spawn ps ENOENT
at Process.ChildProcess._handle.onexit (internal/child_process.js:267:19)
at onErrorNT (internal/child_process.js:469:16)
at processTicksAndRejections (internal/process/task_queues.js:84:21)
Emitted 'error' event on ChildProcess instance at:
at Process.ChildProcess._handle.onexit (internal/child_process.js:273:12)
at onErrorNT (internal/child_process.js:469:16)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
errno: 'ENOENT',
code: 'ENOENT',
syscall: 'spawn ps',
path: 'ps',
spawnargs: [ '-o', 'pid', '--no-headers', '--ppid', 119 ]
}
To work around this, I would need to install the procps
package after pulling the image (using before_script):
apt-get update && apt-get install -y procps
Unfortunately, the user pptruser
is non-privileged and there is no easy way of adding a package in a CI environment, so I suggest adding this to the base image.
i need a require named sprintf-js, and i am new to docker, could you please instruct me to install my dependency through npm install?
I'm having connectivity issues (only the first request worked, then i get could not resolve
error) and i would like to do things like service networking restart
inside the machine but i need the root password for that.
¿Could you give it?
O.S: Windows 10
Command:
docker run --net=host --dns 8.8.8.8 --dns 8.8.4.4 --rm -v ~/Phpstorm/test:/app/ alekzonder/puppeteer:latest node index.js
Timed out after 30000 ms while trying to connect to Chrome! The only Chrome revision guaranteed to work is r571375
The base docker image node:8-slim
has yarn-v1.6.0, but this image has yarn-v1.3.2.
$ docker run alekzonder/puppeteer:latest yarn -v
1.3.2
$ docker run node:8-slim yarn -v
1.6.0
The Official Puppeteer Docker guidance recommends using dumb-init as an entrypoint to prevent chrome from spinning up and being unable to terminate processes. Adding it here will make the image more robust.
Using it as GitLab CI image. Need PIP3 library for deployment
image: alekzonder/puppeteer:latest
services:
- docker:dind
Getting an error:
$ apt-get install python3-pip -y
E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
docker run --shm-size 1G --rm -v $(pwd):/screenshots alekzonder/docker-puppeteer:latest full_screenshot 'https://www.google.com' 1366x768
{ Error: EACCES: permission denied, open '/screenshots/full_screenshot_1366_768.png'
errno: -13,
code: 'EACCES',
syscall: 'open',
path: '/screenshots/full_screenshot_1366_768.png' } Promise {
{ Error: EACCES: permission denied, open '/screenshots/full_screenshot_1366_768.png'
errno: -13,
code: 'EACCES',
syscall: 'open',
path: '/screenshots/full_screenshot_1366_768.png' } }
Provide latest node js support for image
docker run --rm -v /root/www/app/index.js:/app/index.js alekzonder/puppeteer:latest
module.js:549
throw err;
^
Error: Cannot find module '/app/index.js'
at Function.Module._resolveFilename (module.js:547:15)
at Function.Module._load (module.js:474:25)
at Function.Module.runMain (module.js:693:10)
at startup (bootstrap_node.js:191:16)
at bootstrap_node.js:612:3
Thanks for the excellent work Alek,
Your Dockerfile changes the user from root which breaks bind volumes on docker for linux. The use case is that I output a pdf to the bind volume so I can access it from the docker host. When I run
docker run -it --entrypoint=/bin/bash -v `pwd`/tmp:/app/tmp alekzonder/docker-puppeteer
pptruser@ip-172-31-26-224:/app$ cd tmp
pptruser@ip-172-31-26-224:/app/tmp$ mkdir f
mkdir: cannot create directory ‘f’: Permission denied
pptruser@ip-172-31-26-224:/app/tmp$ ls -all .
total 8
drwxrwxr-x 2 500 500 4096 Feb 27 18:43 .
drwxr-xr-x 1 pptruser pptruser 4096 Feb 27 19:21 ..
So you see the bind volume is not owned by pptruser, therefore we cannot write files to it to share with the docker host. You might want to mention a workaround in Readme.md. If you can suggest one I'm happy to submit a PR.
npm WARN checkPermissions Missing write access to /usr/local/lib/node_modules
npm ERR! path /usr/local/lib/node_modules
npm ERR! code EACCES
npm ERR! errno -13
npm ERR! syscall access
npm ERR! Error: EACCES: permission denied, access '/usr/local/lib/node_modules'
npm ERR! { Error: EACCES: permission denied, access '/usr/local/lib/node_modules'
npm ERR! stack: 'Error: EACCES: permission denied, access \'/usr/local/lib/node_modules\'',
npm ERR! errno: -13,
npm ERR! code: 'EACCES',
npm ERR! syscall: 'access',
npm ERR! path: '/usr/local/lib/node_modules' }
npm ERR!
npm ERR! Please try running this command again as root/Administrator.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.