dhamaniasad / headlessbrowsers Goto Github PK
View Code? Open in Web Editor NEWA list of (almost) all headless web browsers in existence
Home Page: http://dhamaniasad.github.io/HeadlessBrowsers
License: Creative Commons Zero v1.0 Universal
A list of (almost) all headless web browsers in existence
Home Page: http://dhamaniasad.github.io/HeadlessBrowsers
License: Creative Commons Zero v1.0 Universal
Thank you for the comprehensive list.
Have you considered woob ('web outside of browsers')?
From their readme
Under the covers it uses Electron, which is similar to PhantomJS but roughly twice as fast and more modern. Because Nightmare uses Electron, it is your responsibility to ensure that the webpages loaded by Nightmare are not malicious. If you do load a malicious website, that website can execute arbitrary code on your computer.
Basically, some of these browsers (nightmare
comes to mind) use electron as their runtime, and electron requires a working X11 install!
Personally, I think if a browser requires X to be installed, it's not really headless, but in any event, this is a critical distinction for many use cases (I do a bit of browser automation, entirely in environments that don't have any X install whatsoever).
If a "headless" browser requires X and xvfb
, at that point you might as well just run full chromium or firefox with webdriver or similar.
It'd be nice to disambiguate between "headless", as in "doesn't open a visible window" and actually headless, as in has no requirement for a framebuffer or X11/etc...
Hi,
From the list options
What would be good alternatives for the "puppeteer"?
I basically need two things:
Set the agent and wait for the site to load completely!
await page.setUserAgent ('Mozilla / 5.0 (Windows NT 10.0; Win64; x64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 61.0.3163.100 Safari / 537.36');
await page.goto ('https://www.site.org', {
waitUntil: 'networkidle0',
});
But the "puppeteer" is spending a lot of resources ๐
Thanks in advance!
After following this:
https://cat.ninja/using-selenium-with-headless-firefox-on-freebsd/
And many other tutorials I still cannot get what I need is that I want to get and save the content of a webpage using javascript, the way it looks like in the browser.
After spending hours on dead projects like console crawler and phantomjs I have found selenium, and it MIGHT be the solution for what I need but I still cannot get it to work.
OS: latest FreeBSD stable FreeBSD 11.2-RELEASE
pkg install py36-pip
pip-3.6 install selenium
pkg install geckodriver
pkg install firefox-62.0.2,1
Example1:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()
Traceback (most recent call last):
File "open.py", line 1, in <module>
from selenium import webdriver
ImportError: No module named selenium
I try:
python3.6 a.sel
Traceback (most recent call last):
File "a.sel", line 5, in <module>
br = webdriver.Firefox()
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/firefox/webdriver.py", line 162, in __init__
keep_alive=True)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 154, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 243, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 312, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: Unable to find a matching set of capabilities
Example 2
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
br = webdriver.Firefox()
br.get('http://www.google.com/')
save_me = ActionChains(br).key_down(Keys.CONTROL)\
.key_down('s').key_up(Keys.CONTROL).key_up('s')
save_me.perform()
Let's start it with that these are python, why on earth this is not mentioned anywhere that this is freaking python, why cannot at very least a simple example be provided for opening a god damn webpage?!?
Chrome has a headless mode now in Linux/macOS with version 59 released, and will be headless on Windows with version 60.
https://developers.google.com/web/updates/2017/04/headless-chrome
https://chromium.googlesource.com/chromium/src/+/lkgr/headless/README.md
http://dhamaniasad.github.io/HeadlessBrowsers not working
Firstly thanks for your list, it really help.
testCafe mainly use typescript
Before make a PR i like to ask, in my opinion it's good to have an additional info on the list. The main usage of the headlessbrowser. AFAIK the dalekjs
and testCafe
used to the web page, and the phantomjs have more wide usage.
I need the any opinion on this, hopefully we can restructure the list to have an additional info. ๐
There are two more headless browsers for .net:
ScrapySharp (MIT)
https://github.com/rflechner/ScrapySharp
(works great, but unfortunately no JavaScript support)
Optimus (MIT)
https://github.com/RusKnyaz/Optimus
(allows to plug in different JavaScript implementations)
Hi
Just found your list via the Twitters - nice work! I don't see headless Chromium (https://chromium.googlesource.com/chromium/src/+/lkgr/headless/README.md) or JSDOM (https://github.com/tmpvar/jsdom) and thought you might want to consider adding them?
Cheers
Many of these are just "emulating" a browser to more or lesser extent, and some of them actually use the rendering engines of full standards-compliant browsers.
I've been working on Wendigo a Puppeteer wrapper to make testing easier.
I wanted to know if it was suitable for this list, as it is not a HeadlessBrowser itself, but a wrapper of one.
But consider the project https://github.com/rubycdp/ferrum
Although defining peformance tests is a hard task, I think it's one of the most useful information from which one to choose and would be a nice addition to this repo
Can someone add pyppeteer ?
https://github.com/miyakogi/pyppeteer
It's a port of puppeteer in python. Seems to be dead but still usable.
Hello, I am trying to load my html page in order to send it as mail. I am using nodemailer for this job, and until now I was trying with puppeteer to retrieve the HTML of the page AFTER js has run. This is because 99.99% of my dom elements are created by js scripts.
I am not happy at all with puppeteer, the html I get after js still has all the scripts(they cant be sent over email), there is no easy way with pictures.
Even after creating by hand the html of the content I want, with nodemailer I dont get backgrounds and pictures, that is not on the topic but you might have some experience with something similar.
Which node module do you suggest i use for this implementation?
They are not fake browsers as they are using there own implementation of a stripped down browser to render javascript like for example what Splash is doing. right?
Mozilla Firefox has a headless mode:
It works well as a substitute for PhantomJS for automated testing.
Headless Chrome is a high-level API to control headless Chrome or Chromium over the DevTools Protocol made in Rust.
EDIT: Added a link.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.