Giter Club home page Giter Club logo

node-phantom-simple's People

Contributors

anko avatar apendua avatar arakir avatar baudehlo avatar chaddjohnson avatar davidmfoley avatar domasx2 avatar giakki avatar janl avatar joscha avatar kevingrandon avatar kirill89 avatar lgladdy avatar maarekelets avatar nathancarter avatar royalgarter avatar seebees avatar siboulet avatar spalger avatar walruscow avatar wilriker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-phantom-simple's Issues

options.phantomPath -> options.path?

Now with SlimerJS option phantomPath looks strange. I suggest renaming it to something like path / binary / execPath. As usually, old style will be supported too, but will show deprecation message.

What do you think?

Solved: Error parsing JSON from phantom: SyntaxError: Unexpected end of input

Trying with your web scraping example:

var phantom=require('node-phantom-simple');
phantom.create(function(err,ph) {
    return ph.createPage(function(err,page) {
    ...
    });
});

when calling ph.createPage the process stops with the following error message:

Error parsing JSON from phantom: SyntaxError: Unexpected end of input
Data from phantom was: 
Error parsing JSON from phantom: SyntaxError: Unexpected end of input
Data from phantom was: 

Environment:
OSX Mavericks on MBP Retina
node: v0.10.31
node-phantom-simple: 1.2.0
phantomjs: 1.9.7

Document comparison with node-phantom-ws

Not really an issue, but I'm trying to use this at larger scale and seeing things start to slow down, timeout and crash over time. I'm considering switching to node-phantom-ws but don't really know if it will help, and it takes a couple of days for me to get enough data to know if the performance is better. Do you have any insight on how this compares to node-phantom-ws WRT:

  1. Speed
  2. Durability for long lived processes.
  3. Resource utilization - mem and cpu?

My current use case has between 5 and 20 concurrent pages loaded.

Casperjs?

What are the chances of this supporting Casper?

That would make this the #1 scraping suite under node, by far.

All of that syntactic sugar provided by Casper would be a huge boost.

Can't set user agent

When I set the user agent page.open hangs.

I'm setting it like this:

page.set('settings', {userAgent: options.userAgent});

Phantom crash handling?

Hi. Loving the project so far. One question.
Is there any way to catch phantom's crash (i.e. Request() error evaluating evaluate() call: Error: connect ECONNREFUSED and re-run it?

Missing support for `customHeaders`

var phantom = require('node-phantom-simple');

var url = 'http://google.com/';

phantom.create(function (err, ph) {
  return ph.createPage(function (err, page) {
    page.address = url;
    page.customHeaders({
      'Cache-Control': 'no-cache',
      'Pragma': 'no-cache'
    });

    return page.open(url, function (err, status) {
      console.log('Status', status);
    });
  });
});
%% node test.js
TypeError: Object #<Object> has no method 'customHeaders'
    at /opt/phantomHAR/test.js:9:10
    at /opt/phantomHAR/node_modules/node-phantom-simple/node-phantom-simple.js:40:26
    at IncomingMessage.<anonymous> (/opt/phantomHAR/node_modules/node-phantom-simple/node-phantom-simple.js:455:17)
    at IncomingMessage.EventEmitter.emit (events.js:126:20)
    at _stream_readable.js:895:16
    at process._tickCallback (node.js:339:11)

Node and callback as last argument

There are a lot of functions here that do not honor this paradigm. I can start giving you PR that would "fix" these issues, but I don't know if you are even interested. As it might break current functionality.

What do you think?

for example:
https://github.com/baudehlo/node-phantom-simple/blob/master/node-phantom-simple.js#L238-L242

This would be better done IMO

 {evaluate: function (fn) {
    var extra_args = []
      , cb = false
    if (arguments.length > 1) {
      // convert arguments into a reall boy
      var extra_args = Array.prototype.slice.call(arguments, 1)
      // grab the end to check if it's a callback
      var cb = extra_args.pop()
      if (typeof cb !== 'function') {
        // not a callback?  put it back...
        extra_args.push(cb)
        cb = false
      }
      // console.log("Extra args: " + extra_args);
    }
    request_queue
      .push([[id
            , 'evaluate'
            , fn.toString()].concat(extra_args)
            , callbackOrDummy(cb, poll_func)]);
  }}

Getting port for pid of phantomjs on Windows 7

Hi! Can you please add this to the node-phantom-simple.js? This code works on Windows7:

switch (platform) {
                case 'linux':
                            cmd = 'netstat -nlp | grep ' + pid + '/';
                            break;
                case 'darwin':
                            cmd = 'lsof -p ' + pid + ' | grep LISTEN';
                            break;
                case 'win32':
                            cmd = 'netstat -ano | find "' + pid + '"';
                            break;
                default:
                            phantom.kill();
                            return callback("Your OS is not supported yet. Tell us how to get the listening port based on PID");
            }

As you see I added cmd for win32 platform:

cmd = 'netstat -ano | find "' + pid + '"';

It works ok on my PC.

Thanks.

abort in onResouceRequest

Hi all,

I want to abort some ads resource on request but cannot call the abort() function.
Error: TypeError: Object # has no method 'abort'

Can someone give me an example that how to do it in page.onResourceRequested or do I miss understand something ?

Long polling isn't cleaned up when exiting browsers

So I have this ExpressJS server that I'm using your awesome library to help manage PhantomJS.

This server maintains a pool of browsers for each request to come in and utilitize to visit URLs and extract data about the pages when they load. Every so often, depending on the amount of times a given browser in the "pool" is used, we kill that browser, remove it from the pool and replace it w/ a new one.

This has all been working swimmingly (pun intended) up until I started throwing thousands of requests at it. Over time (after around 15K requests or so) I noticed that the CPU would end up just pegging at 100% on the server. I ended up narrowing it down to the long polling that node-phantom-simple(NPS) is using to perform IPC.

So I went ahead and created a simple pull request that essentially kills the long poll for a given browser if and when it's explicitly exited from: #68

I'm sure there's a lot better ways to approach this (utilize phantom.killed?) but I needed something quick and dirty.. we have this running in production and I've now thrown 2X anything that I've thrown at it in the past and the CPU usages always come back down to normal after processing completes.

so many node-phantom-simple process are running

HI

i am using node-phantom-simple in cluster environment.
after stopping complete service also, i am getting many process still lying (like below).
Please let me know how these process will stopped after finish there job?

oot 29946 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29947 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29956 29931 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29957 29930 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29962 29936 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29963 29946 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29964 29941 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29965 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29966 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29971 29947 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29976 29965 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29977 29966 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29978 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29983 29978 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29984 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29989 29984 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29990 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29991 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29992 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29993 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 29994 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30015 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30016 29990 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30019 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30020 29993 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30022 29994 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30023 29992 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30024 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30025 29991 0 10:37 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30026 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30027 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30028 1 0 10:37 ? 00:00:00 node ./node_modules/phantomjs/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30083 30026 0 10:38 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30086 30015 0 10:38 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30089 30027 0 10:38 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30097 1 0 10:38 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30129 30019 0 10:38 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30130 1 0 10:38 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30135 30024 0 10:38 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30149 30028 0 10:38 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30150 1 0 10:38 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30154 1 0 10:38 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/bridge.js
root 30158 1 0 10:38 ? 00:00:00 /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/phantomjs/lib/phantom/bin/phantomjs /idap/ReleaseBilling/ibps/ibps_scheduler/node_modules/nodeice/node_modules/node-phantom-simple/brid

Thanks
Chitransh

Docs update

  • LICENSE (+ update year)
  • CHANGELOG.md
  • README
    • add travis/npm buttons
    • shorten licence info (link to separate file)
    • add SlimerJS info
  • package.json
    • refresh fields to modern format (shorten git, add license)
    • add SlimerJS to description

injectJs DOES NOT work

This test fails miserably.

var phantom = require('node-phantom-simple');

exports.testPhantomInjectJs = function(beforeExit,assert) {
    phantom.create(function(error,ph){
        assert.ifError(error);
        ph.injectJs('test/files/injecttest.js',function(err){
            assert.ifError(err);
            ph.exit();
        });
    });
};

Request() error evaluating close() call: Error: read ECONNRESET

page.onCallback = function(data) { 
            page.render(filepath,function(err){
                if(err){
                    ph.exit();
                    throw new Error('page.render failed ');
                }
                if(callback){
                    callback(filepath);
                }
                ph.exit(); // it seems it's not work properly , is the wrong postion to exit ?
            });
        };

you saved me a lot . thanks.

Suspicious exit & uncaught handler

There are some strange legacy code (we didn't touched it at refactoring):

  1. var uncaughtHandler = function (err) {
    console.error(err.stack);
    closeChild();
    };
    . It's responsibility of parent app to care about uncaught exceptions. Package should not write to console.
  2. var uncaughtHandler = function (err) {
    console.error(err.stack);
    closeChild();
    };
    no need to force exit here. That can break paren'ts app logic.

@baudehlo do you have idea why this code was added? I think, it should be removed.

settings.userAgent has no effect

Two problems, not sure if I'm just stupid or node-phantom-simple fails. I tried the following gist on Windows 7 with PhantomJS 1.9.7 and node-phantom-simple 1.2.0 and on Ubuntu 12.04 with PhantomJS 1.9.8:

https://gist.github.com/manuelbieh/2d838715e4472c1f0a72

In case 1 I get no output at all. Neither the user agent nor -- the end --:

$ node ua-test
null 'success'

In case 2 I get output but the user agent hasn't been changed:

$ node ua-test
null 'success'
phantom stdout: userAgent: Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.9.8 Safari/534.34
-- the end --

Can somebody help me find the error?

Port detection in Mac OS X Yosemite

It looks like the lsof command in darwin will now return the last entry in /etc/hosts as the hostname for any application running on localhost, so for users with MAMP Pro which automatically managed /etc/hosts, this will be the bottom most website setup in MAMP Pro.

In my case, it fails with:

Fatal error: Error extracting port from: phantomjs 89461 Liam 12u IPv4 0xdd3c25009a2ebb05 0t0 TCP gladdy.local:60628 (LISTEN)

I'm not sure if there a nice way around this (you could possibly argue that it's a bug in lsof)

Please add more description for page.evaluate()

EDIT: sorry. I didn't read the example carefully.

Could you add a document for page.evaluate since it is quite different from the phantomjs's page.evaluate().

I find that by experiment and read the source code.

  1. It is async. var ret = page.evalute(fn, arg1, arg2, ...); the page.evaluate() is "returned" immediately before fn() is executed.
  2. your page.evaluate() takes an extra parameters "cb" page.evaluate(fn, cb, arg1, arg2...).

HeadlessError: Error executing command to extract phantom ports

Hello everyone!
Thanks for the project. Like it. But when I try to execute node-phantom-simple inside of Docker container (dokku DigitalOcean-droplet) it causes error:

[HeadlessError: Error executing command to extract phantom ports: Error: Command failed: /bin/sh -c if which ss > /dev/null; then ss -nlp | grep "[,=]41,"; else netstat -nlp | grep "[[:space:]]41/"; fi

It looks as a same issue like at Shippable/support#884

Undesirable side-effects on host module signal handling

Hi,

I ran into this issue when using node-phantom-simple alongside Express. When running the following snippet:

var phantom = require('node-phantom-simple');
var app = require('express')();

app.listen(0);

The node process stops handling SIGINT (i.e. Ctrl+C) and must manually be killed with SIGQUIT in order to exit. Removing the first line reverts back to the usual behavior.

Any hints ?

Env:

$ node -v
v4.1.1
$ npm -v
3.3.4
$ cat node_modules/node-phantom-simple/package.json | grep version
  "version": "2.0.5"
$ cat node_modules/express/package.json | grep version
  "version": "4.13.3"
$ uname -a
Linux lgce-arch3 4.2.1-1-ARCH #1 SMP PREEMPT Tue Sep 22 06:57:07 CEST 2015 x86_64 GNU/Linux

is that possible to have multiple phantomjs instance running?

I have an application that requires running multiple phantomjs instance.
But current implementation is to fork a number of them and write my own http interface to interact with them.

I wonder if I could replace my implementation with yours. Could you post a sample code for that?

Besides, does this library support page that create new Page (through phantomjs' onPageCreated event)?

onFilePicker

Is onFilePicker supported. I tried it and I didn't get the callback.

Thanks!

Detecing port used by bridge server is too complicated

Right now there is a couple of lines that try to find what port the webserver uses to be able to send messages to it. It starts on line 94.
The problem is that it doesn't work correctly and relies on system specific calls external programs. See @Fresa s pull request for a go at trying to fix some of the issues

I think, contrary to what the node-phantom-simple documentation says, that the webserver module used is not one built into node but the one built in to phantomjs.

If I understand the phantomjs docs for webserver there should be a port field that is exposed on the webserver object.

port : the port on which the server listen requests (readonly)

That values should be possible to send via stdout from the bridge to the node-phantom-simple module running in the node scope just like the bridge already does to detect if phantomjs is ready (see line 78 ).

Does all this make sense? Have I missed somethign? Otherwise I'll have a go at this tomorrow.

How it can be used in headless mode?

Hey guys,
How this module can be used in headless mode with xvfb?

slimerjs + xvfb-run works good. But I don't have ideas how can I use node-phantom-simple+xvfb

Thanks

TypeError: object is not a function

I used node-phantom simple before and loved it. It seems to have changed a little over the past few months. I am getting TypeError object is not a function. However, it seems to be in my node_modules
I installed node phantom js as well. Not sure what the problem is.
TypeError: object is not a function
at /Users/stevenkauyedauty/projects/finalProject/moneyScrape/node_modules/node-phantom-simple/node-phantom-simple.js:203:20
at ChildProcess. (/Users/stevenkauyedauty/projects/finalProject/moneyScrape/node_modules/node-phantom-simple/node-phantom-simple.js:93:10)
at ChildProcess.g (events.js:180:16)
at ChildProcess.emit (events.js:95:17)
at Process.ChildProcess._handle.onexit (child_process.js:808:12)

Problems with event listener

I'm trying to grab the html when AJAX operations are complete with some client javascript like this:

var evt = document.createEvent('Event');
evt.initEvent('__htmlReady__', true, true);
document.dispatchEvent(evt);

I'm listening for events like this:

page.onCallback = function() {
  page.get('content', function(err, html) {
    ph.exit();
    console.log(html);
  });
};

page.onInitialized = function() {
  page.evaluate(function() {
    document.addEventListener('__htmlReady__', function() {
      window.callPhantom();
    }, false);
    setTimeout(function() {
      window.callPhantom();
    }, 10000);
  });
};

But the event never triggers, only the timeout triggers.
This issue doesn't happen when I run phantomjs from the command line.

Any ideas?

Move to a single handler (for SIGINT)

// Note it's possible to blow up maxEventListeners doing this - consider moving to a single handler.
[ 'SIGINT', 'SIGTERM' ].forEach(function(sig) {
process.on(sig, closeChild);
});
process.on('uncaughtException', uncaughtHandler);

I'm running into the warning mentioned:

(node) warning: possible EventEmitter memory leak detected. 11 SIGTERM listeners added. Use emitter.setMaxListeners() to increase limit.

Google Search using node-phantom-simple

I found an interesting PhantomJS "Google Search-script" on StackOverflow: http://stackoverflow.com/a/20943662. How can this be implemented using node-phantom-simple?

The use case is simple and just what I'm looking for:

  1. Load a page
  2. Fill inn form data
  3. Submit form
  4. Scrape Result data

I have tried various ways, without any success so far.

Cheers,
Ove

Falsy values are returned as null

Falsy values, such as emtpy string and false, get returned as null. For example, the following code returns expected values, because it's wrapped in an object:

var phantom = require('node-phantom-simple');

phantom.create(function(err,ph) {
    ph.createPage(function(err,page) {
        page.open("http://www.google.com", function(err,status) {
            console.log("opened site? ", status);
            page.includeJs('http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js', function(err) {
                page.evaluate( function(){
                    return {
                        count : $('input[name="q"]').length,
                        emtpyString : $('input[name="q"]').val(),
                        bool : $('input[name="q"]').is('a')
                    };
                }, function( err, result ){
                    console.log( result );
                    ph.exit();
                });
            });
        });
    });
});

The output is:
{ bool: false, count: 1, emtpyString: '' }

However, if we change the inner page.evaluate to this, where it returns just the emptyString:

              page.evaluate( function(){
                    return $('input[name="q"]').val();
                }, function( err, result ){
                    console.log( result );
                    ph.exit();
                });

The output is null, when instead an empty string, ``, was expected.

The same thing occurs when returning a false:

                page.evaluate( function(){
                    return $('input[name="q"]').is('a');
                }, function( err, result ){
                    console.log( result );
                    ph.exit();
                });

Again, the output is now null, when a false was expected.

Not all page resources sent to onResourceRequested in PhantomJS

So I've got two different pages that I'm trying to gather all the resources requested against:

http://www.clarizen.com/work/resource-management

and

https://www.docusign.com/free/sign-anywhere

If you load both pages in chrome, you'll see in the network tab >= 4 requests to http://pixel.captora.com/img/pix.gif, passing various data in the query string for each page.

But when I try to load the same two pages in phantomJS (using node-phantom-simple bridge), in the case of the clarizen.com page, all the pix.gif calls are sent to the page.onResourceRequested callback. But in the case of the docusign.com page, only the first call to the pix.gif is ever sent to the callback.

    phantom.create(function(err, ph){
      ph.createPage(function(err, page){
        page.onResourceRequested = function(requestData){
          requestData = requestData.shift();
          console.log(requestData.url);
        };

        page.open([page url], function(status){
          setTimeout(function(){ page.close(); }, 10000);
        });
      });
    });

Obviously it's got something specifically to do w/ the docusign.com page.. but I can't, for the life of me figure out where the delta lies. Please help?

Related SO question: http://stackoverflow.com/questions/30677596/not-all-page-resources-sent-to-onresourcerequested-in-phantomjs

Improve testing & fix broken tests

  • mocha is a bit more comfortable for management. Use it instead of custom script.
  • 1 test (testUnexpectedExit) hangs after last commit with SlimerJS support (shame on me).
  • 3 other tests broken (did not digged yet).
  • add pass with SlimerJS testing.
  • NO global dependencies, add phantomjs & slimerjs to devDeps.
  • automate with travis-ci

'On confirm' event is not responding to callback return value

Pushing an issue in node-horseman further down the chain...johntitus/node-horseman#41.

The approach that @Unlearn is using seems fine. I don't see any obvious javascript errors. So I tried to determine if this is a node-phantom-simple error or a node-horseman issue.

I've recreated the test script for node-phantom-simple, below, and I'm getting the same kind of error. I don't know if this is
a) me and @Unlearn doing something wrong
b) an error in node-phantom-simple
c) an error in phantom

Any help is appreciated.

var driver = require('node-phantom-simple');

driver.create( function (err, browser) {
    browser.createPage(function (err, page) {
        page.onConfirm = function( msg ){
            console.log('Confirm message: %s', msg);
            return true;
        }
        page.onLoadFinished = function(status) {
            console.log('Status %s', status);
        }
        page.onConsoleMessage = function( msg, lineNumber, sourceId ) {
            console.log('Console: %s, %s, %s', msg, lineNumber, sourceId);
        }
        page.open('file:///tmp/confirm/confirm.html', function (err,status) {
            console.log("opened site? ", status);
            page.evaluate( function() {           
                var element = document.querySelector('a#confirm');
                var event = document.createEvent('MouseEvent');
                event.initEvent('click', true, true);
                element.dispatchEvent(event);
            }, function(){
                browser.exit();
            });
        });
    });
});

Need linter and unified coding style

@baudehlo , we are about to finish deals for 2.0.0 release window. I highly recommend to use eslint with tests and unify coding style. That will be very big changeset.

One question:

  • Any preferences? For example, we use 2 space chars for left indentation. All our presets can be found here

For me the most simple way is to copy my linter configs. But i'm not sure you like those. Let me know what do you think.

Any pending problems after 2.0.0 release?

Hi all! As you can see, me & @Kirill89 fixed all pending PRs & issues we could understand. I have 2 requests to users of this package:

  1. If you know any real problems, that should be fixed - please create new issues or add more details to current ones.
  2. Please help to improve doc, because my english is not perfect.

IMHO, the rest of open issues are "questions" (not a bug), or "need more info" / "don't need fixes"

This issue is just a notice and will be closed for several days.

How to communicate between the nodejs and phantomjs?

I use setFn() to set the onLoadFinished callback of phantomjs. However, how can I pass the data back to node after the onLoadFinished callback? Is it true that the only way to do this is to listen to the ph.process.stdout 'data' event and parse the stream myself?

Detect Phantom crash

I keep getting the following crash on Linux:

09:11:03.887 phantom stderr: PhantomJS has crashed. Please read the crash reporting guide at https://github.com/ariya/phantomjs/wiki/Crash-Reporting and file a bug report at https://github.com/ariya/phantomjs/issues/new with the crash dump file attached: /tmp/106608aa-64f2-37ce-6b453bd7-6876263f.dmp

09:11:04.011 Request() error evaluating open() call: Error: socket hang up
09:11:04.012 Poll Request error: Error: socket hang up
  1. Is there a way to detect it?
  2. Is there a way to restart Phantom?

SlimerJS usage in headless (windowless) situations

Since there is the call for issues post 2.0 I wanted to raise this one. One of the major differences between SlimerJS and PhantomJS is the fact that SlimerJS is not headless and requires a windowing system of some sort to function. When run under Linux or FreeBSD there are situations where PhantomJS use will function without problem but SlimerJS will exit immediately because there is no X-Windows environment available to run Firefox/XulRunner in. SlimerJS suggests using xvfb-run in instances when under Linux (or OSX).

This situation can be tested for by checking process.env.DISPLAY (BASH variable $DISPLAY) and if it's empty/null then there isn't Display available to run in.

There are three potential approaches I see to addressing this:

  1. Document the requirement to using SlimerJS as the engine and make no code changes. It's likely this is a small use case (even though it did effect me, mine isn't a typical use scenario).
  2. Document the requirement but also add a check to the DISPLAY variable when under Linux/FreeBSD/OSX and throw a meaningful error on failure (or fall back on PhantomJS?).
  3. Add in a check on the DISPLAY variable and when null/empty use xvfb-run as the child-process command, pushing the SlimerJS command as the first argument (or second since we would probably want to use the -a switch to have xvfb-run automatically grab the first open server number).

a. This would also require documentation to call out the dependency on xvfb and xvfb-run for the instances where headless operation is desired.

I'm willing to make the implementation attempt for option 3, but wanted to poll opinions before putting too much effort into it.

fix get/setProperty of phantom

could you add the 'set' and 'get' property of the phantom?
I added the code in node-phantom-simple.js around line 280 where you define the 'proxy'
e.g.

set : function(property, value, callback){
  request_queue.push([[0, 'setProperty', property, value], callbackOrDummy(callback, poll_func)]);
},
get : function(property, callback){
   request_queue.push([[0, 'getProperty', property], callbackOrDummy(callback, poll_func)]);
}

you should also change the code in bridge.js since there are two bugs related to the get/setProperty

151   getProperty: function (prop) {
152     return phantom[prop];
153   },
154 
155   setProperty: function (prop, value) {
156     phantom[prop] = value;
157     return true;
158   },

I tested on my MBP with node 0.8.22 and phantomjs 1.9.0

Pass Header Contents to paperSize

Phantom seems to support passing a function to the contents attribute of its page.paperSize.header & footer.

http://phantomjs.org/api/webpage/property/paper-size.html

Is there some way to accomplish this in node-phantom-simple?

Initial attempt fails because phantom is undefined. I tried just sending a function, but that didn't work either. Setting the header.height works, but I can't get the contents to do anything.

var driver = require('node-phantom-simple');

var paperSize = {
    format: 'A4',
    margin: "1cm",
    /* default header/footer for pages that don't have custom overwrites (see below) */
    header: {
        height: "1cm",
        contents: phantom.callback(function(pageNum, numPages) {
            if (pageNum == 1) {
                return "";
            }
            return "< h1>Header " + pageNum + " / " + numPages + "";
        })
    }
};

var path = 'out.pdf';

driver.create(function(err, browser) {
    return browser.createPage(function(err, page) {
        return page.open("http://www.google.com", function(err, status) {
            page.set('paperSize', paperSize, function() {
                page.render(path, {
                    format: 'pdf',
                    quality: '100'
                }, function() {
                  browser.exit();
                    console.log('done');
                });
            });
        });
    });
});

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.