parallel-js / parallel.js Goto Github PK

View Code? Open in Web Editor NEW

3.2K 98.0 199.0 530 KB

Easy multi-core processing utilities for Node.

License: MIT License

JavaScript 94.15% HTML 1.07% CSS 4.78%

parallel javascript worker-threads paralleljs node webworker

parallel.js's People

Contributors

Stargazers

Watchers

Forkers

jscr vibster michliu dolfly calvinmetcalf tinganho abpin gitlisted cartercole 99plus2 muddydixon y-ich newfront growlybear randylien nazrulworld mnishihan nmatyukov ibank littlekfc miumiu kurellajunior ameyapg adambabik slavah f2er johncblandii burkeen gunderson xenovyzarz monw3c wavesun malachaifrazier shakakira mrhieu nathanielescribano bsparks th3wltr wheercool troland streaver91 alankavanagh secret-octo meloncloud kelsadita halkibsi newsky amilkey mdenisov nvdnkpr thecodejack evanhahn zixiinlian joaquimserafim karthikbadam hanthomas ganinaleksei sschepis antixrist ashanfernando krishedges alexmocioi mwager jhnstn is00hcw rich-harris romuloctba infect2 russellwmy telekomatrix royriojas joshyu rzurad wildermuthn hardik91 imclab arkadiuszsz simba-mupfunya tcezeaux syzer unitedonline swesh sdtsui zdlopez fu4k6pingu lubberscorrado lemonhall jamesblunt devlato mikeaddison93 ariel-isaacm neurobe jocull code-fury kosta-github scalarmapedia fuguangsen zhiying8710 mrunderhill89 alexgraul

parallel.js's Issues

Remove underscore dependency?

Looks like underscore is being used to provide utility functions only, some of the function is already available in ECMAScript 5 (e.g. bind), and some of them are not hard to write.

It would means more bytes in parallel.js but it means less payload when use the script in the browser.

Reduce is throwing errors

Take this case:

var p = new Parallel([0, 1, 2, 3, 4, 5, 6, 7, 8]);

function add(d) { return d[0] + d[1]; }
function factorial(n) { return n < 2 ? 1 : n * factorial(n - 1); }

p.require(factorial)

p.map(function (n) { return Math.pow(10, n); }).reduce(add);

Throws 'Uncaught TypeError: Object 216.6666666666 has no method splice', line 258. Looks like we're trying to call splice on a number.

I'm working on docs and I'm going to assume this should be working for the purposes of documentation.

Better mapreduce syntax

I want to eventually move away from exposing the syntax spawn and mapreduce.

I kind of like the query language used by RethinkDB. It would be nice to have something like this:

var sqr = function (n) { return n * n; },
    add = function (a, b) { return a + b; },
    log = function (a) { console.log(a); },
    data = [10, 20, 30, 40];


var r = Parallel.data(data)
                .map(sqr)
                .reduce(add)
                .fetch(log);

Each step in this chain returns a wrapped object that can be treated like a promise. Data constructs a "distributed array", so to speak, that gets operated on by the functions that follow it.

I could express this equivalently as:

r = Parallel.map(sqr, data).reduce(add).fetch(log);

Notice that in the map call data is passed after the mapping function. This is intentional, so that multiple arguments or a single atomic argument may be passed instead of an array. It will also allow use to partially apply operations to our chained functions:

Parallel.mixin({
    square: _.partial(Parallel.map, sqr)
});

r = Parallel.data(data).square().fetch() // returns the values of data, squared

Spawn would basically work the same way:

r = Parallel.data(1, 2, 3).spawn(function (a, b, c) { return a + b + c; });

// equivalently:
r = Parallel.spawn(function (a, b, c) { return a + b + c; }, 1, 2, 3);

Use clear options

Since I just saw 546849f with the updated readme and options, we need to make sure the options have a clear purpose:

path (legacyEvalPath): Will be defined in node.js environment (since it's needed anyways), but will default to null in browser environments. If the path is set, it will fall back to this file in case of a cross-site-origin restriction (i.e. IE10). It will therefore merge the meaning with the current ie10shim option.
maxWorkers: Is obvious and will stay this way.
synchronous: If webworkers are not available at all (i.e. IE < 10, etc), fall back to synchronous operations (with an initial setTimeout, currently no setTimeouts afterwards, they may be possible with .map() though).

Ping: @adambom - Please add the remaining options to the readme.

Tests

It's about time to get a test suite up. I'm partial to Jasmine, but I'm open to other alternatives.

Is this code licensed?

Is this code licensed? Maybe I missed it but I can't find what the code is licensed under if anything at all ;-)

Provide a fallback if workers are not available

At the moment if the web browser doesn't support workers at all, an error will be thrown.

There should be an option to enable a blocking execution of the operation, or continue throwing an error.

Require should take any number of arguments

This represents a regression since the rewrite.

Support remote references on another machine

This will allow us to run stuff in the cloud as opposed to locally.

The API would introduce an additional method, connect, that takes two arguments:

location: A url that points to a server that's listening for connections
callback: A function to get called when the connection is initiated (or if the connection errs). The callback will receive two arguments: err, and conn. conn is an instance of a new class called Connection.

The Connection class implements the spawn and mapreduce interfaces except instead of creating web workers, it issues commands to the remote server.

We'd also have to introduce a web server for hosting parallel on a node server somewhere in the cloud. You could start the server using a basic CLI.

no such file, file.js on spawn

I apologize if this is absolutely trivial. I am learning parallel and javascript at the same time. I know it's not recommended. It's a long story. Anyway I thought you may want to know this happens going naively through the parallel web site examples. Detail cmds and errors below. The missing path exists up to parallel.js, but as you can imagine it is not a directory. cwd is /Users/antonio. I am on OS X.

apc-2:~ antonio$ node

var Parallel = require('parallel.js');
undefined
Parallel
{ mapreduce: [Function],
spawn: [Function],
require: { [Function] state: { files: [], funcs: [] } } }
var sqrt = function (n) { return Math.sqrt(n); };
undefined
var r = Parallel.spawn(sqrt, 100);
Error: ENOENT, no such file or directory '/Users/antonio/node_modules/parallel.js/tmp/file.js'
at Object.fs.openSync (fs.js:427:18)
at Object.fs.writeFileSync (fs.js:966:15)
at Object.URL.createObjectURL (/Users/antonio/node_modules/parallel.js/url.js:16:6)
at new RemoteRef (/Users/antonio/node_modules/parallel.js/parallel.js:66:27)
at Object.DistributedProcess.mapper as spawn
at repl:1:18
at REPLServer.self.eval (repl.js:110:21)
at repl.js:249:20
at REPLServer.self.eval (repl.js:122:7)
at Interface. (repl.js:239:12)

Aw Snap error when calling constructor repeatedly

I have thousands chunks of the data which come on the fly and I need to process.

If I'm doing for every chunk:

var p = new Parallel();
p.parallel.spawn(function (input) { .. })

The system starts to slow down until my Chrome crush...
I tried to memoize the instance and change p.data before calling spawn again which avoided the snaps, the input is not reliable...

Need a way to serialize closure

Take memoized fibonacci:

Where we store computed values in closure scope...

var p = new Parallel([0, 1, 2, 3, 4, 5, 6]),
    fib = (function () {
      var memo = {};  

      return function (n) {
        if (n < 2) {
          return 1;
        };

        memo[n] = memo[n] || fib(n - 1) + fib(n - 2);

        return memo[n];
      }
    })(),
    log = function () { console.log(arguments); };

p.map(fib).then(log)

Fails with the error, memo is not defined

require can not be chained

Just checked the code, need add "return this" to enable require function chainable, to run the examples on http://adambom.github.io/parallel.js/

eval.js should not always be required in browsers

If you include a file without passing in an evalPath, then you always get the error:

'Uncaught Error: Can't use required scripts without eval.js!'

While informative, this error is still annoying because most browsers are capable of serializing and importing files correctly. It seems wrong to be dumbing this down to support IE.

Is there any way we could pull this off?

Queue based map reduce

The one chunk per core way that map reduce uses seems a bit strange, it basically means I end up having to do a reduce in the map as well as in the reduce, it might make sense for the syntax to just have you specify a number of threads and a single array of values.

You could also set it up so that a central process hands out the data and when a worker is finished it requests more so that if your data is unevenly sized, the wisdom of that is going to depend on message passing overheads, but it would also allow you to incrementally add data. I whipped together a gist about what I was trying to say which quickly got out of control

https://gist.github.com/calvinmetcalf/5039604

Adding middleware to map function

Hi Adam and Seb!
I'm appreciate your amazing product. You are rock guys.
But want discuss one question.
So we process huge amount of data and map/reduce interface is magic gift for me))
So code below is common snippet for image processing

data
.map(cb)
.map(myMagicFunc)
.then(next);

But sometimes i want get data after first round of processing map(cb) so i propose to add map middleware as callback function.

data
.map(cb, middleware)// here i can process first round data before second round is started
.map(myMagicFunc)
.then(next);

Please write what are you thinking about and if my adea

Add testling hook

@adambom See https://ci.testling.com/#hook.

dacbb08 should make the repo compatible.

Parallel.js have problems with Blob in IE

I need to execute functions in "parallel" and I use parallel.js:

var p = new Parallel(items);

var fn1 = function (item) {
doSomething(item);
};

p.map(fn1).then(function () {
otherFunction();
});

But IE shows the following error:

[Q] Unhandled rejection reasons (should be empty): (no stack) SecurityError

HTML7007: One or more blob URLs were revoked by closing the blob
for which they were created. These URLs will no longer resolve as
the data backing the URL has been freed.
How to fix this error?

I had review parallel.js page in IE and all examples work fine.

I use Durandal, Breeze and Knockout.

In Firefox shows the following error:

[Q] Unhandled rejection reasons (should be empty): ["(no stack) [Exception..... location: ""]"]

and in Google Chrome no shows error, but parallel.js no work.

maxWorkers should be global not per instance

If I do something like:

_.range(10000).map(function(i) {
    return new Parallel(data[i]).map(transform);
});

then at the moment, parallel.js will attempt to create 10000*4 workers at once, when it should really only create 4 workers (by default), and delay creating the others until the first lot have finished.

Real-world use case

Very nice library! The examples are really good in terms of how to use the library - I'm just curious about where this has been put to use. Maybe add a wiki page or something for users to add to?

parallel.js doesn't play well with yeoman and can't require after install -g

With yeoman installed:

$ npm install -g parallel.js
npm http GET https://registry.npmjs.org/parallel.js
npm http 304 https://registry.npmjs.org/parallel.js
unbuild [email protected]
npm WARN unmet dependency .../homebrew/share/npm/lib/node_modules/yo/node_modules/yeoman-generator requires lodash@'~1.3.0' but will load
npm WARN unmet dependency .../homebrew/share/npm/lib/node_modules/yo/node_modules/lodash,
npm WARN unmet dependency which is version 1.1.1
npm WARN unmet dependency .../homebrew/share/npm/lib/node_modules/yo/node_modules/insight requires async@'~0.1.22' but will load
npm WARN unmet dependency .../homebrew/share/npm/lib/node_modules/yo/node_modules/async,
npm WARN unmet dependency which is version 0.2.9
npm WARN unmet dependency .../homebrew/share/npm/lib/node_modules/yo/node_modules/insight requires lodash@'~1.0.0-rc.2' but will load
npm WARN unmet dependency .../homebrew/share/npm/lib/node_modules/yo/node_modules/lodash,
npm WARN unmet dependency which is version 1.1.1

Removing yeoman (and bower):

$ npm uninstall -g yo
unbuild [email protected]
$ npm uninstall -g yeoman-generator
unbuild [email protected]
$ npm uninstall -g bower
unbuild [email protected]
$ npm install -g parallel.js
npm http GET https://registry.npmjs.org/parallel.js
npm http 304 https://registry.npmjs.org/parallel.js
unbuild [email protected]
[email protected] .../homebrew/share/npm/lib/node_modules/parallel.js

Ok, good.

Yet still can't require parallel.js:

$ echo 'var p = require("parallel.js");' | node 

module.js:340
    throw err;
          ^
Error: Cannot find module 'parallel.js'
    at Function.Module._resolveFilename (module.js:338:15)
    at Function.Module._load (module.js:280:25)
    at Module.require (module.js:364:17)
    at require (module.js:380:17)
    at [stdin]:1:9
    at Object.<anonymous> ([stdin]-wrapper:6:22)
    at Module._compile (module.js:456:26)
    at evalScript (node.js:532:25)
    at Socket.<anonymous> (node.js:154:11)
    at Socket.EventEmitter.emit (events.js:117:20)

I've even tried a fresh homebrew [email protected] install and same require failure.

Detecting around cross-domain restrictions for web workers in IE 10

When we require script into webWorker and map to any function Parallel says us about evalPath to evaluate mapping function to string and re-eval it. No doubt it's good idea but there is problem for chrome package apps(ChromeOS). Please see http://developer.chrome.com/extensions/contentSecurityPolicy.
As i see the main decision for using eval.js is cross-domain policy for web workers in IE 10. So we propose to use eval.js only for browser needs it and use blobs for all modern browsers we can prepare fix for it or continue discussion.

try {

          var blob = new Blob([src], { type: 'text/javascript' });
          var url = URL.createObjectURL(blob);

          wrk = new Worker(url);
      } catch (e) {
        if (this.options.evalPath !== null) { // blob/url unsupported, cross-origin error
          wrk = new Worker(this.options.evalPath);
          wrk.postMessage(src);
        } else {
          throw e;
        }
      }
    }

The code below works perfect for chrome app so 'browser detecting' fix will be quick decision. Thanks IE!

Modify input dataset with multiple calls

The title might be a bit misleading, if someone has a better explanation I'll replace it.

What I mean is the following:
You might have a basic input dataset, for example a base image in a pixel format; the input dataset.

Afterwards you modify this input dataset with the worker and its arguments. You have the ability to call the worker multiple times with different arguments.

After you're done with the calculations you can retrieve the input dataset.

Let me explain with a bit of code:

var inputData = new Uint8Array([1,2,3]);
var worker = Parallel.prepare(inputData, function(input, arg) {
  for (var i=0; i < input.length; ++i) {
    input[i] += arg;
  }
});

worker.work(1);
worker.work(5);

worker.fetchInput(function(input) {
  console.log(input); // Outputs: Uint8Array([7, 8, 9])
});

BUG: paralleljs fails in browserify

When I try to use paralleljs in browserify with the following code:

// test.js
var Parallel = require('paralleljs');
var p = new Parallel('forwards');
p.spawn(function (data) {
  data = data.split('').reverse().join('');
  console.log(data); // logs sdrawrof
  return data;
}).then(function (data) {
  console.log(data) // logs sdrawrof
});

And then I compile it:

$ npm install -g browserify
$ browserify test.js > bundle.js

And then I include it in a simple HTML page:

<!DOCTYPE html>
<script src="bundle.js"></script>

I get the following errors in Chrome:

Uncaught Error: Cannot find module '/node_modules/paralleljs/lib/Worker.js' bundle.js:1
s bundle.js:1
(anonymous function) bundle.js:1
(anonymous function) bundle.js:54
os bundle.js:358
s bundle.js:1
(anonymous function) bundle.js:1
paralleljs bundle.js:360
s bundle.js:1
e bundle.js:1
(anonymous function)

I know you can just include parallels directly as a script tag, but would be nice to use this within browserify.

Add support for node.js

I want to make this work locally, running in either the browser or node.

To make it run in node is simple. Need to create a package and simply require phantom.js. I think this will be useful for most users who want to do parallel computing but don't really want to fire up a web page to do it.

Write more documentation for rewrite

We are still missing "good" documentation for the rewrite-branch. I'm absolutely terrible at writing docs.

@adambom Could you take over this issue, or are you AWOL again?

Uncaught ReferenceError: window is not defined

I have the following function:

var dateranges = ["01/01/2013", "02/02/2013"];

var p = new Parallel(dateranges, {evalPath: "_lib/eval.js" });
p.require("_lib/jquery.js");
p.map(worker).then(function (data){ console.log(data); });

and the following worker:

worker: function(daterange)
{
    return $.get("csp/acb/CAC.wsJSON.cls");
}

this produces the error: Uncaught ReferenceError: window is not defined

Proposal for changing reduce syntax

I'm used to formulating reduce jobs like so:

[1, 2, 3].reduce(function (prev, curr, index, arr) { return prev + curr }, 0);

See the MDN docs: https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Array/Reduce

The way we're doing it now is passing in a big lump of data every time.

I'd like to change this if it's ok with you @Sebmaster

async functions / examples

Is it possible to use parallel.js with function that have take a callback to deliver the results?

AJAX example

It would be really useful to post an Ajax example on the documentation. I am having a hard time implementing it and the examples on the web do not use paralell.js and I have not being able to implement them on parallel.js.

I am especially interested in using it to make simultaneous AJAX calls using map

Thanks.

Race condition producing dangling workers

Hi, this gave me a headache.
I managed to create a race condition, where the callback give to the fetch method was never called.
I finally found a possible reason: There is a chance, that a spawned process is reading ./tmp/file.js while it being written again. In worst case it reads the file just between being cleared and rewritten, so reading an empty file, doing nothing at all.

My solution would be to create individual files and delete them again on calls to either fetch or terminate.

Do you prefer a pull request, if so based on which branch?

Reduce gives incorrect results through fetch

so take the map and reduce functions:

var mapFunc = function (w){
    var out = {};
    var letters = w.toLowerCase().split("");
    letters.forEach(function(l){
        if(l in out){
            out[l]++;
        }else{
            out[l]=1;
        }
    });
    return out;
},
reduce1 = function (a,b){
    var out = {};
    [a,b].forEach(function(c){
        for(var key in c){
            if(key in out){
                out[key]+=c[key];
            }else{
                out[key]=c[key];
            }
        }
    });
    return out;
},
reduce2 = function (a,b){
    for(var key in b){
        if(key in a){
            a[key]+=b[key];
        }else{
            a[key]=b[key];
        }
    }
    return a;
}

reduce1 works fine no matter how you call it

var d = Parallel.mapreduce(mapFunc , reduce1, ["ab", "ca", "bc"],function (result) {    console.log("1"+JSON.stringify(result));});
//prints 1{"a":2,"b":2,"c":2}
d.fetch(function (result) {console.log("2"+JSON.stringify(result));});
//prints 2{"a":2,"b":2,"c":2}

reduce2 on the other hand returns the wrong answer, but only when called through fetch but when returned through a callback;

var d = Parallel.mapreduce(mapFunc , reduce2, ["ab", "ca", "bc"],function (result) {    console.log("1"+JSON.stringify(result));});
//prints 1{"a":2,"b":2,"c":2}
d.fetch(function (result) {console.log("2"+JSON.stringify(result));});
//prints 2{"a":3,"b":3,"c":4}

Promises

It would be cool if Parallel#spawn returned a promise object.

EDIT:

Or give RemoteReference a .promise() method.

IE10 can't create WebWorkers from blob:-URLs

IE10 throws a SecurityError when trying to instantiate a Worker from a blob URL.

According to this (http://stackoverflow.com/questions/10343913/how-to-create-a-web-worker-from-a-string) SO question, you have to use a wrapper script which basically calls eval on the input string.

File url generation should be relative to current document

Let's say I'm working with a file at http://localhost/foo/bar/index.html

And I do:

Parallel.require('myfile.js')

It should be smart enough to look in http://localhost/foo/bar/myfile.js

Instead of http://localhost/myfile.js

It would also be nice if we could support relative url commands, like '..' and '.'

So I could do

Parallel.require('../resources/myfile.js')

And have it point to: http://localhost/foo/resources/myfile.js

Not working in Chrome/Safari

Hello,

It looks that examples on demonstration page http://adambom.github.com/parallel.js/ do not work under Chrome and Safari.

Chrome error: Uncaught ReferenceError: URL is not defined

var RemoteRef = function (fn) {
    var str = wrap(fn),
        blob = new Blob([str], { type: 'text/javascript' }),
>>>     url = URL.createObjectURL(blob),
...

Safari error: TypeError: '[object BlobConstructor]' is not a constructor (evaluating 'new Blob([str], { type: 'text/javascript' })')

var RemoteRef = function (fn) {
    var str = wrap(fn),
>>>     blob = new Blob([str], { type: 'text/javascript' }),
        url = URL.createObjectURL(blob),
...

I hope it's easy to fix :)

No graceful degredation.

In order to be useful in the wild, parallel.js must gracefully degrade to single threaded / blocking code when when Worker in window is false.

Handle arguments better

Shouldn't have to pass an array of arguments

Allow shared state

Introduce a require method on remote references that allows you to "require" libraries in your workers.

Better error messaging for bad functions

The following code:

    for (i = 0; i < this.requiredFunctions.length; ++i) {
        if (this.requiredFunctions[i].name) {
            preStr += 'var ' + this.requiredFunctions[i].name + ' = ' + this.requiredFunctions[i].fn.toString() + ';';
        } else {
            preStr += this.requiredFunctions[i].fn.toString();
        }
    }

...will die with "TypeError: Cannot call method 'toString' of undefined" if one of the functions passed into require() is not defined properly. Please detect this condition and report it in a more usable manner.

Ability to close worker

Is the any ability to add to worker self.close();

Cannot chain map calls

var d = [1, 2, 3, 4], 
    sqr = function (n) { return n * n; }, 
    add = function (a, b) { return a + b; }, 
    log = function () { console.log(arguments); };

Paralell(d).map(sqr).map(sqr).reduce(add).then(log);

// logs 30, should log 354

Test for Promises/A spec

The CommonJS Promises/A spec should guarantee compatibility with all major promise libraries.

I'm thinking it might be best to have duplicates of all our API tests, but done with Q. This might be the best solution for now.

Need tests for require

I don't see any tests for require. Am I missing something?

Fix broken examples to work on web and Node

I'm not sure if the examples were only tested in node but trying in a browser throws RTEs. Can someone confirm this?

<!DOCTYPE html>
<html>
<head>
    <title>Parallel.js Test</title>

    <script type="text/javascript" src="../lib/paralleljs/lib/parallel.js"></script>

    <script type="text/javascript">
        var p = new Parallel('forwards');

        // Spawn a remote job (we'll see more on how to use then later)
        p.spawn(function (data) {
            data = data.reverse();

            console.log(data); // logs sdrawrof

            return data;
        }).then(function (data) {
          console.log(data) // logs sdrawrof
        });
    </script>
</head>

<body>
    <section id="content">...</content>
</body>
</html>

The JS is exactly from the examples but I get this error:
Uncaught TypeError: Object forwards has no method 'reverse'

If I move the console.log in the anonymous spawn method, it throws this error:
Uncaught ReferenceError: console is not defined

Changing the spawn to the following works like a charm though:

        p.spawn(function (data) {
            data = data.split("").reverse().join("");

            return data;
        }).then(function (data) {
          console.log(data) // logs sdrawrof
        });

Progress events

Hey.

I do heavy operation in worker method and want to notify caller about calculating progress. By using clean WebWorkers I can send postMessages with progress object.

Is there any way to send progress events from worker using Parallel?

Passing environment to functions

Hi,

I'm looking for a way to pass additional variables to functions that are used in map. Since you can't pass a closure to map that traps a variable I was thinking of passing the variable using:

var fn = function() {  /*...*/ }
fn.extraVar = 123;

Then inside the function I could access the variable using arguments:

var fn = function(a) {  
  var extraVar = arguments.callee.extraVar;
  return a + extraVar;
}

var p = new Parallel([0, 1, 2, 3]);
p.map(fn);

At the moment additional properties of functions are not serialized in Parallel but it seems like they could be.
Would this work and be of interest to the project? I'm happy to code it and submit a pull request.

Regards,

Access to the workers `.terminate()` API

I'd like to have timeouts, though it seems there is no way to cancel running workers?

ReUse Webworkers

Another feature that would be quite handy would be to reuse the WebWorkers. Currently I'm playing around with digit-calculation of π and that implies the same algorithm used for each digit-index. So it would come in quite handy to add an ability to use a already created Worker with different input-data.