parallel-js / parallel.js Goto Github PK
View Code? Open in Web Editor NEWEasy multi-core processing utilities for Node.
License: MIT License
Easy multi-core processing utilities for Node.
License: MIT License
Looks like underscore is being used to provide utility functions only, some of the function is already available in ECMAScript 5 (e.g. bind
), and some of them are not hard to write.
It would means more bytes in parallel.js but it means less payload when use the script in the browser.
The gh-pages branch is currently linked with the master branch because it got branched from it. It also contains the whole parallel branch. To clean it up:
git checkout --orphan gh-pages
Recreate only necessary files.
git push -f
Is that okay with you @adambom?
Take this case:
var p = new Parallel([0, 1, 2, 3, 4, 5, 6, 7, 8]);
function add(d) { return d[0] + d[1]; }
function factorial(n) { return n < 2 ? 1 : n * factorial(n - 1); }
p.require(factorial)
p.map(function (n) { return Math.pow(10, n); }).reduce(add);
Throws 'Uncaught TypeError: Object 216.6666666666 has no method splice', line 258. Looks like we're trying to call splice on a number.
I'm working on docs and I'm going to assume this should be working for the purposes of documentation.
I want to eventually move away from exposing the syntax spawn and mapreduce.
I kind of like the query language used by RethinkDB. It would be nice to have something like this:
var sqr = function (n) { return n * n; },
add = function (a, b) { return a + b; },
log = function (a) { console.log(a); },
data = [10, 20, 30, 40];
var r = Parallel.data(data)
.map(sqr)
.reduce(add)
.fetch(log);
Each step in this chain returns a wrapped object that can be treated like a promise. Data constructs a "distributed array", so to speak, that gets operated on by the functions that follow it.
I could express this equivalently as:
r = Parallel.map(sqr, data).reduce(add).fetch(log);
Notice that in the map call data is passed after the mapping function. This is intentional, so that multiple arguments or a single atomic argument may be passed instead of an array. It will also allow use to partially apply operations to our chained functions:
Parallel.mixin({
square: _.partial(Parallel.map, sqr)
});
r = Parallel.data(data).square().fetch() // returns the values of data, squared
Spawn would basically work the same way:
r = Parallel.data(1, 2, 3).spawn(function (a, b, c) { return a + b + c; });
// equivalently:
r = Parallel.spawn(function (a, b, c) { return a + b + c; }, 1, 2, 3);
Since I just saw 546849f with the updated readme and options, we need to make sure the options have a clear purpose:
path
(legacyEvalPath
): Will be defined in node.js environment (since it's needed anyways), but will default to null in browser environments. If the path is set, it will fall back to this file in case of a cross-site-origin restriction (i.e. IE10). It will therefore merge the meaning with the current ie10shim
option.maxWorkers
: Is obvious and will stay this way.synchronous
: If webworkers are not available at all (i.e. IE < 10, etc), fall back to synchronous operations (with an initial setTimeout, currently no setTimeouts afterwards, they may be possible with .map() though).Ping: @adambom - Please add the remaining options to the readme.
It's about time to get a test suite up. I'm partial to Jasmine, but I'm open to other alternatives.
Is this code licensed? Maybe I missed it but I can't find what the code is licensed under if anything at all ;-)
At the moment if the web browser doesn't support workers at all, an error will be thrown.
There should be an option to enable a blocking execution of the operation, or continue throwing an error.
This represents a regression since the rewrite.
This will allow us to run stuff in the cloud as opposed to locally.
The API would introduce an additional method, connect
, that takes two arguments:
err
, and conn
. conn
is an instance of a new class called Connection
.The Connection
class implements the spawn and mapreduce interfaces except instead of creating web workers, it issues commands to the remote server.
We'd also have to introduce a web server for hosting parallel on a node server somewhere in the cloud. You could start the server using a basic CLI.
I apologize if this is absolutely trivial. I am learning parallel and javascript at the same time. I know it's not recommended. It's a long story. Anyway I thought you may want to know this happens going naively through the parallel web site examples. Detail cmds and errors below. The missing path exists up to parallel.js, but as you can imagine it is not a directory. cwd is /Users/antonio. I am on OS X.
apc-2:~ antonio$ node
var Parallel = require('parallel.js');
undefined
Parallel
{ mapreduce: [Function],
spawn: [Function],
require: { [Function] state: { files: [], funcs: [] } } }
var sqrt = function (n) { return Math.sqrt(n); };
undefined
var r = Parallel.spawn(sqrt, 100);
Error: ENOENT, no such file or directory '/Users/antonio/node_modules/parallel.js/tmp/file.js'
at Object.fs.openSync (fs.js:427:18)
at Object.fs.writeFileSync (fs.js:966:15)
at Object.URL.createObjectURL (/Users/antonio/node_modules/parallel.js/url.js:16:6)
at new RemoteRef (/Users/antonio/node_modules/parallel.js/parallel.js:66:27)
at Object.DistributedProcess.mapper as spawn
at repl:1:18
at REPLServer.self.eval (repl.js:110:21)
at repl.js:249:20
at REPLServer.self.eval (repl.js:122:7)
at Interface. (repl.js:239:12)
I have thousands chunks of the data which come on the fly and I need to process.
If I'm doing for every chunk:
var p = new Parallel();
p.parallel.spawn(function (input) { .. })
The system starts to slow down until my Chrome crush...
I tried to memoize the instance and change p.data before calling spawn again which avoided the snaps, the input is not reliable...
Take memoized fibonacci:
Where we store computed values in closure scope...
var p = new Parallel([0, 1, 2, 3, 4, 5, 6]),
fib = (function () {
var memo = {};
return function (n) {
if (n < 2) {
return 1;
};
memo[n] = memo[n] || fib(n - 1) + fib(n - 2);
return memo[n];
}
})(),
log = function () { console.log(arguments); };
p.map(fib).then(log)
Fails with the error, memo is not defined
Just checked the code, need add "return this" to enable require function chainable, to run the examples on http://adambom.github.io/parallel.js/
If you include a file without passing in an evalPath, then you always get the error:
'Uncaught Error: Can't use required scripts without eval.js!'
While informative, this error is still annoying because most browsers are capable of serializing and importing files correctly. It seems wrong to be dumbing this down to support IE.
Is there any way we could pull this off?
The one chunk per core way that map reduce uses seems a bit strange, it basically means I end up having to do a reduce in the map as well as in the reduce, it might make sense for the syntax to just have you specify a number of threads and a single array of values.
You could also set it up so that a central process hands out the data and when a worker is finished it requests more so that if your data is unevenly sized, the wisdom of that is going to depend on message passing overheads, but it would also allow you to incrementally add data. I whipped together a gist about what I was trying to say which quickly got out of control
Hi Adam and Seb!
I'm appreciate your amazing product. You are rock guys.
But want discuss one question.
So we process huge amount of data and map/reduce interface is magic gift for me))
So code below is common snippet for image processing
data
.map(cb)
.map(myMagicFunc)
.then(next);
But sometimes i want get data after first round of processing map(cb)
so i propose to add map middleware as callback function.
data
.map(cb, middleware)// here i can process first round data before second round is started
.map(myMagicFunc)
.then(next);
Please write what are you thinking about and if my adea
@adambom See https://ci.testling.com/#hook.
dacbb08 should make the repo compatible.
I need to execute functions in "parallel" and I use parallel.js:
var p = new Parallel(items);
var fn1 = function (item) {
doSomething(item);
};
p.map(fn1).then(function () {
otherFunction();
});
But IE shows the following error:
[Q] Unhandled rejection reasons (should be empty): (no stack) SecurityError
HTML7007: One or more blob URLs were revoked by closing the blob
for which they were created. These URLs will no longer resolve as
the data backing the URL has been freed.
How to fix this error?
I had review parallel.js page in IE and all examples work fine.
I use Durandal, Breeze and Knockout.
In Firefox shows the following error:
[Q] Unhandled rejection reasons (should be empty): ["(no stack) [Exception..... location: ""]"]
and in Google Chrome no shows error, but parallel.js no work.
If I do something like:
_.range(10000).map(function(i) {
return new Parallel(data[i]).map(transform);
});
then at the moment, parallel.js will attempt to create 10000*4 workers at once, when it should really only create 4 workers (by default), and delay creating the others until the first lot have finished.
Very nice library! The examples are really good in terms of how to use the library - I'm just curious about where this has been put to use. Maybe add a wiki page or something for users to add to?
With yeoman installed:
$ npm install -g parallel.js
npm http GET https://registry.npmjs.org/parallel.js
npm http 304 https://registry.npmjs.org/parallel.js
unbuild [email protected]
npm WARN unmet dependency .../homebrew/share/npm/lib/node_modules/yo/node_modules/yeoman-generator requires lodash@'~1.3.0' but will load
npm WARN unmet dependency .../homebrew/share/npm/lib/node_modules/yo/node_modules/lodash,
npm WARN unmet dependency which is version 1.1.1
npm WARN unmet dependency .../homebrew/share/npm/lib/node_modules/yo/node_modules/insight requires async@'~0.1.22' but will load
npm WARN unmet dependency .../homebrew/share/npm/lib/node_modules/yo/node_modules/async,
npm WARN unmet dependency which is version 0.2.9
npm WARN unmet dependency .../homebrew/share/npm/lib/node_modules/yo/node_modules/insight requires lodash@'~1.0.0-rc.2' but will load
npm WARN unmet dependency .../homebrew/share/npm/lib/node_modules/yo/node_modules/lodash,
npm WARN unmet dependency which is version 1.1.1
Removing yeoman (and bower):
$ npm uninstall -g yo
unbuild [email protected]
$ npm uninstall -g yeoman-generator
unbuild [email protected]
$ npm uninstall -g bower
unbuild [email protected]
$ npm install -g parallel.js
npm http GET https://registry.npmjs.org/parallel.js
npm http 304 https://registry.npmjs.org/parallel.js
unbuild [email protected]
[email protected] .../homebrew/share/npm/lib/node_modules/parallel.js
Ok, good.
Yet still can't require parallel.js:
$ echo 'var p = require("parallel.js");' | node
module.js:340
throw err;
^
Error: Cannot find module 'parallel.js'
at Function.Module._resolveFilename (module.js:338:15)
at Function.Module._load (module.js:280:25)
at Module.require (module.js:364:17)
at require (module.js:380:17)
at [stdin]:1:9
at Object.<anonymous> ([stdin]-wrapper:6:22)
at Module._compile (module.js:456:26)
at evalScript (node.js:532:25)
at Socket.<anonymous> (node.js:154:11)
at Socket.EventEmitter.emit (events.js:117:20)
I've even tried a fresh homebrew [email protected] install and same require failure.
When we require script into webWorker and map to any function Parallel says us about evalPath to evaluate mapping function to string and re-eval it. No doubt it's good idea but there is problem for chrome package apps(ChromeOS). Please see http://developer.chrome.com/extensions/contentSecurityPolicy.
As i see the main decision for using eval.js is cross-domain policy for web workers in IE 10. So we propose to use eval.js only for browser needs it and use blobs for all modern browsers we can prepare fix for it or continue discussion.
try {
var blob = new Blob([src], { type: 'text/javascript' });
var url = URL.createObjectURL(blob);
wrk = new Worker(url);
} catch (e) {
if (this.options.evalPath !== null) { // blob/url unsupported, cross-origin error
wrk = new Worker(this.options.evalPath);
wrk.postMessage(src);
} else {
throw e;
}
}
}
The code below works perfect for chrome app so 'browser detecting' fix will be quick decision. Thanks IE!
The title might be a bit misleading, if someone has a better explanation I'll replace it.
What I mean is the following:
You might have a basic input dataset, for example a base image in a pixel format; the input dataset.
Afterwards you modify this input dataset with the worker and its arguments. You have the ability to call the worker multiple times with different arguments.
After you're done with the calculations you can retrieve the input dataset.
Let me explain with a bit of code:
var inputData = new Uint8Array([1,2,3]);
var worker = Parallel.prepare(inputData, function(input, arg) {
for (var i=0; i < input.length; ++i) {
input[i] += arg;
}
});
worker.work(1);
worker.work(5);
worker.fetchInput(function(input) {
console.log(input); // Outputs: Uint8Array([7, 8, 9])
});
When I try to use paralleljs in browserify with the following code:
// test.js
var Parallel = require('paralleljs');
var p = new Parallel('forwards');
p.spawn(function (data) {
data = data.split('').reverse().join('');
console.log(data); // logs sdrawrof
return data;
}).then(function (data) {
console.log(data) // logs sdrawrof
});
And then I compile it:
$ npm install -g browserify
$ browserify test.js > bundle.js
And then I include it in a simple HTML page:
<!DOCTYPE html>
<script src="bundle.js"></script>
I get the following errors in Chrome:
Uncaught Error: Cannot find module '/node_modules/paralleljs/lib/Worker.js' bundle.js:1
s bundle.js:1
(anonymous function) bundle.js:1
(anonymous function) bundle.js:54
os bundle.js:358
s bundle.js:1
(anonymous function) bundle.js:1
paralleljs bundle.js:360
s bundle.js:1
e bundle.js:1
(anonymous function)
I know you can just include parallels directly as a script tag, but would be nice to use this within browserify.
I want to make this work locally, running in either the browser or node.
To make it run in node is simple. Need to create a package and simply require phantom.js. I think this will be useful for most users who want to do parallel computing but don't really want to fire up a web page to do it.
We are still missing "good" documentation for the rewrite-branch. I'm absolutely terrible at writing docs.
@adambom Could you take over this issue, or are you AWOL again?
I have the following function:
var dateranges = ["01/01/2013", "02/02/2013"];
var p = new Parallel(dateranges, {evalPath: "_lib/eval.js" });
p.require("_lib/jquery.js");
p.map(worker).then(function (data){ console.log(data); });
and the following worker:
worker: function(daterange)
{
return $.get("csp/acb/CAC.wsJSON.cls");
}
this produces the error: Uncaught ReferenceError: window is not defined
I'm used to formulating reduce jobs like so:
[1, 2, 3].reduce(function (prev, curr, index, arr) { return prev + curr }, 0);
See the MDN docs: https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Array/Reduce
The way we're doing it now is passing in a big lump of data every time.
I'd like to change this if it's ok with you @Sebmaster
Is it possible to use parallel.js
with function that have take a callback to deliver the results?
It would be really useful to post an Ajax example on the documentation. I am having a hard time implementing it and the examples on the web do not use paralell.js and I have not being able to implement them on parallel.js.
I am especially interested in using it to make simultaneous AJAX calls using map
Thanks.
Hi, this gave me a headache.
I managed to create a race condition, where the callback give to the fetch method was never called.
I finally found a possible reason: There is a chance, that a spawned process is reading ./tmp/file.js
while it being written again. In worst case it reads the file just between being cleared and rewritten, so reading an empty file, doing nothing at all.
My solution would be to create individual files and delete them again on calls to either fetch or terminate.
Do you prefer a pull request, if so based on which branch?
so take the map and reduce functions:
var mapFunc = function (w){
var out = {};
var letters = w.toLowerCase().split("");
letters.forEach(function(l){
if(l in out){
out[l]++;
}else{
out[l]=1;
}
});
return out;
},
reduce1 = function (a,b){
var out = {};
[a,b].forEach(function(c){
for(var key in c){
if(key in out){
out[key]+=c[key];
}else{
out[key]=c[key];
}
}
});
return out;
},
reduce2 = function (a,b){
for(var key in b){
if(key in a){
a[key]+=b[key];
}else{
a[key]=b[key];
}
}
return a;
}
reduce1 works fine no matter how you call it
var d = Parallel.mapreduce(mapFunc , reduce1, ["ab", "ca", "bc"],function (result) { console.log("1"+JSON.stringify(result));});
//prints 1{"a":2,"b":2,"c":2}
d.fetch(function (result) {console.log("2"+JSON.stringify(result));});
//prints 2{"a":2,"b":2,"c":2}
reduce2 on the other hand returns the wrong answer, but only when called through fetch but when returned through a callback;
var d = Parallel.mapreduce(mapFunc , reduce2, ["ab", "ca", "bc"],function (result) { console.log("1"+JSON.stringify(result));});
//prints 1{"a":2,"b":2,"c":2}
d.fetch(function (result) {console.log("2"+JSON.stringify(result));});
//prints 2{"a":3,"b":3,"c":4}
It would be cool if Parallel#spawn
returned a promise object.
EDIT:
Or give RemoteReference
a .promise()
method.
IE10 throws a SecurityError when trying to instantiate a Worker from a blob URL.
According to this (http://stackoverflow.com/questions/10343913/how-to-create-a-web-worker-from-a-string) SO question, you have to use a wrapper script which basically calls eval on the input string.
Let's say I'm working with a file at http://localhost/foo/bar/index.html
And I do:
Parallel.require('myfile.js')
It should be smart enough to look in http://localhost/foo/bar/myfile.js
Instead of http://localhost/myfile.js
It would also be nice if we could support relative url commands, like '..' and '.'
So I could do
Parallel.require('../resources/myfile.js')
And have it point to: http://localhost/foo/resources/myfile.js
Hello,
It looks that examples on demonstration page http://adambom.github.com/parallel.js/ do not work under Chrome and Safari.
Chrome error: Uncaught ReferenceError: URL is not defined
var RemoteRef = function (fn) {
var str = wrap(fn),
blob = new Blob([str], { type: 'text/javascript' }),
>>> url = URL.createObjectURL(blob),
...
Safari error: TypeError: '[object BlobConstructor]' is not a constructor (evaluating 'new Blob([str], { type: 'text/javascript' })')
var RemoteRef = function (fn) {
var str = wrap(fn),
>>> blob = new Blob([str], { type: 'text/javascript' }),
url = URL.createObjectURL(blob),
...
I hope it's easy to fix :)
In order to be useful in the wild, parallel.js must gracefully degrade to single threaded / blocking code when when Worker in window is false.
Shouldn't have to pass an array of arguments
Introduce a require method on remote references that allows you to "require" libraries in your workers.
The following code:
for (i = 0; i < this.requiredFunctions.length; ++i) {
if (this.requiredFunctions[i].name) {
preStr += 'var ' + this.requiredFunctions[i].name + ' = ' + this.requiredFunctions[i].fn.toString() + ';';
} else {
preStr += this.requiredFunctions[i].fn.toString();
}
}
...will die with "TypeError: Cannot call method 'toString' of undefined" if one of the functions passed into require() is not defined properly. Please detect this condition and report it in a more usable manner.
Is the any ability to add to worker self.close();
var d = [1, 2, 3, 4],
sqr = function (n) { return n * n; },
add = function (a, b) { return a + b; },
log = function () { console.log(arguments); };
Paralell(d).map(sqr).map(sqr).reduce(add).then(log);
// logs 30, should log 354
The CommonJS Promises/A spec should guarantee compatibility with all major promise libraries.
I'm thinking it might be best to have duplicates of all our API tests, but done with Q. This might be the best solution for now.
I don't see any tests for require. Am I missing something?
I'm not sure if the examples were only tested in node but trying in a browser throws RTEs. Can someone confirm this?
<!DOCTYPE html>
<html>
<head>
<title>Parallel.js Test</title>
<script type="text/javascript" src="../lib/paralleljs/lib/parallel.js"></script>
<script type="text/javascript">
var p = new Parallel('forwards');
// Spawn a remote job (we'll see more on how to use then later)
p.spawn(function (data) {
data = data.reverse();
console.log(data); // logs sdrawrof
return data;
}).then(function (data) {
console.log(data) // logs sdrawrof
});
</script>
</head>
<body>
<section id="content">...</content>
</body>
</html>
The JS is exactly from the examples but I get this error:
Uncaught TypeError: Object forwards has no method 'reverse'
If I move the console.log
in the anonymous spawn
method, it throws this error:
Uncaught ReferenceError: console is not defined
Changing the spawn
to the following works like a charm though:
p.spawn(function (data) {
data = data.split("").reverse().join("");
return data;
}).then(function (data) {
console.log(data) // logs sdrawrof
});
Hey.
I do heavy operation in worker method and want to notify caller about calculating progress. By using clean WebWorkers I can send postMessages with progress object.
Is there any way to send progress events from worker using Parallel?
Hi,
I'm looking for a way to pass additional variables to functions that are used in map
. Since you can't pass a closure to map
that traps a variable I was thinking of passing the variable using:
var fn = function() { /*...*/ }
fn.extraVar = 123;
Then inside the function I could access the variable using arguments
:
var fn = function(a) {
var extraVar = arguments.callee.extraVar;
return a + extraVar;
}
var p = new Parallel([0, 1, 2, 3]);
p.map(fn);
At the moment additional properties of functions are not serialized in Parallel but it seems like they could be.
Would this work and be of interest to the project? I'm happy to code it and submit a pull request.
Regards,
I'd like to have timeouts, though it seems there is no way to cancel running workers?
Another feature that would be quite handy would be to reuse the WebWorkers. Currently I'm playing around with digit-calculation of ฯ and that implies the same algorithm used for each digit-index. So it would come in quite handy to add an ability to use a already created Worker with different input-data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.