michareiser / parallel.es Goto Github PK
View Code? Open in Web Editor NEWParallelize your JavaScript Applications with Ease
Home Page: https://michareiser.github.io/parallel.es/
License: MIT License
Parallelize your JavaScript Applications with Ease
Home Page: https://michareiser.github.io/parallel.es/
License: MIT License
This requires #55 and describes a potential approach.
The idea is that recursion functions like QuickSort can be implemented with a manual or automatic task scheduling similar to what the TPL supports (ContinueWith).
This requires that a task executed in the worker can schedule workers itself. requiring the parallel library again is possible, however, each worker uses than its own thread pool that spawns new workers...
To avoid this, a centralization of the thread pool information idea is.
A potential solution is to use SharedWorkers instead (if supported) allowing a communication between the workers. A sharred worker can be identified by a name. To maintain the Worker-State information (needed by the thread pool), a shared array buffer can be used (again, if supported). This buffer maintains which worker currently is idle and which are not (if work stealing is used, potentially how many tasks are queued in this task).
If SharedArrayBuffers are not supported, a main worker is needed that plays the roll of a master / coordinator and is just responsible to answer a worker which next worker is free (if any).
If neither of both technologies are supported, Recursion might still be possible but always requires a round trip back to the main thread.
This enables Continuation Support, Recursion Support and potentially a work stealing thread pool
Iteratee functions passed to parallel
should be extracted and registered as static functions.
I've got an example repo where I'm trying to get some simple examples working. I'm using webpack in a node environment. I would like to include lodash
functions in some of the parallel processing but right now I'm just trying to get the simplest example working.
Here is the function:
https://github.com/jefffriesen/parallel-es-node-example/blob/master/src/parallel-range.ts
Here is the error:
TypeError: __WEBPACK_IMPORTED_MODULE_0_parallel_es__.range is not a function
at Object.<anonymous> (/Users/jeffers/git/jefffriesen/parallel-es-node-example/dist/parallel-range.js:86:51)
at __webpack_require__ (/Users/jeffers/git/jefffriesen/parallel-es-node-example/dist/parallel-range.js:20:30)
at /Users/jeffers/git/jefffriesen/parallel-es-node-example/dist/parallel-range.js:66:18
at Object.<anonymous> (/Users/jeffers/git/jefffriesen/parallel-es-node-example/dist/parallel-range.js:69:10)
at Module._compile (module.js:571:32)
at Object.Module._extensions..js (module.js:580:10)
at Module.load (module.js:488:32)
at tryModuleLoad (module.js:447:12)
at Function.Module._load (module.js:439:3)
at Module.runMain (module.js:605:10)
parallel
is importing something. I can see it here:
Any suggestions?
I'm trying to get the basic example in the readme working. I've tried on node.js and in the browser. I thought this may be a regression so I tried both version 0.1.18 and 0.1.17 without getting correct results.
Not using ParallelEsPlugin. I've tried the version of this function without the .subscribe
as well with the same results.
const parallel = require('parallel-es')
parallel
.range(0, 10)
.map(value => value * value)
.subscribe((subresult, taskIndex) =>
console.log(`The result of the task ${taskIndex} is`, subresult)
)
.then(result => console.log(result))
error:
GET http://localhost:5000/worker-slave.parallel.js 404 (Not Found)
t.exports @ browser-commonjs.parallel.js:2
value @ browser-commonjs.parallel.js:1
node.js version 7.10, no webpack or transpilation. I've tried the version of this function without the .subscribe
as well with the same results
// You mentioned default on the require. I still had to use default for v0.1.18 to have parallel.range be found
const parallel = require('parallel-es').default
parallel
.range(0, 10)
.map(value => value * value)
.subscribe((subresult, taskIndex) =>
console.log(`The result of the task ${taskIndex} is`, subresult)
)
.then(result => console.log(result))
result
The result of the task 0 is [ null ]
The result of the task 8 is [ null ]
The result of the task 5 is [ null ]
The result of the task 9 is [ null ]
The result of the task 1 is [ null ]
The result of the task 4 is [ null ]
The result of the task 6 is [ null ]
The result of the task 7 is [ null ]
The result of the task 2 is [ null ]
The result of the task 3 is [ null ]
[ null, null, null, null, null, null, null, null, null, null ]
I spent quite a bit of time rewriting my Node.js scripts to take advantage of the parallelization of this library. There were lots of loops that I could try to optimize, but for simplicity I focused on the most expensive one. While I was rewriting I kept reminding myself that this is what your Webpack plugin does and appreciating the brilliance of Webpack itself (what would have been brilliant was me just trying to make your Webpack plugin with with Node.js). Sometime during that rewrite I found this 2012 article by Paul Graham that was re-posted to Hacker News:
http://www.paulgraham.com/ambitious.html
Frighteningly Ambitious Startup Ideas
6) Bring Back Moore's Law
The last 10 years have reminded us what Moore's Law actually says. Till about 2002 you could safely misinterpret it as promising that clock speeds would double every 18 months. Actually what it says is that circuit densities will double every 18 months. It used to seem pedantic to point that out. Not any more. Intel can no longer give us faster CPUs, just more of them.
This Moore's Law is not as good as the old one. Moore's Law used to mean that if your software was slow, all you had to do was wait, and the inexorable progress of hardware would solve your problems. Now if your software is slow you have to rewrite it to do more things in parallel, which is a lot more work than waiting.
It would be great if a startup could give us something of the old Moore's Law back, by writing software that could make a large number of CPUs look to the developer like one very fast CPU. There are several ways to approach this problem. The most ambitious is to try to do it automatically: to write a compiler that will parallelize our code for us. There's a name for this compiler, the sufficiently smart compiler, and it is a byword for impossibility. But is it really impossible? Is there no configuration of the bits in memory of a present day computer that is this compiler? If you really think so, you should try to prove it, because that would be an interesting result. And if it's not impossible but simply very hard, it might be worth trying to write it. The expected value would be high even if the chance of succeeding was low.
It got me thinking why I had to pick only 1 loop to optimize. Why isn't the computer (or Webpack) doing this for me? Related to this is choosing the optimal number of threads:
Therefore, the idea is to use a good default that you can override if it is unsuitable.
One could override it using trial and error based on heuristics (like you linked to). This could, in theory, be done by the computer. Later, you even said what I've been thinking in the back of my mind:
In general, parallelizing might help. However, I would first start by profiling your application to identify any bottlenecks.
What I'm getting at is this:
Let's say we can profile the code. We can track of how long each function takes, if it's parallelizable (like mapping over collections) and if those functions are mostly IO or CPU bound.
Then, based on the CPU and memory available, a Webpack plugin could go into the code and rewrite functions to use parallel-es
. It would ignore simple, fast functions because the up-front cost of parallelizing is greater than running them single-threaded. It may be optimized just on heuristics, or it may run multiple times in order to find the best solution (like my previous automated machine learning resource I posted before). Or a combination of both.
The Webpack plugin would likely need to output a config file. If the optimizing config file is found next time it runs it uses that. Otherwise it does the more expensive optimization step.
Optimizing based on hardware has a lot of advantages in a Node.js environment where the hardware is known. But there may still be a lot to gain in a browser-based environment. Maybe in most cases we can count on having 2-4 cores with graceful fallbacks. It's something to explore.
Profiling: There are quite a few profiling solutions on npm, but maybe the tool in our toolbox is the Chrome Inspector. The detail of data in that tool is incredible and it's exportable (at least manually). Maybe it could be run and captured programmaticaly.
I realize this is a pretty ambitious project. It's ambitious like in Paul Graham's essay. But projects like Prepack is an ambitious project than has significant implications for web and Node.js performance. Speedy.js also seems like an ambitious project. This seems, if it can work, worth it.
I'd be interested in your thoughts on this approach. Is it a good idea or even possible? (independent of if you had time or interest to take it on)
Thanks.
It's hard to determine the end of a parallel chain in the static code rewriter. This makes it difficult to call inEnvironment
at the appropriate time. Besides, user calls to inEnvironment
may conflict with added calls from the rewriter resulting in not well defined parallel chains --- that results in runtime errors.
Therefore, the api should be redefined to guarantee a safe usage of inEnvironment
and the static code rewriter together.
In my quest to parallelize some of my long-running functions, I'm figuring out some of this API. I'm on node, so the webpack compiling of outer variables into functions isn't available.
I was hoping I could do something like can do in lodash
. Here is a working lodash example:
function formatAddresses(zip, address) {
const {num, street, city} = address
return `${num} ${street} ${city} ${zip}`
}
const addresses = [
{num: '123', street: 'Main St.', city: 'Boulder'},
{num: '555', street: 'Elm St.', city: 'Boulder'},
{num: '100', street: '10th Ave.', city: 'Boulder'},
]
const zip = '80306'
const lodashAddresses = _(addresses)
.map(formatAddresses.bind(null, zip))
.value()
console.log('lodashAddresses: ', lodashAddresses)
// [ '123 Main St. Boulder 80306',
// '555 Elm St. Boulder 80306',
// '100 10th Ave. Boulder 80306' ]
Applying this approach to parallel-es
does not work:
parallel.from(addresses)
.map(formatAddressesBound.bind(null, zip))
.then(result => console.log(result))
This throws an error:
node-slave.parallel.js:1
(function (exports, require, module, __filename, __dirname) { !function(t,n){"object"==typeof exports&&"object"==typeof module?module.exports=n(require("process")):"function"==typeof define&&define.amd?define(["process"],n):"object"==typeof exports?exports["parallel-es"]=n(require("process")):t["parallel-es"]=n(t.process)}(this,function(t){return function(t){function n(r){if(e[r])return e[r].exports;var o=e[r]={i:r,l:!1,exports:{}};return t[r].call(o.exports,o,o.exports,n),o.l=!0,o.exports}var e={};return n.m=t,n.c=e,n.i=function(t){return t},n.d=function(t,e,r){n.o(t,e)||Object.defineProperty(t,e,{configurable:!1,enumerable:!0,get:r})},n.n=function(t){var e=t&&t.__esModule?function(){return t.default}:function(){return t};return n.d(e,"a",e),e},n.o=function(t,n){return Object.prototype.hasOwnProperty.call(t,n)},n.p="",n(n.s=173)}([function(t,n,e){"use strict";n.__esModule=!0,n.default=function(t,n){if(!(t instanceof n))th
SyntaxError: Unexpected identifier
at Function (<anonymous>)
at t.value (/Users/jeffers/git/jefffriesen/parallel-es-node-example/dist/node-slave.parallel.js:1:24998)
at t.value (/Users/jeffers/git/jefffriesen/parallel-es-node-example/dist/node-slave.parallel.js:1:24541)
at n.value (/Users/jeffers/git/jefffriesen/parallel-es-node-example/dist/node-slave.parallel.js:1:30793)
at n.value (/Users/jeffers/git/jefffriesen/parallel-es-node-example/dist/node-slave.parallel.js:1:28831)
at emitTwo (events.js:106:13)
at process.emit (events.js:194:7)
at process.nextTick (internal/child_process.js:766:12)
at _combinedTickCallback (internal/process/next_tick.js:73:7)
at process._tickCallback (internal/process/next_tick.js:104:9)
/Users/jeffers/git/jefffriesen/parallel-es-node-example/dist/node-slave.parallel.js:1
(function (exports, require, module, __filename, __dirname) { !function(t,n){"object"==typeof exports&&"object"==typeof module?module.exports=n(require("process")):"function"==typeof define&&define.amd?define(["process"],n):"object"==typeof exports?exports["parallel-es"]=n(require("process")):t["parallel-es"]=n(t.process)}(this,function(t){return function(t){function n(r){if(e[r])return e[r].exports;var o=e[r]={i:r,l:!1,exports:{}};return t[r].call(o.exports,o,o.exports,n),o.l=!0,o.exports}var e={};return n.m=t,n.c=e,n.i=function(t){return t},n.d=function(t,e,r){n.o(t,e)||Object.defineProperty(t,e,{configurable:!1,enumerable:!0,get:r})},n.n=function(t){var e=t&&t.__esModule?function(){return t.default}:function(){return t};return n.d(e,"a",e),e},n.o=function(t,n){return Object.prototype.hasOwnProperty.call(t,n)},n.p="",n(n.s=173)}([function(t,n,e){"use strict";n.__esModule=!0,n.default=function(t,n){if(!(t instanceof n))th
I realize this may not be allowed. I'm curious if you think it's a good idea? It follows a known javascript approach for those cases where you can't or don't want to use webpack.
Thanks
SharedArrayBuffer adds support for a shared memory in JavaScript. This could leverage the performance of message passing.
Add support for transferables.
The maxConcurrencyLevel used by the threadPool cannot be changed. Expose an api that allows changing the concurrency level --- potentially over parallel.defaultOptions().
To Consider
Allow to schedule tasks from a web worker thread
Add Cancelation support to task. This allows to cancel any pending tasks that have not yet been scheduled.
Tasks should be canceled if any promise has failed in the PromiseStream.
If the static transpiler is used, any variables accessed (read only) should be registered and provided in the environment. This also requires that the globals are provided and extracted from the environment
Setup Project with Flow, ESLint
The current implementation only supports browsers by using WebWorkers. To leverage code reusability, a worker backend for NodeJS should be implemented using the same API.
Existing libraries like lodash or underscore pass the index of the current element to the iteratee. This information might be useful.
Either it is hard to implement as the absolute index cannot safely be determined. E.g. in the following case:
const values = [1, 2, 3, 4, 5, 6, 7, 8, 9. 10];
parallel
.collection(values)
.filter(value => value > 5)
.map((value, index) => value * index)
.value()
.then(result => console.log(result));
The expected result is 6*0 + 7*1 + 8*2 + 8*3 + 9*4 + 10*5 = 133
.
The difficulty in this example is, that each slice runs independently. The second slice doesn't notice that the first 5 elements have been filtered out and therefore doesn't adjust the passed indexes by 5.
There are two solutions
Currently not that important therefore suspended until it comes up as requirement.
Integrate typedoc into the production build and update missing documentation.
This is needed to discuss the api design.
Our stack is built on Node.js and so naturally we reach for that for any data processing tasks we run into. However, we're starting to go beyond what Node is typically good at. We have a monster processing task that takes about 6 days (500k records with 1 second of compute per record). This is all CPU and memory tasks - almost no network and i/o. Everything is functionally written with lodash maps, reduce and filter so it's a good candidate to do in parallel.
I'm fairly inexperienced with threads vs cpus, concurrency, and parallelism but I've read a decent amount. People in the Java world talk about throwing 2500 threads at a problem and reducing the job from a day of processing to seconds. I'm not sure what Node.js can do but I'm curious.
Based on reading your source code and paper, it seems like by default you spawn 1 thread per CPU core. Is that a limitation of Node, your library, or is it not limited? Could I map over a function using 1000 threads with this library?
Thanks.
The coverage report is created over the transpiled ES5 Code. This is misleading. Therefore, the coverage should apply the source maps and show the coverage over the TS Code.
Document the API
Instead of an explicit start of the parallel processing, delay the processing until the nextTick (or wait for an explicit call to then, catch or subscribe). This allows to bundle as many call as possible for the moment but does not delay the execution until then is explicitly called.
Potentially add a configuration option that allows to disable this behavior and waits for an explicit then, catch or subscribe call
As the parallelized task might take a while to complete, displaying sub results may help to entertain the user in the meantime. Therefore, the ui thread should have access to the sub results created by the web workers.
Add a streaming api that allows subscribing to sub results.
Suggested API.
Parallel.filter, map and so return a ParallelStream that is a PromiseLike object that also exposes the subscribe(onNext, onError, onComplete)
method, similar to RxJS Observable. The onNext handler is invoked for every resolved element, not strictly in order.
reduce
only returns a promise like object as reduce in itself is a streaming function and therefore only the end result is beneficial.
Describe the Programing Model in the Wiki and Discuss with supervisor
Add Cancellation support to task and parallel. This allows to cancel already scheduled tasks.
Implement an additional overload of the parallel.times
method that accepts a constant value instead of a generator that is repeated n times.
parallel.times(1000, 0).value().then(val => console.log(val));
Creates an array containing 1000 zeros (yep, there are more efficient ways todo so).
Debugging is an important future to locate bugs. The current implementation does not support debugging of iteratee functions as the functions are (de-) serialized without containing source maps.
Add source maps to the transferred functions to add debugging support of Iteratee functions. This is a transpile time only features as line and column numbers cannot be determined at runtime.
Setup two performance measurement environments.
The first is integrated into the example page and measures the time needed to execute a single example. Therefore a synchronous and parallel implementation for each example is needed.
The second measurement performs multiple runs over all examples and displays the results in a table. It would be beneficial if the page has a progress report (at least on console)
Most often the processing logic not only depends upon the value, but also on additional data. Normally such data is either passed to the function or is part of the functions closure. As the function cannot access any data from it's closure if executed on a worker, the data needs to be passed explicitly to the iteratee. Therefore, extend the parallel interface to support argument passing.
It should be possible to pass additional arguments that apply for all iteratees in the ParallelChain. Additional, it should be possible to pass additional - or override existing - arguments for a single operation in the parallel chain.
parallel.environemnt({ test: true})
that defines the arguments for the whole chain.environment
that is accepted by all operations of parallel chain (except value) and the parallel operations.If the parallel chain is created using the generator functions from
, range
or times
, no access to the shared parameters is possible.
Possible solution: Add additional parameter environment to the generator functions. These are globally for the whole chain (equal as calling environment).
Migrate to a Testing Framework that can be executed on node and in the browser as well.
Add missing nodejs tests.
ES6 allows destructuring of function parameters, e.g.
export function knightTours(startPath, { board, boardSize }) {}
Such a function is currently not correctly seralized / deserialized and therefore fails when being executed on a worker.
Add support for functions using destructuring.
Android 5 tests are not running on browser stack.
The browser fails to retrieve the karma test runner page.
Define and implement how a parallel task should determine the number of workers to spawn. Should the number be defined by the number of elements in the array? Is an option parameter needed that defines a ratio (one worker for x items) / maximum or minimum?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.