Our stack is built on Node.js and so naturally we reach for that for any data processi

Processes with a large number of threads about parallel.es HOT 3 CLOSED

michareiser commented on May 24, 2024

Processes with a large number of threads

from parallel.es.

Comments (3)

MichaReiser commented on May 24, 2024

I'm fairly inexperienced with threads vs cpus, concurrency, and parallelism but I've read a decent amount. People in the Java world talk about throwing 2500 threads at a problem and reducing the job from a day of processing to seconds.

I would say it depends. First, on how many cores your computer has, the amount of memory available and if your task is CPU or I/O bound. The point is, if your computer only has two cores it will not help if you create a thousand threads --- except your tasks are heavily I/O bound, but in this case, I suggest to use async APIs --- that are not yet supported by the library. The point is, the higher the ratio of threads per core, the more threads will compete against each other to get as much time as possible to perform the computation. This results in frequent switching of the running threads per CPU which can have quite significant overhead resulting even in a higher runtime. Determining the optimum for the number of threads is problem specific.

Overall, for CPU bound problems you can expect in the best case that the computation time is linear, meaning, the runtime can be computed using = single threaded runtime / number of cores. However, this is the best case where each task is independent and spawning the threads, and inter-process communication is small. In case of parallel.es the amount of time needed to spawn the task and return the result to the main process can have significant overhead, depending on the amount of data that needs to be transfered (requires serialization).

Based on reading your source code and paper, it seems like by default you spawn 1 thread per CPU core.

This is just the default behavior that shows to be a good number for many tasks. However, this approaches fits better for some problems than for other. In the worst case, it might even result in a slower runtime than if executed synchronously. Therefore, the idea is to use a good default that you can override if it is unsuitable. The library itself defines no limit meaning that you can choose an arbitrary positive integer number. The library allows you to configure the number of tasks to create as well as the number of threads used:

// this changes the number of threads for all tasks (globally shared thread pool instance)
parallel.defaultOptions().threadPool.maxThreads = 2500;

The number of items computed per task (the task number can be larger than the number of threads, per default, it creates as many tasks as threads are available):

parallel.from(array, { minValuesPerTask: 1000, maxValuesPerTask: 20000 })

Creating more tasks than threads may improve performance for nonlinear problems where the computation per item depends upon the arguments (e.g. the mandelbrot, the rows in the middle require more computation time than those at the top or bottom). By creating more tasks than threads, it allows performing a form of load balancing. For linear problems it often reduces the throughput as more tasks result in more management overhead. However, maybe you're interested in smaller sub results in which case you favor visible progress over throughput.

Is that a limitation of Node

As v8 threads are using native threads, it might also be an OS limitation (or the operating system gets unstable). I don't know of any limits in Node. But I do know that some browsers might set limits on the number of workers that can be spawned.

Your welcome

Update:

In general, parallelizing might help. However, I would first start by profiling your application to identify any bottlenecks. One might exactly be the functional programming. Invoking functions can be quite expensive.

from parallel.es.

jefffriesen commented on May 24, 2024

I would say it depends. First, on how many cores your computer has, the amount of memory available and if your task is CPU or I/O bound.

I will probably end up running this on cloud compute resources. I would probably just rent the 64 cpu, high-performance, high-memory machines.

This results in frequent switching of the running threads per CPU which can have quite significant overhead resulting even in a higher runtime. Determining the optimum for the number of threads is problem specific ... Creating more tasks than threads may improve performance for nonlinear problems where the computation per item depends upon the arguments (e.g. the mandelbrot, the rows in the middle require more computation time than those at the top or bottom). By creating more tasks than threads, it allows performing a form of load balancing. For linear problems it often reduces the throughput as more tasks result in more management overhead. However, maybe you're interested in smaller sub results in which case you favor visible progress over throughput.

This is all really interesting and educational. Thank you. It would be cool to get some of this information in the readme.

It also reminds me of something else I've been looking into. Here's a great article on automated machine learning tools. Much of the chore of picking the right machine learning architecture and parameters can be mostly automated. Prediction error rates are knowable and objective. You can have the code run through lots of different scenarios and give you a summary of the time and error rates achieved.

I wonder if we could approach determining the most efficient number of threads for a process in a similar way. Run the function (maybe with a subset of the data) with different numbers of threads and see what runs fastest. It could do a binary-search type approach were it starts with 1 thread, 1x CPU count, 10x CPU count, 100x CPU count. Find the fastest two and then start trying options between those. Once you deploy the function on new hardware, run the thread check first, then set the thread count for production use. It could save a ton of time and money. It could be an interesting addition to the library.

In general, parallelizing might help. However, I would first start by profiling your application to identify any bottlenecks. One might exactly be the functional programming. Invoking functions can be quite expensive

Definitely I will do this more. But it's helpful to get the overview like you did of how threads can work in Javascript. As an anecdote, I originally wrote the script in a more imperative forEach, let and push type approach. Re-writing it functionally (without any optimizations) added about 10% to the processing time.

from parallel.es.

MichaReiser commented on May 24, 2024

I wonder if we could approach determining the most efficient number of threads for a process in a similar way. Run the function (maybe with a subset of the data) with different numbers of threads and see what runs fastest. It could do a binary-search type approach were it starts with 1 thread, 1x CPU count, 10x CPU count, 100x CPU count

This sounds interesting as a standalone tool. The primarz target of the library was originally the browser where every user has a different hardware. Something I intended to add is a heuristic like the one the .net framework is using. By using a heuristic the library would tune itself at runtime to create the optimum of threads.

Re-writing it functionally (without any optimizations) added about 10% to the processing time.

That is interesting. Push in its own is problematic as it might require reallocating the whole array. For performance sensitive code it is a good practice to initialize the arrays with a length --- if the length is known.

from parallel.es.

Processes with a large number of threads about parallel.es HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent