Giter Club home page Giter Club logo

Comments (3)

MichaReiser avatar MichaReiser commented on May 24, 2024

I'm fairly inexperienced with threads vs cpus, concurrency, and parallelism but I've read a decent amount. People in the Java world talk about throwing 2500 threads at a problem and reducing the job from a day of processing to seconds.

I would say it depends. First, on how many cores your computer has, the amount of memory available and if your task is CPU or I/O bound. The point is, if your computer only has two cores it will not help if you create a thousand threads --- except your tasks are heavily I/O bound, but in this case, I suggest to use async APIs --- that are not yet supported by the library. The point is, the higher the ratio of threads per core, the more threads will compete against each other to get as much time as possible to perform the computation. This results in frequent switching of the running threads per CPU which can have quite significant overhead resulting even in a higher runtime. Determining the optimum for the number of threads is problem specific.

Overall, for CPU bound problems you can expect in the best case that the computation time is linear, meaning, the runtime can be computed using = single threaded runtime / number of cores. However, this is the best case where each task is independent and spawning the threads, and inter-process communication is small. In case of parallel.es the amount of time needed to spawn the task and return the result to the main process can have significant overhead, depending on the amount of data that needs to be transfered (requires serialization).

Based on reading your source code and paper, it seems like by default you spawn 1 thread per CPU core.

This is just the default behavior that shows to be a good number for many tasks. However, this approaches fits better for some problems than for other. In the worst case, it might even result in a slower runtime than if executed synchronously. Therefore, the idea is to use a good default that you can override if it is unsuitable. The library itself defines no limit meaning that you can choose an arbitrary positive integer number. The library allows you to configure the number of tasks to create as well as the number of threads used:

// this changes the number of threads for all tasks (globally shared thread pool instance)
parallel.defaultOptions().threadPool.maxThreads = 2500;

The number of items computed per task (the task number can be larger than the number of threads, per default, it creates as many tasks as threads are available):

parallel.from(array, { minValuesPerTask: 1000, maxValuesPerTask: 20000 })

Creating more tasks than threads may improve performance for nonlinear problems where the computation per item depends upon the arguments (e.g. the mandelbrot, the rows in the middle require more computation time than those at the top or bottom). By creating more tasks than threads, it allows performing a form of load balancing. For linear problems it often reduces the throughput as more tasks result in more management overhead. However, maybe you're interested in smaller sub results in which case you favor visible progress over throughput.

Is that a limitation of Node

As v8 threads are using native threads, it might also be an OS limitation (or the operating system gets unstable). I don't know of any limits in Node. But I do know that some browsers might set limits on the number of workers that can be spawned.

Your welcome

Update:

In general, parallelizing might help. However, I would first start by profiling your application to identify any bottlenecks. One might exactly be the functional programming. Invoking functions can be quite expensive.

from parallel.es.

jefffriesen avatar jefffriesen commented on May 24, 2024

I would say it depends. First, on how many cores your computer has, the amount of memory available and if your task is CPU or I/O bound.

I will probably end up running this on cloud compute resources. I would probably just rent the 64 cpu, high-performance, high-memory machines.

This results in frequent switching of the running threads per CPU which can have quite significant overhead resulting even in a higher runtime. Determining the optimum for the number of threads is problem specific ... Creating more tasks than threads may improve performance for nonlinear problems where the computation per item depends upon the arguments (e.g. the mandelbrot, the rows in the middle require more computation time than those at the top or bottom). By creating more tasks than threads, it allows performing a form of load balancing. For linear problems it often reduces the throughput as more tasks result in more management overhead. However, maybe you're interested in smaller sub results in which case you favor visible progress over throughput.

This is all really interesting and educational. Thank you. It would be cool to get some of this information in the readme.

It also reminds me of something else I've been looking into. Here's a great article on automated machine learning tools. Much of the chore of picking the right machine learning architecture and parameters can be mostly automated. Prediction error rates are knowable and objective. You can have the code run through lots of different scenarios and give you a summary of the time and error rates achieved.

I wonder if we could approach determining the most efficient number of threads for a process in a similar way. Run the function (maybe with a subset of the data) with different numbers of threads and see what runs fastest. It could do a binary-search type approach were it starts with 1 thread, 1x CPU count, 10x CPU count, 100x CPU count. Find the fastest two and then start trying options between those. Once you deploy the function on new hardware, run the thread check first, then set the thread count for production use. It could save a ton of time and money. It could be an interesting addition to the library.

In general, parallelizing might help. However, I would first start by profiling your application to identify any bottlenecks. One might exactly be the functional programming. Invoking functions can be quite expensive

Definitely I will do this more. But it's helpful to get the overview like you did of how threads can work in Javascript. As an anecdote, I originally wrote the script in a more imperative forEach, let and push type approach. Re-writing it functionally (without any optimizations) added about 10% to the processing time.

from parallel.es.

MichaReiser avatar MichaReiser commented on May 24, 2024

I wonder if we could approach determining the most efficient number of threads for a process in a similar way. Run the function (maybe with a subset of the data) with different numbers of threads and see what runs fastest. It could do a binary-search type approach were it starts with 1 thread, 1x CPU count, 10x CPU count, 100x CPU count

This sounds interesting as a standalone tool. The primarz target of the library was originally the browser where every user has a different hardware. Something I intended to add is a heuristic like the one the .net framework is using. By using a heuristic the library would tune itself at runtime to create the optimum of threads.

Re-writing it functionally (without any optimizations) added about 10% to the processing time.

That is interesting. Push in its own is problematic as it might require reallocating the whole array. For performance sensitive code it is a good practice to initialize the arrays with a length --- if the length is known.

from parallel.es.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.