The data-parallelism from juliafolds

Picking a proper parallel library

Hello,

It is nice to see that the manual is becoming more and more comprehensive with time!

However, the section of mentioning other parallel libraries makes me wonder about how to select a parallel library to use in practice. There have been already ~10 libraries aiming at better usage of either multi-threading or multi-core parallelism besides the basic ones mentioned in Julia's official document; many of them provide more or less the same functionalities, which makes it somehow harder for users to choose from. Maybe this is also a sign that many more can be done in this category, and eventually one package will show up.

What is your opinion about this? Will there be a standard MPI or OpenMP like library in Julia? In the future if I want to build a massively parallel project using Julia, a good parallel library will be a solid building block. I know you are also an active developer in this field, so it is good to hear from an expert!

Questions about the tutorial

Hi,

This is a very nice tutorial! I have some questions and also comments after going through it:

I tried the example of letter count in the mapreduce section. On my Mac with Julia 1.5.1, the performance is a little bit surprising.

With 1 thread:

@btime f1 = mapreduce(x -> Dict(x => 1), mergewith!(+), str)
  8.830 μs (203 allocations: 27.66 KiB)
@btime f2 = ThreadsX.mapreduce(x -> SingletonDict(x => 1), mergewith!!(+), str)
  36.834 μs (308 allocations: 20.67 KiB)

With 4 threads:

@btime f1 = mapreduce(x -> Dict(x => 1), mergewith!(+), str)
  9.466 μs (203 allocations: 27.66 KiB)
@btime f2 = ThreadsX.mapreduce(x -> SingletonDict(x => 1), mergewith!!(+), str)
  55.702 μs (1347 allocations: 86.23 KiB)

Shouldn't the threaded version be faster? Is the workload here too small to show the speedup? I guess there is threading launching overhead, but are these numbers normal?

In the Practical example: Stopping time of Collatz function section,

julia> Threads.nthreads()  # I started `julia` with `-t 4`
4

julia> using BenchmarkTools

julia> @btime map(collatz_stopping_time, 1:100_000);
  18.116 ms (2 allocations: 781.33 KiB)

julia> @btime ThreadsX.map(collatz_stopping_time, 1:100_000);
  5.391 ms (1665 allocations: 7.09 MiB)

With 4 threads, why is the total memory usage ~10 times larger? Is it useful in general to check the memory usage for parallel programs?

The section Practical example: Histogram of stopping time of Collatz function shows a more complicated usage of the FLoops package, which is a little bit harder to follow without knowing the package ahead.

juliafolds / data-parallelism Goto Github PK

data-parallelism's People

Contributors

Stargazers

Watchers

Forkers

data-parallelism's Issues

Picking a proper parallel library

Questions about the tutorial

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent