Hi, this looks like a cool project - I'm looking forward to trying it more extensively

Thank you for your feedback and suggestions! I will have a look at <code class="notran

submitting future objects by reference about torc_py HOT 6 OPEN

jcmgray commented on July 28, 2024

submitting future objects by reference

from torc_py.

Comments (6)

phadjido commented on July 28, 2024

Thank you for your feedback and suggestions! I will have a look at dask and ray to see the details of this mechanism.
I am pretty sure it can be implemented in torcpy. As far as I understand, the above example demonstrates some kind of task dependencies and pipeline execution.

However, I have some comments and questions:
a) I can see some limitation in the example above, at least at the API level and due to the usage of the pool object. Why f2 (bar) cannot be spawned (spawned) within f1 (foo)?
b) In the above example, the main code can access the result of f1. Is that meaningful?

from torc_py.

jcmgray commented on July 28, 2024

Let me give a few more illustrative examples! (I'll use a pool = dask.distributed.Client() here so the code runs, but I guess ideally I'm suggesting one might do pool = torcpy eventually):

import numpy as np
import random
import time
from dask.distributed import Client
pool = Client()

Imagine we have some function that iterates some object x, but has varying runtime:

def foo(x, t):
    time.sleep(random.uniform(0, 1.0) * t)
    return x + t

say we want to iterate some objects xs to time t=5, rather than doing in single chunks we can do (5 * (t=1)):

xs = [np.random.randn(100, 100) for _ in range(4)]
fs = xs
for _ in range(5):
    for f in fs:
        fs = [pool.submit(foo, f, t=1) for f in fs]
rs = [f.result() for f in fs]

with the intermediate state of each x being passed from worker to worker (or itself) directly. I haven't thought about it much but I imagine you can do this kind of thing by spawning tasks within the workers as well.

Where you (I think?) can't do that is combining futures like, e.g.

from operator import matmul

a = pool.submit(np.random.randn, 1000, 1000)
b = pool.submit(np.random.randn, 1000, 1000)
c = pool.submit(np.random.randn, 1000, 1000)
d = pool.submit(np.random.randn, 1000, 1000)

ab = pool.submit(matmul, a, b)
cd = pool.submit(matmul, c, d)
abcd = pool.submit(matmul, ab, cd)
y = pool.submit(np.trace, abcd)
y.result()

here the current worker only ever needs to retrieve the single scalar y.

from torc_py.

phadjido commented on July 28, 2024

Thank you for the examples. I come from the High Performance Computing and C worlds, so I need some time to process and accept what I see. However, I understand the need for task dependencies and OpenMP supports them too.
Since there is support for callbacks and nested tasks in torcpy, I think that the above functionality can be implemented.

Apparently, the results of the first future (passed by reference) must match the input arguments of the second one.
Passing a normal variable (e.g. numpy array) or a future object to the same function is somehow weird, at least for me.

The second example shows the advantages of this mechanism: it offers super flexibility to the user.
My only problem is that the user loses control of the execution but obviously this can be managed efficiently by the runtime system. So, before implementing this mechanism, I will need to see first where dask stores all these intermediate results because transferring them from worker to worker can cost a lot.

A possible solution using torcpy follows. What you see below can have an equivalent implementation in OpenMP, of course in a more compact form due to the compiler support. Please note that I do not specify the target worker queue and I do not enable work stealing either. Moreover, the main task is executed by worker 0, which participates in the execution of the tasks. In fact, the tasks in torcpy are completely decoupled from the workers.

def taskf():
    c = torc.submit(np.random.randn, 1000, 1000)  # spawn task for c
    d = np.random.randn, 1000, 1000)               # no need for second task here
    torc.waitall()                                # wait for c (d is ready)
    cd = matmul(c.result(), d)                    # c*d
    return cd

def main():
    cd = torc.submit(taskf)                       # submit c*d
    a = torc.submit(np.random.randn, 1000, 1000)  # spawn task for a
    b = np.random.randn, 1000, 1000)              # no need for task here
    torc.wait([a])                                # wait for a (b is ready)
    ab = matmul(a.result(), b)                    # a*b
    torc.wait([cd])                               # wait for c*d
    abcd = matmul(ab, cd.result())                # ab*cd
    y = np.trace(abcd)                            # y
    print(y)

if __name__ == "__main__":
    torc.start(main)

from torc_py.

phadjido commented on July 28, 2024

The above functionality can be easily implemented if the task results are stored in a shared object storage. In that case, you do not care where the result of a task has been produced.

This is probably the case for ray and maybe dask-distributed. You can get lost when you look into their implementation.
For torcy, I used less than 1000 lines of code and I do not exclude native MPI code...

from torc_py.

jcmgray commented on July 28, 2024

Indeed ray and dask are very big projects! Dask works with a centralized scheduler for tasks, so I think the results of tasks are kept by the workers and transferred around as needed (the scheduler tells them what to do). Whereas ray I think is built on a some kind of shared memory database.

It may be that torcpy is not a good fit to this approach in which case, there are probably ways to do equivalent things, like you suggest! My thought was just that having this same functionality would make it easier to try as a 'drop-in' for dask or ray.

from torc_py.

phadjido commented on July 28, 2024

The idea is very nice so I think I should try to implement it, using some kind of shared memory based on MPI one-sided communication. Thank you again for this discussion!

from torc_py.

submitting future objects by reference about torc_py HOT 6 OPEN

Comments (6)

Related Issues (2)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent