Giter Club home page Giter Club logo

dask4dvc's People

Contributors

pre-commit-ci[bot] avatar pythonfz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dask4dvc's Issues

Check with `zn.nodes`

There might be an issue with stages that only create git tracked outputs like the hash form zn.nodes

EDIT: they are dvc outs but the directory does not exist prior to the stage running. If that is the case, maybe create the directory before dvc stage add so the gitignore can be git tracked.

Check CPU usage

E.g. with the ´ipsuite.calculators.CP2KSingelPoint´ it seems the Node is only using a single core in a LocalCluster

show stdout/stderr

By default, stdout/stderr is written to e.g. slumr-xxxx.out. It would be a good addition to show that output when running dask4dvc repro

Test fails

Currently there are no tests for runs that fail

Check before submitting to a Cluster

If you have many Nodes but most of them haven't changed you might only want to submit the Nodes that truely changed.
You could use e.g. dvc status to check if the Node, that would be submitted actually has changed outputs or not.

Spawn Cluster

Do not rely on a dedicated Cluster but allow starting a e.g. Slurm Cluster together with dask4dvc <cmd>

Handle Worker Resources

Currently there is no way of setting the resources required per Node. There are ways to define this for the client.submit via https://distributed.dask.org/en/stable/resources.html.

My Idea would be, to add a dask4dvc.yaml file (or alternatively the meta key in dvc.yaml https://distributed.dask.org/en/stable/resources.html) to define the resources that should be acquired for the respective Node.

node1:
  GPU: 1
  MEMORY: 100e9

node2:
  MEMORY: 16e9

node3:
  GPU: 3
  MEMORY: 128e9

we parse this file, if it exists and use it here:

dask_node = client.submit(
cmd,
name=node, # required
deps=deps, # required
pure=False,
**kwargs,

Furthermore, we might also want to have a general section e.g. for https://jobqueue.dask.org/en/latest/generated/dask_jobqueue.SLURMCluster.html

cluster:
   slurm:
      project: "my-project"

support module load

Have some way of running custom shell scripts as a setup before running the stage. Probably through dask4dvc.yaml

dvc repro in parallel

You can actually use the dvc graph which makes things even easier.

@znflow.nodify
def submit_zntrack_node(name, cls, *args):

    outs = subprocess.run(["dvc", "repro", "--dry", name], capture_output=True)
    if outs.stdout.decode().startswith(f"Stage '{name}' is cached - skipping run, checking out outputs"):
        dvc.cli.main(["checkout", name])
        return

    node = cls.from_rev(name=name, results=False)
    node.run()
    node.save(parameter=False)
    # TODO retry commit, because of lock
    for _ in range(10):
        code = dvc.cli.main(["commit", "-f", name])
        if code == 0:
            break

graph = znflow.DiGraph()
mapping = {}
for node_uuid in project.graph.reverse():
    node = project.graph.nodes[node_uuid]["value"]
    predecessors = list(project.graph.predecessors(node.uuid))
    if len(predecessors) == 0:
        with graph:
            node = submit_zntrack_node(node.name, type(node))
        mapping[node_uuid] = node
    else:
        with graph:
            node = submit_zntrack_node(node.name, type(node), *[mapping[predecessor] for predecessor in predecessors])
        mapping[node_uuid] = node

deployment = znflow.deployment.Deployment(graph=graph, client=client)
deployment.submit_graph()

detached mode

Add dask4dvc -d to detach. This can be useful, e.g. when running on a cluster.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.