zincware / dask4dvc Goto Github PK
View Code? Open in Web Editor NEWUse dask to run the DVC Graph
License: Apache License 2.0
Use dask to run the DVC Graph
License: Apache License 2.0
Currently there are no tests for runs that fail
E.g. with the ´ipsuite.calculators.CP2KSingelPoint´ it seems the Node is only using a single core in a LocalCluster
Do not rely on a dedicated Cluster but allow starting a e.g. Slurm Cluster together with dask4dvc <cmd>
To avoid
failed with Could not acquire lock after 10 tries.
If you run dask4dvc repro will it cleanup existing files first?
Add dask4dvc -d
to detach. This can be useful, e.g. when running on a cluster.
Have some way of running custom shell scripts as a setup before running the stage. Probably through dask4dvc.yaml
You can actually use the dvc
graph which makes things even easier.
@znflow.nodify
def submit_zntrack_node(name, cls, *args):
outs = subprocess.run(["dvc", "repro", "--dry", name], capture_output=True)
if outs.stdout.decode().startswith(f"Stage '{name}' is cached - skipping run, checking out outputs"):
dvc.cli.main(["checkout", name])
return
node = cls.from_rev(name=name, results=False)
node.run()
node.save(parameter=False)
# TODO retry commit, because of lock
for _ in range(10):
code = dvc.cli.main(["commit", "-f", name])
if code == 0:
break
graph = znflow.DiGraph()
mapping = {}
for node_uuid in project.graph.reverse():
node = project.graph.nodes[node_uuid]["value"]
predecessors = list(project.graph.predecessors(node.uuid))
if len(predecessors) == 0:
with graph:
node = submit_zntrack_node(node.name, type(node))
mapping[node_uuid] = node
else:
with graph:
node = submit_zntrack_node(node.name, type(node), *[mapping[predecessor] for predecessor in predecessors])
mapping[node_uuid] = node
deployment = znflow.deployment.Deployment(graph=graph, client=client)
deployment.submit_graph()
If you have many Nodes but most of them haven't changed you might only want to submit the Nodes that truely changed.
You could use e.g. dvc status
to check if the Node, that would be submitted actually has changed outputs or not.
dask4dvc/dask4dvc/dvc_repro.py
Line 161 in c44a98c
dask4dvc/dask4dvc/dvc_repro.py
Lines 138 to 144 in c44a98c
When using dask4dvc clone
the git apply patch
fails, claiming the patch would be corrupted.
E.g. for following custom outputs while the experiment is running printing the path can be helpful
We could e.g. use some environmental variables or don't remove previous experiements, etc.
Just for a small speed up, this would be nice to have.
Currently there is no way of setting the resources required per Node. There are ways to define this for the client.submit
via https://distributed.dask.org/en/stable/resources.html.
My Idea would be, to add a dask4dvc.yaml
file (or alternatively the meta
key in dvc.yaml
https://distributed.dask.org/en/stable/resources.html) to define the resources that should be acquired for the respective Node.
node1:
GPU: 1
MEMORY: 100e9
node2:
MEMORY: 16e9
node3:
GPU: 3
MEMORY: 128e9
we parse this file, if it exists and use it here:
dask4dvc/dask4dvc/utils/graph.py
Lines 96 to 101 in 86396b2
Furthermore, we might also want to have a general section e.g. for https://jobqueue.dask.org/en/latest/generated/dask_jobqueue.SLURMCluster.html
cluster:
slurm:
project: "my-project"
Line 87 in c44a98c
By default, stdout/stderr is written to e.g. slumr-xxxx.out
. It would be a good addition to show that output when running dask4dvc repro
There might be an issue with stages that only create git tracked outputs like the hash form zn.nodes
EDIT: they are dvc outs but the directory does not exist prior to the stage running. If that is the case, maybe create the directory before dvc stage add
so the gitignore can be git tracked.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.