🚀 Feature Request What problem is this feature looking t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thinking of all the args we'd need for this. <code class="notr

[FEAT] merlin one-off about merlin HOT 5 CLOSED

llnl commented on June 16, 2024 1

[FEAT] merlin one-off

from merlin.

Comments (5)

jpbrodsky commented on June 16, 2024 1

@koning mentioned this may be another concept related to one-off execution: running a single merlin task from the queue.

Something like:

merlin run-one myworkflow.yaml

would retrieve the first task in the merlin queue, run that task, and finish. In every situation I can imagine, you'd just run the task in the current shell (rather than using srun or equivalent--if you want to do this inside a batch allocation, you could srun merlin run-one)

Ideally, this would ask celery to retrieve and run a single task, then quit. If celery does not have this feature, presumably a run-one feature is still possible by manually querying the task queue, running the queue.

This could take a -n argument to run n tasks then quit.
It could also take a -i argument to run the i-th task, though I don't see why you'd need this.

For context: see the Fireworks feature rlaunch singleshot, which does this: retrieves a single task from the queue and executes it.

from merlin.

jwhite242 commented on June 16, 2024 1

For the one-off part of the feature discussion, this reminds me of something I'd wanted to add to maestro a while back and had forgotten about. I just made the ticket here before forgetting again: LLNL/maestrowf#293. The gist of it is enabling extra arguments to pass through to the scheduler, such as slurms' delayed execution time (presumably other schedulers have something similar). This would make it pretty easy to use the inherited script adapter to make and submit these one-off scripts. That still leaves parsing out what a one off is to dump into this script: whole step, single sample in a step, or all steps in a sample's chain.

Also, for a one off, does it make sense to run any of this through the servers vs just submitting the single batch script? I know celery has features for this but it seems like a lot of extra unnecessary steps to kickstart that vs a single script.

from merlin.

ben-bay commented on June 16, 2024

Thinking of all the args we'd need for this.

--local for consistency
--args equivalent to merlin.resources.workers.<worker_name>.args
--queue name of the task queue to use
--block or --noblock

Thing is, we currently need a yaml spec for run-workers so would we want an alternate, simplified run-workers command too?

from merlin.

koning commented on June 16, 2024

In the merlin world the "run" argument is synonymous with "define tasks". Nothing will run at that point, only when workers are started will the tasks be run.
So in your case is:
merlin run-one myworkflow.yaml
defining a single task from the yaml or a single instance of the DAG in the yaml file (all steps for one sample)?

We would need to change the producer/consumer concept for merlin run-one to actually run a step. Maybe it could alos start local workers in that case.

from merlin.

jpbrodsky commented on June 16, 2024

We could call this concept merlin consume-one for now--versus the example at the top of this issue which looks like produce-one. It consumes one task from the queue associated with the provided yaml file. It should be as close as possible to merlin run-workers, except instead of continually consuming tasks it will stop after a single task is consumed.

Why would somebody want this? Suppose the DAG looks like:
Single setup -> many samples step 1 -> step 2 for each sample.

I might want to run the setup step and check that things look correct.

Or, perhaps I've finished all the step 1s, but I haven't started the step 2s (because step 2 uses a different machine, and I haven't launched any workers on that machine). Maybe I'd like to run a step 2 individually to check if it's working before spinning up a whole bunch of workers.

Or perhaps I come back after a couple of days to see that my batch sessions have timed out and there are still a few step 1s left in the queue. I could run a single one of those immediately to see what the issue is.

from merlin.

[FEAT] merlin one-off about merlin HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent