Docker images encapsulating individual applications are generally set up with an <code

Deprecated support for executable containers: <a class="issue-link js-issue-link" data

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Support for executable Docker images about nextflow HOT 14 CLOSED

nextflow-io commented on May 20, 2024

Support for executable Docker images

from nextflow.

Comments (14)

GordianDziwis commented on May 20, 2024 3

Deprecated support for executable containers: #529

from nextflow.

alartin commented on May 20, 2024

from nextflow.

pditommaso commented on May 20, 2024

I want to implement this feature request however I'm still wondering what's the best way to handle it.

The main problem is that a Nextflow task is supposed to execute an arbitrary piece of BASH script (or any other interpreter specified in the command shebang declaration). For this reason the command is wrapped into a script file in order to execute it.

Moreover I'm supporting the idea that pipeline tasks should be able to run natively or with Docker containers in a transparent manner, without having to modify the tasks code but simply modifying the configuration file.

An executable container break this approach because it runs a user defined task, thus it cannot be used to execute a script. However I agree that Nextflow should be able to manage them.

My proposal is that by default Nextflow sets the option --entrypoint=/bin/bash on the Docker run command so that the container can executed independently the image ENTRYTPOINT definition.

In order to support the usage of executable containers in Nextflow tasks, it could possible to add a configuration option, let call it executable (or inline ?). When specifying it to true Nextflow won't override the image entrypoint and the command will be appended to the Docker run command line instead of wrap it into a script file. For example:

process foo {
    container 'repo/name' executable true

    """
    <command line to append to container run>  
    """
}

Still remain the problem how to interpret a multiline string in such case. An option would be to raise an error, another possibility would be to launch a Docker run for each line in the command, which will be supposed to be a multiple container executions.

Thoughts?

from nextflow.

taion commented on May 20, 2024

Do you think we can running an executable Docker image a special case of running any single executable? There's nothing special about Docker images per se - I think maybe the "feature" here would be the ability to run something without going through bash.

I don't know, maybe there's some additional edge case where non-Docker users also want the ability to avoid potential vulnerabilities like Shellshock or something.

from nextflow.

alartin commented on May 20, 2024

Despite of docker, I would like to share my thought on multiple lines/strings script, frankly speaking, I am not fan of supporting multiple lines script since it brings business logic into process and pipeline level, which should be avoided. I believe doing one thing at each process is a better and more graceful design philosophy. User can encapsulate those in a bash script and put it into pipeline script as one-line command. Back to docker issue, I would like to suggest to be compatible with common docker images. It means we should not force docker images developers to follow specific rules to make them adaptable to nextflow. Keep a specific version of docker image makes no sense. If bash script is not avoidable, is there any solution to add it in the runtime to docker image and leave the original docker image untouched after the process is finished? This could be a docker adapter in some sense. This is OMHO.

from nextflow.

pditommaso commented on May 20, 2024

In principle I may agree with you about multilines command. However with Nextflow we didn't want to have a strict approach, the the choice is left to the developer to adopt the approach the is better for him.

Regarding the Docker issue, my proposal is to override by default the image entry point (if any) at launch time, adding the --entrypoint /bin/bash option to the run command line. In this way any Docker image can be used to run nextflow script.

In the case a user want to use an executable image, I'm suggesting to add a directive executable true in the process definition. When doing this the process command string will be appended to the Docker run command (instead of creating a bash script).

from nextflow.

taion commented on May 20, 2024

Do you think it's worthwhile to treat running an executable Docker container as just being a special case of running a single executable instead of a bash script? It seems like it might be cleaner that way.

from nextflow.

andrewcstewart commented on May 20, 2024

@taion I think that sounds like the most appropriate way. Then one has the choice of using Docker as a generic executor directive like now, or can just call an executable docker container like any other executable. In fact, is there anything preventing one from doing just that right now without any changes? I haven't tried executable docker images myself yet, but I imagine the only possible obstacle here would be mounting the working directory into the docker container.

from nextflow.

taion commented on May 20, 2024

Sure - since the current executor runs a bash script, you can do whatever you want in the script itself. It might be nice to have a specific concept that is more specialized than that, though.

from nextflow.

andrewcstewart commented on May 20, 2024

Right. I just looked into it a bit more. The only real inconvenience seems to be mounting into the container. Much depends on how the image was actually built, and somehow anticipating where one would mount a host path to inside the container. It's not clear to me how that would be done without requiring that the container's executing process is somehow aware of where that host mount point will be made.

What would be really nice is if there were a way to automatically handle cli passed file inputs into the executable container. ie:

docker run myexecutable /local/path/to/inputfile
# or
docker run myexecutable ./inputfile

And have any output files generated inside the container end up in the host cwd. Maybe with some kind of wrapper. That's getting outside the scope of both docker and nextflow though.

I suppose the simplest solution would be to allow the Docker executor to distinguish between a user defined process or a standalone executable container in the pipeline-process definition.

process foobar {
  container 'my/nonexecutable'
  input:
    file in
  output:
    file out

  """
  script.sh $in $out
  """
}

process foobar {
  container 'my/executable'
  input:
    file("/path/to/input/in/container") from inputs
  output:
    file("/path/to/output/in/container") into outputs

  // need to specify the arguments following 'docker run my/executable ...'
  container.exec("--input /path/to/input/in/container --output /path/to/output/in/container")
}

from nextflow.

andrewcstewart commented on May 20, 2024

Looking at that.. it doesn't really simplify any more than just:

process foobar {
  input:
    file in from inputs
  output:
    file output into outputs

  """
  docker run \
  -v $in:/path/to/input/in/container \ 
  -v $out:/path/to/output/in/container \
  my/executable \
  --input /path/to/input/in/container \
  --output /path/to/output/in/container
  """
}

from nextflow.

taion commented on May 20, 2024

Fair enough. I do feel like it would be ideal not to specify some of those things like the volume mappings as currently set up, but I agree it's not that big a deal in practice.

from nextflow.

pditommaso commented on May 20, 2024

Nextflow already mounts automatically the required input/output paths when executes a process in a docker container. So it will do the same for an executable container. It won't be necessary to explicitly specify the mounts.

Indeed, I'm struggling to find a coherent and elegant syntax to handle this use case. I don't like any more first proposal I made on top of this thread.

I am more inclined to one of the following solutions. The first is similar to the one sketched by @andrewcstewart, and consist using a kind of template method as shown below:

process foo {
  input: 
  file x 

  script: 
  dockerRun "my/container", "--any --other --param"
}

Basically the dockerRun is a kind of template method that will return the required docker run command line to be included in the bash wrapper. The container directive isn't required anymore and the container can be parametrised with a generic input or params value.

A second solution proposed by @emi80 looks like the following:

process foo {
  container true 
  input: 
  file x 

  script: 
  """
  my/container --any --other --param
  """
}

In this case the idea is to use a special value for the container directive (for example just the boolean true) in order to instruct the framework to manage the process as the run of an executable container whose name (along with the parameters) is specified in the script string.

from nextflow.

pditommaso commented on May 20, 2024

This feature has been include in version 0.13.4

http://www.nextflow.io/docs/latest/docker.html#executable-containers

from nextflow.

Support for executable Docker images about nextflow HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent