Comments (14)
Deprecated support for executable containers: #529
from nextflow.
+1
from nextflow.
I want to implement this feature request however I'm still wondering what's the best way to handle it.
The main problem is that a Nextflow task is supposed to execute an arbitrary piece of BASH script (or any other interpreter specified in the command shebang declaration). For this reason the command is wrapped into a script file in order to execute it.
Moreover I'm supporting the idea that pipeline tasks should be able to run natively or with Docker containers in a transparent manner, without having to modify the tasks code but simply modifying the configuration file.
An executable container break this approach because it runs a user defined task, thus it cannot be used to execute a script. However I agree that Nextflow should be able to manage them.
My proposal is that by default Nextflow sets the option --entrypoint=/bin/bash
on the Docker run command so that the container can executed independently the image ENTRYTPOINT
definition.
In order to support the usage of executable containers in Nextflow tasks, it could possible to add a configuration option, let call it executable
(or inline
?). When specifying it to true
Nextflow won't override the image entrypoint and the command will be appended to the Docker run command line instead of wrap it into a script file. For example:
process foo {
container 'repo/name' executable true
"""
<command line to append to container run>
"""
}
Still remain the problem how to interpret a multiline string in such case. An option would be to raise an error, another possibility would be to launch a Docker run for each line in the command, which will be supposed to be a multiple container executions.
Thoughts?
from nextflow.
Do you think we can running an executable Docker image a special case of running any single executable? There's nothing special about Docker images per se - I think maybe the "feature" here would be the ability to run something without going through bash
.
I don't know, maybe there's some additional edge case where non-Docker users also want the ability to avoid potential vulnerabilities like Shellshock or something.
from nextflow.
Despite of docker, I would like to share my thought on multiple lines/strings script, frankly speaking, I am not fan of supporting multiple lines script since it brings business logic into process and pipeline level, which should be avoided. I believe doing one thing at each process is a better and more graceful design philosophy. User can encapsulate those in a bash script and put it into pipeline script as one-line command. Back to docker issue, I would like to suggest to be compatible with common docker images. It means we should not force docker images developers to follow specific rules to make them adaptable to nextflow. Keep a specific version of docker image makes no sense. If bash script is not avoidable, is there any solution to add it in the runtime to docker image and leave the original docker image untouched after the process is finished? This could be a docker adapter in some sense. This is OMHO.
from nextflow.
In principle I may agree with you about multilines command. However with Nextflow we didn't want to have a strict approach, the the choice is left to the developer to adopt the approach the is better for him.
Regarding the Docker issue, my proposal is to override by default the image entry point (if any) at launch time, adding the --entrypoint /bin/bash
option to the run command line. In this way any Docker image can be used to run nextflow script.
In the case a user want to use an executable image, I'm suggesting to add a directive executable true
in the process definition. When doing this the process command string will be appended to the Docker run command (instead of creating a bash script).
from nextflow.
Do you think it's worthwhile to treat running an executable Docker container as just being a special case of running a single executable instead of a bash
script? It seems like it might be cleaner that way.
from nextflow.
@taion I think that sounds like the most appropriate way. Then one has the choice of using Docker as a generic executor directive like now, or can just call an executable docker container like any other executable. In fact, is there anything preventing one from doing just that right now without any changes? I haven't tried executable docker images myself yet, but I imagine the only possible obstacle here would be mounting the working directory into the docker container.
from nextflow.
Sure - since the current executor runs a bash
script, you can do whatever you want in the script itself. It might be nice to have a specific concept that is more specialized than that, though.
from nextflow.
Right. I just looked into it a bit more. The only real inconvenience seems to be mounting into the container. Much depends on how the image was actually built, and somehow anticipating where one would mount a host path to inside the container. It's not clear to me how that would be done without requiring that the container's executing process is somehow aware of where that host mount point will be made.
What would be really nice is if there were a way to automatically handle cli passed file inputs into the executable container. ie:
docker run myexecutable /local/path/to/inputfile
# or
docker run myexecutable ./inputfile
And have any output files generated inside the container end up in the host cwd. Maybe with some kind of wrapper. That's getting outside the scope of both docker and nextflow though.
I suppose the simplest solution would be to allow the Docker executor to distinguish between a user defined process or a standalone executable container in the pipeline-process definition.
process foobar {
container 'my/nonexecutable'
input:
file in
output:
file out
"""
script.sh $in $out
"""
}
vs
process foobar {
container 'my/executable'
input:
file("/path/to/input/in/container") from inputs
output:
file("/path/to/output/in/container") into outputs
// need to specify the arguments following 'docker run my/executable ...'
container.exec("--input /path/to/input/in/container --output /path/to/output/in/container")
}
from nextflow.
Looking at that.. it doesn't really simplify any more than just:
process foobar {
input:
file in from inputs
output:
file output into outputs
"""
docker run \
-v $in:/path/to/input/in/container \
-v $out:/path/to/output/in/container \
my/executable \
--input /path/to/input/in/container \
--output /path/to/output/in/container
"""
}
from nextflow.
Fair enough. I do feel like it would be ideal not to specify some of those things like the volume mappings as currently set up, but I agree it's not that big a deal in practice.
from nextflow.
Nextflow already mounts automatically the required input/output paths when executes a process in a docker container. So it will do the same for an executable container. It won't be necessary to explicitly specify the mounts.
Indeed, I'm struggling to find a coherent and elegant syntax to handle this use case. I don't like any more first proposal I made on top of this thread.
I am more inclined to one of the following solutions. The first is similar to the one sketched by @andrewcstewart, and consist using a kind of template method as shown below:
process foo {
input:
file x
script:
dockerRun "my/container", "--any --other --param"
}
Basically the dockerRun
is a kind of template method that will return the required docker run command line to be included in the bash wrapper. The container
directive isn't required anymore and the container can be parametrised with a generic input or params value.
A second solution proposed by @emi80 looks like the following:
process foo {
container true
input:
file x
script:
"""
my/container --any --other --param
"""
}
In this case the idea is to use a special value for the container
directive (for example just the boolean true
) in order to instruct the framework to manage the process as the run of an executable container whose name (along with the parameters) is specified in the script string.
from nextflow.
This feature has been include in version 0.13.4
http://www.nextflow.io/docs/latest/docker.html#executable-containers
from nextflow.
Related Issues (20)
- Console run is broken HOT 8
- Changes in object input structure use cached values rather than starting new runs for exec HOT 1
- Setting maxForks to a negative number makes nextflow hang forever HOT 4
- Pipe (|) does not work with subscribe, dot (.) does
- Azure Batch should support disk directive HOT 4
- Slurm: Add config scope for `-A` (account / project) HOT 1
- Standardized "versions" output format and handling
- Importing operator from plugin in multiple modules doesn't work
- workflow alias in same nf file?
- nextflow 24 fails to submit slurm jobs, unclear what the issue is. HOT 8
- how to change process work dir? HOT 11
- add a "version" output to Nextflow Process HOT 1
- clusterOptions does not work with CLI flags (LSF executor) HOT 3
- Provide a special class for pipeline-defined error messages HOT 4
- `resources/usr/bin` modules binaries loading not working on Google Cloud HOT 3
- Passing an empty value channel to a process causes nextflow to hang HOT 1
- nextflow run hello: ERROR ~ a fault occurred in an unsafe memory access operation HOT 4
- Ability to use 'name' attribute with path() output HOT 2
- Config-defined params using interpolated strings are stored as GStrings instead of Strings as of #4840
- Directory not found universc
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nextflow.