yuch7 / cwlexec Goto Github PK
View Code? Open in Web Editor NEWA new open source tool to run CWL workflows on LSF
License: Other
A new open source tool to run CWL workflows on LSF
License: Other
#37 seems to be resolved now with my minimal example but when I ran my larger workflow again it still failed. After dissecting each piece I found that an unrelated parameter seems to be causing this same error.
When I change the example CLT / WF from #37 and allow the CLT to have an optional boolean
and then edit the WF to interpret the boolean as z: {valueFrom: $(true)}
I get the same std
error I did in #37. However, the new z
parameter is (I think) completely unrelated to the parameter involved with the std
error.
command (just the usual):
cwlexec -p --workdir /home/user/<username>/output/ TranscriptsAnnotation-i5only-wf.cwl TranscriptsAnnotation-i5only-wf.test.job.yaml
cwlexec fails at the scattered functionalAnalysis step and reports the following:
[15:49:15.857] INFO - The step (functionalAnalysis/runInterproscan) scatter of 1 jobs.
[15:49:15.857] INFO - Started job (functionalAnalysis/runInterproscan_1) with
bsub \
-cwd \
/home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/runInterproscan/scatter1 \
-o \
%J_out \
-e \
%J_err \
-env \
all,TMPDIR=/home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4 \
-R \
mem > 8192 \
-n \
3 \
/bin/sh -c 'interproscan.sh --outfile /home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/runInterproscan/transcript-01.p2_transcript-01.p2.i5_annotations --disable-precalc --goterms --pathways --tempdir /home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/runInterproscan --input /home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/splitSeqs/transcript-01.p2_transcript-01.p2.fasta --applications PfamA --formats TSV'
[15:49:15.877] INFO - Job (functionalAnalysis/runInterproscan_1) was submitted. Job <1421> is submitted to default queue <normal>.
[15:49:15.877] INFO - Started to wait for jobs by
bwait \
-w \
done(1421)
[15:50:07.854] INFO - Fill out the scatter gather result in the script /home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/runInterproscan/functionalAnalysis/runInterproscan
[15:50:07.855] ERROR - Failed to wait for job functionalAnalysis/runInterproscan <1415>, Failed to write file "/home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/runInterproscan/functionalAnalysis/runInterproscan": /home/user/maxim/output/a286910d-d3e3-41a4-b707-1e0a7654e4d4/functionalAnalysis/runInterproscan/functionalAnalysis/runInterproscan (No such file or directory)
[15:50:07.855] ERROR - The workflow (TranscriptsAnnotation-i5only-wf) exited with <255>.
[15:50:07.855] WARN - killing waiting job (functionalAnalysis/runInterproscan) <1415>.
[15:50:07.855] WARN - killing waiting job (functionalAnalysis/combineResults) <1418>.
We have a 3 step pipeline (map, foo, reduce) where map creates N files, foo transforms a file into another file, and reduce cats all files into one. When we scatter with CWLEXEC on foo, it only peforms foo on one file and delivers it (with success) to reduce even though there is a Java error involved:
16:28:26.595 default [pool-4-thread-2] ERROR c.i.s.c.e.u.outputs.OutputsCapturer - Fail to write scatter values
java.nio.file.FileAlreadyExistsException: /home/jmichael/CWLEXEC/FailToWriteScatterValuesError/workdir/a5efe1f2-2a7d-42d9-906b-049c154ffff2/foo/1.foo.txt
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.createFile(Files.java:632)
at com.ibm.spectrumcomputing.cwl.exec.util.outputs.OutputsCapturer.writeScatterValues(OutputsCapturer.java:459)
at com.ibm.spectrumcomputing.cwl.exec.util.outputs.OutputsCapturer.findScatterOutputValue(OutputsCapturer.java:444)
at com.ibm.spectrumcomputing.cwl.exec.util.outputs.OutputsCapturer.findScatterOuputValue(OutputsCapturer.java:262)
at com.ibm.spectrumcomputing.cwl.exec.util.outputs.OutputsCapturer.captureCommandOutputsByType(OutputsCapturer.java:181)
at com.ibm.spectrumcomputing.cwl.exec.util.outputs.OutputsCapturer.captureCommandOutputs(OutputsCapturer.java:94)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.captureStepOutputs(LSFBwaitExecutorTask.java:373)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.makeStepSuccessful(LSFBwaitExecutorTask.java:142)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.waitSteps(LSFBwaitExecutorTask.java:132)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.run(LSFBwaitExecutorTask.java:97)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
cwlexec reports the following and exits:
The field [SchemaType] is required by [type].
if a type definition is imported like that:
cwlVersion: v1.0
class: CommandLineTool
requirements:
- class: SchemaDefRequirement
types:
- $import: test_values.yaml
here is a simple test case for reproduction:
test.cwl.txt
test.yaml.txt
test_values.yaml.txt
call:
$ cwlexec test.cwl test.yaml
Hi,
We have a simple tool (attached) that performs echo
on an input, and accepts either a File or Directory. When we run it, it returns the error too many types for one paramter
. I've tested this with an array that is [File, string] or [Directory, string] or [string, int] and it seems to work, but the combination of [File, Directory] throws this error.
As far as I can tell from perusal of the cwlexec source and the description of its behavior here in the README:
bsub
).bwait
) and actually starts the user's job once all dependencies are completed (bresume
).(If I misunderstand, please correct me!).
LSF has built-in job dependency monitoring via bsub -w
. Why does cwlexec dynamically monitor dependency states instead of offloading the job to LSF?
As a note, this would have the side effect of permitting reasoning about the CWL job from the LSF side using bjdepinfo
, which might be useful in its own right. Unless bjdepinfo
already tracks dependencies listed by bwait
-- does it?
Hi,
We have a simple command line tool (attached) that takes a record
type as input with two strings, one for a file name and one for a directory name. Using InitialWorkDirRequirement
should set up the directory for use by the command, but it fails to evaluate the record in this context.
At line 202 in attached outfile.txt
:
09:23:29.529 default [main] DEBUG c.i.s.c.e.util.evaluator.JSEvaluator - Evaluated js expression "$(inputs.parameters.out_dir)" to A null object
However, it is able to parse the record properly for creating the base command, just not for the above step.
InitialWorkDirError.tar.gz
I am trying to scatter over 2 arrays of the same size using scatterMethod: flat_crossproduct
and CWLEXEC fails with:
com.ibm.spectrumcomputing.cwl.model.process.parameter.type.file.CWLFile cannot be cast to java.lang.CharSequence
This seems to happen when using any scatterMethod
. A failed run with debug info is attached.
For tools which require a flag before each item in an array we can use the methods described in The array-inputs tutorial. This works well for cwltool, but does not seem to get passed to the baseCommand in cwlexec.
Attached is an example command 'foo' which takes multiple --INPUT
files and cats them all to a single --OUTPUT
file. With cwltool it works but with cwlexec it does not pass any of the --INPUT
flags.
Complete support for CWL 1.0 as seen by the CWL Conformance test suite for cwlexec - https://ci.commonwl.org/job/cwlexec/
The cwlexec command cannot be executed concurrently by one user, this because cwlexec use HyperSQL (file: database) to record the workflow execution information. (the db file is in the $HOME/.cwlexec by default) , but HyperSQL file model doesn't support to write db concurrently. more information can be found from: http://hsqldb.org/doc/2.0/guide/running-chapt.html#N100CF
Hello,
Could you say a little about how this works with jsrun? I am working on the Summit supercomputer at ORNL. Has anyone run this on Summit?
Thanks.
We have a simple workflow which uses baseCommand: [awk, '{print $2}']
and the interpreted command does not keep the '
's. Instead, it interprets the baseCommand as "baseCommand" : [ "awk", "{print $2}" ],
(line 40 in the attached outfile.txt
) and attempts to execute awk {print $2}
(line 148) which fails.
BaseCommandError.tar.gz
Hi,
We have a simple example workflow (foo_wf.cwl
) that takes an optional string
array as input. It calls foo.cwl
which sets a default for the input array. This workflow works when given the string array as input, but when given no inputs it throws the following error in errfile.txt
:
com.ibm.spectrumcomputing.cwl.model.process.parameter.type.NullValue cannot be cast to java.util.List
This only happens at the workflow level with the optional array input. I believe since it is optional for the workflow, it should return null
as input to the step calling foo.cwl
where it then uses the default array for the command line tool input because the input is null. This is the behavior for a non-array optional input.
I have a simple workflow in which I have 2 inputs, one is type: File, and the other is an array of files. I want to run a command in which each of the files in the array are used against the single input file from input1.
When I run cwltool it performs as expected. It runs in serial each file in the array against the single input1.
When I run cwlexec it gives me no errors or message, and the job immediately terminates.
the tool is:
cwlVersion: v1.0
class: CommandLineTool
hints:
SoftwareRequirement:
packages:
bedtools:
version: [ "2.25.0" ]
inputs:
outputGenomeCov:
type: File
inputBinding:
position: 1
prefix: -a
regionsBedFile:
type: File
inputBinding:
position: 2
prefix: -b
allPositions:
type: string
default: "-c"
inputBinding:
position: 3
prefix: -c
outputs:
allDepthOutput:
type: File
outputBinding: {glob: $(inputs.regionsBedFile.basename)_AtoB.txt}
stdout: $(inputs.regionsBedFile.basename)_AtoB.txt
baseCommand: [bedtools, intersect]
And the workflow is:
cwlVersion: v1.0
class: Workflow
requirements:
- class: ScatterFeatureRequirement
inputs:
outputGenomeCov: File
regionsBedFile: File[]
outputs:
intersectAB:
type: File[]
outputSource: intersect/allDepthOutput
steps:
intersect:
run: 2_bedtoolsIntersect.cwl
scatter: regionsBedFile
in:
outputGenomeCov: outputGenomeCov
regionsBedFile: regionsBedFile
out: [allDepthOutput]
The .yml file:
outputGenomeCov:
class: File
path: /path/to/input.txt
regionsBedFile:
- {class: File, path: /path/to/bedfile1.bed}
- {class: File, path: /path/to/bedfile2.bed}
- {class: File, path: /path/to/bedfile3.bed}
I have another workflow that has designated output files created from the program, but bedtools prints its outputs to stdout. The other workflow works well, but this fails.
Thanks,
Dennis
Working on #34 again and I can now reproduce the same error in my larger workflow.
Step 1 is split_reads
which correctly scatters over the files now after the workaround proposed.
Step 2 scatters over those files and tries to generate a string for output_file
from one of the files generated as part of step 1. I am using the following JavaScript to generate this string and it works in cwltool
so I had assumed it was the correct approach:
output_file:
valueFrom: |
${
var s = inputs.R1_file.nameroot;
s = s.replace(".R1","");
return s + ".out";
}
However, the error I get with cwlexec is:
[var runtime={"tmpdir":"/home/jmichael/cwl-workdir/79cb5eaa-3438-497f-8be8-85fd9a5523c7","tmpdirSize":"15005232752754688","outdirSize":"15005232752754688","cores":"1","outdir":"/home/jmichael/cwl-workdir/79cb5eaa-3438-497f-8be8-85fd9a5523c7","ram":"1024"};, var inputs={"R
12:20:50.855 default [pool-5-thread-1] ERROR c.i.s.c.e.e.lsf.LSFBwaitExecutorTask - Failed to wait for job process_reads <66124407>, Failed to evaluate the expression "${
var s = inputs.R1_file.nameroot;
s = s.replace(".R1","");
return s + ".out";
}
": TypeError: Cannot read property "replace" from undefined in <eval> at line number 3
so it looks like it is not correctly using inputs.R1_file
. Am I using the correct approach here? It appears to be the same general issues as in #34 but I don't know that I can use the same workaround since I don't take anything in the inputs
section in the scatter so I can't use that as source
.
cwlexec reports the following and exits:
The variable type of the field [type] is not valid, "a valid CWL type" is required.
if the input port is defined like that:
inputs:
- id: applications
type:
type: array
items:
type: enum
name: applications
symbols:
- PfamA
- TIGRFAM
on the other hand cwltool & cwl-runner is accepting those type definitions.
The Directory specification requires a basename
attribute but this is currently being evaluated as a null object by cwlexec since it is not included in the fields.
The attached shows a simple example of attempting to evaluate a basename
of a directory where all required fields except basename
are included so cwlexec fails:
09:50:13.340 default [pool-4-thread-1] DEBUG c.i.s.c.e.util.evaluator.JSEvaluator - Evaluate js expression "$(inputs.out_dir.basename)" with context
[var inputs={"sample":"MySample","out_dir":{"location":"/research/rgs01/home/clusterHome/kbrown1/DirectoryBasenameError/outdir/MySample","path":"/home/kbrown1/DirectoryBasenameError/workdir/d9623834-8552-4554-b8b5-c58184d22730/MySample","srcPath":"/research/rgs01/home/clusterHome/kbrown1/DirectoryBasenameError/outdir/MySample","listing":[],"class":"Directory"}};]
09:50:13.353 default [pool-4-thread-1] DEBUG c.i.s.c.e.util.evaluator.JSEvaluator - Evaluated js expression "$(inputs.out_dir.basename)" to A null object
09:50:13.353 default [pool-4-thread-1] ERROR c.i.s.c.e.e.lsf.LSFBwaitExecutorTask - Failed to wait for job touch_sample <42464255>, null
09:50:13.354 default [pool-4-thread-1] ERROR c.i.s.c.e.e.lsf.LSFBwaitExecutorTask - The exception stacks:
java.lang.NullPointerException: null
at java.lang.String.replace(String.java:2240)
at com.ibm.spectrumcomputing.cwl.exec.util.evaluator.JSEvaluator.parsePlaceholder(JSEvaluator.java:136)
at com.ibm.spectrumcomputing.cwl.exec.util.evaluator.JSEvaluator.parseExpr(JSEvaluator.java:171)
at com.ibm.spectrumcomputing.cwl.exec.util.evaluator.JSEvaluator.evaluate(JSEvaluator.java:56)
at com.ibm.spectrumcomputing.cwl.exec.util.evaluator.CommandOutputBindingEvaluator.evalGlob(CommandOutputBindingEvaluator.java:64)
at com.ibm.spectrumcomputing.cwl.exec.util.outputs.OutputsCapturer.captureCommandOutputs(OutputsCapturer.java:92)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.captureStepOutputs(LSFBwaitExecutorTask.java:376)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.makeStepSuccessful(LSFBwaitExecutorTask.java:140)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.waitSteps(LSFBwaitExecutorTask.java:133)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBwaitExecutorTask.run(LSFBwaitExecutorTask.java:98)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Hi,
We were testing using the ResourceRequirement field in the CWL document and noticed that when using ramMin the sub command submits -R mem>coresMin
Looks like the error is likely simply fixed here by replacing coresMin with ramMin: https://github.com/IBMSpectrumComputing/cwlexec/blob/e3c19121ac9ec8db24f09c542931345a43bb4ef0/src/main/java/com/ibm/spectrumcomputing/cwl/exec/service/CWLLSFCommandServiceImpl.java#L177
Attached is a test case. Even though ramMin is set to 100, it uses coresMin (either the given value or null if it is not given where it produces an error).
Following up on #38 I find that a new error is produced in my actual workflow which I have now duplicated here. Specifically, it looks like the JS interpreter is passing values as []
and so a java IndexOutOfBoundsException
error is thrown when trying to evaluate the contents of the array. See lines 1739 and 1742 below (contained in cwlexec.out):
1739 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=output_file, type=string, value=[]) for process_reads
1740 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=threads, type=null, value=2) for process_reads
1741 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=K, type=null, value=NULL) for process_reads
1742 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=Y, type=null, value=[]) for process_reads
1743 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=k, type=null, value=NULL) for process_reads
1744 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=M, type=null, value=NULL) for process_reads
1745 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=R, type=null, value=NULL) for process_reads
1746 12:41:59.517 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=I, type=null, value=NULL) for process_reads
1747 12:41:59.518 default [pool-5-thread-1] DEBUG c.i.s.c.e.util.command.CommandUtil - Prepare input (id=fastq, type=File, value=File:/home/jmichael/cwl-workdir/b388f356-940f-4daf-9a99-c722488fc0d7/split_reads/scatter1/input1_R1.fastq.gz) for process_reads
On June 7th we ran the CWL conformance tests against 4ea1396 and there were 20 failures (same as before)
Today we ran the CWL conformance tests against the latest code 023b1b5 and there are 48 failures (28 more)
https://ci.commonwl.org/job/cwlexec/96/console
Newly failed tests
I have a simple foo.sh
script which is wrapped with foo.cwl
. foo_wf.cwl
is a workflow which scatters over foo.cwl
. When specifying a queue in a config file, all scatter jobs correctly hit that queue, but the final Scatter gather job
action is sent to my default queue, not the queue specified in my config file.
See attached for a fully reproducible example (aside from queues 'priority' and 'short' being specified).
Hi!
I met am issue when tested running several subworkflows in CWLEXEC. My pipeline works fine in CWL (cwltool) but fails in CWLEXEC.
The structure of pipeline is very simple:
step 1:
-- subworkflow 1:
------ copy file from input to another fille
step 2:
-- subworkflow 2:
------ grep the output of step 1 (by condition), output is stdout
------ copy result to another file
The error is:
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 0.02 sec.
Max Memory : -
Average Memory : -
Total Requested Memory : -
Delta Memory : -
Max Swap : -
Max Processes : -
Max Threads : -
Run time : 7 sec.
Turnaround time : 1 sec.
The output (if any) is above this job summary.
[13:32:02.086] INFO - Fill out commands in the script <path>/step-wf-2/step-subwf-1/step-wf-2_step-subwf-1:
grep 2 <command>
[13:32:02.090] INFO - Resuming job (step-wf-2/step-subwf-1) <1896579> with
bresume \
1896579
[13:32:02.236] INFO - Started to wait for jobs by
bwait \
-w \
done(1896579)
[13:32:04.773] INFO - The job (step-wf-2/step-subwf-1) <1896579> is done with stdout from LSF:
------------------------------------------------------------
Job <<path>/step-wf-2/step-subwf-1/step-wf-2_step-subwf-1> was submitted from host <host> by user <user> in cluster <cluster> at Wed Sep 11 13:32:00 2019
Job was executed on host(s) <host>, in queue <queue>, as user <user> in cluster <cluster> at Wed Sep 11 13:32:03 2019
<dirr> was used as the home directory.
<path/step-wf-2/step-subwf-1> was used as the working directory.
Started at Wed Sep 11 13:32:03 2019
Terminated at Wed Sep 11 13:32:03 2019
Results reported at Wed Sep 11 13:32:03 2019
------------------------------------------------------------
# LSBATCH: User input
path/step-wf-2/step-subwf-1/step-wf-2_step-subwf-1
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 0.02 sec.
Max Memory : -
Average Memory : -
Total Requested Memory : -
Delta Memory : -
Max Swap : -
Max Processes : -
Max Threads : -
Run time : 2 sec.
Turnaround time : 3 sec.
The output (if any) is above this job summary.
[13:32:04.837] ERROR - Failed to wait for job step-wf-2/step-subwf-2 <1896578>, null
[13:32:04.837] ERROR - The workflow (test-pipeline) exited with <255>.
[13:32:04.837] WARN - killing waiting job (step-wf-2/step-subwf-2) <1896578>.
I didn't meet this problem when run steps without subworkflows. But this case is very important for me because I use similar structure with more complicated workflows and tools.
All scripts attached in archive.
for_issue.zip
Thank you!
Kate
We do bump into this issue whenever we try to execute workflows in CWLEXEC with more than 20 steps, so 21 steps for instance. Also sometimes CWLEXEC hangs after it has reported the error. Could it be that sessions are not closed properly after each database transaction? We created a simple test workflow so it becomes easy to reproduce for you guys.
Here is the workflow:
test-workflow.zip
This is the command we are running:
cwlexec -debug -L -p -w <work-dir> -o <output-dir> test-workflow.cwl
CWLEXEC reports the following and exits or sometimes just hangs:
17:14:04.579 default [pool-3-thread-20] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_20) was submitted. Job <6076314> is submitted to default queue <research-rh74>.
17:14:04.579 default [pool-3-thread-16] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_16) was submitted. Job <6076305> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-12] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_12) was submitted. Job <6076312> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-14] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_14) was submitted. Job <6076313> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-21] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_21) was submitted. Job <6076319> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-18] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_18) was submitted. Job <6076322> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-17] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_17) was submitted. Job <6076317> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-13] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_13) was submitted. Job <6076321> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-2] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_2) was submitted. Job <6076316> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-5] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_5) was submitted. Job <6076310> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-1] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_1) was submitted. Job <6076324> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-15] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_15) was submitted. Job <6076318> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-4] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_4) was submitted. Job <6076320> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-19] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_19) was submitted. Job <6076307> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-3] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_3) was submitted. Job <6076325> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-7] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_7) was submitted. Job <6076306> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-6] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_6) was submitted. Job <6076315> is submitted to default queue <research-rh74>.
17:14:04.580 default [pool-3-thread-11] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_11) was submitted. Job <6076309> is submitted to default queue <research-rh74>.
17:14:04.581 default [pool-3-thread-10] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_10) was submitted. Job <6076311> is submitted to default queue <research-rh74>.
17:14:04.581 default [pool-3-thread-9] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_9) was submitted. Job <6076323> is submitted to default queue <research-rh74>.
17:14:04.584 default [pool-3-thread-8] INFO c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Job (touch_8) was submitted. Job <6076308> is submitted to default queue <research-rh74>.
17:14:04.737 default [pool-3-thread-6] ERROR c.i.s.c.e.e.lsf.LSFBsubExecutorTask - Failed to submit the step touch_6, The internal connection pool has reached its maximum size and no connection is currently available!
17:14:04.743 default [pool-3-thread-6] ERROR c.i.s.c.e.e.lsf.LSFBsubExecutorTask - The exception stacks:
org.hibernate.HibernateException: The internal connection pool has reached its maximum size and no connection is currently available!
at org.hibernate.engine.jdbc.connections.internal.PooledConnections.poll(PooledConnections.java:82)
at org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl.getConnection(DriverManagerConnectionProviderImpl.java:186)
at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:35)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:106)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:136)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getConnectionForTransactionManagement(LogicalConnectionManagedImpl.java:254)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.begin(LogicalConnectionManagedImpl.java:262)
at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.begin(JdbcResourceLocalTransactionCoordinatorImpl.java:214)
at org.hibernate.engine.transaction.internal.TransactionImpl.begin(TransactionImpl.java:56)
at org.hibernate.internal.AbstractSharedSessionContract.beginTransaction(AbstractSharedSessionContract.java:409)
at com.ibm.spectrumcomputing.cwl.exec.service.CWLInstanceService.updateCWLProcessInstance(CWLInstanceService.java:83)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBsubExecutorTask.runStep(LSFBsubExecutorTask.java:108)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFBsubExecutorTask.run(LSFBsubExecutorTask.java:56)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17:14:04.744 default [pool-3-thread-6] DEBUG c.i.s.c.e.e.lsf.LSFWorkflowRunner - broadcast event EXIT, touch_6
17:14:04.746 default [pool-3-thread-6] ERROR c.i.s.c.e.e.lsf.LSFWorkflowRunner - The workflow (test-wf) exited with <255>.
Here is a simple CWL script "int_to_array.cwl" to convert an int to an int array and a string array:
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: ExpressionTool
requirements:
inputs:
number:
type: int
label: a positive integer
outputs:
int_array:
type: int[]
str_array:
type: string[]
expression: |
${ var s_arr = [], i_arr = [];
for (var i = 0; i < inputs.number; i++) {
s_arr.push('hello' + i + '.txt');
i_arr.push(i);
}
return { "int_array": i_arr, "str_array": s_arr };
}
This works with cwltool. But does not work with cwlexec-0.2.2:
$ cwltool int_to_array.cwl int_to_array.yml
/research/rgs01/project_space/yu3grp/software_JY/yu3grp/conda_env/yulab_env/bin/cwltool 1.0.20190228155703
Resolved 'int_to_array.cwl' to 'file:///research/rgs01/home/clusterHome/lding/develop/cwl/practices/expression/int_to_array.cwl'
{
"int_array": [
0,
1,
2,
3
],
"str_array": [
"hello0.txt",
"hello1.txt",
"hello2.txt",
"hello3.txt"
]
}
Final process status is success
$ cwlexec int_to_array.cwl int_to_array.yml
[17:24:24.592] INFO - Workflow ID: 20fed44a-9f27-4797-b886-28846559711f
[17:24:24.593] INFO - Name: int_to_array
[17:24:24.593] INFO - Description file path: /research/rgs01/home/clusterHome/lding/develop/cwl/practices/expression/int_to_array.cwl
[17:24:24.594] INFO - Input settings file path: /research/rgs01/home/clusterHome/lding/develop/cwl/practices/expression/int_to_array.yml
[17:24:24.594] INFO - Output directory: /home/lding/cwl-workdir/20fed44a-9f27-4797-b886-28846559711f
[17:24:24.594] INFO - Work directory: /home/lding/cwl-workdir/20fed44a-9f27-4797-b886-28846559711f
[17:24:24.594] INFO - Workflow "int_to_array" started to execute.
[17:24:24.871] INFO - Job (int_to_array) was submitted. Job <78813896> is submitted to queue .
[17:24:29.446] ERROR - Failed to wait for job int_to_array <78813896>, java.lang.String cannot be cast to java.lang.Long
[17:24:29.446] ERROR - The job (int_to_array) exited.
Hi,
I have a simple ExpressionTool (attached) I wanted to test with the recently added feature. It takes in a directory as input and return's the directory's listing as a file array. However, I get an error when running this that states
09:54:43.736 default [pool-4-thread-1] ERROR c.i.s.c.e.e.lsf.LSFWorkflowRunner - Failed to capture output for job (directory_to_files): The file "/home/kbrown1/ExpressionToolDirectoryError/workdir/1bd21483-9530-48e9-bc2c-f7d129b428e9/2.tmp" cannot be accessed.
It also exits with exit code 0 instead of a non-zero exit code, but returns no output.
As far as I can tell it seems to be an issue with the input directory listing. I tested swapping the listing
attribute for basename
to simple return a string
of the directory's name and this worked successfully. It seems to know what outputs to collect, just can't actually collect them.
A simple workflow which requires a string runs even when no input is provided. cwltool
fails to run on the same example as expected.
[jmichael(BASH)@nodecn011]: cwlexec foo.cwl
[15:51:20.864] INFO - Workflow ID: ca52aa98-d283-4610-afde-b56a6b8e1ad9
[15:51:20.865] INFO - Name: foo
[15:51:20.865] INFO - Description file path: /research/rgs01/home/clusterHome/jmichael/cwlexec_bugs/non-optional-inputs/foo.cwl
[15:51:20.865] INFO - Output directory: /home/jmichael/cwl-workdir/ca52aa98-d283-4610-afde-b56a6b8e1ad9
[15:51:20.865] INFO - Work directory: /home/jmichael/cwl-workdir/ca52aa98-d283-4610-afde-b56a6b8e1ad9
[15:51:20.865] INFO - Workflow "foo" started to execute.
[15:51:20.870] INFO - Started job (foo) with
bsub \
-cwd \
/home/jmichael/cwl-workdir/ca52aa98-d283-4610-afde-b56a6b8e1ad9 \
-o \
%J_out \
-e \
%J_err \
-env \
TMPDIR=/home/jmichael/cwl-workdir/ca52aa98-d283-4610-afde-b56a6b8e1ad9 \
echo
[15:51:20.993] INFO - Job (foo) was submitted. Job <61886769> is submitted to queue <normal>.
[15:51:21.009] INFO - Started to wait for jobs by
bwait \
-w \
done(61886769)
[15:51:25.188] INFO - The job (foo) <61886769> is done with stdout from LSF:
{ }
[jmichael(BASH)@nodecn011]: echo $?
0
[jmichael(BASH)@nodecn011]: cat foo.cwl
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
baseCommand: echo
inputs:
foo:
type: string
inputBinding:
position: 1
outputs: []
[jmichael(BASH)@nodecn011]: cwltool foo.cwl
/hpcf/apps/python/install/3.5.2/bin/cwltool 1.0.20180525185854
Resolved 'foo.cwl' to 'file:///research/rgs01/home/clusterHome/jmichael/cwlexec_bugs/non-optional-inputs/foo.cwl'
usage: foo.cwl [-h] --foo FOO [job_order]
foo.cwl: error: argument --foo is required
[jmichael(BASH)@nodecn011]: cwlexec --version
0.2.0
[jmichael(BASH)@nodecn011]:
Test [89/128] Test file literal as input
Test failed: /home/jenkins/cwlexec-0.1/cwlexec --outdir=/tmp/tmpgf1k1pyg --quiet v1.0/cat3-tool.cwl v1.0/file-literal.yml
Test file literal as input
Returned non-zero
Failed to write file "/common-workflow-language-master/v1.0/v1.0/file1-78a66506": /common-workflow-language-master/v1.0/v1.0/file1-78a66506 (Permission denied)
Here the user doesn't have permissions to write to /common-workflow-language-master/v1.0/v1.0/
Currently only classes for CWL v1.2 have been generated, but I'd be happy to generate classes to parse and represent CWL v1.0 and v1.1 documents as well (that is not much work for me)
http://github.com/common-workflow-language/cwljava
Alternatively we add a helper method so that submitted documents would be automatically be upgraded to the latest CWL version
Not sure if I am doing anything wrong, but it works that way with the CWL tool description reference implementation. Any advice would be appreciated.
Here is the workflow and a YAML job description:
issue_43.zip
command:
$ unzip issue_43.zip
$ cd issue_43
$ cwlexec -X -p --workdir /home/user/output/ cmsearch-multimodel-wf.cwl jobs/cmsearch-multimodel-wf.test.job.yaml
cwlexec stops and reports the following:
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - The command input argument: 1000 for step cmsearch
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - The command input argument: 1000 for step cmsearch
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - The input (id=covariance_model_database, type=File, value=File:/home/user/maxim/output/cwlexec/7fd4f443-eab1-4fd5-b767-7a68add47c5d/cmsearch/tRNA5.c.cm) of step cmsearch
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - The command input argument: /home/user/maxim/output/cwlexec/7fd4f443-eab1-4fd5-b767-7a68add47c5d/cmsearch/tRNA5.c.cm for step cmsearch
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - The input (id=query_sequences, type=File, value=File:/home/user/maxim/output/cwlexec/7fd4f443-eab1-4fd5-b767-7a68add47c5d/cmsearch/mrum-genome.fa) of step cmsearch
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - The command input argument: /home/user/maxim/output/cwlexec/7fd4f443-eab1-4fd5-b767-7a68add47c5d/cmsearch/mrum-genome.fa for step cmsearch
16:13:44.227 default [main] DEBUG c.i.s.c.e.util.command.CommandUtil - Has Shell Command, build commands as:
[cmsearch, --tblout, mrum-genome.fa.cmsearch_matches.tbl, -o, mrum-genome.fa.cmsearch.out, --cpu, 1, --noali, --hmmonly, -Z, 1000, /home/user/maxim/output/cwlexec/7fd4f443-eab1-4fd5-b767-7a68add47c5d/cmsearch/tRNA5.c.cm, /home/user/maxim/output/cwlexec/7fd4f443-eab1-4fd5-b767-7a68add47c5d/cmsearch/mrum-genome.fa]
java.util.ArrayList cannot be cast to com.ibm.spectrumcomputing.cwl.model.process.parameter.type.file.CWLFile
16:13:44.235 default [Thread-3] DEBUG c.i.s.cwl.exec.CWLExec - Stop cwlexec...
16:13:44.236 default [Thread-3] DEBUG c.i.s.cwl.exec.CWLExec - cwlexec has been stopped
I have a CommandLineTool (foo.cwl), Workflow (inner.cwl) which calls foo.cwl, and another workflow (outter.cwl) which scatters over inner.cwl. Both foo.cwl and inner.cwl work as intended, but outter.cwl does not properly scatter over inner.cwl. That is, instead of scattering over the 'input_file' as it should, it simply calls 'foo' with all possible 'input_file's as input.
It should be:
foo --INPUT file1.txt
foo --INPUT file2.txt
but instead, it invokes foo
as
foo --INPUT file1.txt file2.txt
This may be related to issue 20 which I saw was recently moved to 'enhancement' rather than 'known issue'. In either case I'm hoping we will be able to scatter over subworkflows in this way as it will be very useful for our pipelines.
In the attached 'BadScatter.tar.gz' you will see that '01_foo.sh' gives the expected output, '02_inner.sh' gives the expected output, but '03_outer.sh' fails and that the invocation is a single call to foo
with all possible input files, though the intent is for them to be scatter over as described above.
We have a simple example where we are trying to concatenate english.txt
with french.txt
, german.txt
, and spanish.txt
via scatter using $(inputs.other_file.nameroot)
in the CommandLineTool being scattered. This works as desired in cwltool, but in CWLEXEC, we get no output files and the following information in debug mode (found also in the 'outfile.txt', attached):
16:15:47.191 default [main] DEBUG c.i.s.c.e.util.evaluator.JSEvaluator - Evaluated js expression "$(inputs.other_file.nameroot)" to A null object
Hi,
We have a simple example that takes a string as input and outputs two files with names based on the input string. When using glob to find both files based on the name pattern $(inputs.name)_1.txt
and $(inputs.name)_2.txt
, the glob seems to interpret these as literal strings rather than evaluating $(inputs.name)
in each case:
outfile.txt
182 "outputBinding" : {
183 "glob" : {
184 "patterns" : [ "$(inputs.name)_1.txt", "$(inputs.name)_2.txt" ],
then returns:
218 {
219 "out_file" : [ ]
220 }
If $(inputs.name)
is changed to the exact string, it works, but it should evaluate these for pattern matching.
We would like to run containers safely in a multi-user LSF cluster. Docker has many security issues stemming from dockerd being run as root. Singularity is an alternative that gains more and more popularity in Science.
In the foreseeable future we need a way to execute Singularity containers (and thus by Singularities features also Docker!) in our multi-user LSF cluster. Thus, we need a SingularityRequirement
analogous to the DockerRequirement
in e.g. cwtool.
Hi,
I have a simple example (attached) where the input to a workflow is an optional string. The command line tool has a default value for this input. When I use no outputs or a fixed output file name (as in #19), it works. If I modify the command line tool to now glob for $(inputs.foo)
to return a file matching the name of the input string, it evaluates the input to null
and fails to build the command in this instance. It should be evaluating the default input for the tool if the optional input for the workflow is not given.
When a step's input is from the previous step's cwl.output.json
, e.g. there are two steps (s1 and s2), the input of s2 is from the output of s1 and the s1 outputs a cwl.output.json
, it includes the s1 input filed, the cwlexec execute failed
the pre-exec script dockerOptions.sh
#!/bin/bash
for OPTION in $LSB_CONTAINER_OPTIONS
do
echo $OPTION
done
works fine with long options (--env=VAR=value
) but fails with short options (-e VAR=value
) due to the whitespace between option and value. Instead it should simply print out $LSB_CONTAINER_OPTIONS
as-is:
#!/bin/bash
echo "$LSB_CONTAINER_OPTIONS"
Hi,
We have a simple example workflow that seems to be passing array inputs without scattering them to lower level scripts
top_workflow.cwl
calls -> subworkflow.cwl
calls -> echocat.cwl
calls -> echocat.sh
which takes 3 inputs (string, file, file).
subworkflow.cwl
just has a single step which takes a string input and a File[] input and passes it to the command line tool. This works fine with CWLEXEC. When I use top_workflow.cwl
to scatter over an array of strings or an array of arrays of files, they do not get scattered, but instead passed directly to the command line tool, where it fails because the shell script cannot use it this way. The string array as a single string and the File array of arrays as a single array. Attached is the example and in the output.txt
file at line 646 the command is built incorrectly.
Command used:
./cwlexec /home/johnsoni/Innovation-Pipeline/workflows/QC/qc_workflow_wo_waltz.cwl ~/Innovation-Pipeline/test/workflows/EZ_QC_test.yaml
Error: Could not find or load main class com.ibm.spectrumcomputing.cwl.Application
I've downloaded and extracted the 0.2.2 release, is there any advice on this error?
Currently, cwlexec copies files from the work directories to the output directory (here if I am correct).
If possible, avoid copying output files. These files can be huge (e.g. we usually have files 100 GB, but they can be much bigger; this is common with human whole genome sequencing files) and copying is really a waste of space and time. While space may not be a problem, because copies can be deleted after processing, time may be more of a problem in a network-based storage with tight requirements for short processing times (e.g. for routine cancer diagnostics).
Alternatives are (at least on POSIX filesystems):
I am not sure what the standard says about it, but even if the standard says "do copy", for some of our workflows we'd rather drop CWL than accept copies.
It may be desirable to give the user the possibility between copying, symlinking or hardlinking. However, replacing a symlink by the pointed to file is a small problem. So symlinking as default seems to be a reasonable default.
For both linking approaches file ownership may be more of an issue, because the access rights are identical for all hard/softlinks to the same data.
The fix for #36 resolved that issue but now I am running into an issue when I try to redirect stdout to a file within the CommandLineTool being scattered over. I have modified that example to show that the first step seems to succeed, but the second step fails with
Failed to evaluate the expression "$(inputs.output_file)" in the [std] field, string is required.
Hi,
We have a workflow which has two steps. Each step calls a subworkflow that calls a command line tool. When the steps/subworkflows are independent of each other the run succeeds (two_input_workflow.cwl
), but when one step relies on output of another step (one_input_workflow.cwl
) it fails to resolve the workflow, giving the following error:
09:18:12.103 default [pool-2-thread-1] ERROR c.i.s.c.e.e.CWLInstanceSchedulerTask - Fail to run one_input_workflow (Failed to resolve the step (flow2) dependencies.)
09:18:12.105 default [pool-2-thread-1] ERROR c.i.s.c.e.e.CWLInstanceSchedulerTask - The exception stacks:
com.ibm.spectrumcomputing.cwl.model.exception.CWLException: Failed to resolve the step (flow2) dependencies.
at com.ibm.spectrumcomputing.cwl.exec.util.CWLStepBindingResolver.resolveStepInput(CWLStepBindingResolver.java:178)
at com.ibm.spectrumcomputing.cwl.exec.util.CWLStepBindingResolver.resolveStepInput(CWLStepBindingResolver.java:142)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFWorkflowStepRunner.prepareStepCommand(LSFWorkflowStepRunner.java:158)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFWorkflowStepRunner.resovleExpectDependencies(LSFWorkflowStepRunner.java:111)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFWorkflowStepRunner.<init>(LSFWorkflowStepRunner.java:65)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFWorkflowRunner.addSteps(LSFWorkflowRunner.java:272)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFWorkflowRunner.<init>(LSFWorkflowRunner.java:99)
at com.ibm.spectrumcomputing.cwl.exec.executor.lsf.LSFWorkflowRunner.runner(LSFWorkflowRunner.java:92)
at com.ibm.spectrumcomputing.cwl.exec.executor.CWLInstanceSchedulerTask.schedule(CWLInstanceSchedulerTask.java:76)
at com.ibm.spectrumcomputing.cwl.exec.executor.CWLInstanceSchedulerTask.run(CWLInstanceSchedulerTask.java:62)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The full output (1inp.out
) is attached along with both the failure and success example. This seems to happen with just subworkflows as far as we can tell.
After implementing the workaround in #34, I was able to get past that step and ran into a new problem at the next step in the workflow. In trying to rebuild a minimal example for this from scratch, I've run into new problems at the same step as #34 again.
My workflow is
Split a bam file into R1/R2 fastq files (simulating this with a 'split_reads.sh' file so that my examples are not dependent on external software. This is accomplished with the split_reads.cwl
CLT.
Scatter over multiple input files using scatter_split.cwl
. I was hoping the workaround in #34 would let me get past this part.
3+) Continue simulating my real workflow to reproduce the issues I'm seeing.
Step 2, above, is where I ran into issues on #34. In rebuilding a different workflow, I am getting new issues. Specifically, my CLT and scatter_split.cwl
workflow both work in cwltool, but the scatter_split.cwl
WF fails with CWLEXEC with the error Failed to bind value for [R1_file], The value cannot be found.
.
I've compared this workflow with the working flow from #34 and I think they are very similar, so I'm not sure why this one is failing. Is this an issue in CWLEXEC or my own code? The script 02_scatter_split_reads.sh
in the attached example should reproduce this issue.
In an attempt to overcome #20 (using the same general example as in #33) I have moved the scatter down to the lowest level. However, I found that when building a filename with valueFrom
inside the scatter that it returns undefined
.
steps:
foo:
run: foo.cwl
scatter: input_file
in:
input_file: input_files
output_filename:
valueFrom: ${return inputs.input_file.nameroot + ".out";}
out:
[output_file]
However, this is not reproducible in cwltool where the filename gets correctly built. This seems like a cwlexec specific issue, but it could also be that I am not using best practices when building a string from within a scatter.
I have tried to setup an LSF.conf for a workflow that looks like this:
{
"queue": "standard",
"steps": {
"step1": {
"rerunnable": false,
"res_req: "rusage[mem=20000]",
"num_processors": 4
}
}
}
And it fails, so I stepped into your code here:
https://github.com/IBMSpectrumComputing/cwlexec/blob/master/src/main/java/com/ibm/spectrumcomputing/cwl/model/conf/FlowExecConf.java
and found that you have nothing that handles the -n option for LSF which would distribute to multiple processors.
I'm tagging as a feature enhancement because without this, you can't make a job distributable without hard-coding it in the cwl source, which we don't want backend users to have to do.
It seems like you could do it with adding:
private int processors
...
public int getProcessors(){
return processors
}
/**
* Sets the LSF resource requirement (-n) option
*
* @param processors
* The LSF num_processors requirement option
*/
public void setProcessors(int processors) {
this.processors. = processors;
}
To the files FlowExecConf.java and StepExecConf.java.
I don't know if there is anything else that needs changing but this seems like a great feature to add.
If an argument valueFrom
expression is code fragment, e.g.
arguments:
- prefix: -c
valueFrom: |
import json
fileString = []
with open("$(inputs.inputFile.path)", "r") as inputFile:
for line in inputFile:
fileString.append(line)
with open("cwl.output.json", "w") as output:
json.dump({"fileString": fileString}, output)
cwlexec cannot be evaluated correctly
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.