encode-dcc / caper Goto Github PK

Cromwell/WDL wrapper for Python

License: MIT License

Python 93.77% Shell 6.23%

caper's Introduction

Introduction

Caper (Cromwell Assisted Pipeline ExecutoR) is a wrapper Python package for Cromwell. Caper wraps Cromwell to run pipelines on multiple platforms like GCP (Google Cloud Platform), AWS (Amazon Web Service) and HPCs like SLURM, SGE, PBS/Torque and LSF. It provides easier way of running Cromwell server/run modes by automatically composing necessary input files for Cromwell. Caper can run each task on a specified environment (Docker, Singularity or Conda). Also, Caper automatically localizes all files (keeping their directory structure) defined in your input JSON and command line according to the specified backend. For example, if your chosen backend is GCP and files in your input JSON are on S3 buckets (or even URLs) then Caper automatically transfers s3:// and http(s):// files to a specified gs:// bucket directory. Supported URIs are s3://, gs://, http(s):// and local absolute paths. You can use such URIs either in CLI and input JSON. Private URIs are also accessible if you authenticate using cloud platform CLIs like gcloud auth, aws configure and using ~/.netrc for URLs.

Installation for Google Cloud Platform and AWS

See this for details.

Installation for AWS

See this for details.

Installation for local computers and HPCs

Make sure that you have Java (>= 11) and Python>=3.6 installed on your system and pip to install Caper.
```
$ pip install caper
```
If you see an error message like caper: command not found after installing then add the following line to the bottom of ~/.bashrc and re-login.
```
export PATH=$PATH:~/.local/bin
```
Choose a backend from the following table and initialize Caper. This will create a default Caper configuration file ~/.caper/default.conf, which have only required parameters for each backend. caper init will also install Cromwell/Womtool JARs on ~/.caper/. Downloading those files can take up to 10 minutes. Once they are installed, Caper can completely work offline with local data files.

Backend Description

local local computer without a cluster engine

slurm SLURM (e.g. Stanford Sherlock and SCG)

sge Sun GridEngine

pbs PBS cluster

lsf LSF cluster

IMPORTANT: sherlock and scg backends have been deprecated. Use slurm backend instead and following instruction comments in the configuration file.
```
$ caper init [BACKEND]
```
Edit ~/.caper/default.conf and follow instructions in there. CAREFULLY READ INSTRUCTION AND DO NOT LEAVE IMPORTANT PARAMETERS UNDEFINED OR CAPER WILL NOT WORK CORRECTLY

Backend	Description
local	local computer without a cluster engine
slurm	SLURM (e.g. Stanford Sherlock and SCG)
sge	Sun GridEngine
pbs	PBS cluster
lsf	LSF cluster

Docker, Singularity and Conda

For local backends (local, slurm, sge, pbs and lsf), you can use --docker, --singularity or --conda to run WDL tasks in a pipeline within one of these environment. For example, caper run ... --singularity docker://ubuntu:latest will run each task within a Singularity image built from a docker image ubuntu:latest. These parameters can also be used as flags. If used as a flag, Caper will try to find a default docker/singularity/conda in WDL. e.g. All ENCODE pipelines have default docker/singularity images defined within WDL's meta section (under key caper_docker or default_docker).

IMPORTANT: Docker/singularity/conda defined in Caper's configuration file or in CLI (--docker, --singularity and --conda) will be overriden by those defined in WDL task's runtime. We provide these parameters to define default/base environment for a pipeline, not to override on WDL task's runtime.

For Conda users, make sure that you have installed pipeline's Conda environments before running pipelines. Caper only knows Conda environment's name. You don't need to activate any Conda environment before running a pipeline since Caper will internally run conda run -n ENV_NAME TASK_SHELL_SCRIPT for each task.

Take a look at the following examples:

$ caper run test.wdl --docker # can be used as a flag too, Caper will find a default docker image in WDL if defined
$ caper run test.wdl --singularity docker://ubuntu:latest # define default singularity image in the command line
$ caper hpc submit test.wdl --singularity --leader-job-name test1 # submit to job engine and use singularity defined in WDL
$ caper submit test.wdl --conda your_conda_env_name # running caper server is required

An environemnt defined here will be overriden by those defined in WDL task's runtime. Therefore, think of this as a base/default environment for your pipeline. You can define per-task docker, singularity images to override those defined in Caper's command line. For example:

task my_task {
	...
	runtime {
		docker: "ubuntu:latest"
		singularity: "docker://ubuntu:latest"
	}
}

For cloud backends (gcp and aws), Caper will automatically try to find a base docker image defined in your WDL. For other pipelines, define a base docker image in Caper's CLI or directly in each WDL task's runtime.

Running pipelines on HPCs

Use --singularity or --conda in CLI to run a pipeline inside Singularity image or Conda environment. Most HPCs do not allow docker. For example, caper hpc submit ... --singularity will submit Caper process to the job engine as a leader job. Then Caper's leader job will submit its child jobs to the job engine so that both leader and child jobs can be found with squeue or qstat.

Use caper hpc list to list all leader jobs. Use caper hpc abort JOB_ID to abort a running leader job. DO NOT DIRECTLY CANCEL A JOB USING CLUSTER COMMAND LIKE SCANCEL OR QDEL then only your leader job will be canceled, not all the child jobs.

Here are some example command lines to submit Caper as a leader job. Make sure that you correctly configured Caper with caper init and filled all parameters in the conf file ~/.caper/default.conf.

There is an extra set of parameters --file-db [METADATA_DB_PATH_FOR_CALL_CACHING] to use call-caching (restarting workflows by re-using previous outputs). If you want to restart a failed workflow then use the same metadata DB path then pipeline will start from where it left off. It will actually start over but will reuse (soft-link) previous outputs.

# make a new output directory for a workflow.
$ cd [OUTPUT_DIR]

# Example with Singularity without using call-caching.
$ caper hpc submit [WDL] -i [INPUT_JSON] --singularity --leader-job-name GOOD_NAME1

# Example with Conda and using call-caching (restarting a workflow from where it left off)
# Use the same --file-db PATH for next re-run then Caper will collect and softlink previous outputs.
# If you see any DB connection error then replace it with "--db in-memory" then call-cahing will be disabled
$ caper hpc submit [WDL] -i [INPUT_JSON] --conda --leader-job-name GOOD_NAME2 --file-db [METADATA_DB_PATH]

# List all leader jobs.
$ caper hpc list

# Check leader job's STDOUT file to monitor workflow's status.
# Example for SLURM
$ tail -f slurm-[JOB_ID].out

# Cromwell's log will be written to cromwell.out* on the same directory.
# It will be helpful for monitoring your workflow in detail.
$ ls -l cromwell.out*

# Abort a leader job (this will cascade-kill all its child jobs)
# If you directly use job engine's command like scancel or qdel then child jobs will still remain running.
$ caper hpc abort [JOB_ID]

Restarting a pipeline on local machine (and HPCs)

Caper uses Cromwell's call-caching to restart a pipeline from where it left off. Such database is automatically generated on local_out_dir defined in the configuration file ~/.caper/default.conf. The DB file name is simply consist of WDL's basename and input JSON file's basename so you can simply run the same caper run command line on the same working directory to restart a workflow.

# for standalone/client
$ caper run ... --db in-memory

# for server
$ caper server ... --db in-memory

DB connection timeout

If you see any DB connection timeout error, that means you have multiple caper/Cromwell processes trying to connect to the same file DB. Check any running Cromwell processes with ps aux | grep cromwell and close them with kill PID. If that does not fix the problem, then use caper run ... --db in-memory to disable Cromwell's metadata DB. You will not be able to use call-caching.

Customize resource parameters on HPCs

If default settings of Caper does not work with your HPC, then see this document to manually customize resource command line (e.g. sbatch ... [YOUR_CUSTOM_PARAMETER]) for your chosen backend.

DETAILS

See details.

caper's People

Contributors

Stargazers

Watchers

Forkers

guma44 firdaaminy baharehh barbarak17 procha2 ambeys dfeinzeig jonahcullen nanguage henrycwong j-stat arnuld-acog rairuhi asmh1989 wangdi2016 qinqian ethanmatei mihirsamdarshi

caper's Issues

Running caper without --ntasks-per-node

I am using caper to run the ATAC-seq pipeline on a slurm cluster. The partition on the cluster I am using does not allow me to specify --ntasks-per-node, but caper seems to add this as an option to the cromwell jobs it submits. Is there a way to stop caper from doing this?

Thanks so much!

Running pipeline on Stanford Sherlock

Hi,
After installing and initializing caper on sherlock platform. I would like to start running the pipeline on sherlock using my raw sequencing data samples (fq files: cleaned and are ready for the alignment) and already transferred to sherlock.
Is there any guide/instructions to follow if someone did use the pipeline on sherlock before. This will be very helpful since I have not used this before.
Many thanks in advance

Add -v/--version argument

Would be nice to be able to see what version is installed.

-bash: inspectXeption: line 0: syntax error near unexpected token `;'

Hi Jin,

Apart from my earlier question (#30) how to deal the issue that backend version ran for 24hrs before it crashed because of excessive job submission, I also found that even before that there are errors in the call-read-genome-tsv step:

$ find .|grep stderr|xargs ls -lS|head -n20
-rw------- 1 user group 1215 Nov  1 23:42 ./atac/21385ee3-a703-4a97-8a87-855e8941fa03/call-read_genome_tsv/execution/stderr
-rw-r--r-- 1 user group    0 Nov  2 01:23 ./atac/21385ee3-a703-4a97-8a87-855e8941fa03/call-align_mito/shard-0/execution/stderr.check
-rw-r--r-- 1 user group    0 Nov  1 23:37 ./atac/21385ee3-a703-4a97-8a87-855e8941fa03/call-align_mito/shard-0/execution/stderr.submit
-rw-r--r-- 1 user group    0 Nov  2 01:24 ./atac/21385ee3-a703-4a97-8a87-855e8941fa03/call-align_mito/shard-1/execution/stderr.check
-rw-r--r-- 1 user group    0 Nov  1 23:37 ./atac/21385ee3-a703-4a97-8a87-855e8941fa03/call-align_mito/shard-1/execution/stderr.submit
-rw-r--r-- 1 user group    0 Nov  2 01:25 ./atac/21385ee3-a703-4a97-8a87-855e8941fa03/call-align_mito/shard-2/execution/stderr.check
-rw-r--r-- 1 user group    0 Nov  1 23:37 ./atac/21385ee3-a703-4a97-8a87-855e8941fa03/call-align_mito/shard-2/execution/stderr.submit

The first stderr file above has non-zero size.
Below is the contents of the first stderr file above:

$ cat ./atac/21385ee3-a703-4a97-8a87-855e8941fa03/call-read_genome_tsv/execution/stderr
-bash: inspectXeption: line 0: syntax error near unexpected token `;'
-bash: inspectXeption: line 0: `inspectXeption () {  echo "***checking exceptions***"; for i in `ls *reeting*`; do;  cnt=`cat $i|grep ception|wc -l`; if [ $cnt -gt 0 ]; then;  echo $i '       ' $cnt; fi; done; echo ""; echo "***checking unfinished jobs***"; for i in `ls *reeting*`; do;  cnt=`cat $i|grep "Total time"|wc -l`; if [ $cnt -eq 0 ]; then;  echo $i '    ' $cnt; fi; done; }'
-bash: error importing function definition for `BASH_FUNC_inspectXeption'
-bash: waitqueue: line 0: syntax error near unexpected token `;'
-bash: waitqueue: line 0: `waitqueue () {  while true; do;  sleep 10; cnt1=`qstat -rn1|grep user|wc -l`; if [ $cnt1 -gt 0 ]; then;  echo $cnt1; else;  break; fi; done; }'
-bash: error importing function definition for `BASH_FUNC_waitqueue'
-bash: sedline: line 0: syntax error near unexpected token `;'
-bash: sedline: line 0: `sedline () {  for i in $(seq 1 10); do;  sed -i 's/  / /g' $1; done; sed -i 's/^ //g' $1; sed -i ':a;N;$!ba;s/\n /\n/g' $1; sed -i ':a;N;$!ba;s/ \n/\n/g' $1; sed -i 's/ /\t/g' $1; sed -i 's/\tCHR/CHR/g' $1; sed -i 's/^\t//g' $1; }'
-bash: error importing function definition for `BASH_FUNC_sedline'

This stderr was not found in my successfully finished run of your pipeline for individual bio-reps on local/non-backend CPUs.

What might be wrong? Is it a bug too?
Thanks.

Best
Chan

Singularity backend

Hi,

Is there any particular reason to use singularity runtime attributes as opposed to submit-docker as in Cromwell docs (https://cromwell.readthedocs.io/en/stable/tutorials/Containers/)? I am asking because I am developing workflows and I am wandering which is the way to go and why. What would happen if I have different images in tasks?

Best,
Rafal

backend.conf parsing error

Hi,

I'm trying to run the ATAC-seq pipeline with a local backend.

conda activate encode-atac-seq-pipeline
caper init local
caper run /path/to/atac.wdl -i /path/to/json

This fails with the following error:

[...]
2020-04-12 17:27:13,140|caper|INFO| Validating WDL/input JSON with womtool...
Success!
2020-04-12 17:27:17,474|caper|INFO| cmd: ['java', '-Xmx3G', '-XX:ParallelGCThreads=1', '-DLOG_LEVEL=INFO', '-DLOG_MODE=standard', '-jar', '-Dconfig.file=/data1/dwesche/data/ATAC_Hox/encode/test/.caper_tmp/atac/20200412_172711_438485/backend.conf', '/home/newmanlab/dwesche/.caper/cromwell_jar/cromwell-47.jar', 'run', '/data1/dwesche/programs/atac-seq-pipeline/atac.wdl', '-i', '/data1/dwesche/data/ATAC_Hox/encode/test/.caper_tmp/data1/dwesche/data/ATAC_Hox/encode/scripts/test.local.json', '-o', '/data1/dwesche/data/ATAC_Hox/encode/test/.caper_tmp/atac/20200412_172711_438485/workflow_opts.json', '-l', '/data1/dwesche/data/ATAC_Hox/encode/test/.caper_tmp/atac/20200412_172711_438485/labels.json', '-m', '/data1/dwesche/data/ATAC_Hox/encode/test/.caper_tmp/atac/20200412_172711_438485/metadata.json']
Exception in thread "main" java.lang.ExceptionInInitializerError
	at cromwell.CromwellApp$.runCromwell(CromwellApp.scala:14)
[...]
Caused by: com.typesafe.config.ConfigExceptions$Parse: /data1/dwesche/data/ATAC_Hox/encode/test/.caper_tmp/atac/20200412_172711_438485/backend.conf: 62: 
Expecting a value but got wrong token: 'd' (backslash followed by 'd', this is not a valid escape
 sequence (quoted strings use JSON escaping, so use double backslash \\ for literal backslash))
[...]

When I check the automatically generated backend.conf, the referenced line says indeed
job-id-regex = "Submitted batch job (\d+).*"
Happens with both the test and my own .json

The sample backend.conf in the caper documentation has a double backslash there, so I tried editing the backend.conf and adding the missing backslash, then running
caper run /path/to/atac.wdl -i /path/to/json --backend-file backend.edited.conf
But that seems to create another backend.conf with the same missing \ and fails with the same error message.

Additional info:
conda=miniconda3/4.7.12
caper=0.8.2
pipeline=latest
default.conf:

backend=local
tmp-dir=
cromwell=/home/newmanlab/dwesche/.caper/cromwell_jar/cromwell-47.jar
womtool=/home/newmanlab/dwesche/.caper/womtool_jar/womtool-47.jar

Please advise how to fix this. Thanks!

I 'pip install caper', but cannot run '$caper'

Hi,

I follow your readme, and run this line on our server:
python3 -m pip install caper --user

Then I checked that caper can indeed be loaded:

$ python3
Python 3.6.4 (default, May 28 2019, 12:12:54)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import caper
>>>

but i cannot run caper under the $ sign:

$ caper
bash: caper: command not found...

In your README landing page, you clearly say that installation is either by pip or by git clone, why can't I run $caper after I chose the first way which is by pip?

[The git clone method works for me though, but using it to run your atac-pipeline produced tons of errors, pointing to dependencies not installed or loaded. See https://github.com/ENCODE-DCC/atac-seq-pipeline/issues/149. Thus i tried to see if pip installation of caper will resolve the issue]

We chose your software because we liked the big name 'ENCODE' on it. But your software is certainly not easy to use. Very painful sometimes. Can you pls kindly help. Thanks!

caper Cannot Connect to Server

I am using caper to submit the latest ATAC-seq pipeline, and I am getting the following error:
[CaperURI] copying from url to local, src: https://github.com/broadinstitute/cromwell/releases/download/47/cromwell-47.jar
Traceback (most recent call last):
File "/home/ikaplow/miniconda3/envs/encode-atac-seq-pipeline/bin/caper", line 13, in
main()
File "/home/ikaplow/miniconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/caper.py", line 1267, in main
c.run()
File "/home/ikaplow/miniconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/caper.py", line 213, in run
self.__download_cromwell_jar(), 'run',
File "/home/ikaplow/miniconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/caper.py", line 591, in __download_cromwell_jar
return cu.copy(target_uri=path)
File "/home/ikaplow/miniconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/caper_uri.py", line 445, in copy
ignored_http_err=(416,))
File "/home/ikaplow/miniconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/caper_uri.py", line 988, in __curl_auto_auth
rc, http_err, stderr))
Exception: cURL RC: 7, HTTP_ERR: 0, STDERR: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
^M 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Couldn't connect to server

I am running caper from a login node. Do you know how I might be able to solve this problem? Thanks!

slurmstepd: error: * JOB CANCELLED DUE TO TIME LIMIT *

Hello,

I am trying to run a WDL workflow using caper and the scheduler SLURM. I have noticed that my "children" jobs (named cromwell) stop after around 15 minutes preventing the entire workflow to finish all the process. The main job is still running until a point in which it stops showing that the workflow failed. Why is this happening? Is this due to the time value in the backend.conf file? (time=24 in the content of the backend.conf see below).

 slurm {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
	default-runtime-attributes {
          time = 24
          slurm_partition = "long"
          slurm_account = "jgu-cbdm"
        }
	script-epilogue = "sleep 10 && sync"
        concurrent-job-limit = 1000
        runtime-attributes = """
	String? docker
        String? docker_user
        Int cpu = 1
        Int? gpu
        Int? time
        Int? memory_mb
        String? slurm_partition
        String? slurm_account
        String? slurm_extra_param
        String? singularity
        String? singularity_bindpath
        String? singularity_cachedir

I tried to provide a different amount of time while running the workflow such as adding the option -t with 120 hours as follow:

sbatch -p long -A jgu-cbdm -J test_caper_scMeth -t 120:00:00 --export=ALL --mem 16G --wrap "caper run scMeth.wdl -i scMeth.json"

but this did not help. How could I modify the time=24? Is 24 minutes or hours? If this is not the problem how can I solve this issue? Though my jobs are killed after running for 15 minutes..

I have investigated what might be the problem consulting this post:

https://stackoverflow.com/questions/34653226/job-unexpectedly-cancelled-due-to-time-limit

Basically what I am noticing is that the jobs killed after 15 minutes are only the ones named "Cromwell". The main job is not killed. Is it possible that the "Cromwell" jobs run with other time parameters? Looking at the partitions of my cluster, for the "long" partition by default there are only 15 minutes available in the DefaultTime parameter. Check here when I type:

scontrol show partitions long

I have this:

PartitionName=long
   AllowGroups=ALL DenyAccounts=none AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=long_limits
   DefaultTime=00:15:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=1 MaxTime=5-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=63
   Nodes=a[0004-0018,0022-0263,0265-0287,0289-0338,0340-0360,0362-0533,0535-0550,0552-0555]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=34752 TotalNodes=543 SelectTypeParameters=NONE
   DefMemPerCPU=300 MaxMemPerNode=UNLIMITED
   TRESBillingWeights=cpu=1.0,mem=0.25G

It should be this the problem. Jobs are run with the default parameters above. Any idea how to prevent this?

Thanks in advance for any reply
Tommaso

Fail to define tmp directory

Hi, Jin. Thank you for previous help of the PBS cluster issue. Now I found there's another problem regarding the temporary directory.

I edited the "tmp-dir =" option in the file ~/.caper/default.conf, but the large files are still stored in my home directory ~/atac .

Can you show the correct way to define a tmp directory?

Thank you!

Womtool validation doesn't raise an exception

Submission with CaperClientSubmit returns false when the Womtool validation fails. Is it possible to raise an error exception with a message indicating why the validation failed?

respect environment variables in conf file

$ caper --version
0.6.3

If I set my conf file like so, then caper uses the appropriate directory and all is right with the world.

$ cat ~/.caper/default.conf
backend=slurm

# define one of the followings (or both) according to your
# cluster's SLURM configuration.
slurm-partition=my_partition
slurm-account=

# DO NOT use /tmp here
# Caper stores all important temp files and cached big data files here
tmp-dir="/home/some_user/caper_test"

But if I try to accomplish the same thing with an environment variable caper appears to get confused and try to use my $PWD instead?

$ export MY_DIR=caper_test

$ cat ~/.caper/default.conf
backend=slurm

# define one of the followings (or both) according to your
# cluster's SLURM configuration.
slurm-partition=my_partition
slurm-account=

# DO NOT use /tmp here
# Caper stores all important temp files and cached big data files here
tmp-dir="/home/some_user/${MY_DIR}"

This is a problem. On our system we set up temporary directories on a per-job basis that are supposed to be accessed by environment variables.

Thanks for looking at this! 😸

sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified

Hello,

i am trying to use caper to run my WDL workflow into a slurm cluster. I have installed succesfully caper and created the configuration file. Now I am modifying its contents:

backend=slurm

# define one of the followings (or both) according to your
# cluster's SLURM configuration.
slurm-partition=andrade
slurm-account=jgu-cbdm
tmp-dir=

In slurm-partition i have added what is defined by slurm as:
#SBATCH -A

In slurm-account I have added what is defined by slurm as:
#SBATCH -p

when I try to test it:
sbatch -p andrade -J test_caper_scMeth --export=ALL --mem 16G -t 4-0 --wrap "caper run scMeth.wdl -i scMeth.json"

I have this error:
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified

What is the problem? I am sure that the slurm partition "andrade" and the account "jgu-cbdm" are correct since I have used them for a while.

Any help on this?
Thanks in advance

Do I use the caper or scg version of reference?

I'm wondering if I should point my reference to the caper links or the scg links in my input.json file for the atac-seq pipeline

Error while running Caper with Singularity

I am trying this command:

sbatch -p download -J atac_submit --export=ALL --mem 3G -t 4-0 --wrap "caper run atac.wdl -c caper.conf --cromwell cromwell-42.jar --womtool womtool-52.jar -i ENCSR356KRQ_subsampled.json --no-build-singularity --singularity encode-atac-pipeline.sif"

But I have this error:

* Error: pipeline dependencies not found.
Conda users: Did you activate Conda environment (conda activate encode-atac-seq-pipeline)?
    Or did you install Conda and environment correctly (bash scripts/install_conda_env.sh)?
GCP/AWS/Docker users: Did you add --docker flag to Caper command line arg?
Singularity users: Did you add --singularity flag to Caper command line arg?

When I use this command without giving any image:
sbatch -p download -J atac_submit --export=ALL --mem 3G -t 4-0 --wrap "caper run atac.wdl -c caper.conf --cromwell cromwell-42.jar --womtool womtool-52.jar -i ENCSR356KRQ_subsampled.json --singularity"

I have this error:

ESC[31mFATAL:  ESC[0m Unable to handle docker://encodedcc/atac-seq-pipeline:v1.9
.0 uri: unable to create new build: while searching for mksquashfs: exec: "mksquashfs": executable file not found in $PATH

Would you please let me know what is wrong here in both ways? And how can I fix this? I'll be very thankful for any suggestions.

Default -s/--str-label to -i/--inputs basename

I often find I'm repeating the same string in -s and the input JSON filename (and sometimes forgetting to make them the same when they should be). A nice default (perhaps better than None) for -s/--str-label might be the basename of the input JSON.

how to limit the backend from excessively submitting jobs?

Hi Jin,

Thanks for prev help. The backend ran for 24hrs before it crashes again. The reason seems to be that a large number of jobs were submitted, beyond my account limit (max 10 jobs running and queuing), prompting the pipe to break.

Below are last messages from the pipe:

[2019-10-30 14:11:07,41] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2019-10-30 14:11:07,41] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2019-10-30 14:11:07,41] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2019-10-30 14:11:07,41] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2019-10-30 14:11:07,41] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2019-10-30 14:11:07,41] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2019-10-30 14:11:07,41] [info] SubWorkflowStoreActor stopped
[2019-10-30 14:11:07,41] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2019-10-30 14:11:07,41] [info] JobStoreActor stopped
[2019-10-30 14:11:07,41] [info] CallCacheWriteActor stopped
[2019-10-30 14:11:07,41] [info] KvWriteActor Shutting down: 0 queued messages to process
[2019-10-30 14:11:07,41] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2019-10-30 14:11:07,41] [info] IoProxy stopped
[2019-10-30 14:11:07,41] [info] ServiceRegistryActor stopped
[2019-10-30 14:11:07,42] [info] DockerHashActor stopped
[2019-10-30 14:11:07,43] [info] Database closed
[2019-10-30 14:11:07,43] [info] Stream materializer shut down
[2019-10-30 14:11:07,43] [info] WDL HTTP import resolver closed
[CaperURI] write to local, target: /project1/atac/ec11e753-3974-4682-a20c-04e00e3654c2/metadata.json, size: 239367
[Caper] troubleshooting ec11e753-3974-4682-a20c-04e00e3654c2 ...
Found failures:
[
    {
        "message": "Workflow failed",
        "causedBy": [
            {
                "causedBy": [],
                "message": "Unable to start job. Check the stderr file for possible errors: /project1/atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito
            },
            {
                "causedBy": [],
                "message": "Unable to start job. Check the stderr file for possible errors: /project1/atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito
            },
            {
                "causedBy": [],
                "message": "Unable to start job. Check the stderr file for possible errors: /project1/atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito
            },
            {
                "causedBy": [],
                "message": "Unable to start job. Check the stderr file for possible errors: /project1/atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito
            },
            {
                "message": "Unable to start job. Check the stderr file for possible errors: /project1/atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito
                "causedBy": []
            }
        ]
    }
]

atac.align_mito Failed. SHARD_IDX=0, RC=None, JOB_ID=None, RUN_START=2019-10-31T13:23:09.912Z, RUN_END=2019-10-31T13:26:56.918Z, STDOUT=/project1/atac/ec11e753

atac.align_mito Failed. SHARD_IDX=1, RC=None, JOB_ID=None, RUN_START=2019-10-31T13:23:13.914Z, RUN_END=2019-10-31T13:27:04.331Z, STDOUT=/project1/atac/ec11e753

atac.align_mito Failed. SHARD_IDX=2, RC=None, JOB_ID=None, RUN_START=2019-10-31T13:23:15.907Z, RUN_END=2019-10-31T13:26:06.704Z, STDOUT=/project1/atac/ec11e753

atac.align_mito Failed. SHARD_IDX=3, RC=None, JOB_ID=None, RUN_START=2019-10-31T13:23:11.904Z, RUN_END=2019-10-31T13:27:38.640Z, STDOUT=/project1/atac/ec11e753

atac.align_mito Failed. SHARD_IDX=5, RC=None, JOB_ID=None, RUN_START=2019-10-31T13:23:07.906Z, RUN_END=2019-10-31T13:27:09.525Z, STDOUT=/project1/atac/ec11e753
[Caper] run:  1 ec11e753-3974-4682-a20c-04e00e3654c2 /project1/atac/ec11e753-3974-4682-a20c-04e00e3654c2/metadata.json

So I find all stderrs:

user@hpc$ find .|grep stderr|xargs ls -Sl
-rw-r--r-- 1 user group 50 Oct 30 21:23 ./atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito/shard-0/execution/stderr.submit
-rw-r--r-- 1 user group 50 Oct 30 21:23 ./atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito/shard-1/execution/stderr.submit
-rw-r--r-- 1 user group 50 Oct 30 21:23 ./atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito/shard-2/execution/stderr.submit
-rw-r--r-- 1 user group 50 Oct 30 21:23 ./atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito/shard-3/execution/stderr.submit
-rw-r--r-- 1 user group  0 Oct  31 06:23 ./atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito/shard-4/execution/stderr
-rw-r--r-- 1 user group  0 Oct  31 10:42 ./atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito/shard-4/execution/stderr.check
-rw-r--r-- 1 user group  0 Oct 30 21:23 ./atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito/shard-4/execution/stderr.submit

The contents of the stderrs:

user@hpc$ cat ./atac/ec11e753-3974-4682-a20c-04e00e3654c2/call-align_mito/shard-?/execution/stderr.submit
qsub: would exceed queue generic's per-user limit
qsub: would exceed queue generic's per-user limit
qsub: would exceed queue generic's per-user limit
qsub: would exceed queue generic's per-user limit
qsub: would exceed queue generic's per-user limit

The apparent cause is that many jobs were submitted by the backend under my name to our HPC, where i don't have as many quota.

May I know if there is a way to limit this?

Thanks

Best
Chan

Running caper within a docker container

Is it possible to run caper inside of a docker container while using the --docker flag? I know a docker within a docker container isn't a good idea, but I was hoping to spin up a sibling instance.
This is because my university's compute cluster requires everyone to run everything within a docker container.

Can caper server use a custom Cromwell conf file?

To Whom It May Concern,

I'm trying to run caper server with a Cromwell conf file I wrote by myself with the following command on a GCP instance:

caper server --port 8080 --backend-file gcp.conf

However, it gives the following error:

Traceback (most recent call last):
  File "/home/yyang/caper/bin/caper", line 13, in <module>
    main()
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/cli.py", line 675, in main
    return runner(parsed_args, nonblocking_server=nonblocking_server)
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/cli.py", line 229, in runner
    return subcmd_server(c, args, nonblocking=nonblocking_server)
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/cli.py", line 331, in subcmd_server
    thread = caper_runner.server(fileobj_stdout=f, **args_from_cli)
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/caper_runner.py", line 519, in server
    custom_backend_conf=custom_backend_conf,
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/caper_backend_conf.py", line 362, in create_file
    hocon_s.merge(s, update=True)
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/hocon_string.py", line 165, in merge
    d = HOCONString(b).to_dict()
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/hocon_string.py", line 149, in to_dict
    return json.loads(j)
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 29 column 36 (char 820)

Is it because my specification in gcp.conf is incorrect? But I was able to directly run Cromwell with it in the same instance:

java -Dconfig.file=gcp.conf -jar cromwell-59.jar run star-solo.wdl -i starsolo_inputs.json

Could you please give me a hand on how to properly fit it into Caper? I've attached my gcp.conf right below. Thanks!

include required(classpath("application"))

google {
    application-name = "cromwell"
    auths = [
        {
            name = "application-default"
            scheme = "application_default"
        }
    ]
}

engine {
    filesystems {
        gcs {
            auth = "application-default"
            project = "mgh-lilab-archive"
        }
    }
}

backend {
    default = PAPIv2

    providers {
        PAPIv2 {
            actor-factory = "cromwell.backend.google.pipelines.v2beta.PipelinesApiLifecycleActorFactory"
            config {
                # Google project
                project = "mgh-lilab-archive"

                # Base bucket for workflow executions
                root = "gs://mgh-lilab-fileshare/cromwell_execution"

                # Make the name of the backend used for call caching purposes insensitive to the PAPI version.
                name-for-call-caching-purposes: PAPI

                # Emit a warning if jobs last longer than this amount of time. This might indicate that something got stuck in PAPI.
                slow-job-warning-time: 24 hours

                # Set this to the lower of the two values "Queries per 100 seconds" and "Queries per 100 seconds per user" for
                # your project.
                #
                # Used to help determine maximum throughput to the Google Genomics API. Setting this value too low will
                # cause a drop in performance. Setting this value too high will cause QPS based locks from Google.
                # 1000 is the default "Queries per 100 seconds per user", 50000 is the default "Queries per 100 seconds"
                # See https://cloud.google.com/genomics/quotas for more information
                genomics-api-queries-per-100-seconds = 1000

                # Polling for completion backs-off gradually for slower-running jobs.
                # This is the maximum polling interval (in seconds):
                maximum-polling-interval = 600

                # Number of workers to assign to PAPI requests
                request-workers = 3

                genomics {
                    # A reference to an auth defined in the `google` stanza at the top.
                    # This auth is used to create pipelines and manipulate auth JSONs.
                    auth = "application-default"

                    # Endpoint for APIs, no reason to change this unless directed by Google.
                    endpoint-url = "https://lifesciences.googleapis.com/"

                    # Currently Cloud Life Sciences API is available only in `us-central1` and `europe-west2` locations.
                    location = "us-central1"

                    # Restrict access to VM metadata. Useful in cases when untrusted containers are running under a service
                    # account not owned by the submitting user
                    restrict-metadata-access = false

                    # Pipelines v2 only: specify the number of times localization and delocalization operations should be attempted
                    # There is no logic to determine if the error was transient or not, everything is retried upon failure
                    # Defaults to 3
                    localization-attempts = 3

                    # Specifies the minimum file size for `gsutil cp` to use parallel composite uploads during delocalization.
                    # Parallel composite uploads can result in a significant improvement in delocalization speed for large files
                    # but may introduce complexities in downloading such files from GCS, please see
                    # https://cloud.google.com/storage/docs/gsutil/commands/cp#parallel-composite-uploads for more information.
                    #
                    # If set to 0 parallel composite uploads are turned off. The default Cromwell configuration turns off
                    # parallel composite uploads, this sample configuration turns it on for files of 150M or larger.
                    parallel-composite-upload-threshold="150M"
                }

                filesystems {
                    gcs {
                        # A reference to a potentially different auth for manipulating files via engine functions.
                        auth = "application-default"

                        # Google project which will be billed for the requests
                        project = "mgh-lilab-archive"

                        caching {
                            # When a cache hit is found, the following duplication strategy will be followed to use the cached outputs
                            # Possible values: "copy", "reference". Defaults to "copy"
                            # "copy": Copy the output files
                            # "reference": DO NOT copy the output files but point to the original output files instead.
                            #              Will still make sure than all the original output files exist and are accessible before
                            #              going forward with the cache hit.
                            duplication-strategy = "copy"
                        }
                    }
                }

                default-runtime-attributes {
                    cpu: 1
                    failOnStderr: false
                    continueOnReturnCode: 0
                    memory: "2048 MB"
                    bootDiskSizeGb: 10
                    # Allowed to be a String, or a list of Strings
                    disks: "local-disk 10 SSD"
                    noAddress: false
                    preemptible: 0
                    zones: ["us-central1-a", "us-central1-b"]
                }

                include "papi_v2_reference_image_manifest.conf"
            }
        }
    }
}

Job doesn't submit with custom backend memory specification

This is the current backend file

include required(classpath("application"))
backend{
default="pbs"
providers{
pbs {
  config {
    submit = """
            LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active/:/storage1/fs1/dspencer/Active/" \ 
            bsub \
            -J ${job_name} \
            -G compute-oncology \
            -q oncology \
            -n ${cpu} \
            ${true="-M " false="" defined(memory_mb)}${memory_mb} \
            ${"-R \"rusage[mem=" + memory_mb + "] span[hosts=1]\""} \
            -a "docker(henrycwong/pipeline)" \
            -g /wongh/rnaseq /usr/bin/env bash ${script}
"""
    kill = "bkill ${job_id}"
    check-alive = "bjobs ${job_id}"
    job-id-regex = "Job <(\\d+)>.*"
  }
}
}
}

And I'm trying to run it with an easy hello world wdl

include required(classpath("application"))
backend{
default="pbs"
providers{
pbs {
  config {
    submit = """
            LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active/:/storage1/fs1/dspencer/Active/" \ 
            bsub \
            -J ${job_name} \
            -G compute-oncology \
            -q oncology \
            -n ${cpu} \
            ${true="-M " false="" defined(memory_mb)}${memory_mb} \
            ${"-R \"rusage[mem=" + memory_mb + "] span[hosts=1]\""} \
            -a "docker(henrycwong/pipeline)" \
            -g /wongh/rnaseq /usr/bin/env bash ${script}
"""
    kill = "bkill ${job_id}"
    check-alive = "bjobs ${job_id}"
    job-id-regex = "Job <(\\d+)>.*"
  }
}
}
}
wongh@compute1-exec-277:/storage1/fs1/dspencer/Active/wongh/rnaseq$ cat hello.wdl 
workflow myWorkflow {
    call myTask
}

task myTask {
    command {
        echo "hello world"
    }
    output {
        String out = read_string(stdout())
    }

    runtime {
      cpu: "1"
      maxRetries: "0"
      memory: "1 GB"
      time: "1"

    }
}

The job only submits if I remove the ${"-R \"rusage[mem=" + memory_mb + "] span[hosts=1]\""} line. I suspect this may have something to do with string parsing? It seems to work in cromwell, not in caper though. Thanks!

caper run: error: argument --pbs-extra-param: expected one argument

Hi Jin,

I figured out that the reason why my jobs were killed because of excessive CPU use is it used only the CPUs, in the same job.

So the apparent solution from your caper document seems to be setting up some 'backend' approach wherein the smaller tasks (such as alignment) were again submitted to job queue for running.

Our HPC is PBS. below is my command for running caper:

caper run \
                        -i input.json \
                        --tmp-dir /TMPDIR3 \
                        --pbs-queue normal \
                        --pbs-extra-param "-l nodes=1:ppn=12,mem=45g,walltime=96:00:00" \
                        --pbs-extra-param "-N Task.backend" \
                        --pbs-extra-param " -j oe" \
                        ../atac-seq-pipeline/atac.wdl

and here is the content (empty) of ~/.caper/default.conf:

running above command resulting in this error message:

usage: caper run [-h] [--dry-run] [-i INPUTS] [-o OPTIONS] [-l LABELS]
                 [-p IMPORTS] [-s STR_LABEL] [--hold]
                 [--singularity-cachedir SINGULARITY_CACHEDIR] [--no-deepcopy]
                 [--deepcopy-ext DEEPCOPY_EXT]
                 [--docker [DOCKER [DOCKER ...]]]
                 [--singularity [SINGULARITY [SINGULARITY ...]]]
                 [--no-build-singularity] [--slurm-partition SLURM_PARTITION]
                 [--slurm-account SLURM_ACCOUNT]
                 [--slurm-extra-param SLURM_EXTRA_PARAM] [--sge-pe SGE_PE]
                 [--sge-queue SGE_QUEUE] [--sge-extra-param SGE_EXTRA_PARAM]
                 [--pbs-queue PBS_QUEUE] [--pbs-extra-param PBS_EXTRA_PARAM]
                 [-m METADATA_OUTPUT] [--java-heap-run JAVA_HEAP_RUN]
                 [--db-timeout DB_TIMEOUT] [--file-db FILE_DB] [--no-file-db]
                 [--mysql-db-ip MYSQL_DB_IP] [--mysql-db-port MYSQL_DB_PORT]
                 [--mysql-db-user MYSQL_DB_USER]
                 [--mysql-db-password MYSQL_DB_PASSWORD] [--cromwell CROMWELL]
                 [--max-concurrent-tasks MAX_CONCURRENT_TASKS]
                 [--max-concurrent-workflows MAX_CONCURRENT_WORKFLOWS]
                 [--max-retries MAX_RETRIES] [--disable-call-caching]
                 [--backend-file BACKEND_FILE] [--out-dir OUT_DIR]
                 [--tmp-dir TMP_DIR] [--gcp-prj GCP_PRJ]
                 [--gcp-zones GCP_ZONES] [--out-gcs-bucket OUT_GCS_BUCKET]
                 [--tmp-gcs-bucket TMP_GCS_BUCKET]
                 [--aws-batch-arn AWS_BATCH_ARN] [--aws-region AWS_REGION]
                 [--out-s3-bucket OUT_S3_BUCKET]
                 [--tmp-s3-bucket TMP_S3_BUCKET] [--use-gsutil-over-aws-s3]
                 [-b BACKEND] [--http-user HTTP_USER]
                 [--http-password HTTP_PASSWORD] [--use-netrc]
                 wdl
caper run: error: argument --pbs-extra-param: expected one argument

Would you pls kindly see what is the problem? And how to solve it?
Thanks

Chan

How does docker get called in caper?

Whenever one runs caper with the --docker flag how does the docker container get run?

I'm only asking because I'm trying to run a wdl file that's supposed to run on caper on miniwdl because caper doesn't run within a docker container. When I run the ENCODE/hic-pipeline I get a "python3 not found" error since the pipeline uses raw commands to call python scripts. It doesn't seem like the miniwdl is calling docker the same way caper is.

how to easily find metadata.json file after run

i'd like to be able to easily find the metadata.json file after a submitted job completes, so that i can run croo against it.

caper metadata <id> seems to show the contents of the metadata, but i'd like to retrieve the path. any way to do this?

metadata.json has changed significantly

I have noticed that the latest caper version 1.4.2 does not constantly write the metadata.json file, which I guess it's a good thing because certainly, the server crashed if too many files have to be written on the metadata.json file. So this is great because a gigantic job that I run using the previous caper version was always crashing, and now it completes the job.

But I still need the metadata.json file. The way I found to generate the metadata.json file is by running this command:

caper metadata e8c2155f-ee2c-4eac-8aa9-a32cdbbd4de0 > metadata.json

But to my surprise, there are many changes in the structure of the file, for example:

When the job is re-run and most of the previous (failed) runs are cached, most of the values printed in the metadata.json file do not work anymore, i.e, the "job / bucket id" is not updated. This means that if the metadata shows, for example:

gs://proteomics-pipeline/results/proteomics_msgfplus/87b2ae48-889a-4ece-a83e-2f0e77122392/call-msconvert_mzrefiner/shard-0/stdout

in reality, that stdout is not in that bucket folder, but in the previous job

gs://proteomics-pipeline/results/proteomics_msgfplus/e8c2155f-ee2c-4eac-8aa9-a32cdbbd4de0/call-msconvert_mzrefiner/shard-0/stdout

The json key-value "commandLine" has disappeared and it's not available anymore!! any other way to find the command that was run?
Could the metadata.json be written to the original bucket where all the output data is located instead of to the local VM folder from where caper metadata command is run?

Thanks a lot for this great tool!

Localization via hard link has failed

Cromwell seems to love hard-links. Due to security reasons, hard links are not supported in many network mounts and virtual environments.

caper uses hard link by default. How can I change the configuration file to use soft link?

~/.caper/default.conf file not created after invoking caper init

Hi I am trying to run your pipeline on the cluster at my university, Imperial College London which uses pbs cluster . after running the caper init, or caper init pbs commands, the ~/.caper/default.conf file doesn't seem to be created , and as a consequence I get the following error:

Traceback (most recent call last):
  File "/rds/general/user/rw4917/home/anaconda3/envs/encode-atac-seq-pipeline/bin/caper", line 13, in <module>
    main()
  File "/rds/general/user/rw4917/home/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.6/site-packages/caper/cli.py", line 500, in main
    init_caper_conf(parsed_args.conf, parsed_args.platform)
  File "/rds/general/user/rw4917/home/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.6/site-packages/caper/caper_init.py", line 180, in init_caper_conf
    with open(conf_file, 'w') as fp:
FileNotFoundError: [Errno 2] No such file or directory: '/rds/general/user/rw4917/home/.caper/default.conf'

I tried manually creating a blank ~/.caper/default.conf file but then get

Traceback (most recent call last):
  File "/rds/general/user/rw4917/home/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.6/configparser.py", line 846, in items
    d.update(self._sections[section])
KeyError: 'defaults'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/rds/general/user/rw4917/home/anaconda3/envs/encode-atac-seq-pipeline/bin/caper", line 13, in <module>
    main()
  File "/rds/general/user/rw4917/home/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.6/site-packages/caper/cli.py", line 481, in main
    parser, _ = get_parser_and_defaults()
  File "/rds/general/user/rw4917/home/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.6/site-packages/caper/caper_args.py", line 699, in get_parser_and_defaults
    conf_key_map=CAPER_1_0_0_PARAM_KEY_NAME_CHANGE,
  File "/rds/general/user/rw4917/home/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.6/site-packages/caper/arg_tool.py", line 120, in update_parsers_defaults_with_conf
    no_strip_quote=no_strip_quote,
  File "/rds/general/user/rw4917/home/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.6/site-packages/caper/arg_tool.py", line 47, in read_from_conf
    d_ = dict(config.items(conf_section))
  File "/rds/general/user/rw4917/home/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.6/configparser.py", line 849, in items
    raise NoSectionError(section)
configparser.NoSectionError: No section: 'defaults'

Sorry if this is a trivial fix I am quite new to submitting jobs on clusters
Thanks in advance !

Pipeline hangs on: "task=chip.read_genome_tsv:-1, retry=0, status=WaitingForReturnCode"

Describe the problem

When running the encode chip-seq-pipeline on HPC with SLURM, it consistently hangs on the line that reads "task=chip.read_genome_tsv:-1, retry=0, status=WaitingForReturnCode". I have tried running the pipeline with and without an active server and cannot overcome this error.

OS/Platform

OS/Platform: HPC with SLURM cluster engine, CentOS Linux
Conda version: 4.10.1
Pipeline version: v1.9.0
Caper version: v1.6.1

Caper configuration file

backend=slurm

# define one of the followings (or both) according to your
# cluster's SLURM configuration.
slurm-partition=long
slurm-account=science

# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
#   file: use md5sum hash (slow).
#   path: use path.
#   path+modtime: use path and modification time.
local-hash-strat=path+modtime

# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=/home/hpc/batchus1/caper_tmp/

cromwell=/home/hpc/batchus1/.caper/cromwell_jar/cromwell-59.jar
womtool=/home/hpc/batchus1/.caper/womtool_jar/womtool-59.jar

Input JSON file

{
    "chip.title" : "RT4",
    "chip.description" : "RT4",

    "chip.pipeline_type" : "histone",
    "chip.aligner" : "bwa",
    "chip.align_only" : false,
    "chip.true_rep_only" : false,

    "chip.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v3/hg38.tsv",

    "chip.paired_end" : true,
    "chip.ctl_paired_end" : true,

    "chip.always_use_pooled_ctl" : true,

    "chip.fastqs_rep1_R1" : [ "/home/hpc/batchus1/GSE148079/SRR11478947_1.fastq" ],
    "chip.fastqs_rep1_R2" : [ "/home/hpc/batchus1/GSE148079/SRR11478947_2.fastq" ],
    "chip.fastqs_rep2_R1" : [ "/home/hpc/batchus1/GSE148079/SRR11478948_1.fastq" ],
    "chip.fastqs_rep2_R2" : [ "/home/hpc/batchus1/GSE148079/SRR11478948_2.fastq" ],

    "chip.ctl_fastqs_rep1_R1" : [ "/home/hpc/batchus1/GSE148079/SRR11478957_1.fastq" ],
    "chip.ctl_fastqs_rep1_R2" : [ "/home/hpc/batchus1/GSE148079/SRR11478957_2.fastq" ],
    "chip.ctl_fastqs_rep2_R1" : [ "/home/hpc/batchus1/GSE148079/SRR11478958_1.fastq" ],
    "chip.ctl_fastqs_rep2_R2" : [ "/home/hpc/batchus1/GSE148079/SRR11478958_2.fastq" ]
}

Sbatch Submission Script

#!/bin/bash

#SBATCH --ntasks-per-node=1
#SBATCH --partition=long 
#SBATCH --export=ALL 
#SBATCH --time=4-00:00:00 
#SBATCH --cpus-per-task=12
#SBATCH --account=science

source ~/miniconda3/etc/profile.d/conda.sh
conda activate encode-chip-seq-pipeline
caper run /home/hpc/batchus1/chip-seq-pipeline2/chip.wdl -i /home/hpc/batchus1/param_short.json

Troubleshooting result

SLURM output file

2021-05-13 12:39:33,587|caper.caper_base|INFO| Creating a timestamped temporary directory. /home/hpc/batchus1/caper_tmp/chip/20210513_123933_586540
2021-05-13 12:39:33,587|caper.caper_runner|INFO| Localizing files on work_dir. /home/hpc/batchus1/caper_tmp/chip/20210513_123933_586540
2021-05-13 12:39:35,039|autouri.autouri|INFO| cp: skipped due to name_size_match, size=872949833, mt=1549739698.0, src=https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15/@@download/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz, dest=/home/hpc/batchus1/caper_tmp/caf534ed3cf684406e731d19be272b4a/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz
2021-05-13 12:39:35,796|autouri.autouri|INFO| cp: skipped due to md5_match, md5=05297d96dd1f7cfb45a7b637d6dd7036, src=https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15_mito_only/@@download/GRCh38_no_alt_analysis_set_GCA_000001405.15_mito_only.fasta.gz, dest=/home/hpc/batchus1/caper_tmp/f43b63a83784d3ec8055f1a22168ed89/GRCh38_no_alt_analysis_set_GCA_000001405.15_mito_only.fasta.gz
2021-05-13 12:39:36,631|autouri.autouri|INFO| cp: skipped due to md5_match, md5=393688b4f06c9ce26165d47433dd8c37, src=https://www.encodeproject.org/files/ENCFF356LFX/@@download/ENCFF356LFX.bed.gz, dest=/home/hpc/batchus1/caper_tmp/f183dcba5d34f959d8b55ed438ee2e22/ENCFF356LFX.bed.gz
2021-05-13 12:39:38,246|autouri.autouri|INFO| cp: skipped due to md5_match, md5=c95303fb77cc3e11d50e3c3a4b93b3fb, src=https://www.encodeproject.org/files/GRCh38_EBV.chrom.sizes/@@download/GRCh38_EBV.chrom.sizes.tsv, dest=/home/hpc/batchus1/caper_tmp/c52f52c7bfa357f55a39b1de7e4d0b0c/GRCh38_EBV.chrom.sizes.tsv
2021-05-13 12:39:39,532|autouri.autouri|INFO| cp: skipped due to name_size_match, size=3749246230, mt=1571469011.0, src=https://www.encodeproject.org/files/ENCFF110MCL/@@download/ENCFF110MCL.tar.gz, dest=/home/hpc/batchus1/caper_tmp/3ff4ac4c3f59d096b1a3842a182072ae/ENCFF110MCL.tar.gz
2021-05-13 12:39:40,386|autouri.autouri|INFO| cp: skipped due to md5_match, md5=80b263f6ea6ff65d547eef07102535db, src=https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15_mito_only_bowtie2_index/@@download/GRCh38_no_alt_analysis_set_GCA_000001405.15_mito_only_bowtie2_index.tar.gz, dest=/home/hpc/batchus1/caper_tmp/df5193e07055d13c48be59bacd0f56b8/GRCh38_no_alt_analysis_set_GCA_000001405.15_mito_only_bowtie2_index.tar.gz
2021-05-13 12:39:41,713|autouri.autouri|INFO| cp: skipped due to name_size_match, size=4318261891, mt=1549723866.0, src=https://www.encodeproject.org/files/ENCFF643CGH/@@download/ENCFF643CGH.tar.gz, dest=/home/hpc/batchus1/caper_tmp/8c692fba4640609720272154ab0faa30/ENCFF643CGH.tar.gz
2021-05-13 12:39:42,490|autouri.autouri|INFO| cp: skipped due to md5_match, md5=7e088c24a017a43b1db5e8f50060eec1, src=https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15_mito_only_bwa_index/@@download/GRCh38_no_alt_analysis_set_GCA_000001405.15_mito_only_bwa_index.tar.gz, dest=/home/hpc/batchus1/caper_tmp/d3dff25534e93d893902540d81e4f475/GRCh38_no_alt_analysis_set_GCA_000001405.15_mito_only_bwa_index.tar.gz
2021-05-13 12:39:43,365|autouri.autouri|INFO| cp: skipped due to md5_match, md5=aca8cf959206aa3ad257fc46dc783266, src=https://www.encodeproject.org/files/ENCFF493CCB/@@download/ENCFF493CCB.bed.gz, dest=/home/hpc/batchus1/caper_tmp/0fa7d04b32e66fa02fb2c1ae39e41447/ENCFF493CCB.bed.gz
2021-05-13 12:39:44,640|autouri.autouri|INFO| cp: skipped due to name_size_match, size=14377496, mt=1592463730.0, src=https://www.encodeproject.org/files/ENCFF304XEX/@@download/ENCFF304XEX.bed.gz, dest=/home/hpc/batchus1/caper_tmp/805e179275a9c0fb7a37def40c4312d1/ENCFF304XEX.bed.gz
2021-05-13 12:39:45,485|autouri.autouri|INFO| cp: skipped due to md5_match, md5=91047588129069ff91ec1b0664179f8e, src=https://www.encodeproject.org/files/ENCFF140XLU/@@download/ENCFF140XLU.bed.gz, dest=/home/hpc/batchus1/caper_tmp/0cbd2c602ddad252bc39729fc8a29286/ENCFF140XLU.bed.gz
2021-05-13 12:39:46,764|autouri.autouri|INFO| cp: skipped due to name_size_match, size=18381891, mt=1592463727.0, src=https://www.encodeproject.org/files/ENCFF212UAV/@@download/ENCFF212UAV.bed.gz, dest=/home/hpc/batchus1/caper_tmp/1d3aa436b05f16a509edb94789c061d3/ENCFF212UAV.bed.gz
2021-05-13 12:39:46,923|autouri.autouri|INFO| cp: skipped due to md5_match, md5=df624401f76fbd4d651e736068c43a1a, src=https://storage.googleapis.com/encode-pipeline-genome-data/hg38/ataqc/hg38_dnase_avg_fseq_signal_formatted.txt.gz, dest=/home/hpc/batchus1/caper_tmp/3b39284516e676ea52238f0636c0bbbf/hg38_dnase_avg_fseq_signal_formatted.txt.gz
2021-05-13 12:39:46,996|autouri.autouri|INFO| cp: skipped due to md5_match, md5=ced0c653d28628654288f7a8ab052590, src=https://storage.googleapis.com/encode-pipeline-genome-data/hg38/ataqc/hg38_celltype_compare_subsample.bed.gz, dest=/home/hpc/batchus1/caper_tmp/c73f434c3fa4f3f54bc2ecad09c065c2/hg38_celltype_compare_subsample.bed.gz
2021-05-13 12:39:47,086|autouri.autouri|INFO| cp: skipped due to md5_match, md5=3f7fd85ab9a4c6274f28c3e82a79c10d, src=https://storage.googleapis.com/encode-pipeline-genome-data/hg38/ataqc/hg38_dnase_avg_fseq_signal_metadata.txt, dest=/home/hpc/batchus1/caper_tmp/a9745b33b4ffdd83d7d2c5a7d3c8036a/hg38_dnase_avg_fseq_signal_metadata.txt
2021-05-13 12:39:47,850|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool...
2021-05-13 12:39:51,199|caper.cromwell|INFO| Womtool validation passed.
2021-05-13 12:39:51,200|caper.caper_runner|INFO| launching run: wdl=/home/hpc/batchus1/chip-seq-pipeline2/chip.wdl, inputs=/home/hpc/batchus1/caper_tmp/home/hpc/batchus1/param_short.local.json, backend_conf=/home/hpc/batchus1/caper_tmp/chip/20210513_123933_586540/backend.conf
2021-05-13 12:40:00,685|caper.cromwell_workflow_monitor|INFO| Workflow: id=3b6d19ac-dd11-4e6f-9246-af6ffb9af467, status=Submitted
2021-05-13 12:40:00,729|caper.cromwell_workflow_monitor|INFO| Workflow: id=3b6d19ac-dd11-4e6f-9246-af6ffb9af467, status=Running
2021-05-13 12:40:10,133|caper.cromwell_workflow_monitor|INFO| Task: id=3b6d19ac-dd11-4e6f-9246-af6ffb9af467, task=chip.read_genome_tsv:-1, retry=0, status=Started, job_id=44018
2021-05-13 12:40:10,139|caper.cromwell_workflow_monitor|INFO| Task: id=3b6d19ac-dd11-4e6f-9246-af6ffb9af467, task=chip.read_genome_tsv:-1, retry=0, status=WaitingForReturnCode

cromwell.out file

2021-05-13 12:39:52,796  INFO  - Running with database db.url = jdbc:hsqldb:mem:4bb55e12-96c4-4bba-a1f8-78ea07e02915;shutdown=false;hsqldb.tx=mvcc
2021-05-13 12:39:59,327  INFO  - Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
2021-05-13 12:39:59,338  INFO  - [RenameWorkflowOptionsInMetadata] 100%
2021-05-13 12:39:59,417  INFO  - Running with database db.url = jdbc:hsqldb:mem:cac1db66-f6bc-4aaa-a0e5-adb74c67f90b;shutdown=false;hsqldb.tx=mvcc
2021-05-13 12:39:59,754  INFO  - Slf4jLogger started
2021-05-13 12:39:59,952 cromwell-system-akka.dispatchers.engine-dispatcher-5 INFO  - Workflow heartbeat configuration:
{
  "cromwellId" : "cromid-ee56600",
  "heartbeatInterval" : "2 minutes",
  "ttl" : "10 minutes",
  "failureShutdownDuration" : "5 minutes",
  "writeBatchSize" : 10000,
  "writeThreshold" : 10000
}
2021-05-13 12:40:00,005 cromwell-system-akka.dispatchers.service-dispatcher-13 INFO  - Metadata summary refreshing every 1 second.
2021-05-13 12:40:00,096 cromwell-system-akka.dispatchers.service-dispatcher-12 INFO  - WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
2021-05-13 12:40:00,100 cromwell-system-akka.actor.default-dispatcher-4 INFO  - KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
2021-05-13 12:40:00,123 cromwell-system-akka.dispatchers.engine-dispatcher-42 INFO  - CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
2021-05-13 12:40:00,124  WARN  - 'docker.hash-lookup.gcr-api-queries-per-100-seconds' is being deprecated, use 'docker.hash-lookup.gcr.throttle' instead (see reference.conf)
2021-05-13 12:40:00,610 cromwell-system-akka.dispatchers.engine-dispatcher-42 INFO  - JobExecutionTokenDispenser - Distribution rate: 1 per 2 seconds.
2021-05-13 12:40:00,637 cromwell-system-akka.dispatchers.engine-dispatcher-5 INFO  - SingleWorkflowRunnerActor: Version 59
2021-05-13 12:40:00,643 cromwell-system-akka.dispatchers.engine-dispatcher-5 INFO  - SingleWorkflowRunnerActor: Submitting workflow
2021-05-13 12:40:00,685 cromwell-system-akka.dispatchers.api-dispatcher-47 INFO  - Unspecified type (Unspecified version) workflow 3b6d19ac-dd11-4e6f-9246-af6ffb9af467 submitted
2021-05-13 12:40:00,711 cromwell-system-akka.dispatchers.engine-dispatcher-41 INFO  - SingleWorkflowRunnerActor: Workflow submitted UUID(3b6d19ac-dd11-4e6f-9246-af6ffb9af467)
2021-05-13 12:40:00,714 cromwell-system-akka.dispatchers.engine-dispatcher-42 INFO  - 1 new workflows fetched by cromid-ee56600: 3b6d19ac-dd11-4e6f-9246-af6ffb9af467
2021-05-13 12:40:00,722 cromwell-system-akka.dispatchers.engine-dispatcher-41 INFO  - WorkflowManagerActor: Starting workflow UUID(3b6d19ac-dd11-4e6f-9246-af6ffb9af467)
2021-05-13 12:40:00,728 cromwell-system-akka.dispatchers.engine-dispatcher-41 INFO  - WorkflowManagerActor: Successfully started WorkflowActor-3b6d19ac-dd11-4e6f-9246-af6ffb9af467
2021-05-13 12:40:00,729 cromwell-system-akka.dispatchers.engine-dispatcher-41 INFO  - Retrieved 1 workflows from the WorkflowStoreActor
2021-05-13 12:40:00,743 cromwell-system-akka.dispatchers.engine-dispatcher-42 INFO  - WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
2021-05-13 12:40:00,804 cromwell-system-akka.dispatchers.engine-dispatcher-41 INFO  - MaterializeWorkflowDescriptorActor [UUID(3b6d19ac)]: Parsing workflow as WDL 1.0
2021-05-13 12:40:03,540 cromwell-system-akka.dispatchers.engine-dispatcher-41 INFO  - MaterializeWorkflowDescriptorActor [UUID(3b6d19ac)]: Call-to-Backend assignments: chip.align_R1 -> slurm, chip.bam2ta_ctl -> slurm, chip.idr_pr -> slurm, chip.filter_no_dedup -> slurm, chip.pool_ta_ctl -> slurm, chip.error_subsample_pooled_control_with_mixed_endedness -> slurm, chip.overlap_ppr -> slurm, chip.filter -> slurm, chip.qc_report -> slurm, chip.error_wrong_aligner -> slurm, chip.call_peak_pooled -> slurm, chip.filter_R1 -> slurm, chip.pool_ta_pr2 -> slurm, chip.spr -> slurm, chip.filter_ctl -> slurm, chip.call_peak_pr1 -> slurm, chip.error_custom_aligner -> slurm, chip.count_signal_track_pooled -> slurm, chip.error_ctl_fastq_input_required_for_control_mode -> slurm, chip.call_peak_ppr1 -> slurm, chip.idr_ppr -> slurm, chip.call_peak -> slurm, chip.error_control_required -> slurm, chip.align -> slurm, chip.jsd -> slurm, chip.error_input_data -> slurm, chip.reproducibility_overlap -> slurm, chip.idr -> slurm, chip.call_peak_ppr2 -> slurm, chip.bam2ta -> slurm, chip.pool_ta -> slurm, chip.xcor -> slurm, chip.macs2_signal_track -> slurm, chip.overlap -> slurm, chip.subsample_ctl -> slurm, chip.macs2_signal_track_pooled -> slurm, chip.call_peak_pr2 -> slurm, chip.gc_bias -> slurm, chip.read_genome_tsv -> slurm, chip.pool_blacklist -> slurm, chip.error_use_bowtie2_local_mode_for_non_bowtie2 -> slurm, chip.error_use_bwa_mem_for_non_bwa -> slurm, chip.fraglen_mean -> slurm, chip.bam2ta_no_dedup_R1 -> slurm, chip.bam2ta_no_dedup -> slurm, chip.pool_ta_pr1 -> slurm, chip.subsample_ctl_pooled -> slurm, chip.count_signal_track -> slurm, chip.align_ctl -> slurm, chip.overlap_pr -> slurm, chip.error_ctl_input_defined_in_control_mode -> slurm, chip.choose_ctl -> slurm, chip.reproducibility_idr -> slurm
2021-05-13 12:40:03,813 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,814 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,815 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,815 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,816 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,816 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,816 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,817 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,817 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,817 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,817 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,818 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,818 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,818 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,818 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,819 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,819 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,819 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,820 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,820 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,820 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,820 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,821 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,821 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,821 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,821 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,822 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,822 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,822 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,822 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,823 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,823 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,824 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,824 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,824 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,825 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,825 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,825 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,826 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,826 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,826 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,826 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,826 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,827 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,827 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,827 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,827 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,827 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,828 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,830 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,831 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,831 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:03,831 cromwell-system-akka.dispatchers.backend-dispatcher-101 WARN  - slurm [UUID(3b6d19ac)]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2021-05-13 12:40:05,631 cromwell-system-akka.dispatchers.engine-dispatcher-152 INFO  - Not triggering log of token queue status. Effective log interval = None
2021-05-13 12:40:08,202 cromwell-system-akka.dispatchers.engine-dispatcher-152 INFO  - WorkflowExecutionActor-3b6d19ac-dd11-4e6f-9246-af6ffb9af467 [UUID(3b6d19ac)]: Starting chip.read_genome_tsv
2021-05-13 12:40:08,645 cromwell-system-akka.dispatchers.engine-dispatcher-152 INFO  - Assigned new job execution tokens to the following groups: 3b6d19ac: 1
2021-05-13 12:40:08,808 cromwell-system-akka.dispatchers.engine-dispatcher-112 INFO  - 3b6d19ac-dd11-4e6f-9246-af6ffb9af467-EngineJobExecutionActor-chip.read_genome_tsv:NA:1 [UUID(3b6d19ac)]: Could not copy a suitable cache hit for 3b6d19ac:chip.read_genome_tsv:-1:1. No copy attempts were made.
2021-05-13 12:40:08,836 cromwell-system-akka.dispatchers.backend-dispatcher-156 WARN  - BackgroundConfigAsyncJobExecutionActor [UUID(3b6d19ac)chip.read_genome_tsv:NA:1]: Unrecognized runtime attribute keys: disks
2021-05-13 12:40:08,914 cromwell-system-akka.dispatchers.backend-dispatcher-156 INFO  - BackgroundConfigAsyncJobExecutionActor [UUID(3b6d19ac)chip.read_genome_tsv:NA:1]: `echo "$(basename /home/hpc/batchus1/RT4_output/chip/3b6d19ac-dd11-4e6f-9246-af6ffb9af467/call-read_genome_tsv/inputs/955696674/hg38.local.tsv)" > genome_name
# create empty files for all entries
touch ref_fa bowtie2_idx_tar bwa_idx_tar chrsz gensz blacklist blacklist2
touch mito_chr_name
touch regex_bfilt_peak_chr_name

python <<CODE
import os
with open('/home/hpc/batchus1/RT4_output/chip/3b6d19ac-dd11-4e6f-9246-af6ffb9af467/call-read_genome_tsv/inputs/955696674/hg38.local.tsv','r') as fp:
    for line in fp:
        arr = line.strip('\n').split('\t')
        if arr:
            key, val = arr
            with open(key,'w') as fp2:
                fp2.write(val)
CODE`
2021-05-13 12:40:09,169 cromwell-system-akka.dispatchers.backend-dispatcher-156 INFO  - BackgroundConfigAsyncJobExecutionActor [UUID(3b6d19ac)chip.read_genome_tsv:NA:1]: executing: if [ -z \"$SINGULARITY_BINDPATH\" ]; then export SINGULARITY_BINDPATH=; fi; \
if [ -z \"$SINGULARITY_CACHEDIR\" ]; then export SINGULARITY_CACHEDIR=; fi;

ITER=0
until [ $ITER -ge 3 ]; do
    sbatch \
        --export=ALL \
        -J cromwell_3b6d19ac_read_genome_tsv \
        -D /home/hpc/batchus1/RT4_output/chip/3b6d19ac-dd11-4e6f-9246-af6ffb9af467/call-read_genome_tsv \
        -o /home/hpc/batchus1/RT4_output/chip/3b6d19ac-dd11-4e6f-9246-af6ffb9af467/call-read_genome_tsv/execution/stdout \
        -e /home/hpc/batchus1/RT4_output/chip/3b6d19ac-dd11-4e6f-9246-af6ffb9af467/call-read_genome_tsv/execution/stderr \
        -t 60 \
        -n 1 \
        --ntasks-per-node=1 \
        --cpus-per-task=1 \
        --mem=2048 \
        -p long \
        --account science \
         \
         \
        --wrap "/bin/bash /home/hpc/batchus1/RT4_output/chip/3b6d19ac-dd11-4e6f-9246-af6ffb9af467/call-read_genome_tsv/execution/script" \
        && break
    ITER=$[$ITER+1]
    sleep 30
done
2021-05-13 12:40:10,132 cromwell-system-akka.dispatchers.backend-dispatcher-156 INFO  - BackgroundConfigAsyncJobExecutionActor [UUID(3b6d19ac)chip.read_genome_tsv:NA:1]: job id: 44018
2021-05-13 12:40:10,139 cromwell-system-akka.dispatchers.backend-dispatcher-157 INFO  - BackgroundConfigAsyncJobExecutionActor [UUID(3b6d19ac)chip.read_genome_tsv:NA:1]: Status change from - to WaitingForReturnCode

caper submit without call-caching?

Is it possible to do caper submit but to choose not to do the call-catching? I.e, even if I submit the exact same job, to use a parameter to force Cromwell to do everything from the beginning again (but with a new id, obviously). Thanks!!

PBS qsub Node Specification Error

Hi there,

I installed the ATAC-seq pipeline on a PBS cluster.
Caper is configured with pbs backend and the lead job was submitted.
It seems the pipeline fails to submit the first subjob, this is what the generated submission script script.submit looks like:

#!/bin/bash
if [ -z \"$SINGULARITY_BINDPATH\" ]; then export SINGULARITY_BINDPATH=; fi; \
if [ -z \"$SINGULARITY_CACHEDIR\" ]; then export SINGULARITY_CACHEDIR=; fi;

echo "/bin/bash /scratch/username/KMS11_ATAC/runQSZNCL.ctrls/atac/cc317db4-d5f2-4ad7-ad32-5b95862f02c7/call-read_genome_tsv/execution/script" | \
qsub \
    -N cromwell_cc317db4_read_genome_tsv \
    -o /scratch/username/KMS11_ATAC/runQSZNCL.ctrls/atac/cc317db4-d5f2-4ad7-ad32-5b95862f02c7/call-read_genome_tsv/execution/stdout \
    -e /scratch/username/KMS11_ATAC/runQSZNCL.ctrls/atac/cc317db4-d5f2-4ad7-ad32-5b95862f02c7/call-read_genome_tsv/execution/stderr \
    -lnodes=1:ppn=1:mem=2048mb \
    -lwalltime=1:0:0 \
     \
    -q q32 \
     -P sbs_liyh \
    -V

When this is submitted, stderr.background comes back with qsub: node(s) specification error
Looks like some spaces are missing?

Workflow failed error

Hi, I tried running the example : caper run ~/atac-seq-pipeline/atac.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled_caper.json and eventually ran into an error:

2020-08-20 17:25:43,168|caper.cromwell_workflow_monitor|INFO| Task: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, task=atac.align:0, retry=1, status=Started, job_id=25149
2020-08-20 17:25:43,180|caper.cromwell_workflow_monitor|INFO| Task: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, task=atac.align:1, retry=1, status=WaitingForReturnCode
2020-08-20 17:25:43,191|caper.cromwell_workflow_monitor|INFO| Task: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, task=atac.align:0, retry=1, status=WaitingForReturnCode
2020-08-20 17:25:48,160|caper.cromwell_workflow_monitor|INFO| Task: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, task=atac.align_mito:1, retry=1, status=Started, job_id=25169
2020-08-20 17:25:48,172|caper.cromwell_workflow_monitor|INFO| Task: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, task=atac.align_mito:0, retry=1, status=Started, job_id=25189
2020-08-20 17:25:48,183|caper.cromwell_workflow_monitor|INFO| Task: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, task=atac.align_mito:1, retry=1, status=WaitingForReturnCode
2020-08-20 17:25:48,195|caper.cromwell_workflow_monitor|INFO| Task: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, task=atac.align_mito:0, retry=1, status=WaitingForReturnCode
2020-08-20 17:25:52,296|caper.cromwell_workflow_monitor|INFO| Task: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, task=atac.align:1, retry=1, status=Done
2020-08-20 17:25:52,446|caper.cromwell_workflow_monitor|INFO| Task: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, task=atac.align:0, retry=1, status=Done
2020-08-20 17:25:55,609|caper.cromwell_workflow_monitor|INFO| Task: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, task=atac.align_mito:1, retry=1, status=Done
2020-08-20 17:25:58,550|caper.cromwell_workflow_monitor|INFO| Task: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, task=atac.align_mito:0, retry=1, status=Done
2020-08-20 17:25:59,598|caper.cromwell_workflow_monitor|INFO| Workflow: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, status=Failed
2020-08-20 17:26:13,667|caper.cromwell_metadata|INFO| Wrote metadata file. /Users/j/Desktop/atac/atac/8bdeb27a-baa7-4835-a5bb-07ca655d2f07/metadata.json
2020-08-20 17:26:13,668|caper.cromwell|INFO| Workflow failed. Auto-troubleshooting...
* Started troubleshooting workflow: id=8bdeb27a-baa7-4835-a5bb-07ca655d2f07, status=Failed
* Found failures JSON object.
[
    {
        "causedBy": [
            {
                "message": "Job atac.align:1:2 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job atac.align:0:2 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job atac.align_mito:1:2 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            },
            {
                "message": "Job atac.align_mito:0:2 exited with return code 3 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.",
                "causedBy": []
            }
        ],
        "message": "Workflow failed"
    }
]
* Recursively finding failures in calls (tasks)...

==== NAME=atac.align_mito, STATUS=RetryableFailure, PARENT=
SHARD_IDX=0, RC=3, JOB_ID=24986
START=2020-08-20T21:25:28.208Z, END=2020-08-20T21:25:43.151Z
STDOUT=/Users/j/Desktop/atac/atac/8bdeb27a-baa7-4835-a5bb-07ca655d2f07/call-align_mito/shard-0/execution/stdout
STDERR=/Users/j/Desktop/atac/atac/8bdeb27a-baa7-4835-a5bb-07ca655d2f07/call-align_mito/shard-0/execution/stderr
STDERR_CONTENTS=

* Error: pipeline dependencies not found.
Conda users: Did you activate Conda environment (conda activate encode-atac-seq-pipeline)?
    Or did you install Conda and environment correctly (bash scripts/install_conda_env.sh)?
GCP/AWS/Docker users: Did you add --docker flag to Caper command line arg?
Singularity users: Did you add --singularity flag to Caper command line arg? 

==== NAME=atac.align, STATUS=RetryableFailure, PARENT=
SHARD_IDX=0, RC=3, JOB_ID=24946
START=2020-08-20T21:25:24.211Z, END=2020-08-20T21:25:38.156Z
STDOUT=/Users/j/Desktop/atac/atac/8bdeb27a-baa7-4835-a5bb-07ca655d2f07/call-align/shard-0/execution/stdout
STDERR=/Users/j/Desktop/atac/atac/8bdeb27a-baa7-4835-a5bb-07ca655d2f07/call-align/shard-0/execution/stderr
STDERR_CONTENTS=

* Error: pipeline dependencies not found.
Conda users: Did you activate Conda environment (conda activate encode-atac-seq-pipeline)?
    Or did you install Conda and environment correctly (bash scripts/install_conda_env.sh)?
GCP/AWS/Docker users: Did you add --docker flag to Caper command line arg?
Singularity users: Did you add --singularity flag to Caper command line arg?

2020-08-20 17:26:13,670|caper.nb_subproc_thread|ERROR| Subprocess failed. returncode=1
2020-08-20 17:26:13,670|caper.cli|ERROR| Check stdout/stderr in /Users/j/Desktop/atac/cromwell.out```

Race condition for spin lock file

ENCODE-DCC/atac-seq-pipeline#185

No such file or directory: 'singularity'

I'm trying to run caper on the encode hic-pipeline using singularity using the following command
caper run hic.wdl -i tests/functional/json/test_hic.json --singularity
and I get the following error message:

FATAL:   while extracting ~/.caper/singularity_cachedir/cache/oci-tmp/ad135405272624a90ae386db4d4deaeaff7126532bbd444b071827ea30107530: root filesystem extraction failed: extract command failed: FATAL:   could not open image /~/.caper/singularity_cachedir/rootfs-971186452/tmp-rootfs-720708493: failed to retrieve path for /~/.caper/singularity_cachedir/rootfs-971186452/tmp-rootfs-720708493: lstat /~: no such file or directory
: exit status 255
Traceback (most recent call last):
  File "/home/kos/.local/bin/caper", line 13, in <module>
    main()
  File "/home/kos/.local/lib/python3.8/site-packages/caper/cli.py", line 675, in main
    return runner(parsed_args, nonblocking_server=nonblocking_server)
  File "/home/kos/.local/lib/python3.8/site-packages/caper/cli.py", line 226, in runner
    subcmd_run(c, args)
  File "/home/kos/.local/lib/python3.8/site-packages/caper/cli.py", line 347, in subcmd_run
    thread = caper_runner.run(
  File "/home/kos/.local/lib/python3.8/site-packages/caper/caper_runner.py", line 410, in run
    options = self._caper_workflow_opts.create_file(
  File "/home/kos/.local/lib/python3.8/site-packages/caper/caper_workflow_opts.py", line 234, in create_file
    s.build_local_image()
  File "/home/kos/.local/lib/python3.8/site-packages/caper/singularity.py", line 56, in build_local_image
    return check_call(cmd, env=env)
  File "/usr/lib64/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['singularity', 'exec', 'docker://encodedcc/hic-pipeline:0.4.0', 'echo', 'Built local singularity image for docker://encodedcc/hic-pipeline:0.4.0']' returned non-zero exit status 255.

initialization of Caper (Sherlock paltform)

Hi,
I installed caper om my MacOS. I did the run test 'caper'. However, when i tried to initialize Caper using Sherlock (I work at Stanford) it did not want to. Do you think there's a step I'm missing?
Thanks

Error related to caper server on scg node

I received this error after I submitted jobs to my caper server. I think the scg node isn't compatible. Error below

HTTPConnectionPool(host='sgisummit-frcf-111-28', port=8000): Max retries exceeded with url: /api/workflows/v1/query (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2aba5d2e2fd0>: Failed to establish a new connection: [Errno 111] Connection refused'))

Can Caper upload data for cloud backend?

Hi,

I'm just wondering if Caper has the functionality of uploading data when submitting a job to cloud. Or users need to upload using cloud SDK themselves, then submit jobs by referencing file locations on the cloud bucket? Thanks!

Sincerely,
Yiming

Error when submitting to caper server on SGE

When I submit my atac-seq input.json files to the caper server. I get the following error:
HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /api/workflows/v1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b9025e3c7b8>: Failed to establish a new connection: [Errno 111] Connection refused'))...I attached my default.conf file.
default.conf.txt

Missing module "AutoURI"

Hi, I have been trying to use ATAC-Seq Pipeline and been encountering some issues.
After installing Conda, Caper gives an error saying "autouri" module is missing. I tried "pip install autouri" which led Caper to seem normal when I test it.
However, when I do a Caper run , I get a ValueError that says "Not a valid URI?..." . It says the same for the "Test input JSON File " and a JSON file I created. I am not sure where the problem is. Could you please help me with it?
Thanks!

Example WDL file

Hi,

I am working with Stanford SCG cluster.
I am wondering if you are kind enough to share an example of working WDL file (for RNA-seq pipeline using caper) for me to adapt? Also, if you don't mind, a json file. I can't seem to find in your github (especially WDL).
Thanks.

Wil

how to set up LSF specific conf file for caper

Hi Jin,

I'm looking for the right value of a platform parameter to be specified to init Caper on my HPC (Mount Sinai). My HPC uses the LSF system. I'm referring to section 2.3 of this manual - https://github.com/MoTrPAC/motrpac-atac-seq-pipeline.

Thanks,

German Nudelman, Ph.D.
Sr. Bioinformatics Developer/Analyst
Icahn School of Medicine at Mount Sinai

caper submit leader job but no children jobs on slurm HPC

Hi team,

I'm using the ENCODE chip-seq-pipeline2 and installed the conda environment for it.
I also edited the ~/.caper/default.conf as follow:

backend=slurm

# define one of the followings (or both) according to your
# cluster's SLURM configuration.
slurm-partition=genomics
slurm-account=ls25

# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
#   file: use md5sum hash (slow).
#   path: use path.
#   path+modtime: use path and modification time.
local-hash-strat=path+modtime

# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=/home/fyan0011/ls25_scratch/feng.yan/caperfiles/

cromwell=/home/fyan0011/.caper/cromwell_jar/cromwell-52.jar
womtool=/home/fyan0011/.caper/womtool_jar/womtool-52.jar

Then I activated the conda environment and run this command as per your manual
sbatch -A ls25 -p genomics --qos=genomics -J chip-seq --export=ALL --mem 4G -t 4:00:00 --wrap 'caper run /home/fyan0011/ls25_scratch/feng.yan/software/chip-seq-pipeline2/chip.wdl -i template.json'
I noticed the qos flag seems not used according to the logs, anyway, the job was submitted, but no children job was seen.

The slurm out file showed that jobs are

2020-11-10 15:05:32,956|caper.caper_base|INFO| Creating a timestamped temporary directory. /home/fyan0011/ls25_scratch/feng.yan/caperfiles/chip/20201110_150532_953560
2020-11-10 15:05:32,957|caper.caper_runner|INFO| Localizing files on work_dir. /home/fyan0011/ls25_scratch/feng.yan/caperfiles/chip/20201110_150532_953560
2020-11-10 15:05:34,243|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool...
2020-11-10 15:05:43,850|caper.cromwell|INFO| Womtool validation passed.
2020-11-10 15:05:43,851|caper.caper_runner|INFO| launching run: wdl=/home/fyan0011/ls25_scratch/feng.yan/software/chip-seq-pipeline2/chip.wdl, inputs=/fs03/ls25/feng.yan/Lmo2_ChIP/test/caper/template.json, backend_conf=/home/fyan0011/ls25_scratch/feng.yan/caperfiles/chip/20201110_150532_953560/backend.conf
2020-11-10 15:06:07,320|caper.cromwell_workflow_monitor|INFO| Workflow: id=abadfc26-bf72-4d87-b5cf-a36e8a2cbeb8, status=Submitted
2020-11-10 15:06:07,545|caper.cromwell_workflow_monitor|INFO| Workflow: id=abadfc26-bf72-4d87-b5cf-a36e8a2cbeb8, status=Running
2020-11-10 15:06:25,686|caper.cromwell_workflow_monitor|INFO| Task: id=abadfc26-bf72-4d87-b5cf-a36e8a2cbeb8, task=chip.read_genome_tsv:-1, retry=0, status=Started, job_id=35516
2020-11-10 15:06:25,697|caper.cromwell_workflow_monitor|INFO| Task: id=abadfc26-bf72-4d87-b5cf-a36e8a2cbeb8, task=chip.read_genome_tsv:-1, retry=0, status=WaitingForReturnCode

Could you help with this?
Thank you!

caper init subcommand DNE

Hello, for clarity, can you please remove Step 5 caper init [PLATFORM] from the README since it looks the config file is now generated by running caper without any arguments?

caper cannot find singularity command if SINGULARITY_CACHEDIR environment variable does not exist

I'm running caper with the slurm backend. When SINGULARITY_CACHEDIR variable is unset, I get the following error when running caper submit --singularity ... within a bash script:

Traceback (most recent call last):
  File "/N/users/rojoshi/anaconda3/envs/encode-atac-seq-pipeline/bin/caper", line 13, in <module>
    main()
  File "/N/users/rojoshi/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/caper.py", line 1238, in main
    c.submit()
  File "/N/users/rojoshi/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/caper.py", line 382, in submit
    input_file, tmp_dir)
  File "/N/users/rojoshi/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/caper.py", line 745, in __create_workflow_opts_json_file
    self.__build_singularity_image(singularity)
  File "/N/users/rojoshi/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/caper.py", line 1042, in __build_singularity_image
    return check_call(cmd, env=env)
  File "/N/users/rojoshi/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/subprocess.py", line 323, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/N/users/rojoshi/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/subprocess.py", line 304, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/N/users/rojoshi/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/subprocess.py", line 756, in __init__
    restore_signals, start_new_session)
  File "/N/users/rojoshi/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/subprocess.py", line 1499, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'singularity': 'singularity'

If I execute the script that calls caper submit with SINGULARITY_CACHEDIR=<path>/singularity_cachedir/ script.sh, the command executes successfully.

I think these lines in caper.py:__build_singularity_image:1037-1039 are the problem:

            if self._singularity_cachedir is not None \
                    and 'SINGULARITY_CACHEDIR' not in os.environ:
                env = {'SINGULARITY_CACHEDIR': self._singularity_cachedir}

Likely could be fixed by:

            if self._singularity_cachedir is not None \
                    and 'SINGULARITY_CACHEDIR' not in os.environ:
                env=os.environ.copy()
                env['SINGULARITY_CACHEDIR'] = self._singularity_cachedir

Can GCP backend use Genomics API?

Hi,

I'm trying caper on Google Cloud. And since my VPC only supports us-west-1 region, I can only use the deprecated Genomics API, not Google Life Sciences API. However, even if I set use-google-cloud-life-sciences to false in the caper default.conf file, the job still failed with error message saying that Life Sciences API was not enabled. On the contrary, when I directly used Cromwell to run the jobs, it worked properly.

So I'm wondering if caper still support the old Genomics API, or only Life Sciences API is supported. Thanks!

How to specify the Cromwell version

caper currently downloads by default cromwell-47.jar. Is it possible to specify which version to download? I know that it is possible to specify which cromwell version to run if it is already downloaded. But what about specifying which version to download?

It would be great to know whether it's currently possible, otherwise, it would be an excellent and very useful new feature.

Thanks!

~/.caper/default.conf file isn't created after caper init

I've installed caper v=1.6.3 on md Anderson seadragon cluster, and after I've try caper init seadragon, it shows:

Traceback (most recent call last):
  File "***/.conda/envs/encode-atac-seq-pipeline/bin/caper", line 13, in <module>
    main()
  File "***/.conda/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/cli.py", line 672, in main
    init_caper_conf(parsed_args.conf, parsed_args.platform)
  File "***/.conda/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/caper/caper_init.py", line 176, in init_caper_conf
    raise ValueError('Unsupported backend {p}'.format(p=backend))
ValueError: Unsupported backend seadragon

how to deal with this problem? really appreciate your response.

caper does not respect --cromwell option

Hi @leepc12,
I am running caper 0.6.3 on mac using docker. My caper configuration contains only the line backend=local.
When I run the command

(dnase_dev) ottoDN52es83~/github/dnase-seq-pipeline caper run tests/unit/test_bwa_index.wdl -i tests/unit/json/test_bwa_index.json --cromwell /Users/otto/github/dnase-seq-pipeline/cromwell-48.jar --docker quay.io/encode-dcc/dnase-seq-pipeline:mytag

caper still downloads the cromwell 47:

[CaperURI] read from local, src: /Users/otto/github/dnase-seq-pipeline/tests/unit/json/test_bwa_index.json
[CaperURI] copying from url to local, src: https://github.com/broadinstitute/cromwell/releases/download/47/womtool-47.jar
[CaperURI] file already exists. skip downloading and ignore HTTP_ERR 416
[CaperURI] copying skipped, target: /Users/otto/.caper/womtool_jar/womtool-47.jar
[Caper] Validating WDL/input JSON with womtool...
Success!

However caper runs with the correct cromwell that is defined in the command line:

(dnase_dev) ottoDN52es83~/github/dnase-seq-pipeline caper run tests/unit/test_bwa_index.wdl -i tests/unit/json/test_bwa_index.json --cromwell /this/path/does/not/exist/cromwell-48.jar --docker quay.io/encode-dcc/dnase-seq-pipeline:mytag
[CaperURI] read from local, src: /Users/otto/github/dnase-seq-pipeline/tests/unit/json/test_bwa_index.json
[CaperURI] copying from url to local, src: https://github.com/broadinstitute/cromwell/releases/download/47/womtool-47.jar
[CaperURI] file already exists. skip downloading and ignore HTTP_ERR 416
[CaperURI] copying skipped, target: /Users/otto/.caper/womtool_jar/womtool-47.jar
[Caper] Validating WDL/input JSON with womtool...
Success!
[Caper] cmd:  ['java', '-Xmx3G', '-XX:ParallelGCThreads=1', '-DLOG_LEVEL=INFO', '-DLOG_MODE=standard', '-jar', '-Dconfig.file=/Users/otto/github/dnase-seq-pipeline/.caper_tmp/test_bwa_index/20200122_120544_485865/backend.conf', '/this/path/does/not/exist/cromwell-48.jar', 'run', '/Users/otto/github/dnase-seq-pipeline/tests/unit/test_bwa_index.wdl', '-i', '/Users/otto/github/dnase-seq-pipeline/tests/unit/json/test_bwa_index.json', '-o', '/Users/otto/github/dnase-seq-pipeline/.caper_tmp/test_bwa_index/20200122_120544_485865/workflow_opts.json', '-l', '/Users/otto/github/dnase-seq-pipeline/.caper_tmp/test_bwa_index/20200122_120544_485865/labels.json', '-m', '/Users/otto/github/dnase-seq-pipeline/.caper_tmp/test_bwa_index/20200122_120544_485865/metadata.json']
Error: Unable to access jarfile /this/path/does/not/exist/cromwell-48.jar

So the caper functions correctly, but downloads unnecessary .jar

Thanks!

caper install

Caper run fine on python 3.8.5

Does anyone know what the possible errors can be? error on python 3.9.5 and Python 2.7.16 :: Anaconda, Inc.
Thank you

Traceback (most recent call last):
File "/home/shiyi/.local/bin/caper", line 3, in
from caper.cli import main
File "/home/shiyi/.local/lib/python3.9/site-packages/caper/cli.py", line 13, in
from .caper_args import ResourceAnalysisReductionMethod, get_parser_and_defaults
File "/home/shiyi/.local/lib/python3.9/site-packages/caper/caper_args.py", line 25, in
from .resource_analysis import ResourceAnalysis
File "/home/shiyi/.local/lib/python3.9/site-packages/caper/resource_analysis.py", line 10, in
from sklearn import linear_model
File "/home/shiyi/.local/lib/python3.9/site-packages/sklearn/init.py", line 82, in
from .base import clone
File "/home/shiyi/.local/lib/python3.9/site-packages/sklearn/base.py", line 17, in
from .utils import _IS_32BIT
File "/home/shiyi/.local/lib/python3.9/site-packages/sklearn/utils/init.py", line 20, in
from scipy.sparse import issparse
File "/home/shiyi/.local/lib/python3.9/site-packages/scipy/init.py", line 153, in
from scipy._lib._ccallback import LowLevelCallable
File "/home/shiyi/.local/lib/python3.9/site-packages/scipy/_lib/_ccallback.py", line 1, in
from . import _ccallback_c
File "_ccallback_c.pyx", line 210, in init scipy._lib._ccallback_c
File "/usr/local/lib/python3.9/ctypes/init.py", line 8, in
from _ctypes import Union, Structure, Array
ModuleNotFoundError: No module named '_ctypes'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/shiyi/.local/bin/caper", line 10, in
from caper.cli import main
File "/home/shiyi/.local/lib/python3.9/site-packages/caper/cli.py", line 13, in
from .caper_args import ResourceAnalysisReductionMethod, get_parser_and_defaults
File "/home/shiyi/.local/lib/python3.9/site-packages/caper/caper_args.py", line 25, in
from .resource_analysis import ResourceAnalysis
File "/home/shiyi/.local/lib/python3.9/site-packages/caper/resource_analysis.py", line 10, in
from sklearn import linear_model
File "/home/shiyi/.local/lib/python3.9/site-packages/sklearn/init.py", line 82, in
from .base import clone
File "/home/shiyi/.local/lib/python3.9/site-packages/sklearn/base.py", line 17, in
from .utils import _IS_32BIT
File "/home/shiyi/.local/lib/python3.9/site-packages/sklearn/utils/init.py", line 20, in
from scipy.sparse import issparse
File "/home/shiyi/.local/lib/python3.9/site-packages/scipy/init.py", line 153, in
from scipy._lib._ccallback import LowLevelCallable
File "/home/shiyi/.local/lib/python3.9/site-packages/scipy/_lib/_ccallback.py", line 1, in
from . import _ccallback_c
File "_ccallback_c.pyx", line 210, in init scipy._lib._ccallback_c
File "/usr/local/lib/python3.9/ctypes/init.py", line 8, in
from _ctypes import Union, Structure, Array
ModuleNotFoundError: No module named '_ctypes'

Cannot build a local path from file link on GCP GS

To Whom It May Concern,

I'm running caper v1.6.3 on a GCP instance. When running it using my WDL workflow which contains the following code:

File ref_index_file = "gs://regev-lab/resources/count_tools/ref_index.tsv"
Map[String, String] ref_index2gsurl = read_map(ref_index_file)

Cromwell gave me the error saying

LinuxFileSystem: Cannot build a local path from gs://regev-lab/resources/count_tools/ref_index.tsv
Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems

and

Evaluating read_map(ref_index_file) failed: Failed to read_map("gs://regev-lab/resources/count_tools/ref_index.tsv") (reason 1 of 1): java.lang.IllegalArgumentException: Could not build the path "gs://regev-lab/resources/count_tools/ref_index.tsv". It may refer to a filesystem not supported by this instance of Cromwell. Supported filesystems are: HTTP, LinuxFileSystem.

Since it refers to the documentation in HPC backend of Cromwell, I suspect that caper treated my instance as HPC. However, I did run caper init gcp, and it worked properly using my other WDL workflows without read_map function.

The issue went away after I copied the file from GS to the instance, and modified my workflow to refer to the local link.

Do you have any suggestion on fixing this issue. I'm not sure if this issue is related to caper or Cromwell. But since my workflow was executed successfully on Broad Terra platform, which uses Cromwell as the engine and GCP as backends, I guess there is something I may not configure correctlly in caper, instead of Cromwell.

Any help would be appreciated. Thanks!

cannot run caper

Traceback (most recent call last):
File "/home/sxx128/caper/bin/caper", line 11, in
from caper.caper import main
File "/home/sxx128/caper/bin/../caper/caper.py", line 18, in
from pyhocon import ConfigFactory, HOCONConverter
ModuleNotFoundError: No module named 'pyhocon'

List module raises KeyError on Subworkflows

Just started using the tool and am pleased by the promise to have something I can use to coordinate runs across projects.

I'm using caper v0.6.2 and have noticed the list command doesn't handle the JSON structure for subworkflows. Because subworkflows don't have a 'submission' key returned by the API, it raises a KeyError. Are there any plans to handle subworkflows or at least skip them in the future?

Log:
Traceback (most recent call last):
File "/Users/awaldrop/anaconda3/bin/caper", line 13, in
main()
File "/Users/awaldrop/anaconda3/lib/python3.6/site-packages/caper/caper.py", line 1274, in main
c.list()
File "/Users/awaldrop/anaconda3/lib/python3.6/site-packages/caper/caper.py", line 482, in list
submission = w['submission']
KeyError: 'submission'

encode-dcc / caper Goto Github PK

caper's Introduction

Introduction

Installation for Google Cloud Platform and AWS

Installation for AWS

Installation for local computers and HPCs

Docker, Singularity and Conda

Running pipelines on HPCs

Restarting a pipeline on local machine (and HPCs)

DB connection timeout

Customize resource parameters on HPCs

DETAILS

caper's People

Contributors

Stargazers

Watchers

Forkers

caper's Issues

Describe the problem

OS/Platform

Caper configuration file

Input JSON file

Sbatch Submission Script

Troubleshooting result

Thanks,

Recommend Projects

Recommend Topics

Recommend Org