gagneurlab / drop Goto Github PK

View Code? Open in Web Editor NEW

128.0 128.0 43.0 112.34 MB

Pipeline to find aberrant events in RNA-Seq data, useful for diagnosis of rare disorders

License: MIT License

Python 48.90% R 45.78% Shell 5.31%

genetic-diagnosis pipeline rna-seq

drop's People

Contributors

Stargazers

Watchers

drop's Issues

Error running demo - TypeError in line 7

Hi,
I'm getting this error in the demo run:
check for missing R packages
create temporary files directory /gpfs/scratch/evrong01/droptest/.drop/tmp
TypeError in line 7 of /gpfs/scratch/evrong01/droptest/Snakefile:
replace() got an unexpected keyword argument 'regex'
File "/gpfs/scratch/evrong01/droptest/Snakefile", line 7, in
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/setupDrop.py", line 20, in setupDrop
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/configHelper.py", line 35, in parse
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/configHelper.py", line 186, in getSampleAnnotation

Error running demo dataset - .onLoad failed in loadNamespace() for 'shiny'

Hi,

When I am trying to run this pipeline with demo datasets I have this problem:

[Fri Jan 10 14:34:44 2020]
rule Scripts_Outrider_Summary_R:
    input: demo/Output/processed_results/aberrant_expression/v29/outrider/outrider/ods.Rds, demo/Output/processed_results/aberrant_expression/v29/outrider/outrider/OUTRIDER_results.tsv, Scripts/Outrider/Summary.R
    output: demo/Output/htmlOutput/AberrantExpression/Outrider/v29/Summary_outrider.html
    jobid: 9
    wildcards: annotation=v29, dataset=outrider

Loading required package: knitr
Loading required package: rmarkdown
[1] TRUE


processing file: /scratch_tmp/19940125/RtmpFkh6vQ/file5eb9168ca90c/Summary.spin.Rmd
  |....                                                                                                          |   4%
   inline R code fragments

  |........                                                                                                      |   8%
label: unnamed-chunk-1 (with options) 
List of 1
 $ echo: symbol F

  |.............                                                                                                 |  12%
  ordinary text without R code

  |.................                                                                                             |  15%
label: unnamed-chunk-2
  |.....................                                                                                         |  19%
   inline R code fragments

  |.........................                                                                                     |  23%
label: unnamed-chunk-3
  |..............................                                                                                |  27%
  ordinary text without R code

  |..................................                                                                            |  31%
label: unnamed-chunk-4
  |......................................                                                                        |  35%
  ordinary text without R code

  |..........................................                                                                    |  38%
label: countCorHeatmap (with options) 
List of 2
 $ fig.height: num 8
 $ fig.width : num 8

  |...............................................                                                               |  42%
  ordinary text without R code

  |...................................................                                                           |  46%
label: geneSampleHeatmap (with options) 
List of 2
 $ fig.height: num 12
 $ fig.width : num 6

  |.......................................................                                                       |  50%
  ordinary text without R code

  |...........................................................                                                   |  54%
label: unnamed-chunk-5
  |...............................................................                                               |  58%
label: BCV (with options) 
List of 2
 $ fig.height: num 5
 $ fig.width : num 6

  |....................................................................                                          |  62%
  ordinary text without R code

  |........................................................................                                      |  65%
label: unnamed-chunk-6
  |............................................................................                                  |  69%
  ordinary text without R code

  |................................................................................                              |  73%
label: unnamed-chunk-7
  |.....................................................................................                         |  77%
  ordinary text without R code

  |.........................................................................................                     |  81%
label: unnamed-chunk-8
  |.............................................................................................                 |  85%
  ordinary text without R code

  |.................................................................................................             |  88%
label: unnamed-chunk-9 (with options) 
List of 1
 $ echo: symbol F

**Quitting from lines 149-152 (/scratch_tmp/19940125/RtmpFkh6vQ/file5eb9168ca90c/Summary.spin.Rmd) 
Error: .onLoad failed in loadNamespace() for 'shiny', details:
  call: NULL
  error: invalid version specification '1,5'**
In addition: Warning messages:
1: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :
  span too small.   fewer data values than degrees of freedom.
2: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :
  pseudoinverse used at 0.47601
3: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :
  neighborhood radius 0.12605
4: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :
  reciprocal condition number  0
5: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric,  :
  There are other near singularities as well. 0.0096078
6: In predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)),  :
  span too small.   fewer data values than degrees of freedom.
7: In predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)),  :
  pseudoinverse used at 0.47601
8: In predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)),  :
  neighborhood radius 0.12605
9: In predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)),  :
  reciprocal condition number  0
10: In predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)),  :
  There are other near singularities as well. 0.0096078

Execution halted

VCF source

Hi,
What are the possible sources for the input VCF file? Does it have to be whole-genome sequencing VCF? Or can it be WES, or even RNA-seq-based VCF files?

Sample labels for batch correlation plots

How do I plot or find out the names of the sample names on the batch correlation plots?
It would be helpful to know which samples are clustering together into batches to understand which batches are different.

Error in aberrantExpression: Cannot detect whether 'file' is a GFF3 or GTF

I'm getting this error below. Please advise.

[Tue Jun 9 23:15:19 2020]
rule bam_stats:
input: UDP-1002_RNA.cram, resource/chr_UCSC_NCBI.txt
output: /droptest/root/processed_data/aberrant_expression/v32/coverage/UDP-1002_RNA.tsv
jobid: 78
wildcards: annotation=v32, sampleID=UDP-1002_RNA

Error in makeTxDbFromGFF(snakemake@input$gtf) :
Cannot detect whether 'file' is a GFF3 or GTF file. Please use the
'format' argument to specify the format ("gff3" or "gtf").
Execution halted
[Tue Jun 9 23:15:58 2020]
Error in rule Scripts_Counting_preprocessGeneAnnotation_R:
jobid: 8
output: /droptest/root/processed_data/aberrant_expression/v32/txdb.db, /droptest/root/processed_data/aberrant_expression/v32/count_ranges.Rds, /droptest/root/processed_data/aberrant_expression/v32/gene_name_mapping_v32.tsv

CRAM files

Does DROP accept CRAM files? Because I'm getting the below error:

Error in value[3L] :
failed to open BamFile: 'filename' is not a BAM file
file: /gpfs/data/mapping/UDP-1000/RNA/UDP-1001_RNA.cram
Calls: seqlevelsStyle ... tryCatch -> tryCatchList -> tryCatchOne ->
Execution halted
[Mon Jun 15 19:46:14 2020]
Error in rule Scripts_Counting_countReads_R:
jobid: 167
output: /gpfs/scratch/evrong01/droptest/root/processed_data/aberrant_expression/v32/counts/UDP-1001_RNA.Rds

Bugs in conda mae pipeline

Hi,
Here is the next bug in the mae pipeline I am getting. It looks much more like a bug, because gatk is getting an error due to an existing .dict file. This should not give an error, and the gatk command should be told to overwrite it if it exists.
Please let me know when it is fixed.

There are 3 other bugs in the conda mae pipeline that prevent it from working. But I think it will be easier to fix them one by one. I can post the next one after this is fixed.

[Mon Aug 3 22:51:08 2020]
rule create_dict:
input: /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.fa
output: /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.dict
jobid: 31

INFO 2020-08-03 22:51:54 CreateSequenceDictionary Output dictionary will be written in /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.dict
22:51:54.446 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/data/bin/drop_conda/share/gatk4-4.1.8.1-0/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Aug 03 22:51:54 EDT 2020] CreateSequenceDictionary --REFERENCE /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.fa --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Mon Aug 03 22:51:56 EDT 2020] Executing as evrong01@cn-0044 on Linux 3.10.0-693.17.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_192-b01; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.8.1
[Mon Aug 03 22:51:57 EDT 2020] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.05 minutes.
Runtime.totalMemory()=2667577344
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
picard.PicardException: /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.dict already exists. Delete this file and try again, or specify a different output file.
at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:220)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /gpfs/data/bin/drop_conda/share/gatk4-4.1.8.1-0/gatk-package-4.1.8.1-local.jar
Running:

BAM file alignment

Hi,
Does the alignment software need to be the same for all samples? For example, I would like to use GTEx samples that are only available as BAM files. However, the read aligner that GTEx used is different than the one I use. Is it ok to use my BAM files together with GTEx BAM files?

Also, GTEx aligned to GENCODE v26, while I aligned to GENCODE v32. So the transcriptome references used were different.

Same question for VCF input files.
Thanks.

minDeltaPsi default

The default config.yaml created by 'drop init' sets minDeltaPsi to 0. However your documentation says the default should be 0.05.

Which value should it be?

drop update requires to rerun the full pipeline

when running drop update the pipeline has to be rerun fully.

This is due to the fact that we rely on the files in or modules which are replaces with the update command. Maybe it would be better to have a rsync with just update only if change happend this will then ensure that we only have to run which has changed.

@vyepez88 @mumichae

how to reproduce it:

drop demo
drop update
snakemake --cores 10

# this should not do anything as everything should be up to date
drop update
snakemake --cores 10

Errors/warnings in running demo

Hi, I'm getting various errors/warnings in running the demo. See below.

/gpfs/share/apps/pandoc/2.2.3.2/bin/pandoc +RTS -K512m -RTS /tmp/Rtmp0annei/file5241f2818ed4f/Datasets.utf8.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash+smart --output /tmp/Rtmp0annei/file5241f11371564/Scripts_OUTRIDER_Datasets.html --email-obfuscation none --standalone --section-divs --table-of-contents --toc-depth 3 --variable toc_float=1 --variable toc_selectors=h1,h2,h3 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /gpfs/share/apps/R/3.6.3/lib64/R/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --css lib/add_content_table.css --css lib/leo_style.css --variable 'theme:bootstrap' --include-in-header /tmp/Rtmp0annei/rmarkdown-str5241f5299b0a7.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --lua-filter /gpfs/share/apps/R/3.6.3/lib64/R/library/rmarkdown/rmd/lua/pagebreak.lua --lua-filter /gpfs/share/apps/R/3.6.3/lib64/R/library/rmarkdown/rmd/lua/latex-div.lua --variable code_folding=hide --variable source_embed=Datasets.R --include-after-body /tmp/Rtmp0annei/file5241f340af0de.html --variable code_menu=1
[WARNING] Could not parse YAML metadata at line 1 column 1: :8:122: Unexpected '

[Mon May 18 11:47:58 2020]
rule markdown:
input: aberrant_expression_readme.md
output: /gpfs/scratch/evrong01/droptest/Output/htmlOutput/aberrant_expression_readme.html
jobid: 3
[WARNING] This document format requires a nonempty <title> element.
Please specify either 'title' or 'pagetitle' in the metadata.
Falling back to 'aberrant_expression_readme'

/gpfs/share/apps/pandoc/2.2.3.2/bin/pandoc +RTS -K512m -RTS /tmp/RtmpEilFDB/file52b51499c4c67/Datasets.utf8.md --to html4 --from markdown+autolink_bar
e_uris+tex_math_single_backslash+smart --output /tmp/RtmpEilFDB/file52b51182475e8/Scripts_Counting_Datasets.html --email-obfuscation none --standalone
--section-divs --table-of-contents --toc-depth 3 --variable toc_float=1 --variable toc_selectors=h1,h2,h3 --variable toc_collapsed=1 --variable toc_s
mooth_scroll=1 --variable toc_print=1 --template /gpfs/share/apps/R/3.6.3/lib64/R/library/rmarkdown/rmd/h/default.html --no-highlight --variable highl
ightjs=1 --css lib/add_content_table.css --css lib/leo_style.css --variable 'theme:bootstrap' --include-in-header /tmp/RtmpEilFDB/rmarkdown-str52b517a
25e6e6.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --lua-filter /gpfs/share/app
s/R/3.6.3/lib64/R/library/rmarkdown/rmd/lua/pagebreak.lua --lua-filter /gpfs/share/apps/R/3.6.3/lib64/R/library/rmarkdown/rmd/lua/latex-div.lua --vari
able code_folding=hide --variable source_embed=Datasets.R --include-after-body /tmp/RtmpEilFDB/file52b517b12ba71.html --variable code_menu=1
[WARNING] Could not parse YAML metadata at line 1 column 1: :9:121: Unexpected '

[Mon May 18 11:48:38 2020]
rule create_graph:
output: /gpfs/scratch/evrong01/droptest/Output/htmlOutput/AE_rulegraph.svg, /gpfs/scratch/evrong01/droptest/Output/htmlOutput/AE_rulegraph.png
jobid: 1

WARNING: Less than 30 IDs in DROP_GROUP outrider
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
Error: Directory cannot be locked. This usually means that another Snakemake instance is running on this directory. Another possibility is that a prev
ious run exited unexpectedly.
WARNING: Less than 30 IDs in DROP_GROUP outrider
Structuring dependencies...
Dependencies file generated.

Mon May 18 11:56:11 2020: Computing PCA ...
Mon May 18 11:56:11 2020: Fitting rho ...
Min. 1st Qu. Median Mean 3rd Qu. Max.
6.000e-08 6.000e-08 6.000e-08 2.712e-02 3.189e-02 1.162e-01
Mon May 18 11:56:13 2020: Writing final FRASER object ('/gpfs/scratch/evrong01/droptest/Output/processed_data/aberrant_splicing/datasets//savedObjects/fraser/fds-object.RDS').
Warning messages:
1: In injectOutliers(fds_copy, type = type, freq = injectFreq, minDpsi = minDeltaPsi, :
Injection-frequency is to low. Increasing it to 0.015 so we can inject at least 10 events into the data set!
2: In optimHyperParams(fds, type = type, implementation = implementation, :
No outliers could be injected so the hyperparameter optimization could not run. Possible reason: too few junctions in the data.
3: In injectOutliers(fds_copy, type = type, freq = injectFreq, minDpsi = minDeltaPsi, :
Injection-frequency is to low. Increasing it to 0.017 so we can inject at least 10 events into the data set!
4: In optimHyperParams(fds, type = type, implementation = implementation, :
No outliers could be injected so the hyperparameter optimization could not run. Possible reason: too few junctions in the data.
5: In injectOutliers(fds_copy, type = type, freq = injectFreq, minDpsi = minDeltaPsi, :
Injection-frequency is to low. Increasing it to 0.11 so we can inject at least 10 events into the data set!
Mon May 18 11:56:14 2020: Writing final FRASER object ('/gpfs/scratch/evrong01/droptest/Output/processed_data/aberrant_splicing/datasets//savedObjects/fraser/fds-object.RDS').

/gpfs/share/apps/pandoc/2.2.3.2/bin/pandoc +RTS -K512m -RTS /tmp/RtmpT6Hg3C/file60a823d069667/Summary.utf8.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash+smart --output /tmp/RtmpT6Hg3C/file60a821c21ec95/fraser_countSummary.html --email-obfuscation none --standalone --section-divs --table-of-contents --toc-depth 3 --variable toc_float=1 --variable toc_selectors=h1,h2,h3 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /gpfs/share/apps/R/3.6.3/lib64/R/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --css lib/add_content_table.css --css lib/leo_style.css --variable 'theme:bootstrap' --include-in-header /tmp/RtmpT6Hg3C/rmarkdown-str60a8261548510.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --lua-filter /gpfs/share/apps/R/3.6.3/lib64/R/library/rmarkdown/rmd/lua/pagebreak.lua --lua-filter /gpfs/share/apps/R/3.6.3/lib64/R/library/rmarkdown/rmd/lua/latex-div.lua --variable code_folding=hide --variable code_menu=1

Output created: /tmp/RtmpT6Hg3C/file60a821c21ec95/fraser_countSummary.html
Warning messages:
1: Transformation introduced infinite values in continuous y-axis
2: Removed 380 rows containing missing values (geom_bar).
3: Removed 75 rows containing non-finite values (stat_bin).
4: Transformation introduced infinite values in continuous y-axis
5: Removed 195 rows containing missing values (geom_bar).
[1] TRUE TRUE TRUE TRUE

Bus error

Hi,
I keep getting this error. I thought it was a memory issue, but I'm running with 70 Gb memory for 100 samples, so that should not be an issue.

*** caught segfault ***
address (nil), cause 'unknown'

Traceback:
1: doTryCatch(return(expr), name, parentenv, handler)
2: tryCatchOne(expr, names, parentenv, handlers[[1L]])
3: tryCatchList(expr, classes, parentenv, handlers)
4: tryCatch({ .Call(func, .extptr(file), regions, flag, simpleCigar, tagFilter, mapqFilter, ...)}, error = function(err) { stop(conditionMessage(err), "\n fil
e: ", path(file), "\n index: ", index(file))})
5: .io_bam(.scan_bamfile, file, reverseComplement, yieldSize(file), tmpl, obeyQname(file), asMates(file), qnamePrefix, qnameSuffix, param = param)
6: scanBam(file, param = param)
7: scanBam(file, param = param)
8: .load_bamcols_from_BamFile(file, param, what0, with.which_label = with.which_label)
9: readGAlignments(file, use.names = use.names, param = param2, with.which_label = with.which_label)
10: readGAlignments(file, use.names = use.names, param = param2, with.which_label = with.which_label)
11: FUN(bf, param = param, ...)
12: FUN(bf, param = param, ...)
13: .countWithYieldSize(FUN, features, bf, mode, ignore.strand, inter.feature, param, preprocess.reads, ...)
14: FUN(...)
15: doTryCatch(return(expr), name, parentenv, handler)
16: tryCatchOne(expr, names, parentenv, handlers[[1L]])
17: tryCatchList(expr, classes, parentenv, handlers)
18: tryCatch({ FUN(...)}, error = handle_error)
19: withCallingHandlers({ tryCatch({ FUN(...) }, error = handle_error)}, warning = handle_warning)
20: FUN(...)
21: FUN(X[[i]], ...)
22: lapply(X, FUN_, ...)
23: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = param)
24: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = param)
25: bplapply(setNames(seq_along(reads), names(reads)), function(i, FUN, reads, features, mode, ignore.strand, inter.feature, param, preprocess.reads, ...) { bf <- reads[[i]] .countWithYieldSize(FUN, features, bf, mode, ignore.strand, inter.feature, param, preprocess.reads, ...)}, FUN, reads, features, mode = match.fun(mode), ignore.strand = ignore.strand, inter.feature = inter.feature, param = param, preprocess.reads = preprocess.reads, ...)
26: bplapply(setNames(seq_along(reads), names(reads)), function(i, FUN, reads, features, mode, ignore.strand, inter.feature, param, preprocess.reads, ...) { bf <- reads[[i]] .countWithYieldSize(FUN, features, bf, mode, ignore.strand, inter.feature, param, preprocess.reads, ...)}, FUN, reads, features, mode = match.fun(mode), ignore.strand = ignore.strand, inter.feature = inter.feature, param = param, preprocess.reads = preprocess.reads, ...)
27: .dispatchBamFiles(features, BamFileList(reads), mode, ignore.strand, inter.feature = inter.feature, singleEnd = singleEnd, fragments = fragments, param = param, preprocess.reads = preprocess.reads, ...)
28: .local(features, reads, mode, ignore.strand, ...)
29: summarizeOverlaps(count_ranges, bam_file, mode = count_mode, singleEnd = !paired_end, ignore.strand = !strand_spec, fragments = F, count.mapped.reads = T, inter.feature = inter_feature, preprocess.reads = preprocess_reads, BPPARAM = MulticoreParam(snakemake@threads))
30: summarizeOverlaps(count_ranges, bam_file, mode = count_mode, singleEnd = !paired_end, ignore.strand = !strand_spec, fragments = F, count.mapped.reads = T, inter.feature = inter_feature, preprocess.reads = preprocess_reads, BPPARAM = MulticoreParam(snakemake@threads))
An irrecoverable exception occurred. R is aborting now ...
/usr/bin/bash: line 1: 423802 Segmentation fault Rscript --vanilla /gpfs/scratch/evrong01/droptest/.drop/modules/aberrant-expression-pipeline/.snakemake/scripts/tmpb18uoms6.countReads.R
[Wed Jun 10 15:35:40 2020]
Error in rule Scripts_Counting_countReads_R:
jobid: 138
output: /gpfs/scratch/evrong01/droptest/root/processed_data/aberrant_expression/v32/counts/GTEX-11ONC_RNA.Rds

/usr/bin/bash: line 1: 423809 Killed Rscript --vanilla /gpfs/scratch/evrong01/droptest/.drop/modules/aberrant-expression-pipeline/.snakemake/scripts/tmpgqqfg8w3.countReads.R
[Wed Jun 10 15:35:41 2020]
:

*** caught bus error ***
address 0x7ff6869431f0, cause 'non-existent physical address'

Too few IDs in DROP_GROUP... please ensure that it has at least 10 IDs

I'm comparing 100 GTEX samples to 3 families with 3 different rare diseases, each with 2, 3, and 3 samples (total 8 samples).

I gave the GTEX samples one DROP_GROUP, and the 8 unrelated samples from rare disease families a different DROP_GROUP.

However, I am getting this error:
Too few IDs in DROP_GROUP ... please ensure that it has at least 10 IDs

What is the best way to analyze a small set of rare disease families together with a large set of control samples? Should I instead put all the samples in 1 group? I guess I'm not sure how DROP actually works in terms of groups. Does DROP still find outliers even within groups? If so, what is the purpose of groups in the first place? Could I just give all samples the same group name?

Viewing plot of gene expression across all samples in aberrantExpression pipeline

Is there an easy way, or can you add to the output a way to see a plot of the gene expression of any specific gene across all samples, for all the genes that are outliers, with the outlier sample in a different color point and labeled?

Cannot unlock directory

Hi, Snakemake will not let me unlock the directory.
I am running snakemake --unlock, but I'm still getting this when I run the drop pipeline:

check for missing R packages
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
Unlocking working directory.
check for missing R packages
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
Executing subworkflow AE.
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
Error: Directory cannot be locked. Please make sure that no other Snakemake process is trying to create the same files in the following directory:
/gpfs/scratch/evrong01/droptest/.drop/modules/aberrant-expression-pipeline
If you are sure that no other instances of snakemake are running on this directory, the remaining lock was likely caused by a kill signal or a power loss. It can be removed with the --unlock argument.

Error: unsupported operand type(s) for +: 'NoneType' and 'str'

Hi,
I'm getting the below error:

[evrong01@bigpurple-ln3 drop100test]$ snakemake sampleAnnotation
check for missing R packages
TypeError in line 7 of /gpfs/scratch/evrong01/drop100test/Snakefile:
unsupported operand type(s) for +: 'NoneType' and 'str'
File "/gpfs/scratch/evrong01/drop100test/Snakefile", line 7, in
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/setupDrop.py", line 13, in setupDrop
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/configHelper.py", line 24, in init
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/configHelper.py", line 75, in createDirs
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/configHelper.py", line 285, in getProcDataDir

sampleID

In the aberrantExpression output, it labels samples with sampleIDs such as sample_10, sample_11, etc. But I don't see any of the sample IDs that I configured in the original samples.tsv file. How do I convert these sampleIDs? Can you change it so the output uses my sample IDs? Otherwise, I have no ability to analyze the data.

Pseudoexon outlier not detected

Hi,
I have a sample with a known pathogenic pseudoexon that is completely (100%) the only isoform. The pseudoexon is between two constitutive exons. So there should be 3 outliers detected by DROP:

Reduction in normal splicing between the flanking exons.
New splice site between upstream exon and the pseudoexon
New splice site between the downstream exon and the pseudoexon.

The results by junction table only detected # 1 and #3. But it did not detect #2. However there are many splice junction reads for #2 and there is no chance this outlier event is present in any other samples.

Can you help me figure out why #2 was not detected? This might help you improve your pipeline, because positive control cases like this where there is a very striking splice abnormality are useful.

COUNT_MODE and COUNT_OVERLAPS - explanation

Can you please further clarify the meanings of these annotation columns?

COUNT_MODE: either “Union”, “IntersectionStrict” or “IntersectionNotEmpty”.

COUNT_OVERLAPS: either TRUE or FALSE, depending on whether reads overlapping different regions are allowed and counted.

gatk bug in mae pipeline

Hi,
I got this error below. However, gatk is configured and works fine.

tools:
gatkCmd: /gpfs/data/bin/gatk/gatk
bcftoolsCmd: bcftools
samtoolsCmd: samtools

[evrong01@bigpurple-ln2 droptest]$ /gpfs/data/bin/gatk/gatk

Usage template for all tools (uses --spark-runner LOCAL when used with a Spark tool)
gatk AnyTool toolArgs

Usage template for Spark tools (will NOT work on non-Spark tools)
gatk SparkTool toolArgs [ -- --spark-runner <LOCAL | SPARK | GCS> sparkArgs ]

Getting help
gatk --list Print the list of available tools

gatk Tool --help  Print help on a particular tool

ERROR:
[Tue Jul 21 10:25:50 2020]
rule create_dict:
input: /gpfs/data/evronylab/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.fa
output: /gpfs/data/evronylab/reference-files/GRCh38_gencode-STAR/GRCh38.dict
jobid: 31

/usr/bin/bash: gatk: command not found
[Tue Jul 21 10:25:50 2020]
Error in rule create_dict:
jobid: 31
output: /gpfs/data/evronylab/reference-files/GRCh38_gencode-STAR/GRCh38.dict
shell:
gatk CreateSequenceDictionary --REFERENCE /gpfs/data/evronylab/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.fa
(exited with non-zero exit code)

Using python3

reposting misago's issue:

Hi,

Although I have found the problem, now I have another error with snakemake:

I'm using a CentOS-7. This version of centos still have Python-2.7. So, If in this operative system you execute only 'python', you are working with python-2.7. To work with python-3 you have to run the command python3:

Singularity> cat /etc/centos-release
CentOS Linux release 7.7.1908 (Core)
Singularity> python --version
Python 2.7.5
Singularity> python3 --version
Python 3.6.8
Singularity>

So, In the script drop/download_data.sh, to be sure that I'm executing python-3 (that is the python where Drop in installed), I have had to modify the line:

python fix_sample_anno.py

by:

python3 fix_sample_anno.py

Then, reinstall Drop and then the command 'drop demo' woks fine.

The new error is with snamake:

Singularity> snakemake -n
check for missing R packages
WARNING: Less than 30 IDs in DROP_GROUP outrider
WARNING: Less than 30 IDs in DROP_GROUP fraser
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
Executing subworkflow AE.
Error: Snakefile "/root/drop-demo/.drop/modules/aberrant-expression-pipeline/Snakefile" not found.
Singularity>

The content of the .drop directory in the demo project directory is:

Singularity> ls -l .drop
total 8
drwxr-xr-x 3 root root 4096 mar 26 17:05 modules
drwxr-xr-x 5 root root 4096 mar 26 17:07 tmp
Singularity> ls -l .drop/modules/
total 4
drwxr-xr-x 2 root root 4096 mar 26 17:05 helpers
Singularity>

Any idea? Thanks a lot for your time.

qcgroups

Hi,
'drop init' sets the default for qcGroups to mae. Why is that? I'm confused what that means.

Sample annotation columns

Hi,
There is some inconsistency in sample annotation columns listed in the protocol here: https://protocolexchange.researchsquare.com/article/pex-787/v1

versus the Documentation site

versus the example table here: https://github.com/gagneurlab/drop/blob/d30fe27368b9fe0226863dc0d0486a8d2e97ebb3/manuscript/TableS1_sample_annotation.tsv

Each of these has a different set of columns, so I'm not sure which are necessary and which are optional. For example, STRAND and SEX are only listed in some of them. It isn't clear what phase1TG and RNA_exists are, etc.

Can you please clarify?
Thanks.

Different STRAND in the same analysis

Can samples with different STRAND values be used together in the same analysis?

qc_vcf_1000G.vcf.gz file

Hi, The documentation says to download this, but I can't find it.
Thanks.

Rerun snakemake pipeline even if already run

Is there a way to force rerun of all the jobs of a pipeline even though it was already run?

Can't unlock directory

I did snakemake --unlock, but I am still getting this error:

check for missing R packages
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
Executing subworkflow AE.
Structuring dependencies...
Dependencies file generated.

MissingInputException

Hi, I'm getting these two errors when running snakemake unlock; snakemake aberrantSplicing:

Building DAG of jobs...
MissingInputException in line 62 of /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule create_SNVs:
/gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt
/gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt
/gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
[Mon Jul 6 22:31:57 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest/.drop/tmp/config.yaml
(exited with non-zero exit code)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

RNA_ID and DNA_ID

Can these two be the same? Because it would be simpler to have the same ID for both of these when it is the same sample.

Reads with ambiguous pairing

I'm getting these warning messages. What does it mean? Is this an issue?

counting
Warning messages:
1: In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, :
11612496 alignments with ambiguous pairing were dumped.
Use 'getDumpedAlignments()' to retrieve them from the dump environment.
2: In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, :
11046444 alignments with ambiguous pairing were dumped.
Use 'getDumpedAlignments()' to retrieve them from the dump environment.

Error running demo project

Hello

I'm installing the drop software in a singularity container with Centos7. I have already installed all the dependencies and the drop software as is explained in the Readme file. When I execute the drop init command, everything works fine but when I run the demo command it seems that something is not working fine. Can you help me, please? Thanks:

Singularity> cd drop-test/
Singularity> drop init
init...done
Singularity> drop demo
overwriting module scripts
resource/dna_vcf/ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.exome.vcf.gz
resource/rna_bam/HG00176.4.M_120208_2_chr21.bam_trunc.bam
resource/qc_vcf_1000G.vcf.gz
resource/rna_bam/HG00103.4.M_120208_3_chr21.bam.bai
resource/qc_vcf_1000G.vcf.gz.tbi
resource/rna_bam/HG00116.2.M_120131_1_chr21.bam
resource/rna_bam/HG00149.1.M_111124_6_chr21.bam
resource/rna_bam/HG00132.2.M_111215_4_chr21.bam
resource/dna_vcf/ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.exome.vcf.gz.tbi
resource/rna_bam/HG00126.1.M_111124_8_chr21.bam_trunc.bam
resource/rna_bam/HG00150.4.M_120208_7_chr21.bam_trunc.bam
resource/sample_annotation_relative.tsv
resource/rna_bam/HG00111.2.M_111215_4_chr21.bam_trunc.bam
resource/rna_bam/HG00106.4.M_120208_5_chr21.bam_trunc.bam
resource/rna_bam/HG00096.1.M_111124_6_chr21.bam_trunc.bam
resource/rna_bam/HG00126.1.M_111124_8_chr21.bam_trunc.bam.bai
resource/rna_bam/HG00150.4.M_120208_7_chr21.bam_trunc.bam.bai
resource/rna_bam/HG00116.2.M_120131_1_chr21.bam.bai
resource/rna_bam/HG00149.1.M_111124_6_chr21.bam.bai
resource/rna_bam/HG00176.4.M_120208_2_chr21.bam_trunc.bam.bai
resource/chr21.fa.gz
resource/rna_bam/HG00106.4.M_120208_5_chr21.bam_trunc.bam.bai
resource/rna_bam/HG00176.4.M_120208_2_chr21.bam
resource/config.yaml
resource/rna_bam/HG00111.2.M_111215_4_chr21.bam_trunc.bam.bai
resource/rna_bam/HG00126.1.M_111124_8_chr21.bam.bai
resource/rna_bam/HG00111.2.M_111215_4_chr21.bam
resource/rna_bam/HG00103.4.M_120208_3_chr21.bam
resource/rna_bam/HG00126.1.M_111124_8_chr21.bam
resource/rna_bam/HG00132.2.M_111215_4_chr21.bam_trunc.bam.bai
resource/rna_bam/HG00116.2.M_120131_1_chr21.bam_trunc.bam.bai
resource/rna_bam/HG00176.4.M_120208_2_chr21.bam.bai
resource/rna_bam/HG00096.1.M_111124_6_chr21.bam_trunc.bam.bai
resource/hpo_genes.tsv.gz
resource/rna_bam/
resource/rna_bam/HG00096.1.M_111124_6_chr21.bam.bai
resource/rna_bam/HG00106.4.M_120208_5_chr21.bam.bai
resource/rna_bam/HG00150.4.M_120208_7_chr21.bam.bai
resource/gencode_annotation_trunc.gtf
resource/rna_bam/HG00111.2.M_111215_4_chr21.bam.bai
resource/dna_vcf/
resource/rna_bam/HG00149.1.M_111124_6_chr21.bam_trunc.bam
resource/rna_bam/HG00132.2.M_111215_4_chr21.bam_trunc.bam
resource/rna_bam/HG00132.2.M_111215_4_chr21.bam.bai
resource/
resource/config_relative.yaml
resource/rna_bam/HG00103.4.M_120208_3_chr21.bam_trunc.bam
resource/rna_bam/HG00096.1.M_111124_6_chr21.bam
resource/rna_bam/HG00103.4.M_120208_3_chr21.bam_trunc.bam.bai
resource/rna_bam/HG00150.4.M_120208_7_chr21.bam
resource/rna_bam/HG00149.1.M_111124_6_chr21.bam_trunc.bam.bai
resource/rna_bam/HG00106.4.M_120208_5_chr21.bam
resource/fix_sample_anno.py
resource/rna_bam/HG00116.2.M_120131_1_chr21.bam_trunc.bam
Traceback (most recent call last):
File "/usr/local/bin/drop", line 11, in
load_entry_point('drop==0.9.0', 'console_scripts', 'drop')()
File "/usr/local/lib64/python3.6/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib64/python3.6/site-packages/click/core.py", line 1259, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib64/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib64/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/drop/cli.py", line 85, in demo
dict[key] = str(pathlib.Path(dict_[key]).resolve())
File "/usr/lib64/python3.6/pathlib.py", line 1001, in new
self = cls._from_parts(args, init=False)
File "/usr/lib64/python3.6/pathlib.py", line 656, in _from_parts
drv, root, parts = self._parse_args(args)
File "/usr/lib64/python3.6/pathlib.py", line 640, in _parse_args
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
Singularity>

n+1 runs

If I have a new RNA-seq sample, is there any way to add it to an existing analysis set so that the pipeline only has to redo the jobs that require that new sample, without having to rerun the entire pipeline from the beginning for all the samples?

BSgenome.Hsapiens.UCSC.hg38 not found

I'm getting this error. I will install it manually, but you should add this to the automatic package checks and installs.

BiocParallel errors

Hi, I'm getting this strange error in aberrantsplicing. These same samples ran fine for aberrantexpression.

Thu Jul 9 19:11:43 2020: Count split reads for sample: UDP-1203_RNA
Error: BiocParallel errors
element index: 26
first error: sequence GL000008.2 not found
Execution halted
[Thu Jul 9 19:21:26 2020]
Error in rule Scripts_Counting_01_1_countRNA_splitReads_samplewise_R:
jobid: 94
output: /gpfs/scratch/evrong01/droptest/root/processed_data/aberrant_splicing/datasets/cache/raw-GTEX100/sample_tmp/splitCounts/sample_UDP-1203_RNA.done

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest/.drop/modules/aberrant-splicing-pipeline/.snakemake/log/2020-07-09T174827.943245.snakemake.log

aberrantbygene

What is the meaning of this column in the aberrantExpression results?

Unlock issues

Hi,
Unlock issues are still happening.

Snakemake unlock gives this error:
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
MissingInputException in line 38 of /gpfs/home/evrong01/.local/lib/python3.6/site-packages/wbuild/wBuild.snakefile:
Missing input files for rule markdown:
MAE/UDP--v32_results.md
[Mon Jul 20 23:43:01 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest/.drop/tmp/config.yaml
(exited with non-zero exit code)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/.snakemake/log/2020-07-20T234300.144071.snakemake.log

snakemake --unlock works without error:
Structuring dependencies...
Dependencies file generated.

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/.snakemake/log/2020-07-20T234300.144071.snakemake.log

But then running the mae command after the above gives this error:
check for missing R packages
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
Subworkflow AE: Nothing to be done.
Subworkflow AS: Nothing to be done.
Executing subworkflow MAE.
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
Error: Directory cannot be locked. Please make sure that no other Snakemake process is trying to create the same files in the following directory:
/gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline
If you are sure that no other instances of snakemake are running on this directory, the remaining lock was likely caused by a kill signal or a power loss. It can be removed with the --unlock argument.

I strongly suggest to change the pipeline so that these unlock issues don't happen. I got it to work again by deleting this directory: /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline

And then doing drop update.

Multiple bugs in conda drop - mae pipeline

Hi,
The 'mae' pipeline still has multiple bugs. I thought the conda environment was tested and working.

See below. Can you please fix these?

[Sun Aug 2 12:24:43 2020]
rule create_SNVs:
input: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt, /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt, /gpfs/data/variants/UDP-1000/WES.SNVINDEL/UDP-1002_exome.vcf.gz, /gpfs/scratch/evrong01/UDP-RNA_seq/UDP-1002_RNA.bam, /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
output: /gpfs/scratch/evrong01/droptest2/root/processed_data/mae/snvs/1841999--UDP-1002_RNA.vcf.gz, /gpfs/scratch/evrong01/droptest2/root/processed_data/mae/snvs/1841999--UDP-1002_RNA.vcf.gz.tbi
jobid: 32
wildcards: vcf=1841999, rna=UDP-1002_RNA

samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

[Sun Aug 2 12:24:53 2020]
rule Scripts_MAE_gene_name_mapping_R:
input: /gpfs/data/reference-files/GRCh38_gencode-STAR/gencode.v32.primary_assembly.annotation.gtf, Scripts/MAE/ge
ne_name_mapping.R
output: /gpfs/scratch/evrong01/droptest2/root/processed_data/mae/gene_name_mapping_v32.tsv
jobid: 14
wildcards: annotation=v32

INFO 2020-08-02 12:24:55 CreateSequenceDictionary Output dictionary will be written in /gpfs/data/refer
ence-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.dict
12:24:55.253 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/data/bin/drop_conda/share/
gatk4-4.1.8.1-0/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Sun Aug 02 12:24:55 EDT 2020] CreateSequenceDictionary --REFERENCE /gpfs/data/reference-files/GRCh38_gencode-STAR/GR
Ch38.primary_assembly.genome.fa --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false -
-VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --
GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INF
LATER false
[Sun Aug 02 12:24:55 EDT 2020] Executing as evrong01@cn-0020 on Linux 3.10.0-693.17.1.el7.x86_64 amd64; OpenJDK 64-Bit Server V
M 1.8.0_192-b01; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.8.1
[Sun Aug 02 12:24:55 EDT 2020] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2326265856
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
picard.PicardException: /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.dict already ex
ists. Delete this file and try again, or specify a different output file.
at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:220)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor
:
.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /gpfs/data/bin/drop_conda/share/gatk4-4.1.8.1-0/gatk-package-4.1.8.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gpfs/data/bin/drop_conda/share/gatk4-4.1.8.1-0/gatk-package-4.1.8.1-local.jar CreateSequenceDictionary --REFERENCE /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.fa
[Sun Aug 2 12:24:56 2020]
Error in rule create_dict:
jobid: 31
output: /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.dict
shell:
gatk CreateSequenceDictionary --REFERENCE /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.fa
(exited with non-zero exit code)

Error in eval(jsub, SDenv, parent.frame()) :
object 'gene_status' not found
Calls: [ -> [.data.table -> eval -> eval
Execution halted
[Sun Aug 2 12:25:53 2020]
Error in rule Scripts_MAE_gene_name_mapping_R:
jobid: 14
output: /gpfs/scratch/evrong01/droptest2/root/processed_data/mae/gene_name_mapping_v32.tsv

mae pipeline not usable - too slow

Hi,
The MAE pipeline is running very slowly. It is not usable. See below for timestamps. Each sample is taking several hours.

check for missing R packages
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
Subworkflow AE: Nothing to be done.
Subworkflow AS: Nothing to be done.
Executing subworkflow MAE.
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 Index
1 Scripts_MAE_Datasets_R
1 Scripts_MAE_Results_R
107 Scripts_MAE_deseq_mae_R
1 Scripts_MAE_gene_name_mapping_R
1 Scripts_QC_DNA_RNA_matrix_plot_R
1 Scripts_QC_Datasets_R
1 Scripts_QC_create_matrix_dna_rna_cor_R
107 Scripts_QC_deseq_qc_R
1 all
107 allelic_counts
107 allelic_counts_qc
107 create_SNVs
1 create_dict
1 markdown
545

[Tue Jul 14 00:01:54 2020]
rule create_SNVs:
input: /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt, /gpfs/scratch/evrong01/GTEX/VCF/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.vcf.gz, /gpfs/scratch/evrong01/GTEX/BAM/GTEX-11TTK-0005-SM-5O9BX.Aligned.sortedByCoord.out.patched.md.bam, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
output: /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-11TTK--GTEX-11TTK_RNA.vcf.gz, /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-11TTK--GTEX-11TTK_RNA.vcf.gz.tbi
jobid: 363
wildcards: vcf=GTEX-11TTK, rna=GTEX-11TTK_RNA

[Tue Jul 14 00:01:54 2020]
rule create_SNVs:
input: /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt, /gpfs/scratch/evrong01/GTEX/VCF/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.vcf.gz, /gpfs/scratch/evrong01/GTEX/BAM/GTEX-131XE-0006-SM-5P9F9.Aligned.sortedByCoord.out.patched.md.bam, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
output: /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-131XE--GTEX-131XE_RNA.vcf.gz, /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-131XE--GTEX-131XE_RNA.vcf.gz.tbi
jobid: 400
wildcards: vcf=GTEX-131XE, rna=GTEX-131XE_RNA

[Tue Jul 14 16:49:45 2020]
Finished job 363.
1 of 545 steps (0.18%) done

[Tue Jul 14 16:49:45 2020]
rule create_SNVs:
input: /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt, /gpfs/scratch/evrong01/GTEX/VCF/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.vcf.gz, /gpfs/scratch/evrong01/GTEX/BAM/GTEX-11OF3-0006-SM-5O9CM.Aligned.sortedByCoord.out.patched.md.bam, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
output: /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-11OF3--GTEX-11OF3_RNA.vcf.gz, /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-11OF3--GTEX-11OF3_RNA.vcf.gz.tbi
jobid: 356
wildcards: vcf=GTEX-11OF3, rna=GTEX-11OF3_RNA

[Tue Jul 14 17:00:45 2020]
Finished job 400.
2 of 545 steps (0.37%) done

[Tue Jul 14 17:00:45 2020]
rule create_SNVs:
input: /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt, /gpfs/scratch/evrong01/GTEX/VCF/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.vcf.gz, /gpfs/scratch/evrong01/GTEX/BAM/GTEX-11EMC-0006-SM-5O9DN.Aligned.sortedByCoord.out.patched.md.bam, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
output: /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-11EMC--GTEX-11EMC_RNA.vcf.gz, /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-11EMC--GTEX-11EMC_RNA.vcf.gz.tbi
jobid: 345
wildcards: vcf=GTEX-11EMC, rna=GTEX-11EMC_RNA

[Wed Jul 15 09:15:21 2020]
Finished job 356.
3 of 545 steps (0.55%) done

[Wed Jul 15 09:15:21 2020]
rule create_SNVs:
input: /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt, /gpfs/scratch/evrong01/GTEX/VCF/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.vcf.gz, /gpfs/scratch/evrong01/GTEX/BAM/GTEX-11ONC-0005-SM-5O9CY.Aligned.sortedByCoord.out.patched.md.bam, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
output: /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-11ONC--GTEX-11ONC_RNA.vcf.gz, /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-11ONC--GTEX-11ONC_RNA.vcf.gz.tbi
jobid: 357
wildcards: vcf=GTEX-11ONC, rna=GTEX-11ONC_RNA

[Wed Jul 15 09:26:07 2020]
Finished job 345.
4 of 545 steps (0.73%) done

[Wed Jul 15 09:26:07 2020]
rule create_SNVs:
input: /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt, /gpfs/scratch/evrong01/GTEX/VCF/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.vcf.gz, /gpfs/scratch/evrong01/GTEX/BAM/GTEX-139TT-0006-SM-5O9CG.Aligned.sortedByCoord.out.patched.md.bam, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
output: /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-139TT--GTEX-139TT_RNA.vcf.gz, /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-139TT--GTEX-139TT_RNA.vcf.gz.tbi
jobid: 417
wildcards: vcf=GTEX-139TT, rna=GTEX-139TT_RNA

[Thu Jul 16 02:04:43 2020]
Finished job 357.
5 of 545 steps (0.92%) done

[Thu Jul 16 02:04:43 2020]
rule create_SNVs:
input: /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt, /gpfs/scratch/evrong01/GTEX/VCF/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.vcf.gz, /gpfs/scratch/evrong01/GTEX/BAM/GTEX-1128S-0005-SM-5P9HI.Aligned.sortedByCoord.out.patched.md.bam, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
output: /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-1128S--GTEX-1128S_RNA.vcf.gz, /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-1128S--GTEX-1128S_RNA.vcf.gz.tbi
jobid: 333
wildcards: vcf=GTEX-1128S, rna=GTEX-1128S_RNA

[Thu Jul 16 02:15:35 2020]
Finished job 417.
6 of 545 steps (1%) done

[Thu Jul 16 02:15:35 2020]
rule create_SNVs:
input: /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt, /gpfs/scratch/evrong01/GTEX/VCF/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.vcf.gz, /gpfs/scratch/evrong01/GTEX/BAM/GTEX-1314G-0005-SM-5NQ9O.Aligned.sortedByCoord.out.patched.md.bam, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
output: /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-1314G--GTEX-1314G_RNA.vcf.gz, /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-1314G--GTEX-1314G_RNA.vcf.gz.tbi
jobid: 399
wildcards: vcf=GTEX-1314G, rna=GTEX-1314G_RNA

[Thu Jul 16 18:29:34 2020]
Finished job 333.
7 of 545 steps (1%) done

[Thu Jul 16 18:29:34 2020]
rule create_SNVs:
input: /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt, /gpfs/scratch/evrong01/GTEX/VCF/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.vcf.gz, /gpfs/scratch/evrong01/GTEX/BAM/GTEX-12C56-0006-SM-5N9E9.Aligned.sortedByCoord.out.patched.md.bam, /gpfs/scratch/evrong01/droptest/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
output: /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-12C56--GTEX-12C56_RNA.vcf.gz, /gpfs/scratch/evrong01/droptest/root/processed_data/mae/snvs/GTEX-12C56--GTEX-12C56_RNA.vcf.gz.tbi
jobid: 381
wildcards: vcf=GTEX-12C56, rna=GTEX-12C56_RNA

[Thu Jul 16 18:40:07 2020]
Finished job 399.
8 of 545 steps (1%) done

Miscellaneous warnings

I'm getting a few various warnings for snakemake aberrantExpression.
Please advise.

Warning 1:
output file: /tmp/RtmpqpfsOK/file47f182f1fc6fa/Datasets.knit.md

/gpfs/share/apps/pandoc/2.2.3.2/bin/pandoc +RTS -K512m -RTS /tmp/RtmpqpfsOK/file47f182f1fc6fa/Datasets.utf8.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /tmp/RtmpqpfsOK/file47f18539284e
5/Scripts_Counting_Datasets.html --email-obfuscation none --standalone --section-divs --table-of-contents --toc-depth 3 --variable toc_float=1 --variable toc_selectors=h1,h2,h3 --variable toc_collapsed=1 --variable toc_smoot
h_scroll=1 --variable toc_print=1 --template /gpfs/data/bin/Gilad/R/libs/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --css lib/add_content_table.css --css lib/leo_style.css --variable 'them
e:bootstrap' --include-in-header /tmp/RtmpqpfsOK/rmarkdown-str47f187a527639.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --lua-filter /gpfs/data/
/bin/Gilad/R/libs/rmarkdown/rmd/lua/pagebreak.lua --lua-filter /gpfs/data/bin/Gilad/R/libs/rmarkdown/rmd/lua/latex-div.lua --variable code_folding=hide --variable source_embed=Datasets.R --include-after-body /tmp/R
tmpqpfsOK/file47f187c2684bf.html --variable code_menu=1
[WARNING] Could not parse YAML metadata at line 1 column 1: :9:121: Unexpected '
'
[WARNING] This document format requires a nonempty <title> element.
Please specify either 'title' or 'pagetitle' in the metadata.
Falling back to 'Datasets.utf8'

Output created: /tmp/RtmpqpfsOK/file47f18539284e5/Scripts_Counting_Datasets.html

Warning 2: Note: I have X11 configured. I'm not sure why this is saying this.
Output created: /tmp/RtmpY461CM/file480162435d245/Summary_GTEX100.html
Warning messages:
1: In grDevices::png(f) : unable to open connection to X11 display ''
2: Transformation introduced infinite values in continuous y-axis
3: Removed 62 rows containing missing values (geom_bar).
[1] TRUE TRUE TRUE TRUE
[Wed Jun 24 10:45:12 2020]
Finished job 9.
101 of 104 steps (97%) done
Warning 3:
Warning message:
In OutriderDataSet(counts) :
No sampleID was specified. We will generate a generic one.
47762 genes did not pass the filter. This is 78.8% of the genes.
[Wed Jun 24 07:25:52 2020]
Finished job 10.
96 of 104 steps (92%) done
Warning 4:
Output created: /tmp/Rtmp6qybWF/file3d75e19e5acf9/Summary_GTEX100.html
Warning messages:
1: In grDevices::png(f) : unable to open connection to X11 display ''
2: Transformation introduced infinite values in continuous x-axis
3: Removed 14236 rows containing non-finite values (stat_bin).
4: Transformation introduced infinite values in continuous x-axis
5: Removed 14236 rows containing non-finite values (stat_density).
Warning 5:
/gpfs/share/apps/pandoc/2.2.3.2/bin/pandoc +RTS -K512m -RTS /tmp/RtmpT3Evl2/file4812b3d242953/Datasets.utf8.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /tmp/RtmpT3Evl2/file4812b61d64465/Scripts_OUTRIDER_Datasets.html --email-obfuscation none --standalone --section-divs --table-of-contents --toc-depth 3 --variable toc_float=1 --variable toc_selectors=h1,h2,h3 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /gpfs/data/evronylab/bin/Gilad/R/libs/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --css lib/add_content_table.css --css lib/leo_style.css --variable 'theme:bootstrap' --include-in-header /tmp/RtmpT3Evl2/rmarkdown-str4812b133c5d3.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --lua-filter /gpfs/data/evronylab/bin/Gilad/R/libs/rmarkdown/rmd/lua/pagebreak.lua --lua-filter /gpfs/data/evronylab/bin/Gilad/R/libs/rmarkdown/rmd/lua/latex-div.lua --variable code_folding=hide --variable source_embed=Datasets.R --include-after-body /tmp/RtmpT3Evl2/file4812b18abd38e.html --variable code_menu=1
[WARNING] Could not parse YAML metadata at line 1 column 1: :8:122: Unexpected '
'
[WARNING] This document format requires a nonempty <title> element.
Please specify either 'title' or 'pagetitle' in the metadata.
Falling back to 'Datasets.utf8'

Output created: /tmp/RtmpT3Evl2/file4812b61d64465/Scripts_OUTRIDER_Datasets.html
Warning message:
In grDevices::png(f) : unable to open connection to X11 display ''

addAF default

Hi,
The default in documentation for addAF is true, but drop init sets this to false. What should it be?

drop demo problem in new version

Hi, The new drop demo conda package runs better than before, but it still crashed with the below error:

output file: /tmp/Rtmp60vPDo/file57c0f73dd6e64/Datasets.knit.md

/gpfs/home/evrong01/evronylab/bin/drop-conda/bin/pandoc +RTS -K512m -RTS /tmp/Rtmp60vPDo/file57c0f73dd6e64/Datasets.utf8.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /tmp/Rtmp60vPDo/file57c0f1f227cf5/Scripts_FRASER_Datasets.html --email-obfuscation none --standalone --section-divs --table-of-contents --toc-depth 3 --variable toc_float=1 --variable toc_selectors=h1,h2,h3 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /gpfs/data/evronylab/bin/drop-conda/lib/R/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --css lib/add_content_table.css --css lib/leo_style.css --variable 'theme:bootstrap' --include-in-header /tmp/Rtmp60vPDo/rmarkdown-str57c0f7e73c2f4.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --lua-filter /gpfs/data/evronylab/bin/drop-conda/lib/R/library/rmarkdown/rmd/lua/pagebreak.lua --lua-filter /gpfs/data/evronylab/bin/drop-conda/lib/R/library/rmarkdown/rmd/lua/latex-div.lua --variable code_folding=hide --variable code_menu=1

Output created: /tmp/Rtmp60vPDo/file57c0f1f227cf5/Scripts_FRASER_Datasets.html
[1] TRUE TRUE TRUE
[Thu Oct 1 14:27:44 2020]
Finished job 6.
35 of 39 steps (90%) done

[Thu Oct 1 14:27:44 2020]
rule Index:
input: /gpfs/scratch/evrong01/dropdemo/Output/html/Scripts_Counting_DatasetsF.html, /gpfs/scratch/evrong01/dropdemo/Output/html/_gpfs_scratch_evrong01_dropdemo_readme.html, /gpfs/scratch/evrong01/dropdemo/Output/html/Scripts_FRASER_Datasets.html, /gpfs/scratch/evrong01/dropdemo/Output/html/_gpfs_scratch_evrong01_dropdemo_readme.html
output: /gpfs/scratch/evrong01/dropdemo/Output/html/aberrant-splicing-pipeline_index.html
jobid: 2

MissingInputException in line 158 of /tmp/tmprb45x8gw:
Missing input files for rule Index:
/gpfs/scratch/evrong01/dropdemo/Output/html/aberrant_splicing_readme.html
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/dropdemo/.drop/modules/aberrant-splicing-pipeline/.snakemake/log/2020-10-01T141706.964755.snakemake.log

KeyError in new version of DROP

I'm getting this error at the beginning of the pipeline:

KeyError in line 7 of /gpfs/scratch/evrong01/droptest/Snakefile:
'geneAnnotations'
File "/gpfs/scratch/evrong01/droptest/Snakefile", line 7, in
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/setupDrop.py", line 13, in setupDrop
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/configHelper.py", line 24, in init
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/configHelper.py", line 114, in setDefaults
check for missing R packages
KeyError in line 7 of /gpfs/scratch/evrong01/droptest/Snakefile:
'geneAnnotations'
File "/gpfs/scratch/evrong01/droptest/Snakefile", line 7, in
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/setupDrop.py", line 13, in setupDrop
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/configHelper.py", line 24, in init
File "/gpfs/home/evrong01/.local/lib/python3.6/site-packages/drop/configHelper.py", line 114, in setDefaults

Could not connect to ENSEMBL

Hi, I'm getting the below error:

Loading assay: pvaluesBetaBinomial_psi3
Loading assay: padjBetaBinomial_psi3
Loading assay: rawCountsSS
Loading assay: psiSite
Loading assay: rawOtherCounts_psiSite
Loading assay: delta_psiSite
Loading assay: predictedMeans_psiSite
Loading assay: zScores_psiSite
Loading assay: pvaluesBetaBinomial_psiSite
Loading assay: padjBetaBinomial_psiSite
Ensembl site unresponsive, trying uswest mirror

Check if we have a internet connection! Could not connect to ENSEMBL.
Nothing was annotated!
Mon Jul 13 01:50:55 2020: Collecting results for: psi3
Mon Jul 13 01:51:05 2020: Process chunk: 1 for: psi3
Error: BiocParallel errors
element index: 1, 2, 3
first error: subscript contains invalid names
Execution halted
[Mon Jul 13 01:51:12 2020]
Error in rule Scripts_FRASER_07_extract_results_FraseR_R:
jobid: 9
output: /gpfs/scratch/evrong01/droptest/root/processed_data/aberrant_splicing/results/GTEX100_results_per_junction.tsv, /gpfs/scratch/evrong01/droptest/root/processed_data/aberrant_splicing/results/GTEX100_results.tsv

drop demo not working

Hi, The command 'drop demo' in the latest conda version of drop is giving this error:

resource/dna_vcf/ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.exome.vcf.gz.tbi
resource/dna_vcf/demo_chr21.vcf.gz.tbi
resource/dna_vcf/demo_chr21.vcf.gz
resource/dna_vcf/ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.exome.vcf.gz
resource/external_geneCounts.tsv.gz
resource/hpo_genes.tsv.gz
Traceback (most recent call last):
File "/gpfs/home/evrong01/evronylab/bin/drop-conda/bin/drop", line 10, in
sys.exit(main())
File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/gpfs/home/evrong01/evronylab/bin/drop-conda/lib/python3.6/site-packages/drop/cli.py", line 87, in demo
dict[key] = str(pathlib.Path(dict_[key]).resolve())
File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/pathlib-1.0.1-py3.6.egg/pathlib.py", line 1034, in resolve

File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/pathlib-1.0.1-py3.6.egg/pathlib.py", line 320, in resolve
if newpath in seen:
File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/pathlib-1.0.1-py3.6.egg/pathlib.py", line 305, in _resolve
accessor = path._accessor
File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/site-packages/pathlib-1.0.1-py3.6.egg/pathlib.py", line 398, in readlink
lstat = _wrap_strfunc(os.lstat)
FileNotFoundError: [Errno 2] No such file or directory: '/gpfs/scratch/evrong01/dropdemo/Output'

Input at gene count levels

Hi, Saving all the control BAM files for input into DROP takes a lot of space. After I run DROP the first time, is there any kind of intermediate file for each sample (gene and exon/intron counts for example) that I can save instead for next time I want to run those samples, without having to save the original BAM files?

STRAND annotation

Hi,
What is the correct direction for STRAND for Illumina Truseq stranded libraries?

No rule to produce MAE

Hi, I'm getting this error for the MAE pipeline.

check for missing R packages
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
MissingRuleException:
No rule to produce MAE (if you use input functions make sure that they don't raise unexpected exceptions).

Links to publicly available gene counts do not work

Hi,
The links to publicly available gene counts (DOI links) do not work. They lead to a page that asks to request the data via "ILLiad" and that is not a feasible option.

Missing file exception

I'm getting the below error. Not sure what the issue is.

Loading assay: psi3
Loading assay: rawOtherCounts_psi5
Loading assay: rawOtherCounts_psi3
Loading assay: delta_psi5
Loading assay: delta_psi3
Loading assay: predictedMeans_psi5
Loading assay: predictedMeans_psi3
Loading assay: zScores_psi5
Loading assay: pvaluesBetaBinomial_psi5
Loading assay: padjBetaBinomial_psi5
Loading assay: zScores_psi3
Loading assay: pvaluesBetaBinomial_psi3
Loading assay: padjBetaBinomial_psi3
Loading assay: rawCountsSS
Loading assay: psiSite
Loading assay: rawOtherCounts_psiSite
Loading assay: delta_psiSite
Loading assay: predictedMeans_psiSite
Loading assay: zScores_psiSite
Loading assay: pvaluesBetaBinomial_psiSite
Loading assay: padjBetaBinomial_psiSite
Fri Jun 19 19:34:14 2020: Writing final FRASER object ('/gpfs/scratch/evrong01/droptest/root/processed_data/aberrant_splicing/datasets//savedObjects/GTEX100/fds-object.RDS').
Waiting at most 5 seconds for missing files.
MissingOutputException in line 115 of /tmp/tmpvhlvdoxe:
Missing files after 5 seconds:
/gpfs/scratch/evrong01/droptest/root/processed_data/aberrant_splicing/datasets/savedObjects/GTEX100/pajdBetaBinomial_psiSite.h5
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest/.drop/modules/aberrant-splicing-pipeline/.snakemake/log/2020-06-19T183802.842513.snakemake.log

R error: dev.control

I'm running snakemake sampleAnnotation and getting the below error.
There appears to be something in the R configuration that is not compatible with your code.

Quitting from lines 18-27 (/tmp/RtmpET6935/file1d6d9205f4556/SampleAnnotation.spin.Rmd)
Error in dev.control(displaylist = if (record) "enable" else "inhibit") :
dev.control() called without an open graphics device
Calls: render ... call_block -> block_exec -> chunk_device -> dev.control
In addition: Warning messages:
1: In grDevices::png(f) : no png support in this version of R
2: In (function (filename = if (onefile) "Rplots.svg" else "Rplot%03d.svg", :
unable to load shared object '/gpfs/share/apps/R/4.0.0/lib64/R/library/grDevices/libs//cairo.so':
/gpfs/share/apps/R/4.0.0/lib64/R/library/grDevices/libs//cairo.so: cannot open shared object file: No such file or directory
3: In (function (filename = if (onefile) "Rplots.svg" else "Rplot%03d.svg", :
failed to load cairo DLL

Execution halted
[Mon Jun 8 11:37:13 2020]
Error in rule Scripts_Pipeline_SampleAnnotation_R:
jobid: 1
output: /gpfs/scratch/evrong01/droptest/root/processed_data/sample_anno/sample_anno.done, /gpfs/scratch/evrong01/droptest/htmlOutput/Scripts_Pipeline_SampleAnnotation.html

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest/.snakemake/log/2020-06-08T113707.550475.snakemake.log

gagneurlab / drop Goto Github PK

drop's People

Contributors

Stargazers

Watchers

Forkers

drop's Issues

Recommend Projects

Recommend Topics

Recommend Org