wzthu / esatac Goto Github PK

View Code? Open in Web Editor NEW

23.0 4.0 11.0 24.49 MB

Bioconductor package esATAC: an Easy-to-use Systematic pipeline for ATAC-seq data analysis

Home Page: https://www.bioconductor.org/packages/release/bioc/html/esATAC.html

License: GNU General Public License v3.0

R 83.86% C++ 16.14%

bioconductor atac-seq pipeline

esatac's People

Contributors

Stargazers

Watchers

Forkers

springtan rogerzou0108 fengyq wan-yifei mauriziopaul jianguozhou3 jefferys bowangxjtu chlorproaine msq-123 honchkrow

esatac's Issues

Bioconductor install esATAC stuck

Hello esATAC team,

I tried to install esATAC followed by https://www.bioconductor.org/packages/release/bioc/html/esATAC.html. I use Bioconductor version 3.16 (BiocManager 1.30.18), R 4.2.1 (2022-06-23), use the commands
R

library(BiocManager)
BiocManager::install("esATAC")

it successfully installed the dependencies packages, but stuck on the testing stage.

installing to /rsrch3/home/itops/ryao/R/x86_64-pc-linux-gnu-library/4.2/00LOCK-esATAC/00new/esATAC/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location

May I have your suggestion? Thank you!

Rong Yao

Pipeline produces error for single end fastq

I am using esATAC 1.4.0 in R studio.

In the documentation, it says that for single end sequencing, I just need to input fastqInput1 and adapter 1. I am running the following command.

conclusion<-atacPipe2(case=list(fastqInput1="TGFbKO_69p103p_S11_L005_R1_001.fastq.gz", adapter1="NexteraSE-SE.fa"), control=list(fastqInput1="TGFbKO_Spleen_S2_L005_R1_001.fastq.gz", adapter1= "NexteraSE-SE.fa"), genome = "mm10", threads = 8)

I get the following error.

Error in atacPipe2(case = list(fastqInput1 = "TGFbKO_69p103p_S11_L005_R1_001.fastqz", : fastqInput2 for case can not be NULL

Am I missing another parameter that is needed for single end sequencing that would bypass the need for fastqInput2?

Can I use the intermediate files to run?

Hello, I want to run case/control peak analysis with peak file. But it seems the input file need to be fastq file. I also want to know the R package can do analysis with intermediate files, Is there exists one step to next step commands?

Thank you.

Error when run atacRepsPipe

Dear Zheng,

Here I met a problem when I run atacRepsPipe.
Error in atacPipe(genome = genome, fastqInput1 = fastqInput1[[n]], fastqInput2 = fastqInput2n, : formal argument "basicAnalysis" matched by multiple actual arguments

But I checked my code and it shows similar with the code in your examples. I could not know how to resovle it. Could you help me with this? It could be very grateful when I get your kindly reply.

BTW, I listed my codes below.
conclusion <- atacRepsPipe( fastqInput1 = list("H:/ATAC_analysis/raw_data/ATAC_12hpf-4_1.clean.fq.gz", "H:/ATAC_analysis/raw_data/ATAC_12hpf-5_1.clean.fq.gz", "H:/ATAC_analysis/raw_data/ATAC_12hpf-6_1.clean.fq.gz"), fastqInput2 = list("H:/ATAC_analysis/raw_data/ATAC_12hpf-4_2.clean.fq.gz", "H:/ATAC_analysis/raw_data/ATAC_12hpf-5_2.clean.fq.gz", "H:/ATAC_analysis/raw_data/ATAC_12hpf-6_2.clean.fq.gz"), refdir <- "H:/ATAC_analysis/refdatafolder", tmpdir = "H:/ATAC_analysis/analysis/sample_12hpf/replicate_basic/intermediate_result", chr = c(1:25), genome = "danRer10")

BEST
BJ

Example data does not work as stated because it is still under development

Hi,
I am trying to run the example data but I received an error stating the function is still underdevelopment.

conclusion <-

atacPipe2(

# MODIFY: Change these paths to your own case files!

# e.g. fastqInput1 = "your/own/data/path.fastq"

case=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"),

          fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")),

# MODIFY: Change these paths to your own control files!

# e.g. fastqInput1 = "your/own/data/path.fastq"

control=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"),

             fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")),

# MODIFY: Change this path to an permanent path to be used in future!

#       refdir = "./esATAC_pipeline/refdir",

#       tmpdir = "./esATAC_pipeline/intermediate_results",

```
# MODIFY: Set the genome for your data
```
```
genome = "hg19")
```

Error in atacPipe2(case = list(fastqInput1 = system.file(package = "esATAC", :
this function is still under developing

error in atacpipe2

Hello guys, When I execute atacpipe2 command I always get the following error regardless of the input samples. It happens during the processing of the control case.

start processing(paired end data): PeakCallingFseq
Error in .jcall("edu/duke/igsp/gkde/Main", "V", "main", argvs) :
java.lang.ArithmeticException: / by zero

Does anybody have an idea or a solution to this? Thanks :-)

TSS enrichment score

Dear authors of esATAC,

First, thanks for the wonderful tool.
Under the TSS enrichment module section of your sample report, I could see the 'Nuceleosome Free Reads' plot. Do you also have an option of generating TSS enrichment score, like recommended by the EnCODE project: https://www.encodeproject.org/data-standards/terms/#enrichment ?
Essentially, normalizing the above plot using the flanking regions.

Thanks
SK

R Error subscript out of bounds

I have 3 samples, strangely, one sample worked perfectly. The another two gives me this error during the peakcalling, after filtering 100bp fragments with bedUtils:

Error in strsplit(filename[i], split = "\\.")[[1]] : 
  subscript out of bounds
In addition: Warning messages:
1: In dir.create(esATAC_result) :
  '/Users/tiagob/R_Projects/Mariya/esATAC_result' already exists
2: In dir.create(esATAC_report) :
  '/Users/tiagob/R_Projects/Mariya/esATAC_report' already exists
3: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 4 out-of-bound ranges located on sequence chr4_GL456350_random. Note that only ranges located on a non-circular sequence whose length is
  not NA can be considered out-of-bound (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use trim()
  to trim these ranges. See ?`trim,GenomicRanges-method` for more information.
4: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 4 out-of-bound ranges located on sequence chr4_GL456350_random. Note that only ranges located on a non-circular sequence whose length is
  not NA can be considered out-of-bound (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use trim()
  to trim these ranges. See ?`trim,GenomicRanges-method` for more information.
5: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 4 out-of-bound ranges located on sequence chr4_GL456350_random. Note that only ranges located on a non-circular sequence whose length is
  not NA can be considered out-of-bound (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use trim()
  to trim these ranges. See ?`trim,GenomicRanges-method` for more information.
6: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 4 out-of-bound ranges located on sequence chr4_GL456350_random. Note that only ranges located on a non-circular sequence whose length is
  not NA can be considered out-of-bound (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use trim()
  to trim these ranges. See ?`trim,GenomicRanges-method` for more information.
7: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 4 out-of-bound ranges located on sequence chr4_GL456350_random. Note that only ranges located on a non-circular sequence whose length is
  not NA can be considered out-of-bound (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use trim()
  to trim these ranges. See ?`trim,GenomicRanges-method` for more information.
8: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 4 out-of-bound ranges located on sequence chr4_GL456350_random. Note that only ranges located on a non-circular sequence whose length is
  not NA can be considered out-of-bound (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use trim()
  to trim these ranges. See ?`trim,GenomicRanges-method` for more information.

I have already done everything, but could not find the error origin. More strange is the fact that, one sample worked nicely, but all three Fastqs are all the same origin aligned in the same pipeline. Do you have any idea of what is it?

(EDITED) I thought that maybe is related to these out of bound ranges but even after removing manually all those chromosomes it did not worked. Do you know any way to fix this?

Actually this error is strangely related to a selection of a file name, cutting the name in pieces separated by a dot and taking the first element, but it return this error. Any Idea why?

Thanks!

Error in RmotifScan, character not in lookup table

Hello,

Thanks for developing esATAC, I'm having an issue at the motif scanning part in the atacPipe2() function. Below is my call:

D7_MSC_IMMOB_v_D7_MSC_MOB = atacPipe2(
    case = list(
        fastqInput1 = file.path(run_dir, 'Sample_130301/130301_CGTACTAG-ACTGCATA_S5_R1_001.fastq.gz'),
        fastqInput2 = file.path(run_dir, 'Sample_130301/130301_CGTACTAG-ACTGCATA_S5_R2_001.fastq.gz')),
    control = list(
        fastqInput1 = file.path(run_dir, 'Sample_130299/130299_AGGCAGAA-GTAAGGAG_S3_R1_001.fastq.gz'),
        fastqInput2 = file.path(run_dir, 'Sample_130299/130299_AGGCAGAA-GTAAGGAG_S3_R2_001.fastq.gz')),
    refdir = '/nfs/med-bfx-common/iGenomes/Mus_musculus/UCSC/mm10_esatac',
    motifs = getMotifInfo(motif.file = file.path(project_dir, 'Huber_2860_ATAC01_AH-205', 'MSC_jaspar.txt')),
    tmpdir = 'tmp',
    threads = 2,
    interleave = FALSE,
    genome = 'mm10')

When the RMotifScan part of the pipeline begins I get the following:

...
>>>>>>========================================
2019-05-23 11:34:30
start processing(paired end data): RMotifScan
processing file:
peak file:/nfs/turbo/epicore-active/Huber_2860_ATAC01_AH-205/esATAC/D7_MSC_IMMOB_v_D7_MSC_MOB/tmp/130301_CGTACTAG-ACTGCATA_S5_R1_001.PeakCallingFseq.bed
Output destination:/nfs/turbo/epicore-active/Huber_2860_ATAC01_AH-205/esATAC/D7_MSC_IMMOB_v_D7_MSC_MOB/tmp/130301_CGTACTAG-ACTGCATA_S5_R1_001.PeakCallingFseq_Case_data_MotifScanOutput
Error in .Call2("new_CHARACTER_from_XStringSet", x, xs_dec_lkup(x), PACKAGE = "Biostrings") :
  key 48 (char '0') not in lookup table

I feel like this error comes from a non-IUPAC letter as it relates to XStringSets. I looked through my JASPAR formatted motif file and didn't see anything amiss. So I tried using the file that you've included with the package:

motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC"))

And I actually get a similar error:

>>>>>>========================================
2019-05-23 12:23:45
start processing(paired end data): RMotifScan
processing file:
peak file:/nfs/turbo/epicore-active/Huber_2860_ATAC01_AH-205/esATAC/D7_MSC_IMMOB_v_D7_MSC_MOB/tmp/130301_CGTACTAG-ACTGCATA_S5_R1_001.PeakCallingFseq.bed
Output destination:/nfs/turbo/epicore-active/Huber_2860_ATAC01_AH-205/esATAC/D7_MSC_IMMOB_v_D7_MSC_MOB/tmp/130301_CGTACTAG-ACTGCATA_S5_R1_001.PeakCallingFseq_Case_data_MotifScanOutput
Error in .Call2("new_CHARACTER_from_XStringSet", x, xs_dec_lkup(x), PACKAGE = "Biostrings") : 
  key 48 (char '0') not in lookup table

I made sure that my file and your file are very similar:

> readLines(system.file("extdata", "CustomizedMotif.txt", package="esATAC"))
 [1] ">MA0139.1\tCTCF"                                                                                                                           
 [2] "A  [    87    167    281     56      8    744     40    107    851      5    333     54     12     56    104    372     82    117    402 ]"
 [3] "C  [   291    145     49    800    903     13    528    433     11      0      3     12      0      8    733     13    482    322    181 ]"
 [4] "G  [    76    414    449     21      0     65    334     48     32    903    566    504    890    775      5    507    307     73    266 ]"
 [5] "T  [   459    187    134     36      2     91     11    324     18      3      9    341      8     71     67     17     37    396     59 ]"
 [6] ""                                                                                                                                          
 [7] ">MA0835.1\tBATF3"                                                                                                                          
 [8] "A  [    40     23    396      1      0    429      1      8      0      6    420      5     18    357 ]"                                   
 [9] "C  [    98     11     11      0      0      0    497      0      0    513      0     39    520     24 ]"                                   
[10] "G  [    30    777     20      0    776      0      0    746      2      2      0     27      7    147 ]"                                   
[11] "T  [   625     17      1    732      0      0      0      0    692      3      2    684     16     49 ]"                                   
> head(readLines(file.path(project_dir, 'Huber_2860_ATAC01_AH-205', 'MSC_jaspar.txt')), n = 11)
 [1] ">PH0024.1\tDlx5"                                                                                                      
 [2] "A  [    11     17     28     18      1     97     98      1      1     61     19      5      7     26     15     19 ]"
 [3] "C  [    29     29     17     14     38      2      0      1      0      0     25     49     17     34     22     27 ]"
 [4] "G  [    37     37     49     59      0      0      1      0      2     38     37     17     37     23     22     35 ]"
 [5] "T  [    22     17      5      9     61      1      1     98     97      1     18     28     39     17     41     19 ]"
 [6] ""                                                                                                                     
 [7] ">MA0482.1\tGata4"                                                                                                     
 [8] "A  [    10      0      0      0   2707      0      0    547      0    601    386 ]"                                   
 [9] "C  [  1039   2165     89      0     39      0   2746      0   1240   1004    929 ]"                                   
[10] "G  [   374    306      0      0      0      0      0      0    940    157    682 ]"                                   
[11] "T  [  1323    275   2657   2746      0   2746      0   2199    566    984    749 ]"

Have you encountered this problem before? I can't seem to find anything via Google.

Thanks in advance,
Raymond

And here is my session info:

> devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 3.5.0 (2018-04-23)
 os       Red Hat Enterprise Linux Server 7.6 (Maipo)
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Detroit
 date     2019-05-23
  
─ Packages ───────────────────────────────────────────────────────────────────
 package                            * version   date       lib source
 annotate                             1.58.0    2018-06-27 [2] Bioconductor
 AnnotationDbi                      * 1.44.0    2018-10-30 [1] Bioconductor
 assertthat                           0.2.0     2017-04-11 [2] CRAN (R 3.5.0)
 backports                            1.1.2     2017-12-13 [2] CRAN (R 3.5.0)
 Biobase                            * 2.40.0    2018-06-27 [2] Bioconductor
 BiocGenerics                       * 0.28.0    2018-10-30 [1] Bioconductor
 BiocInstaller                        1.30.0    2018-06-27 [2] Bioconductor
 BiocParallel                       * 1.14.2    2018-07-08 [2] Bioconductor
 biomaRt                              2.36.1    2018-06-27 [2] Bioconductor
 Biostrings                         * 2.48.0    2018-06-27 [2] Bioconductor
 bit                                  1.1-14    2018-05-29 [2] CRAN (R 3.5.0)
 bit64                                0.9-7     2017-05-08 [2] CRAN (R 3.5.0)
 bitops                               1.0-6     2013-08-17 [2] CRAN (R 3.5.0)
 blob                                 1.1.1     2018-03-25 [2] CRAN (R 3.5.0)
 boot                                 1.3-20    2017-08-06 [2] CRAN (R 3.5.0)
 brew                                 1.0-6     2011-04-13 [2] CRAN (R 3.5.0)
 BSgenome                           * 1.48.0    2018-06-27 [2] Bioconductor
 BSgenome.Mmusculus.UCSC.mm10       * 1.4.0     2019-05-22 [1] Bioconductor
 callr                                3.1.1     2018-12-21 [1] CRAN (R 3.5.0)
 caTools                              1.17.1.1  2018-07-20 [2] CRAN (R 3.5.0)
 ChIPseeker                           1.16.1    2018-07-21 [2] Bioconductor
 cli                                  1.0.1     2018-09-25 [1] CRAN (R 3.5.0)
 clusterProfiler                      3.8.1     2018-06-27 [2] Bioconductor
 CNEr                                 1.16.1    2018-06-27 [2] Bioconductor
 colorspace                           1.3-2     2016-12-14 [2] CRAN (R 3.5.0)
 corrplot                             0.84      2017-10-16 [2] CRAN (R 3.5.0)
 cowplot                              0.9.3     2018-07-15 [2] CRAN (R 3.5.0)
 crayon                               1.3.4     2017-09-16 [2] CRAN (R 3.5.0)
 data.table                           1.11.4    2018-05-27 [2] CRAN (R 3.5.0)
 DBI                                  1.0.0     2018-05-02 [2] CRAN (R 3.5.0)
 DelayedArray                       * 0.8.0     2018-10-30 [1] Bioconductor
 desc                                 1.2.0     2018-05-01 [1] CRAN (R 3.5.0)
 devtools                             2.0.1     2018-10-26 [1] CRAN (R 3.5.0)
DiagrammeR                           1.0.0     2018-03-01 [2] CRAN (R 3.5.0)
 digest                               0.6.16    2018-08-22 [2] CRAN (R 3.5.0)
 DirichletMultinomial                 1.22.0    2018-06-27 [2] Bioconductor
 DO.db                                2.9       2018-06-27 [2] Bioconductor
 DOSE                                 3.6.1     2018-06-20 [2] Bioconductor
 downloader                           0.4       2015-07-09 [2] CRAN (R 3.5.0)
 dplyr                                0.8.0.1   2019-02-15 [1] CRAN (R 3.5.0)
 enrichplot                           1.0.2     2018-06-27 [2] Bioconductor
 esATAC                             * 1.2.1     2018-08-07 [1] Bioconductor
 evaluate                             0.11      2018-07-17 [2] CRAN (R 3.5.0)
 fastmatch                            1.1-0     2017-01-28 [2] CRAN (R 3.5.0)
 fgsea                                1.6.0     2018-06-27 [2] Bioconductor
 formatR                              1.5       2017-04-25 [2] CRAN (R 3.5.0)
 fs                                   1.2.6     2018-08-23 [1] CRAN (R 3.5.0)
 futile.logger                        1.4.3     2016-07-10 [2] CRAN (R 3.5.0)
 futile.options                       1.0.1     2018-04-20 [2] CRAN (R 3.5.0)
 gdata                                2.18.0    2017-06-06 [2] CRAN (R 3.5.0)
 GenomeInfoDb                       * 1.16.0    2018-06-27 [2] Bioconductor
 GenomeInfoDbData                     1.1.0     2018-06-27 [2] Bioconductor
 GenomicAlignments                  * 1.16.0    2018-06-27 [2] Bioconductor
 GenomicFeatures                    * 1.32.3    2018-10-05 [1] Bioconductor
 GenomicRanges                      * 1.32.7    2018-09-20 [1] Bioconductor
 ggforce                              0.1.3     2018-07-07 [2] CRAN (R 3.5.0)
 ggplot2                              3.1.0     2018-10-25 [1] CRAN (R 3.5.0)
 ggraph                               1.0.2     2018-07-07 [2] CRAN (R 3.5.0)
 ggrepel                              0.8.0     2018-05-09 [2] CRAN (R 3.5.0)
 ggridges                             0.5.0     2018-04-05 [2] CRAN (R 3.5.0)
 glue                                 1.3.0     2018-07-17 [2] CRAN (R 3.5.0)
 GO.db                                3.6.0     2018-06-27 [2] Bioconductor
 GOSemSim                             2.6.2     2018-08-08 [2] Bioconductor
 gplots                               3.0.1     2016-03-30 [2] CRAN (R 3.5.0)
 gridBase                             0.4-7     2014-02-24 [2] CRAN (R 3.5.0)
 gridExtra                            2.3       2017-09-09 [2] CRAN (R 3.5.0)
 gtable                               0.2.0     2016-02-26 [2] CRAN (R 3.5.0)
 gtools                               3.8.1     2018-06-26 [2] CRAN (R 3.5.0)
 hms                                  0.4.2     2018-03-10 [2] CRAN (R 3.5.0)
 htmltools                            0.3.6     2017-04-28 [2] CRAN (R 3.5.0)
 htmlwidgets                          1.2       2018-04-19 [2] CRAN (R 3.5.0)
 httr                                 1.4.0     2018-12-11 [1] CRAN (R 3.5.0)
 hwriter                              1.3.2     2014-09-10 [2] CRAN (R 3.5.0)
 igraph                               1.2.2     2018-07-27 [2] CRAN (R 3.5.0)
 influenceR                           0.1.0     2015-09-03 [2] CRAN (R 3.5.0)
 IRanges                            * 2.16.0    2018-10-30 [1] Bioconductor
 JASPAR2016                           1.8.0     2018-06-27 [2] Bioconductor
 jsonlite                           * 1.5       2017-06-01 [2] CRAN (R 3.5.0)
 KEGGREST                             1.20.1    2018-06-27 [2] Bioconductor
 KernSmooth                           2.23-15   2015-06-29 [2] CRAN (R 3.5.0)
 knitr                                1.20      2018-02-20 [2] CRAN (R 3.5.0)
 lambda.r                             1.2.3     2018-05-17 [2] CRAN (R 3.5.0)
 lattice                              0.20-35   2017-03-25 [2] CRAN (R 3.5.0)
 latticeExtra                         0.6-28    2016-02-09 [2] CRAN (R 3.5.0)
 lazyeval                             0.2.1     2017-10-29 [2] CRAN (R 3.5.0)
 magrittr                           * 1.5       2014-11-22 [2] CRAN (R 3.5.0)
 MASS                                 7.3-50    2018-04-30 [2] CRAN (R 3.5.0)
 Matrix                               1.2-14    2018-04-13 [2] CRAN (R 3.5.0)
 matrixStats                        * 0.54.0    2018-07-23 [2] CRAN (R 3.5.0)
 memoise                              1.1.0     2017-04-21 [2] CRAN (R 3.5.0)
 motifmatchr                          1.2.0     2018-06-27 [2] Bioconductor
 munsell                              0.5.0     2018-06-12 [2] CRAN (R 3.5.0)
 org.Mm.eg.db                       * 3.7.0     2019-03-07 [1] Bioconductor
 pillar                               1.3.1     2018-12-15 [1] CRAN (R 3.5.0)
 pkgbuild                             1.0.2     2018-10-16 [1] CRAN (R 3.5.0)
 pkgconfig                            2.0.2     2018-08-16 [2] CRAN (R 3.5.0)
 pkgload                              1.0.2     2018-10-29 [1] CRAN (R 3.5.0)
 plotrix                              3.7-2     2018-05-27 [2] CRAN (R 3.5.0)
 plyr                                 1.8.4     2016-06-08 [2] CRAN (R 3.5.0)
 png                                  0.1-7     2013-12-03 [2] CRAN (R 3.5.0)
 poweRlaw                             0.70.1    2017-08-29 [2] CRAN (R 3.5.0)
 prettyunits                          1.0.2     2015-07-13 [2] CRAN (R 3.5.0)
 processx                             3.2.1     2018-12-05 [1] CRAN (R 3.5.0)
 progress                             1.2.0     2018-06-14 [2] CRAN (R 3.5.0)
 ps                                   1.3.0     2018-12-21 [1] CRAN (R 3.5.0)
 purrr                                0.2.5     2018-05-29 [2] CRAN (R 3.5.0)
qvalue                               2.12.0    2018-06-27 [2] Bioconductor
 R.methodsS3                        * 1.7.1     2016-02-16 [2] CRAN (R 3.5.0)
 R.oo                               * 1.22.0    2018-04-22 [2] CRAN (R 3.5.0)
 R.utils                            * 2.6.0     2017-11-05 [2] CRAN (R 3.5.0)
 R6                                   2.4.0     2019-02-14 [1] CRAN (R 3.5.0)
 Rbowtie2                             1.2.0     2018-06-27 [2] Bioconductor
 RColorBrewer                         1.1-2     2014-12-07 [1] CRAN (R 3.5.0)
 Rcpp                                 1.0.0     2018-11-07 [1] CRAN (R 3.5.0)
 RCurl                                1.95-4.11 2018-07-15 [2] CRAN (R 3.5.0)
 readr                                1.1.1     2017-05-16 [2] CRAN (R 3.5.0)
 remotes                              2.0.2     2018-10-30 [1] CRAN (R 3.5.0)
 reshape2                             1.4.3     2017-12-11 [2] CRAN (R 3.5.0)
 rgexf                                0.15.3    2015-03-24 [2] CRAN (R 3.5.0)
 rJava                                0.9-10    2018-05-29 [2] CRAN (R 3.5.0)
 rlang                                0.3.1     2019-01-08 [1] CRAN (R 3.5.0)
 rmarkdown                            1.10      2018-06-11 [2] CRAN (R 3.5.0)
 Rook                                 1.1-1     2014-10-20 [2] CRAN (R 3.5.0)
 rprojroot                            1.3-2     2018-01-03 [2] CRAN (R 3.5.0)
 Rsamtools                          * 1.34.1    2019-01-31 [1] Bioconductor
 RSQLite                              2.1.1     2018-05-06 [2] CRAN (R 3.5.0)
 rstudioapi                           0.7       2017-09-07 [2] CRAN (R 3.5.0)
 rtracklayer                        * 1.42.2    2019-03-01 [1] Bioconductor
 rvcheck                              0.1.0     2018-05-23 [2] CRAN (R 3.5.0)
 S4Vectors                          * 0.20.1    2018-11-09 [1] Bioconductor
 scales                               1.0.0     2018-08-09 [2] CRAN (R 3.5.0)
 seqLogo                              1.46.0    2018-06-27 [2] Bioconductor
 sessioninfo                          1.1.1     2018-11-05 [1] CRAN (R 3.5.0)
 ShortRead                          * 1.38.0    2018-06-27 [2] Bioconductor
 stringi                              1.2.4     2018-07-20 [2] CRAN (R 3.5.0)
 stringr                            * 1.3.1     2018-05-10 [2] CRAN (R 3.5.0)
 SummarizedExperiment               * 1.10.1    2018-06-27 [2] Bioconductor
 TFBSTools                            1.18.0    2018-06-27 [2] Bioconductor
 TFMPvalue                            0.0.8     2018-05-16 [2] CRAN (R 3.5.0)
 tibble                               2.0.1     2019-01-12 [1] CRAN (R 3.5.0)
 tidyr                                0.8.1     2018-05-18 [2] CRAN (R 3.5.0)
 tidyselect                           0.2.5     2018-10-11 [1] CRAN (R 3.5.0)
 tweenr                               0.1.5     2016-10-10 [2] CRAN (R 3.5.0)
 TxDb.Hsapiens.UCSC.hg19.knownGene    3.2.2     2018-06-27 [2] Bioconductor
 TxDb.Mmusculus.UCSC.mm10.knownGene * 3.4.0     2019-05-22 [1] Bioconductor
 units                                0.6-0     2018-06-09 [2] CRAN (R 3.5.0)
 UpSetR                               1.3.3     2017-03-21 [2] CRAN (R 3.5.0)
 usethis                              1.4.0     2018-08-14 [1] CRAN (R 3.5.0)
 VennDiagram                          1.6.20    2018-03-28 [2] CRAN (R 3.5.0)
 VGAM                                 1.0-6     2018-08-18 [2] CRAN (R 3.5.0)
 viridis                              0.5.1     2018-03-29 [2] CRAN (R 3.5.0)
 viridisLite                          0.3.0     2018-02-01 [2] CRAN (R 3.5.0)
 visNetwork                           2.0.4     2018-06-14 [2] CRAN (R 3.5.0)
 withr                                2.1.2     2018-03-15 [2] CRAN (R 3.5.0)
 XML                                  3.98-1.16 2018-08-19 [2] CRAN (R 3.5.0)
 xtable                               1.8-2     2016-02-05 [2] CRAN (R 3.5.0)
 XVector                            * 0.20.0    2018-06-27 [2] Bioconductor
 zlibbioc                             1.26.0    2018-06-27 [2] Bioconductor
  
[1] /home/rcavalca/R/x86_64-redhat-linux-gnu-library/3.5
[2] /app/med/centos7/R/3.5.0/lib64/R/library

Fragment Size

What is the difference between fragment size and the field #9 in the SAM format?

Thank u!

Qiong

Trouble performing configuration steps offline

Hello,

I am not able to get through the configuration steps when run offline (on a cluster that does not have internet access). When I run the script on my login node (with internet access) it works fine. This is the error I get when I run without internet access.

Reference data configuraion start ...
Configure bsgenome ...
Error: Bioconductor version cannot be validated; no internet connection?
In addition: Warning messages:
1: In file(con, "r") :
URL 'https://bioconductor.org/config.yaml': status was 'Couldn't connect to server'
2: In file(con, "r") :
URL 'http://bioconductor.org/config.yaml': status was 'Couldn't connect to server'
Execution halted

I believe this is because I am not configuring correctly to run offline. Please let me know how to fix this problem. Thank you.

Cannot generate report when run atacRepsPipe for one of replicates

Hi,

I found after update, the atacRepsPipe still occur an unexpected error. I have 4 replicates, the atacRepsPipe cannot generate report for 4th sample and return following message

Quitting from lines 277-278 (Report.Rmd)
Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) :
  cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory'
Calls: atacRepsPipe ... withVisible -> eval -> eval -> sessionInfo -> Sys.which
In addition: There were 30 warnings (use warnings() to see them)

system call failed: Cannot allocate memory

I have set the options(java.parameters = "-Xmx8192m") for JVM. Could you please figure out the reason of this problem? Do I need more memory?

Thank you very much! I think I ask too many questions，thanks for your patient.

Result from example deviates from demo

Hi Zheng Wei,
Thank you for developing this package. As someone new in this field, i am running your codes posted in: https://bioconductor.org/packages/release/bioc/vignettes/esATAC/inst/doc/esATAC-Introduction.html#2_introduction
under the section 3.3.7 motif scan and 3.3.8
I copied and pasted your codes and ran on my computer. the results look different from the example:
this is my result:

this is the introduction's result:

I really don't understand why could this happen.

also in general, for the function atacCutSiteCount, in real life which of the following bedfile should i use as input?:
*.SamToBed.bed
*.SamToBed.BedUtils.bed
or
*.PeakCallingFseq.bed
Thank you
Atom

Error mentions Step_03 pipe_FindAdapter/find' had Status 1

I am using the quick start single sample analysis of the esATAC pipeline, and this is a recent error I got in which my pipeline is no longer generating the "find.adapter1" and "find.adapter2" files for Step 3 of the pipeline. I would get 2 warning messages in which it mentions that the "/pipe/Step_03_pipe_FindAdapter/find' had status 1" and the second warning is "In file(con, "r") :
cannot open file '/[my source directory]/tmpdir/pipe/Step_03_pipe_FindAdapter/find.adapter1': No such file or directory".

I updated my R to the newest version and also re-installed esATAC through Bioconductor just in case. Any assistance would be appreciated!

error trying to run atacPipe2

Hello! Even though I have downloaded this package, and loaded it, I am still receiving this error:

Error in atacPipe2(case = list(fastqInput1 = "ELS3_BB-read-1.fastq.gz",  : 
  could not find function "atacPipe2"

when I run this:

conclusion <- atacPipe2(
+     case=list(fastqInput1 = "experimental-read-1.fastq.gz", fastqInput2 = "experimental-read-1.fastq.gz"),
+     control=list(fastqInput1 = "controlSample-read-1.fastq.gz", fastqInput2 = "controlSample_BB-read-1.fastq.gz"),
+     genome = "mm10")

Are there other dependencies that I am unaware of or something else I should do?
Thank you!

Handling large fastq files

Hi,

I'm having some issues running the pipeline.

After running these commands, R session gets aborted. I can see that the package also depends ShortRead (which uses the FastqStreamer function). Are there any hacks to deal with large fastq files (ranging from 3-12 GB)?

library(esATAC)

call pipeline

conclusion <-
atacPipe(
# MODIFY: Change these paths to your own case files!
# e.g. fastqInput1 = "your/own/data/path.fastq"
fastqInput1 ="ATCT_Seq_2_S53_L003_R1_001.fastq.gz",
fastqInput2 = "ATCT_Seq_2_S53_L003_R2_001.fastq.gz",
# Reference genome:
genome = "hg19")
#######

Thanks.

library(esATAC) is failed

Hi:
Thanks for the useful package!!
But when I use "library(esATAC)", here are some error message:

library(esATAC)
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply,
parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, basename, cbind, colMeans, colnames, colSums, dirname, do.call,
duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, lengths, Map,
mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce,
rowMeans, rownames, rowSums, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which,
which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

expand.grid

Loading required package: IRanges

Attaching package: ‘IRanges’

The following object is masked from ‘package:grDevices’:

windows

Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

strsplit

Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: DelayedArray
Loading required package: matrixStats

Attaching package: ‘matrixStats’

The following objects are masked from ‘package:Biobase’:

anyMissing, rowMedians

Attaching package: ‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following object is masked from ‘package:Biostrings’:

type

The following objects are masked from ‘package:base’:

aperm, apply

Error: package or namespace load failed for ‘esATAC’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
there is no package called ‘TxDb.Hsapiens.UCSC.hg19.knownGene’

And it seems not activate esATAC package:

Error in atacPipe(fastqInput1 = system.file(package = "esATAC", "extdata", :
could not find function "atacPipe"

library（esATAC）error

Hi，
Thanks for developing this user-friendly tool. I have a problem when I run library (esATAC).
The error is:
Error: package or namespace load failed for ‘esATAC’:
.onLoad failed in loadNamespace() for 'rJava', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared object '/home/vip1/R/x86_64-pc-linux-gnu-library/3.6/rJava/libs/rJava.so':
libjvm.so: cannot open shared object file: No such file or directory
Could you help me solving this problem
Looking forward to your reply.

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘NSBS’ for signature ‘"SortedByQueryHits"’

Hi, there,

I tried to feed in the bed file of my peaks into the function "peakanno", but I got the similar error as in one previous post:
2019-10-13 16:50:05
start processing(paired end data): RPeakAnno
processing file:
source:/BGFS1/home/wuq/data/test/KO-Treg.bed
dataframe destination:/BGFS1/home/wuq/data/test/KO-Treg.RPeakAnno.txt
Image destination:/BGFS1/home/wuq/data/test/KO-Treg.RPeakAnno.pdf

preparing features information... 2019-10-13 04:50:05 PM
identifying nearest features... 2019-10-13 04:50:05 PM
calculating distance from peak to TSS... 2019-10-13 04:50:05 PM
assigning genomic annotation... 2019-10-13 04:50:05 PM
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘NSBS’ for signature ‘"SortedByQueryHits"’
In addition: Warning messages:
1: In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
2: In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
3: In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
4: In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
5: In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)

My code is as follows:
batch1 <- "/path to/KO-Treg.bed.bz2"
peak1_path <- as.vector(bunzip2(filename = batch1, destname = file.path(getwd(), "KO-Treg.bed"), ext="bz2", FUN=bzfile, overwrite=TRUE, remove = FALSE))
AnnoInfo <- peakanno(peakInput = peak1_path, TxDb = TxDb.Mmusculus.UCSC.mm10.knownGene, annoDb = "org.Mm.eg.db")

Could you please help me to have a look?

Thanks so much!

Best wishes,
Qiong

sam to bam conversion output differs from bam file from samtools

Hi,

I converted sam to bam file using sam2bam provided in esATAC package. I would like to add this step to my esATAC pipline. However, it gave me 488M BAM file which is quite different from 605M BAM file using $samtools view -b -S file.sam file.bam. I looked at the source code of sam2bam in GitHub and could not find the default arguments to compare with samtools. I would be glad if you could help me figure it out so, I feel more confident to include sam2bed in a pipeline. Thank you in advance for your response.

Error in dir.create

Hello, I get the following error while running atacpipe2 after the Rgo stage:
Error in dir.create(.Object@paramlist[["scanO.dir"]]) : 'path' too long

Would anyone have any advice on that ? :-s

removeAdapter error

Hello, I got the following error while running the program:

"""
start processing(paired end data): RemoveAdapter
begin to find adapter
arguments 'show.output.on.console', 'minimized' and 'invisible' are for Windows only
Error in file(con, "r") : cannot open the connection

In file(con, "r") :
cannot open file 'R1.RemoveAdapter.report.adapter1': No such file or directory

"""

Would anyone help me to sort the error out? Thanks in advance!

Installing error

After I installed your package, I was returned with the following comment
Error: package or namespace load failed for ‘esATAC’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
不存在叫‘TxDb.Hsapiens.UCSC.hg19.knownGene’这个名字的程辑包

atacRepsPipe cannot generate rep_Report

Dear Zheng,

When I apply atacRepsPipe, it can generate a report for each replicate respectively. But when it tries to generate the final report rep_Report, below errors would pop up:

Quitting from lines 100-103 (Rep_Report.Rmd)
Error in parse(text = x, srcfile = src) : <text>:3:1: unexpected ')'
2: knitr::kable(x = getReportVal(atacProcs$DHSQC,"report")))
3: )
   ^
Calls: atacRepsPipe ... <Anonymous> -> parse_all -> parse_all.character -> parse
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Execution halted
Warning message:
system call failed: Cannot allocate memory

I have tried that adding options(java.parameters = "-Xmx8192m") at the beginning of my code. But the error still pops up. My server has 35GB memory in total. Could you please help me to figure out this problem?

Error in RMotifScan

Hi Zheng,

When I run esATAC to the RMotifScan step, I get an error like this,

Error in loadFUN(x, seqname, ranges) : trying to load regions beyond the boundaries of non-circular sequence "chrUn_KN150437v1"

In fact, I get the similar error when I run one replicated samples via atacRepsPipe.
Would you mind showing me the proper way to deal with this issue?

Thanks a lot
BJ

Error in `[<-.data.frame`(`tmp`, tssIndex, "Promoter", value = TRUE) : replacement has 1 row, data has 0

Hello, I met this problem when using atacRepsPipe2...

2019-04-25 17:32:23
start processing(paired end data): FRiPQC
load reads
load peak
get overlap number
finish overlap
2019-04-25 17:32:25
processing finished
========================================<<<<<<

========================================
2019-04-25 17:32:25
start processing(paired end data): RPeakAnno
processing file:
source:/media/shen/disk2/jjc/2019-4-24-atac-seq/esatac_female/case/replicate1/wy-0331-5.R1_val_1.PeakCallingFseq.bed
dataframe destination:/media/shen/disk2/jjc/2019-4-24-atac-seq/esatac_female/case/replicate1/wy-0331-5.R1_val_1.RPeakAnno.txt
Image destination:/media/shen/disk2/jjc/2019-4-24-atac-seq/esatac_female/case/replicate1/wy-0331-5.R1_val_1.RPeakAnno.pdf
preparing features information... 2019-04-25 17时32分26秒
identifying nearest features... 2019-04-25 17时32分26秒
calculating distance from peak to TSS... 2019-04-25 17时32分28秒
assigning genomic annotation... 2019-04-25 17时32分28秒
Error in [<-.data.frame(*tmp*, tssIndex, "Promoter", value = TRUE) :
replacement has 1 row, data has 0
此外: There were 19 warnings (use warnings() to see them)

I have googled this problem but I still don't know how to solve it .... THANKS A LOT!

pipeline for drosophila

Hi,

As my data is drosophila, and I get the error like this,

Error in .obtainConfigure("knownGene") : 
  knownGene has not been configured yet! Please call 'options(atacConf=setConfigure("knownGene","set your value here"))' to configure first

Is it possible to modify something to apply it to drosophila (dm6)?

Thanks

Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: could not find function "input"

When I set the thread parameter to 1, this error will not appear, but once changed, it will report tha error.

Definition of score in RPeakAnno.txt

Hi wzthu,

I don't know the exact meaning of score in the RPeakAnno.txt file. esATAC calls peaks based on F-seq. But in the standard output of F-seq, the score determines the level of gray in which this feature is displayed, which should be int. But in the RPeakAnno.txt, the score is the float. So, could you please tell me the definition of score column in the RPeakAnno?

Thanks a lot!

Single-end data with replicates esATAC error

I have a single end data (2 replicates) without control. I want to analyze the data using esATAC. I have used


conclusion <-
  atacPipe(
    fastqInput1 = "/home/jplab/esATAC/ETP/SRR5183778.fastq.gz", adapter1= "CTGTCTCTTATACACATCT",
    fastqInput2 = "/home/jplab/esATAC/ETP/SRR5183779.fastq.gz", adapter1= "CTGTCTCTTATACACATCT",
    genome = "mm10", refdir ="/home/jplab/esATAC/mm10/")

after running for some time it gives the following error

Error in atacPipe(fastqInput1 = "/home/jplab/esATAC/ETP/SRR5183778.fastq.gz",  : 
  formal argument "adapter1" matched by multiple actual arguments

Please let me know how I should be using esATAC for single-end replicate data without control

Is Bowtie2 local alignment mode supported?

Glancing over the Sam to Bed code, it looks to me that the end point is calculated based on the values of the CIGAR string. I think this would include soft-clipped ends of the reads.

That would explain the issue I am seeing with my fragment length distribution where I see the nucleosome free region peak and mono-nucleosome peaks merged into one broader peak. I think I'm getting a lot of overestimated fragment lengths due to the soft-clipped being counted as part of the length.

If I do trimming before using esATAC with trim galore (so there will be trimming a second time with esATAC's removeAdapter afterward) and thus have less soft-clipping (at least on the 3' end), I get slightly better alignment rate (by ~1%) but a much better looking fragment length distribution where the NFR and mono-nucleosome peaks are more distinct.

Our dataset overall aligns poorly in Bowtie2's end-to-end mode, hence resorting to local alignment mode. But I wonder of the validity of continuing using esATAC in this particular case since I suspect post-alignment operations may not be compatible.

esATAC.only.FragLenDistr.lendistr.pdf
Using.Trim.Galore.First.FragLenDistr.lendistr.revised.pdf

PeakCallingFseq failed

Below is the error report. Someone said it would be attributed to the rJava, so I reinstalled rJava and esATAC, not work. Any help will be much appreciated.

========================================
2018-11-26 05:58:46
start processing(paired end data): PeakCallingFseq
Error in .jcall("edu/duke/igsp/gkde/Main", "V", "main", argvs) :
java.lang.ArithmeticException: / by zero
Calls: atacRepsPipe2 ... processing -> .fseq_call -> fseq -> .jcall -> .jcheck
此外: There were 30 warnings (use warnings() to see them)
stopped

Errors when run atacRepsPipe

Dear Zheng,

When I ran esATAC for the first time, I used R3.5.1 to do it. However, I could not get the report as there is an error showed that
pandoc version 1.12.3 or higher is required and was not found.

So I changed to RStudio to re-do it. It's quite well that I could get all the results and reports. But the problem is that when I ran atacRepsPipe, which could combine many replicates, I got another error.
It showed that
error reading from the connection.

I think maybe it's caused by the large size of the files. So I changed to the R3.5.1 in our linux system. However, I get the same error
pandoc version 1.12.3 or higher is required and was not found.

SO, I know that I could resolve the pandoc error by RStudio. But how to resolve the problem of the connection. Could you help me with this error? It would be very grateful for me.

Best regards,
BJ

Error in PeakCallingFseq

start processing(paired end data): PeakCallingFseq
F-Seq Version 1.85
Error in strsplit(filename[i], split = "\.")[[1]] : 下标出界

Any help will be appreciated

Error during removeAdapter/Bowtie2Mapping

Hello,
I've been trying to use esATAC for some ATAC-seq data I have from mice. When I run the pre-made pipeline, I encounter errors:
"Error in file(con, "r") : cannot open the connection"
"In file(con, "r") : cannot open file 'C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipelineintermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_1.RemoveAdapter.report.adapter1': No such file or directory"

This also happens when I try to run the customized pipeline step-by-step by calling each function sequentially and this typically happens at the removeAdapter or Bowtie2Mapping steps. Any help would be greatly appreciated.

Full Code:
library(dplyr)
library(esATAC)
library(magrittr)

td<-tempdir()
options(atacConf=setConfigure("tmpdir",td))
dir.create("./esATAC_pipeline")
dir.create("./esATAC_pipeline/refdir")
dir.create("./esATAC_pipeline/result")

options(atacConf=setConfigure("refdir","esATAC_pipeline/refdir"))
options(atacConf=setConfigure("tmpdir","esATAC_pipeline/result"))
options(atacConf=setConfigure("threads",2))
options(atacConf=setConfigure("genome","mm10"))

Read1 = "Td148506_USPD16089050_HNN23BBXX_L1_1.fq.gz"
Read2 = "Td148506_USPD16089050_HNN23BBXX_L1_2.fq.gz"

conclusion <-
atacPipe(
# MODIFY: Change these paths to your own case files!
# e.g. fastqInput1 = "your/own/data/path.fastq"
fastqInput1 = Read1,
fastqInput2 = Read2,
# MODIFY: Change this path to an permanent path to be used in future!
# refdir <- "./esATAC_pipeline/refdir",
# tmpdir = "./esATAC_pipeline/intermediate_results",
# MODIFY: Set the genome for your data
genome = "mm10")

Console:

conclusion <-

atacPipe(

# MODIFY: Change these paths to your own case files!

# e.g. fastqInput1 = "your/own/data/path.fastq"

```
fastqInput1 = Read1,
```
```
fastqInput2 = Read2,
```

# MODIFY: Change this path to an permanent path to be used in future!

#       refdir <- "./esATAC_pipeline/refdir",

#       tmpdir = "./esATAC_pipeline/intermediate_results",

```
# MODIFY: Set the genome for your data
```
```
genome = "mm10")
```

========================================
Reference data configuraion start ...
Configure bsgenome ...
Configure annoDb ...
Loading required package: AnnotationDbi

Attaching package: ‘AnnotationDbi’

The following object is masked from ‘package:dplyr’:

select

Configure knownGene ...
Loading required package: GenomicFeatures
Generate genome fasta file ...
Configure bowtie2 index ...
Configure blacklist and DHS ...
Reference data configuraion done
========================================<<<<<<

final esATAC_result
C:/Users/Omar/Desktop/Tomato ATAC-seq Raw Data/Td148506/esATAC_pipeline/esATAC_result
final esATAC_report
C:/Users/Omar/Desktop/Tomato ATAC-seq Raw Data/Td148506/esATAC_pipeline/esATAC_report

========================================
2018-11-07 20:07:11
start processing(paired end data): UnzipAndMerge
processing file:
source:C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\Td148506_USPD16089050_HNN23BBXX_L1_1.fq.gz
destination:C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_1.UnzipAndMerge.fq
processing file:
source:C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\Td148506_USPD16089050_HNN23BBXX_L1_2.fq.gz
destination:C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_2.UnzipAndMerge.fq
2018-11-07 21:29:43
processing finished
========================================<<<<<<

========================================
2018-11-07 21:29:45
start processing(paired end data): FastQC
processing file:
source:C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_1.UnzipAndMerge.fqsource:C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_2.UnzipAndMerge.fq
destination:C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_1.UnzipAndMerge_FastQC.pdf
creating QC plots
2018-11-07 22:34:12
processing finished
========================================<<<<<<

========================================
2018-11-07 22:34:13
start processing(paired end data): Renamer
processing file:
source:C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_1.UnzipAndMerge.fq
destination:C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_1.Renamer.fq
processing file:
source:C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_2.UnzipAndMerge.fq
destination:C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_2.Renamer.fq
2018-11-08 01:55:09
processing finished
========================================<<<<<<

========================================
2018-11-08 01:55:17
start processing(paired end data): RemoveAdapter
begin to find adapter
Error in file(con, "r") : cannot open the connection
In addition: Warning messages:
1: In atacPipe(fastqInput1 = Read1, fastqInput2 = Read2, genome = "mm10") :
path 'esATAC_pipeline/intermediate_results' is exist
2: In dir.create(esATAC_result) :
'C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\esATAC_result' already exists
3: In dir.create(esATAC_report) :
'C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\esATAC_report' already exists
4: In system(call, intern = TRUE, show.output.on.console = TRUE) :
running command '"C:/Users/Omar/Documents/R/win-library/3.5/Rbowtie2/AdapterRemoval" --identify-adapters --file1 C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_1.Renamer.fq --file2 C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_2.Renamer.fq --threads 2 --basename C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_1.RemoveAdapter.report' had status 1
5: In file(con, "r") :
cannot open file 'C:\Users\Omar\Desktop\Tomato ATAC-seq Raw Data\Td148506\esATAC_pipeline\intermediate_results/Td148506_USPD16089050_HNN23BBXX_L1_1.RemoveAdapter.report.adapter1': No such file or directory

libComplexQC

Hello,
Thanks to previous support, I have managed to trim adapters out of my ATAC sequences and align them back to the mm10 genome using removeAdapter and Bowtie2Mapping. However, when I wanted to QC the resulting sam file using libComplexQC, it ran without issues but produced a 2.7GB file which I've been unable to open due to its size.

Although the size of the sam was 64GB, I was under the impression the libComplexQC output would be a small report detailing NRF and other statistics so I'm not sure why it is so big....

Any help/advice would be much appreciated. Thanks again for your time and effort, it is much appreciated.

Error at peak calling stage

Hi,
When running the pipelines (atacPipe, atacPipe2 etc.), the run successfully completes at times while at other times it stops at the peak calling stage with the following error:
F-Seq Version 1.85
Error in strsplit(filename[i], split = "\.")[[1]] :
subscript out of bounds
Execution halted

Do you know what might be causing this error or how I can solve it?
Thank in advance!

arguments 'show.output.on.console', 'minimized' and 'invisible' are for Windows only

Hi I am trying to follow the vignette here https://wzthu.github.io/esATAC/doc/vignettes.html in Rstudio.

The first example of case-control using atacPipe2() failed for the error of "this function is still under developing".

Then I tried the example of single sample in 1.3.2 using atacPipe(), it have been running for over an hour using the sample fastq chr20_1.1.fq.gz. And the console stays at

"arguments 'show.output.on.console', 'minimized' and 'invisible' are for Windows only".

I am not sure whether the pipeline is stuck or it is still running. Also, changing the thread number in the function to 8 seems not working as the console outputs "--threads 2" anyway.

Could anyone give suggestions on these? Thanks in advance!

Error in atacRepsPipe(... : this function is still under developing

Dear Zheng,
Here I met a problem when I run atacRepsPipe.

Error in atacRepsPipe(fastqInput1 = list("/0-1/0-1_S0_L000_R1_000.fastq.gz", :
this function is still under developing

Is atacRepsPipe updating? Or can it be replaced by atacPipe?

My codes are listed as below:

atacRepsPipe(
fastqInput1 =list("./0-1/0-1_S0_L000_R1_000.fastq.gz","./0-2/0-2_S0_L000_R1_000.fastq.gz"),
fastqInput2 =list("./0-1/0-1_S0_L000_R2_000.fastq.gz","./0-2/0-2_S0_L000_R2_000.fastq.gz"),
threads = 48,
genome = "mm10",
refdir = "~/ref",
motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC")))

Best wishes,
PS

Configure issue

Hi, I have problem in starting the pipeline. It seems there is configuration issue for bt2Idx. Can you help me solving it? Thanks!

Configure bsgenome ...
Configure bsgenome finished
Configure knownGene ...
Configure knownGene finished
Configure annoDb ...
Configure annoDb finished
Configure fasta ...
Configure fasta finished
Configure bt2Idx ...
--threads 2
arguments 'show.output.on.console', 'minimized' and 'invisible' are for Windows only

Error in runWithFinishCheck(func = checkAndInstallBt2Idx, refName = "bt2Idx", :
file.exists(refFilePath) are not all TRUE

GRanges object contains missing values

Hi, I am running esATAC for case control with replicatess (atacRepsPipe2) with mm10 and motifs=NULL.

After the first replicate of case sample has run through libcomplexQC, SamToBed conversion seems to work but it failes at BedToBigWig with an error message shown in the attached screenshot.

Any pointers to where should I start the troubleshooting.

How to skip several analysis in the pipeline?

Hi,

Here I met a problem that stop me to move in the pipeline. As my data is zebrafish not human or mouse, I modified the pipeline. But I get the error like this,

Error in .obtainConfigure("DHS") :
  DHS has not been configured yet! Please call 'options(atacConf=setConfigure("DHS","set your value here"))' to configure first

But as we haven't obtained the datasets of the DHS of zebrafish, I want to skip this analysis and the analysis of the peaks in blacklist.
Could it possible to skip these steps to the next, such as TF motifs analysis?

Thanks very much
BJ

Incomplete Bowtie2Mapping

I am trying out esATAC by starting with the Quick Start approach (https://wzthu.github.io/esATAC/doc/vignettes.html#1_quick_start).

Basically trying out a single paired end read from my dataset using the following script:

conclusion <-
atacPipe(
# MODIFY: Change these paths to your own case files!
fastqInput1 = file.path(dataDir, "RawData/S1_R1_001.fastq.gz"),
fastqInput2 = file.path(dataDir, "RawData/S1_R2_001.fastq.gz"),
refdir = file.path(dataDir, "esATAC_pipeline/refdir"),
tmpdir = file.path(dataDir, "esATAC_pipeline/intermediate_results"),
genome = "mm10")

The script produces output data in folders from Step_00 to Step_05.

However, in Step_05_pipe_Bowtie2Mapping, I see sam file for only the 1st mate ( "S1_R1_001.sam"), and the S1_R1_001.report file is empty.

The log file for Bowtie mapping (ntermediate_results/pipe/Step_05_pipe_Bowtie2Mapping/pipeFrame.obj.log) is as below:

Begin to check if it is finished...
2021-07-07 00:13:03
New step. Start processing data:
start mapping with parameters:
bowtie2 index:/esATAC_pipeline/refdir/mm10/genome
samOutput:/esATAC_pipeline/intermediate_results/pipe/Step_05_pipe_Bowtie2Mapping/S1_R1_001.sam
report:/esATAC_pipeline/intermediate_results/pipe/Step_05_pipe_Bowtie2Mapping/S1_R1_001.report
fastqInput1:/esATAC_pipeline/intermediate_results/pipe/Step_04_pipe_RemoveAdapter/S1_R1_001.fq
fastqInput2:~/esATAC_pipeline/intermediate_results/pipe/Step_04_pipe_RemoveAdapter/S1_R2_001.fq
other parameters:-p 2 --no-discordant --no-unal --no-mixed -X 2000
2021-07-07 02:50:00
processing finished

ems everything is running fine but I don't get an output for read-2 (R2) and no html report. Will appreciate any suggestions of where I am going wrong.

Thank you.

Segmentation fault during installation

Thank you for providing this package!

I was trying to install it into R 3.4.1 via Anaconda. I finally got all the dependencies working but now, it's having a problem with this package specifically. This is what I see:

installing to ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
Segmentation fault (core dumped)
ERROR: loading failed

Do you have any advice on what I should do? Perhaps I should use another version of R?

major problem in atacpipe2 / java

When I run atacpipe2 I get:
start processing(paired end data): PeakCallingFseq
F-Seq Version 1.85
Error in strsplit(filename[i], split = "\.")[[1]] :
subscript out of bounds

start processing(paired end data): PeakCallingFseq
Error in .jcall("edu/duke/igsp/gkde/Main", "V", "main", argvs) :
java.lang.OutOfMemoryError: Requested array size exceeds VM limit

I have 64GB RAM and I increased the java limits using -Xmx60192m and XX:MaxMetaspaceSize=60192m but still no solution. Heeeeeelp :-(

Error with Renamer function ".renamer_call"

Hello, I was very excited to find this package for my atacseq data. When I run atacPipe, I get the following error in the Renamer step:

Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: could not find function ".renamer_call"

I have everything installed and setup per user guide. Thanks!

Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'masks': could not find function ".validate_Vector_vertical_slots"

Hi Wei,

Thank you so much for developing this pipeline. For biologists like me with no coding bg it's great to learn to analyze complex data this way. I have run into a problem when running single sample analysis on my data. I am attaching the console script below :

Configure bsgenome ...
Configure bsgenome finished
Configure knownGene ...
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Configure knownGene finished
Configure annoDb ...

Configure annoDb finished
Configure fasta ...
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'masks': could not find function ".validate_Vector_vertical_slots"

Could you please help me in overcoming this?
Thank you so much!

Best,
Rosh

Problem with custom genome bowtie index

I am trying to replicate this example script using a custom genome that was previously indexed:

library(esATAC)
conclusion <-
atacPipe(
fastqInput1 = "path_to/SRR5799398_1.fastq.gz",
fastqInput2 = "path_to/SRR5799398_2.fastq.gz",
refdir = "./refdir",
tmpdir = "./tmp",
genome = "MusMus10_primary")

In the refdir there are the 6 index files previously generated (e.g., MusMus10_primary.1.bt2 ...)
When I try to run the pipeline it is throwing me this error:

========================================
Reference data configuraion start ...
Configure bsgenome ...
Error in isValidVal(.configObj, item, val) : bsgenome not

What is the problem?

Thank you

Giulia

how to turn off auto overwrite function???

I am using atacRepsPipe2 for a total 100 samples, it takes days to processing, which is fine though. And I think you have a good pipeline design that if the pipeline crushed due to (such as) memory issue, hard disk space issue etc.. and I can re-run the pipeline without re-creating pre-generated files.

however, in my situation, some fastq files failed to generate correct SAM files which causes down stream error. When I manually fixed the SAM file and re-run the pipeline, the manually fixed the SAM file will be over-written, I tried modify the file time stamp using touch, not work either.

now my question is how can I tune off the auto overwrite function !!!

many thanks

Chen

conclusion <-
#' atacRepsPipe2(
#' # MODIFY: Change these paths to your own case files!
#' # e.g. fastqInput1 = "your/own/data/path.fastq"
#' caseFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"),
#' system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz")),
#' # MODIFY: Change these paths to your own case files!
#' # e.g. fastqInput1 = "your/own/data/path.fastq"
#' caseFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"),
#' system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")),
#' # MODIFY: Change these paths to your own control files!
#' # e.g. fastqInput1 = "your/own/data/path.fastq"
#' ctrlFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"),
#' system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")),
#' # MODIFY: Change these paths to your own control files!
#' # e.g. fastqInput1 = "your/own/data/path.fastq"
#' ctrlFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2"),
#' system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")),
#' # MODIFY: Set the genome for your data
#' genome = "hg19",
#' motifs = getMotifInfo(motif.file = system.file("extdata", "CustomizedMotif.txt", package="esATAC")))

Error in atacRepsPipe(fastqInput1 = list("malemouse1_S1_L001_R1_001.fastq.gz", : the number of adapters in adapter1 do not match the number of replicates in fastqInput1

Here is my code:
conclusion <-
atacRepsPipe(
fastqInput1 = list("malemouse1_S1_L001_R1_001.fastq.gz","malemouse2_S2_L001_R1_001.fastq.gz","malemouse3_S3_L001_R1_001.fastq.gz","mommouse1_S7_L001_R1_001.fastq.gz","pupmouse1_S4_L001_R1_001.fastq.gz","pupmouse2_S5_L001_R1_001.fastq.gz","pupmouse3_S6_L001_R1_001.fastq.gz"),
fastqInput2 = list("malemouse1_S1_L001_R2_001.fastq.gz","malemouse2_S2_L001_R2_001.fastq.gz","malemouse3_S3_L001_R2_001.fastq.gz","mommouse1_S7_L001_R2_001.fastq.gz","pupmouse1_S4_L001_R2_001.fastq.gz","pupmouse2_S5_L001_R2_001.fastq.gz","pupmouse3_S6_L001_R2_001.fastq.gz"),
genome = "mm10",
threads = 2,
adapter1 = "CTGTCTCTTATACACATCT", adapter2 = "CTGTCTCTTATACACATCT",
chr = c(12),
createReport = TRUE)

Not sure what this error means? Do I need to specify adapter sequence for each of the samples?

wzthu / esatac Goto Github PK

esatac's People

Contributors

Stargazers

Watchers

Forkers

esatac's Issues

call pipeline

Error: package or namespace load failed for ‘esATAC’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): there is no package called ‘TxDb.Hsapiens.UCSC.hg19.knownGene’

Recommend Projects

Recommend Topics

Recommend Org

Error: package or namespace load failed for ‘esATAC’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
there is no package called ‘TxDb.Hsapiens.UCSC.hg19.knownGene’