liulabub / hmmratac Goto Github PK

View Code? Open in Web Editor NEW

98.0 8.0 22.0 67.2 MB

HMMRATAC peak caller for ATAC-seq data

License: GNU General Public License v3.0

Java 99.38% Shell 0.62%

peak-caller atac-seq peak-detection sequencing machine-learning

hmmratac's Introduction

HMMRATAC

Quick Start

Assume that you have a BAM file from aligner such as bwa mem named ATACseq.bam.

Sort the BAM file to get a ATACseq.sorted.bam file:

samtools sort ATACseq.bam -o ATACseq.sorted.bam
Make index from the BAM file to get a ATACseq.sorted.bam.bai file:

samtools index ATACseq.sorted.bam ATACseq.sorted.bam.bai
Make genome information (chromosome sizes) from the BAM file to get a genome.info file:

samtools view -H ATACseq.sorted.bam| perl -ne 'if(/^@SQ.*?SN:(\w+)\s+LN:(\d+)/){print $1,"\t",$2,"\n"}' > genome.info
Run HMMRATAC on the sorted BAM ATACseq.sorted.bam, the BAM index file ATACseq.sorted.bam.bai, and the genome information file genome.info:

java -jar HMMRATAC_V1.2.4_exe.jar -b ATACseq.sorted.bam -i ATACseq.sorted.bam.bai -g genome.info
Filter HMMRATAC output by the score, if desired. Score threshold will depend on dataset, score type and user preference. A threshold of 10 would be:

awk -v OFS="\t" '$13>=10 {print}' NAME_peaks.gappedPeak > NAME.filteredPeaks.gappedPeak

To filter the summit file by the same threshold:

awk -v OFS="\t" '$5>=10 {print}' NAME_summits.bed > NAME.filteredSummits.bed

NOTE: HMMRATAC will report all peaks that match the structure defined by the model, including weak peaks. Filtering by score can be used to retain stronger peaks. Lower score = higher sensitivity and lower precision, Higher score = lower sensitivity and higher precision.

Samtools can be downloaded here: http://www.htslib.org/download/

Be sure to run HMMRATAC using the executable file, found here: https://github.com/LiuLabUB/HMMRATAC/releases For details on HOW to run HMMRATAC, see HMMRATAC_Guide.md, which contains a thorough runthrough of all parameters, output files and input requirements and troubleshooting.

HMMRATAC requires paired-end data. Single-end data will not work. HMMRATAC is designed to process ATAC-seq data that hasn't undergone any size selection, either physical or in silico. This should be standard practice for any ATAC-seq analysis. Size selected data can be processed by HMMRATAC (see HMMRATAC_Guide.md on --trim option).

If you use HMMRATAC in your research, please cite the following paper:

Evan D. Tarbell and Tao Liu, HMMRATAC: a Hidden Markov ModeleR for ATAC-seq, Nucleic Acids Res. 2019 Jun 14. doi: 10.1093/nar/gkz533

hmmratac's People

Contributors

Stargazers

Watchers

hmmratac's Issues

How can I have the data of NFRs and 1Ns to create the heatmap shown in Fig. 2B of your NAR paper?

We are planing to analyze our data with HMMRATAC to investigate whether nucleososome changes in mutants for chromatin remodeling complex and related genes. To have more comprehensive understanding, we would like to examine the heapmap of the NFRs and 1Ns. Therefore, I am wondering how I could output the needed data generated by HMMRATAC.

Thank you.

FR: Removing chrM

Is your feature request related to a problem? Please describe.
Does HMMRATAC provide an easy way to remove chrM?

Describe the solution you'd like
A new parameter to remove a list of chromosomes:
e.g. --exclude chrM,GL (comma-separated)

Describe alternatives you've considered

Supply a chrM bed file to --blacklist
Run awk to remove chrM or M

Error when running HMMRATAC

When I run HMMRATAC with my sorted.bam file it gives me an error.

Exception in thread "main" java.util.InputMismatchException
        at java.util.Scanner.throwFor(Scanner.java:864)
        at java.util.Scanner.next(Scanner.java:1485)
        at java.util.Scanner.nextInt(Scanner.java:2117)
        at java.util.Scanner.nextInt(Scanner.java:2076)
        at GenomeFileReaders.GenomeFileReader.<init>(GenomeFileReader.java:48)

The code I use is the following one:

java -jar HMMRATAC_V1.2.5_exe.jar \
--bam bamfiles/${sample}/${sample}_.bam \
--index bamfiles/${sample}/${sample}_.bam.bai \
--genome genome.info \
--output ${trial}/results/hmmr/${sample}/${sample} \
--upper 20 \
--lower 10 \
--peaks True \

The genome.info file contains the names and sizes of the chromosomes of the mm9 mouse genome. This is what the file says:

chr1 197195432
chr2	181748087
chr3 159599783
chr4	155630120
chr5	152537259
chr6 149517037
chr7	152524553
chr8 131738871
chr9 124076172
chr10 129993255
chr11 121843856
chr12 121257530
chr13 120284312
chr14 125194864
chr15 103494974
chr16 98319150
chr17 95272651
chr18 90772031
chr19 61342430
chrX 166650296
chrY 15902555
chrM 16299
chr13_random	400311
chr16_random	3994
chr17_random 628739
chr1_random 1231697
chr3_random 41899
chr4_random 160594
chr5_random 357350
chr7_random 362490
chr8_random 849593
chr9_random 449403
chrUn_random 5900358
chrX_random 1785075
chrY_random 58682461

Could someone give me some advice, please?

Thank you.

HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:489)

Hi,

Thanks for your recent update.

I tried to run the code using the following command:

java -jar /Applications/bioinfo_tools/HMMRATAC-master/HMMRATAC_V1.2.1_exe.jar
-b BAM_FILE.bam -i BAM_FILE.bam.bai -g dm6.genome --bedgraph true --bgscore true -o file_hmmratac

After several minutes it started to print:
chr3L 16828 26916 training
chr3L 16917 26920 training
chr3L 16922 26931 training
chr3L 16932 26939 training
chr3L 16940 26957 training
....

And then an error message:
Exception in thread "main" java.lang.NullPointerException
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:489)

it generated 2 non empty files: file_hmmratac.log & file_hmmratac.model

Do you have any idea of the problem here ?

Version of JAVA:
$java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode)

Version of samtools 1.9 (using htslib 1.9)

Thank you very much for your help,

What threshold should be used to keep the peaks?

In my data of all samples, the results showed significant different peak numbers cross all the samples? Why that happened? What should I do with that? In the Troubleshooting part5, I checked my model, the thing I don't understand is emission parameters, what do they mean? and what are the numbers in the log files? And also, can I use different -u -l parameters to run each samples? If so, can compare the open chromatin regions of different samples?

Q: The shifted bam reads from ATACseqQC are unable to call the peaks.

Use case
A clear and concise description of what the use case is. For example: my data is from [...]

Describe the problem
A clear and concise description of what happend.

Describe the solution you tried
A clear and concise description of any solutions you've tried.

Additional context
Add any other context or screenshots about your use case here.

Q: negative standard deviation

Use case
Hello, I'm analysing some ATAC seq from different tissue and different samples of 5 instar butterflies. For some of the file the tool works properly but for some reason with others don't.

Describe the problem
the error I'm getting is this:
#################################
Exception in thread "main" org.apache.commons.math3.exception.NotStrictlyPositiveException: standard deviation (-3.722)
at org.apache.commons.math3.distribution.NormalDistribution.(NormalDistribution.java:142)
at org.apache.commons.math3.distribution.NormalDistribution.(NormalDistribution.java:107)
at org.apache.commons.math3.distribution.NormalDistribution.(NormalDistribution.java:85)
at GEMM.HMMR_EM.getWeightedDensity(HMMR_EM.java:170)
at GEMM.HMMR_EM.iterate(HMMR_EM.java:74)
at GEMM.HMMR_EM.learn(HMMR_EM.java:113)
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:228)
##########################
the reference genome (and chromosome size) is the same for all the data.

Describe the solution you tried
Previously I got the same error when I was using the wrong size file but in this situation the size is the same that worked for other samples.

Adjust the read start sites

Dear,

Tn5 transposase has been shown to bind as a dimer and inserts two adaptors into accessible DNA locations separated by 9 bp. Therefore, for downstream analysis, such as peak-calling and footprinting, all reads in input bamfile need to be shifted. The function shiftGAlignmentsList can be used to shift the reads. By default, all reads aligning to the positive strand are offset by +4bp, and all reads aligning to the negative strand are offset by -5bp. The adjusted reads will be written into a new bamfile for peak calling or footprinting.
This is the strategy used by ATACseqQC ( an bioconductor R package to perform quanlity control of ATACseq, https://bioconductor.org/packages/release/bioc/vignettes/ATACseqQC/inst/doc/ATACseqQC.html). Is it conflict with HMMRATAC ?

.bed file does not reflect Summit Name or Peak Score in .gappedPeak file

I executed the following command:

java -jar ~/scr/local/bin/HMMRATAC_V1.2.1_exe.jar --peaks true --score max --model NA.model --bedgraph true --bgscore true -b jcf7180000043062.bam -i jcf7180000043062.bam.bai -g jcf7180000043062.genome

And obtained the following outputs for .gappedPeak and .bed respectively

$ head NA_peaks.gappedPeak
jcf7180000043062        16390   18220   Peak_1  .       .       16640   17100   255,0,0 3       1,460,1 0,250,1829  3.0      -1      -1
jcf7180000043062        17100   18910   Peak_2  .       .       18220   18620   255,0,0 3       1,400,1 0,1120,1809 3.0      -1      -1
jcf7180000043062        25530   26160   Peak_3  .       .       25540   25930   255,0,0 3       1,390,1 0,10,629    3.0      -1      -1
jcf7180000043062        39050   42720   Peak_4  .       .       42230   42700   255,0,0 3       1,470,1 0,3180,3669 1.0      -1      -1
jcf7180000043062        47000   48420   Peak_5  .       .       48000   48410   255,0,0 3       1,410,1 0,1000,1419 3.0      -1      -1
jcf7180000043062        50210   52570   Peak_6  .       .       50320   50900   255,0,0 3       1,580,1 0,110,2359  2.0      -1      -1
jcf7180000043062        54940   56510   Peak_7  .       .       55560   55990   255,0,0 3       1,430,1 0,620,1569  24.0     -1      -1
jcf7180000043062        56940   57780   Peak_8  .       .       56980   57420   255,0,0 3       1,440,1 0,40,839    3.0      -1      -1
jcf7180000043062        57930   59550   Peak_9  .       .       57950   58430   255,0,0 3       1,480,1 0,20,1619   4.0      -1      -1
jcf7180000043062        67200   68820   Peak_10 .       .       67390   67850   255,0,0 3       1,460,1 0,190,1619  3.0      -1      -1

$ head NA_summits.bed
jcf7180000043062        17015   17016           0.0
jcf7180000043062        18551   18552           0.0
jcf7180000043062        25603   25604           0.0
jcf7180000043062        42296   42297           0.0
jcf7180000043062        48091   48092           0.0
jcf7180000043062        50452   50453           0.0
jcf7180000043062        55595   55596           0.0
jcf7180000043062        57381   57382           0.0
jcf7180000043062        58329   58330           0.0
jcf7180000043062        67461   67462           0.0

Any help gratefully received!

Cheers,
Michael

Why is there a large chromosome area and there is no peak？

Hi，
When I use HMMRATAC for ATAC-seq data to call peak,I found there is no peak in the interval of about 10M on a chromosome.Not all chromosomes have this kind of situation。The following figure is the result of my viewing with IGV software.

Hope your replay，thanks .

Does HMMRATAC still work when BAM reads have been stripped of sequence and base qualities such as that produced by deepTools alignmentSieve?

Originally posted by @jeffhsu3 in #46 (comment)

Add CPM normalization to track signals

Add cpm normalization to signal tracks. Will help with consistency between samples (at similar settings) and will allow data sets to be run using master models, rather than needing one for every data set.

FR: Can we add an input control into this program?

Because I found out some ATAC-Seq data has some input control, like ChIP-seq. I was wondering can we add another parameter to let the control in this program to get some more valid peaks? Thanks

Q: On the Opdf values in the log file

Hi,

I've been working with HMMRATAC to identify summits for downstream analysis but I'm somewhat unsure of some of the values describing the model.

In the examples that I found scattered in the issue section and examples I noticed that the Opdf values from my runs are lower (<1) than the values that others are getting. I started off with higher -l and -u values for "a safe bet" and running currently with lower -u and -l values (10,5 and 20,10 respectively) and experiencing the same thing. The values get higher by the state so in this sense they are in line with the expected outcome of a good model but I just want to be sure before continuing downstream.

Is this normal and how would one interpret these values in question? I attached a log file from one of the peak calling runs below. Thank you in advance!

Contents of the log file with retracted filenames.
Version: 1.2.9
Arguments Used:
--bam < .bam file >
--index < .bai file >
--genome < genome size file >
--means 75,200,400,600
--minmapq 30
--upper 40
--lower 20
--output < outprefix >
--bedgraph True
--bgscore True
Fragment Expectation Maximum Done
Mean 75.0 StdDevs 20.0
Mean 182.55893183563134 StdDevs 42.994465842745974
Mean 365.96938542104164 StdDevs 52.02589220132546
Mean 566.0401496018148 StdDevs 73.1375099614439
ScalingFactor 80.065888
Training Regions found and Zscore regions for exclusion found
Training Fragment Pileup completed
Kmeans Model:
HMM with 3 state(s)

State 0
Pi: 0.3333333333333333
Aij: 0.333 0.333 0.333
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.06 0.103 0.082 0.055 ]

State 1
Pi: 0.3333333333333333
Aij: 0.333 0.333 0.333
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.18 0.297 0.319 0.251 ]

State 2
Pi: 0.3333333333333333
Aij: 0.333 0.333 0.333
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.464 0.723 0.75 0.571 ]

Model created and refined. See < outprefix >.model
Model:
HMM with 3 state(s)

State 0
Pi: 0.3333333333333333
Aij: 0.973 0.027 0
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.037 0.086 0.071 0.043 ]

State 1
Pi: 0.3333333333333333
Aij: 0.03 0.963 0.008
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.127 0.208 0.203 0.159 ]

State 2
Pi: 0.3333333333333333
Aij: 0 0.021 0.979
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.352 0.51 0.531 0.411 ]

Genome split and subtracted masked regions
0 round viterbi done
50 round viterbi done
100 round viterbi done
131 round viterbi done
Total time (seconds)= 33066

Q: normalization of read depth

Hi,
I'm new to this type of analysis. I compared the ENCODE pipeline and this one on bulk seq data of hES cells. Encode one gave peaks on NANOG and POU5F1 which are kind of expected but HMMRATAC didn't on both genes. I'm wondering if this is caused by the low read depth (66510354 reads) of the data. Or should i lower the stringency? Thanks!

Best,
Shanzheng

BR: java.lang.NumberFormatException

I am running HMMRATAC Version 1.2.10-0 from conda installed into singularity image with conda 3 base. Trying to call peaks on mm10 based data.

Here is the command that is being run:

singularity run hmmratac.sif java -Xms512m -Xmx10g -jar /opt/conda/share/hmmratac-1.2.10-0/HMMRATAC.jar -b input.bam -i input.bam.bai -g chrom2.info -o testing -e mm10-blacklist.v2.bed --window 1250000 --score all

Run proceeds to step where testing_peaks.gappedPeak and testing_summits.bed are generated. The following error is then reported and run ends with both gappedPeak and summits files empty:

Exception in thread "main" java.lang.NumberFormatException: For input string: "1.0_0.740740740740741_0.0_0.04420339549616825_1.4173579467268351"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
	at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
	at java.lang.Double.parseDouble(Double.java:538)
	at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:660)

Based on the error, I suspect that --score all is to blame

Run Log:

Fragment Expectation Maximum Done
Mean	50.0	StdDevs	20.0
Mean	187.2568319263678	StdDevs	52.04863123120787
Mean	376.3211422102428	StdDevs	48.231939080159705
Mean	599.1807715834468	StdDevs	96.32246076484671
ScalingFactor	1.087837
Training Regions found and Zscore regions for exclusion found
Training Fragment Pileup completed
Kmeans Model:
HMM with 3 state(s)

State 0
  Pi: 0.3333333333333333
  Aij: 0.333 0.333 0.333
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.016 0.017 0.153 0.133 ]

State 1
  Pi: 0.3333333333333333
  Aij: 0.333 0.333 0.333
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.455 0.839 0.284 0.152 ]

State 2
  Pi: 0.3333333333333333
  Aij: 0.333 0.333 0.333
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.736 1.409 1.274 0.948 ]

Model created and refined. See testing.model
Model:
HMM with 3 state(s)

State 0
  Pi: 0.3333333333333333
  Aij: 0.967 0.028 0.005
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0 0 0 0 ]

State 1
  Pi: 0.3333333333333333
  Aij: 0.05 0.941 0.009
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.443 0.81 0.196 0.027 ]

State 2
  Pi: 0.3333333333333333
  Aij: 0.008 0.007 0.984
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.342 0.626 0.822 0.692 ]

Genome split and subtracted masked regions
50 round viterbi done
100 round viterbi done
150 round viterbi done
200 round viterbi done
250 round viterbi done
300 round viterbi done
349 round viterbi done

Discrepancy from the same dataset

Dear,

I mapped my paired-end ATAC-seq fastq file to ensemble GRCm38 genome and UCSC mm10 genome respectively using bowtie2. The only difference between GRCm38 and mm10 is the chromosome order, which is 1,2,3,,4,5,6,7,8,9,10,11,12,13,14,15,1,6,1,7,18,19,20,X,Y,MT for GRCm38 and chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr1,chr2,chr3,
chr4,chr5,chr6,chr7,chr8,chr9,chrM,chrX,chrY for mm10. I have checked the two bam files, I am sure that the only difference is the chromosome order. After running HMMRATAC with the same set up, I find the results differs a lot. HMMRATAC used regions from chromosome 11 to train the model (GRCm38), while used regions from chromosome 9 to train the model (mm10). For GRCm38, 30624 peaks was called, while for mm10 it's 976422 ! Could you give me some advices ?
By the way, does the chromosome order in genome_info file would affect the results ? And, the default value of --score “Max” refers to the maximum number of reads mapping to the open region, is the "Max" the same as the maximum coverage of the open region ?

memory error using conda

Hi,

I tried running HMMRATAC. Unfortunately, it's not going great. Execution time is quite long and on top of that, the command seems to be throwing the memory error. I have secured 40GB of RAM for this process, and I'm running it on the 3.2GB alignment file. I reduced the window size but still was given this error. I am using 1.2.9 version.

How much memory do I have to provide and what is the expected run time of the algorithm?

$ HMMRATAC --bam ../../alignment/filtered/XXX.bam --index ../../alignment/filtered/XXX.bam.bai --genome /ref/hg38_sizes.genome --blacklist /ref/blacklisted/hg38.blacklist.bed --bedgraph true --output XXX --window 25000 |& tee ../../logs/hmmratac_XXX.log
0Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOf(Arrays.java:3181)
        at java.util.ArrayList.grow(ArrayList.java:261)
        at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
        at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
        at java.util.ArrayList.add(ArrayList.java:458)
        at WigMath.pileup.toBedGraph(pileup.java:214)
        at WigMath.pileup.build(pileup.java:205)
        at WigMath.pileup.<init>(pileup.java:70)
        at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:270)

-k option disagreement between help and userguide

found this disagreement between userguide and help message.
User guide:
-k , --kmeans Number of states in the model. Default is 4.
help message:
-k , --kmeans Number of States in the model. Default = 3.

Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!

Hi,

I'm trying to run HMMRATAC but had this error:

Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.base/java.util.TimSort.mergeLo(TimSort.java:777) at java.base/java.util.TimSort.mergeAt(TimSort.java:514) at java.base/java.util.TimSort.mergeCollapse(TimSort.java:441) at java.base/java.util.TimSort.sort(TimSort.java:245) at java.base/java.util.Arrays.sort(Arrays.java:1514) at java.base/java.util.ArrayList.sort(ArrayList.java:1585) at java.base/java.util.Collections.sort(Collections.java:176) at HMMR_ATAC.SubtractBed.subtract(SubtractBed.java:131) at HMMR_ATAC.SubtractBed.<init>(SubtractBed.java:42) at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:465)

After merging my replicates and sorting the merged file, I ran this command:
java -jar ~/HMMRATAC_V1.2.8_exe.jar -b /path/to/atac.forHMMRATAC.bam -i /path/to/atac.forHMMRATAC.bam.bai -g /path/to/mm9.size -e /path/to/blacklist.bed

My genome file is:

chr1 197195432
chr2 181748087
chr3 159599783
chr4 155630120
chr5 152537259
chr6 149517037
chr7 152524553
chr8 131738871
chr9 124076172
chr10 129993255
chr11 121843856
chr12 121257530
chr13 120284312
chr14 125194864
chr15 103494974
chr16 98319150
chr17 95272651
chr18 90772031
chr19 61342430
chrX 166650296
chrY 15902555
chrM 16299

Thank you for your help!

BR: Significant slow down in performance time compared to older versions

I've noticed that running the simple HMMRATAC example with files from the TestData directory on different versions of HMMRATAC (a presumably older one found at the link https://raw.githubusercontent.com/taoliu/HMMRATAC/master/store/HMMRATAC.jar) and the newer ones that I either obtained from the the most recent gradle artifacts (https://github.com/LiuLabUB/HMMRATAC/commit/f6a09324ea61f8b599d17e9412a569da474f382b/checks?check_suite_id=349972561) or the release page (https://github.com/LiuLabUB/HMMRATAC/releases) results in the presumably older version of HMMRATAC running much faster than the newer ones. The newer one seem to take over an hour to run while the presumably older one takes about 10-15 minutes. Is there a reason for this? I've noticed that this older version does not include the "*_training.bed" file on output, so perhaps this is a reason.

The example I am using is just running the jar with these various jar files.

java -jar HMMRATAC.jar -b TestData/65283_treat_rep1.chr22.bam -i TestData/65283_treat_rep1.chr22.bam.bai -g TestData/hg19.chr22.genome

Hard-coded duplicate removal

Duplicate removal prior to peak calling makes perfect sense when calling peaks on tissue-level ATAC-seq experiments. However, single-cell experiments can retain identical fragments derived from different cells. Thus, I am curious if future releases will be sensitive to this characteristic of single cell ATAC-seq data?

java.io.FileNotFoundException: (Too many open files) error

Hi,
Hope you are doing well. I am trying to run HMMRATAC and it starts/ runs initially without any issues. It creates the model and starts with viterbi algorithm. But soon after that, it stops and throws up what I think is a "too many open files" error. I was just wondering if this is a known issue for you and/ or if I need to add other options to my java usage. The actual usage command and error message are below.

I am running this on a cluster and hope that the open file limit is not a problem ( for me, ulimit -n 1024). If this is a potential issue then I would need to request more from the admin.

Please let me know, when you get a chance.

Thanks and Warm Regards,
Sudarshan.

Usage
java -jar HMMRATAC_V1.0_exe.jar
-b file.bam
-i file.bai
-g genomefile.txt
-w file.bigwig
--bedgraph true
--bgscore true
-o file_hmmratac

Actual error
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)

Caused by: java.io.FileNotFoundException:file.bigwig (Too many open files)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at org.broad.tribble.util.SeekableFileStream.(SeekableFileStream.java:40)
at org.broad.igv.bbfile.BBFileReader.(BBFileReader.java:127)
at HMMR_ATAC.GetSignal.find(GetSignal.java:72)
at HMMR_ATAC.GetSignal.(GetSignal.java:40)
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:417)
... 5 more

BR: Gradle command line test is failing

The gradle action "Run command line test" is failing on the feature-CICD branch. It's error is:

Run java -jar ../build/libs/HMMRATAC-snapshot-exec.jar -b 65283_treat_rep1.chr22.bam -i 65283_treat_rep1.chr22.bam.bai -g hg19.chr22.genome -o Test2 --model 65283_chr22.model
Error: Unable to access jarfile ../build/libs/HMMRATAC-snapshot-exec.jar
##[error]Process completed with exit code 1.

It's not clear where the artifact being generated is.

FR: Please include some sample data that will produce a known result

Hi there! I've been trying to validate this tool before installing, and I'm running into some trouble finding the correct kind of data to give it.

problem

I'm an admin trying to test this software on behalf of a user. I have some .bam files that other admins use as test data, but when I follow the quickstart steps, I get errors telling me that the .bam file header is incorrect or that the data needs to be paired read. I realize that these are probably trivial errors that someone with an omics background could solve easily, but it would be nice to be able to install and test this software on behalf of someone else.

solution

Including some test data with the source code along with (real) example commands and expected results would greatly help people like me who install and validate software on behalf of other users.

additional benefits

Having some test data that produces a known result seems like it would be a prerequisite for #36, and might produce a nice starting point.

Thanks for considering this feature request!

😄

Q: What should we do to output the HMMRATAC results in a format so that differential peak analysis can be conducted? What tool for differential peak analysis do you recommend?

Use case
A clear and concise description of what the use case is. For example: my data is from [...]

Describe the problem
A clear and concise description of what happend.

Describe the solution you tried
A clear and concise description of any solutions you've tried.

Additional context
Add any other context or screenshots about your use case here.

How to find the four chromatin states from output files?

Hi,
HMMRATAC identifies a set of training regions throughout the genome to train a 4-states Hidden Markov Model (HMM) with multivariate Gaussian emissions, representing the underlying four chromatin states of 1) the center of open chromatin, 2) the flanking regions, 3) the non-flanking regions and 4) the background.I used HMMRATAC by default parameters,but the output file( Name_peaks.gappedPeak ) just indicate score of standard peaks and standard peaks with the position of flanking nucleosome and open state , not include four chromatin states .And could HMMRATAC output the score of nucleosome to measure nucleosome occupancy？
Hope your replay，thanks .

What is the major difference between MACS and HMMRATAC?

Hi，
what is the difference betwwen MACS2 and HMMRATAC in peak calling when input ATAC-seq data,

and the difference in detecting chrom stat when using iNPS and HMMRATAC ?

How to define Open Chromatin Regions?

My purpose is to use the open chromatin regions to do MEME analysis to get the potential motif in the open chromatin regions, in this way, which column I should use in the Name_peaks.gappedPeak
? Is the peak start and peak end columns, or the Open state start and end columns? Thanks!

Q: Which output file is suggested to use for downstream Differential Accessibility of nucleosome free regions?

Should the gappedpeak be converted to a narrow peak to read in to Diffbind or use the summits.bed file instead. I noticed the ranges are slightly different between the two (nucleosome free in gappedpeak and range in summits). Does the summits file only contain nucleosome free regions?

Tn5 cut site shift

Hi,

Should the bam files that hmmratac runs on be adjusted for the Tn5 transposase occupancy prior to peak calling?

download HMMRATAC always error and slowly

Hi dear developers,
Thank you so much for developing.
I use centos and the java version is

$ java -version
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)```
I download it

$ nohup wget https://github.com/LiuLabUB/HMMRATAC/releases/download/1.2.9/HMMRATAC_V1.2.9_exe.jar &```
but it really strange that I tried HMMRATAC_V1.2.9_exe.jar/HMMRATAC_V1.2.8_exe.jar/HMMRATAC_V1.2.7_exe.jar it always error and very very slowly.Attachment is my nohup.out file.I am not good at java and I really don't know if it is just because my slow Internet speed.Could you please give me some advice about this stupid question?Thank you so much in advance.
nohup.log

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

One of my samples won't complete all the jobs within the time allotted while the other jobs do finish.

Here is the error:
[jimcdonald@login4 ATACseq.HMMR.STDERR.STDOUT.files]$ cat HMMR.v1.2.1.blacklist.A2780.ATACseq.library001.treat1.mock.3535969.err
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at WigMath.pileup.toBedGraph(pileup.java:192)
at WigMath.pileup.build(pileup.java:184)
at WigMath.pileup.(pileup.java:50)
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:223)
slurmstepd: error: *** JOB 3535969 ON node079 CANCELLED AT 2019-01-08T11:46:06 DUE TO TIME LIMIT ***

However, I generally only get the slurmstepd message without the java exception.

The SBATCH script I used to submit the samples is attached.
HMMRATAC_GC_Troubleshooting_SBATCH.txt

I think this could just be a problem with the size of the sequencing library for this sample. Its BAM file is 12 GB. I have had successful runs with 8.4 GB and 7.2 GB BAM files, but the majority of my other samples are around 3-4 GB.

Is this the issue or is it some form of user error? Any ideas on how to fix it? Thank you for all your help!

Add threshold option

Add a score option to report peaks/summits above certain threshold. Rather than have it a post-analysis process. Default to a value comparable to MACS2 -q 0.05. Will solve many users issues

Are the ATACseq data with fragment distribution of NFRs, 1Ns, and (slightly) 2Ns suitable for HMMRATAC

I am wondering whether my data belong to the proper data with multi-peak distribution of fragment length, mentioned at #28 since the peaks of >1N are not clear or even not there.

Do you think my data are suitable for HMMRATAC? The species is budding yeast with the genome size of 12M bp. We sequenced with 10M reads (2x150bp)/5K cells. The distribution shown here are from 3 repeats. That is 30 M reads.

Another unrelated question is what the sharp peak of 150bp could be.

Thank you.

FR: CI/CD through Travis

This issue ticket is to track our progress to build a CI/CD pipeline for HMMRATAC in TravisCI. We need to implement:

Auto-compilation and packaging triggered by each commit
Testing triggered by each commit: need testing codes and testing data
Integration test triggered by each PR
Auto-release

Current working branch will be "feature-CICD"

Replicate handling

In the paper (HMMRATAC: a Hidden Markov ModeleR for ATAC-seq) I see you just merge the fastq files for the replicates. Is this the recommended way of doing it? Or is there a more 'sophisticated' way?

Exception in thread "main" java.lang.RuntimeException: The dataset should not be empty

Dear LiuLab,
When I ran HMMRATAC, I met this error:

Exception in thread "main" java.lang.RuntimeException: The dataset should not be empty
at net.sf.javaml.clustering.KMeans.cluster(KMeans.java:120)
at RobustHMM.KMeansToHMM.build(KMeansToHMM.java:86)
at RobustHMM.KMeansToHMM.(KMeansToHMM.java:31)
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:344)

What's the problem might be?
Thanks

Can HMMRATAC be set to use multiple threads?

Hi,
Can HMMRATAC be set to run using multiple threads? I could not find any indication of this from the documentation.
Or could I just set it through java itself with something like the following?
java -XX:ParallelGCThreads=<num_of _threads> -jar HMMRATAC_V1.2.4_exe.jar

HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:636

I am getting the "HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:636" for all the samples.

Here is the command.

java -Xmx8g -jar HMMRATAC/HMMRATAC_V1.2.4_exe.jar -b sorted.bam -i sorted.bam.bai -g genome.info -e blacklist.bed -o out.file

It does create log and model file.

Here is the tail of log file
181 round viterbi done
182 round viterbi done
183 round viterbi done
184 round viterbi done
185 round viterbi done

Keyur

'--score all' option adds only a single value

Hi,

I am trying to run HMMRATAC v1.2.3 with the following options

java -jar HMMRATAC_V1.0_exe.jar
-b file.bam
-i file.bai
-g genomefile.txt
-w file.bigwig
--bedgraph true
--bgscore true
--score all
-o file_hmmratac

According to the guide the --score all option should produce all types of scores separated by an 'underscore'. But I get only the default score in the relevant output. Just wondering if I am misunderstanding the score option or if this is a bug. Please let me know when you get a chance.

Thank you!

illegal output format

I got this line in the gappedPeak output:
chr13 51779070 51780230 Peak_298118 . . 51779600 51780230 255,0,0 3 1,630,1 0,530,1159

Note that 630+530 = 1160. Blocks are overlapping.
Why are there these flanking 1-bp blocks anyway?

State3_regions.bed is 0 means no open region and my data quality is poor?

Hi I ran HMMRATAC in openjdk version "1.8.0_191" with attachment of log file.
As.log

I have known from #2 which you said

in most cases, E0 means the background state. E1 is the non-flanking nucleosome, E2 is the flanking nucleosome and E3 is the open region in the bedgraph file.

so I grep the 4 files

grep E0 As.bedgraph > As.State0_regions.bed
grep E1 As.bedgraph > As.State1_regions.bed
grep E2 As.bedgraph > As.State2_regions.bed
grep E3 As.bedgraph > As.State3_regions.bed

but the State3_regions.bed is 0.

-rw-rw-r--. 1 qiaowei qiaowei 157M Oct  6 21:34 As.State0_regions.bed
-rw-rw-r--. 1 qiaowei qiaowei 159M Oct  6 21:33 As.State1_regions.bed
-rw-rw-r--. 1 qiaowei qiaowei 321K Oct  6 21:32 As.State2_regions.bed
-rw-rw-r--. 1 qiaowei qiaowei    0 Oct  6 21:14 As.State3_regions.bed
-rw-rw-r--. 1 qiaowei qiaowei 1.7K Oct  6 18:16 As.log
-rw-rw-r--. 1 qiaowei qiaowei 140K Oct  6 18:16 As_peaks.gappedPeak
-rw-rw-r--. 1 qiaowei qiaowei 316M Oct  6 18:16 As.bedgraph
-rw-rw-r--. 1 qiaowei qiaowei  51K Oct  6 18:16 As_summits.bed
-rw-rw-r--. 1 qiaowei qiaowei 1.4K Oct  5 19:17 As.model
-rw-rw-r--. 1 qiaowei qiaowei  26K Oct  5 18:48 As_training.bed
-rw-rw-r--. 1 qiaowei qiaowei 2.4M Oct  5 13:07 As.sorted.bam.bai
-rw-rw-r--. 1 qiaowei qiaowei 5.4G Oct  5 12:35 As.sorted.bam

I just want to know if it means there is no open region and my data quality is so poor? Besides, how can I know my data quality and how to trim the data in order to obtain some open region data in HMMRATAC?
Another thing is I have not add -e blacklist.bed for some install problem which I haven't fixed now this time.Does the poor quality is caused by blacklist? And I also read #23 after I solved blacklist install problem I can have a test.
Thank you so much for any help.

java.lang.ClassNotFoundException: net.sf.samtools.SAMFormatException

Hi,

First, this is a very interesting paper, congratulation!

I tried to used it by writing the following command:
java -jar /Applications/bioinfo_tools/HMMRATAC-master/HMMRATAC_V1.0_exe.jar
-b BAM_FILE.bam -i BAM_FILE.bam.bai -g dm6.genome -w BIGWIG_FILE.bw --bedgraph true --bgscore true -o file_hmmratac

Note that the bam has been obtain using bowtie2+samtools, and has been sorted.

However, instantaneously, I obtain the following error message:

Exception in thread "main" java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.NoClassDefFoundError: net/sf/samtools/SAMFormatException
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:165)
... 5 more
Caused by: java.lang.ClassNotFoundException: net.sf.samtools.SAMFormatException
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:466)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:566)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:499)
... 6 more

Do you have any idea of the problem here ?

Thank you very much,

Version of Java:
java version "10.0.1" 2018-04-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.1+10)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)

Version of samtools:
samtools 1.8

BR: HMMRATAC fails due to SAM validation error?

Describe the bug

HMMRATAC fails with this error message:

Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 292169, Read name A00519:269:H7FM2DRXX:1:1218:5303:31219, Zero-length read without FZ, CS or CQ tag
	at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
	at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:541)
	at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
	at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481)
	at net.sf.samtools.BAMFileReader$BAMQueryFilteringIterator.advance(BAMFileReader.java:750)
	at net.sf.samtools.BAMFileReader$BAMQueryFilteringIterator.next(BAMFileReader.java:720)
	at net.sf.samtools.BAMFileReader$BAMQueryFilteringIterator.next(BAMFileReader.java:673)
	at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672)
	at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650)
	at WigMath.pileup.makeBlock(pileup.java:228)
	at WigMath.pileup.build(pileup.java:205)
	at WigMath.pileup.<init>(pileup.java:70)
	at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:270)

Here is the actual reads that HMMRATAC complained about:

$ samtools view possorted_bam.bam | grep -F "A00519:269:H7FM2DRXX:1:1218:5303:31219"
A00519:269:H7FM2DRXX:1:1218:5303:31219	69	chr6	148759869	0	*	=	148759869	0	*	*	MC:Z:12S37M	AS:i:0	XS:i:0	CR:Z:GGTGAAGCACTGTCGG	CY:Z:FF:F,FFFFFFFFFFF	CB:Z:GGTGAAGCACTGTCGG-1	BC:Z:CTTTCGAC	QT:Z::FFFFFFF	TR:Z:TGTCTCTTATACACATCTCCGAGCCCACGAGACCTTTCGACATCTCGTAT	TQ:Z:FFFFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFF	GP:i:-1	MP:i:1211301816	MQ:i:22	RG:Z:atac_pbmc_500_v1:MissingLibrary:1:H7FM2DRXX:1
A00519:269:H7FM2DRXX:1:1218:5303:31219	137	chr6	148759869	22	12S37M	=	148759869	0	GGGGGGGGTTTTAAAAAAAAAAAAAACCAAAACAAACAAAAAAAAAAAC	FF,FF:,:,,F,,,,,:F:::F,,,,,F:,,:,,,:,F,:,:,,,F::,	NM:i:0	MD:Z:37	AS:i:37	XS:i:32	XA:Z:chr13,+32718338,17S32M,0;chr3,-57922436,5S32M12S,0;chrX,-147034285,5S32M12S,0;chr3,-184460874,2S35M12S,1;chr20,-52312303,9S30M10S,0;	CR:Z:GGTGAAGCACTGTCGG	CY:Z:FF:F,FFFFFFFFFFF	CB:Z:GGTGAAGCACTGTCGG-1	BC:Z:CTTTCGAC	QT:Z::FFFFFFF	GP:i:1211301816	MP:i:-1	MQ:i:0	RG:Z:atac_pbmc_500_v1:MissingLibrary:1:H7FM2DRXX:1

To Reproduce

Only using these absolutely mandatory parameters: --bam, --index, --genome, and --output. Nothing else.

The dataset is 10x's public datasets: 500 PBMC from a healthy donor.

System (please complete the following information):

Open JDK 8
HMMRATAC v1.2.9

Thanks for your help.

If my data quality is not good, do this software can still be used to identify open chromatin regions?

Quality control of ATAC-Seq data showed that one of samples showed bad qualities, since this method is based on machine learning (self-study process), I was wondering whether HMMRATAC can be used to detect open chromatin regions for this poor-quality data

genome.info is empty

When I use the command you provided to generate the genome.info I end up with an empty genome.info. These are the first couple lines of the bam file:

samtools view -H /vol/atacpipeline/dedup/GSM2837484-Bl71nemr.bam | head
@HD	VN:1.5	SO:coordinate
@SQ	SN:FLLO01000001.1	LN:10571537
@SQ	SN:FLLO01000002.1	LN:9833161
@SQ	SN:FLLO01000003.1	LN:8797521
@SQ	SN:FLLO01000004.1	LN:8740412
@SQ	SN:FLLO01000005.1	LN:8020686
@SQ	SN:FLLO01000006.1	LN:7953648
@SQ	SN:FLLO01000007.1	LN:7634011
@SQ	SN:FLLO01000008.1	LN:7394200
@SQ	SN:FLLO01000009.1	LN:6209474

And when I run this no output is generated:

samtools view -H /vol/atacpipeline/dedup/GSM2837484-Bl71nemr.bam| perl -ne 'if(/^@SQ.*?SN:(\w+)\s+LN:(\d+)/){print $1,"\t",$2,"\n"}'

perl -v
This is perl 5, version 26, subversion 2 (v5.26.2) built for x86_64-linux-thread-multi 
...

What am I doing wrong?

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException

Evan,

I have run HMMRATAC v1.2, on 7 cell lines. It appears to have worked on 5 of them, but 2 of them produced errors which prevented the production of any output. The errors are the following, and are consistent across all 4 individual samples I have for each cell line:

For the Hey cell line:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 14
at HMMR_ATAC.ArgParser.set(ArgParser.java:228)
at HMMR_ATAC.ArgParser.(ArgParser.java:44)
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:74)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 18
at HMMR_ATAC.ArgParser.set(ArgParser.java:228)
at HMMR_ATAC.ArgParser.(ArgParser.java:44)
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:74)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 18
at HMMR_ATAC.ArgParser.set(ArgParser.java:228)
at HMMR_ATAC.ArgParser.(ArgParser.java:44)
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:74)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 18
at HMMR_ATAC.ArgParser.set(ArgParser.java:228)
at HMMR_ATAC.ArgParser.(ArgParser.java:44)
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:74)

For the Kuramochi cell line:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 14
at HMMR_ATAC.ArgParser.set(ArgParser.java:212)
at HMMR_ATAC.ArgParser.(ArgParser.java:44)
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:74)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 18
at HMMR_ATAC.ArgParser.set(ArgParser.java:212)
at HMMR_ATAC.ArgParser.(ArgParser.java:44)
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:74)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 18
at HMMR_ATAC.ArgParser.set(ArgParser.java:212)
at HMMR_ATAC.ArgParser.(ArgParser.java:44)
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:74)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 18
at HMMR_ATAC.ArgParser.set(ArgParser.java:212)
at HMMR_ATAC.ArgParser.(ArgParser.java:44)
at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:74)

The only difference between the two is: at HMMR_ATAC.ArgParser.set(ArgParser.java:228) vs at HMMR_ATAC.ArgParser.set(ArgParser.java:212).

I tried to track down the issue by digging around in the source code zip file for the release. If the (ArgParser.java:XXX) numbers are line numbers, then I think the subroutine is passed an incorrect set of files as line 44 of ArgParser.java deals with receiving the strings passed by the main program.

What confuses me is that the script I am using to submit the jobs to the SLURM job manager has the exact same inputs for all the cell lines. All I do is change the name of the directory in each script but the commands are all the same (see below).

An example of this code follows:
#!/bin/bash

#SBATCH -o HMMR.v1.2.TykNu.ATACseq.library001.treat1.mock.%j.out # File for STD OUT
#SBATCH -e HMMR.v1.2.TykNu.ATACseq.library001.treat1.mock.%j.err # File for STD ERR
#SBATCH -p defq # Node queue to submit to
#SBATCH -N 1 # Number of nodes to reserve
#SBATCH -D /lustre/groups/chiappinellilab/processed_data/ATACseq.bowtie2 # Path to working directory
#SBATCH -J HMMR.v1.2.TykNu.ATACseq.library001.treat1.mock # Job TykNu.ATACseq.library001.treat1.mock
#SBATCH --export=NONE # Don't export my environmental variables to this job
#SBATCH -t 2-00:00:00 # Time to reserve
#SBATCH --nice=100 # Niceness value

ml jdk/1.7.0
d=TykNu.ATACseq.library001.treat1.mock
java -jar /groups/chiappinellilab/software/executables/HMMRATAC_V1.2_exe.jar -m 75,200,400,600 --bedgraph true -o ${d}/${d}.HMMRv1.2.u-10.l-02 -b ${d}/.sorted.bam -i ${d}/.sorted.bam.bai -g /groups/chiappinellilab/genomes/hg38/ucsc.analysisSet/hg38.analysisSet.chroms.sizes -w ${d}/.bw
java -jar /groups/chiappinellilab/software/executables/HMMRATAC_V1.2_exe.jar -m 75,200,400,600 --bedgraph true -u 20 -l 10 -o ${d}/${d}.HMMRv1.2.u-20.l-10 -b ${d}/.sorted.bam -i ${d}/.sorted.bam.bai -g /groups/chiappinellilab/genomes/hg38/ucsc.analysisSet/hg38.analysisSet.chroms.sizes -w ${d}/.bw
java -jar /groups/chiappinellilab/software/executables/HMMRATAC_V1.2_exe.jar -m 75,200,400,600 --bedgraph true -u 40 -l 20 -o ${d}/${d}.HMMRv1.2.u-40.l-20 -b ${d}/.sorted.bam -i ${d}/.sorted.bam.bai -g /groups/chiappinellilab/genomes/hg38/ucsc.analysisSet/hg38.analysisSet.chroms.sizes -w ${d}/.bw
java -jar /groups/chiappinellilab/software/executables/HMMRATAC_V1.2_exe.jar -m 75,200,400,600 --bedgraph true -u 30 -l 10 -o ${d}/${d}.HMMRv1.2.u-30.l-10 -b ${d}/.sorted.bam -i ${d}/.sorted.bam.bai -g /groups/chiappinellilab/genomes/hg38/ucsc.analysisSet/hg38.analysisSet.chroms.sizes -w ${d}/.bw

Do you have any ideas why the script would be inconsistently giving this error?

In the meantime, I'll move forward with the cell lines I do have data for. Thank you for fixing the "too many files" exception!

How is it supposed to be executed?

Section 2 of the readme describes the options but does not tell the user how to execute the program.
I am not a java programmer. I tried java -jar HMMRATAC_V1.0_src.jar with the result:
no main manifest attribute, in HMMRATAC_V1.0_src.jar

My java: openjdk version "1.8.0_162"

Training bed not outputted when blacklist list included

I found that when blacklist was included with -e blacklist, there was no Name_training.bed in the output, while when blacklist was not included, Name_traning.bed was generated in the output. Are there any speical purposes for this ?

Originally posted by @wangjiawen2013 in #22 (comment)

liulabub / hmmratac Goto Github PK

hmmratac's Introduction

HMMRATAC

hmmratac's People

Contributors

Stargazers

Watchers

Forkers

hmmratac's Issues

problem

solution

additional benefits

Recommend Projects

Recommend Topics

Recommend Org