Giter Club home page Giter Club logo

hmmratac's Introduction

HMMRATAC

Quick Start

Assume that you have a BAM file from aligner such as bwa mem named ATACseq.bam.

  1. Sort the BAM file to get a ATACseq.sorted.bam file:

    samtools sort ATACseq.bam -o ATACseq.sorted.bam

  2. Make index from the BAM file to get a ATACseq.sorted.bam.bai file:

    samtools index ATACseq.sorted.bam ATACseq.sorted.bam.bai

  3. Make genome information (chromosome sizes) from the BAM file to get a genome.info file:

    samtools view -H ATACseq.sorted.bam| perl -ne 'if(/^@SQ.*?SN:(\w+)\s+LN:(\d+)/){print $1,"\t",$2,"\n"}' > genome.info

  4. Run HMMRATAC on the sorted BAM ATACseq.sorted.bam, the BAM index file ATACseq.sorted.bam.bai, and the genome information file genome.info:

    java -jar HMMRATAC_V1.2.4_exe.jar -b ATACseq.sorted.bam -i ATACseq.sorted.bam.bai -g genome.info

  5. Filter HMMRATAC output by the score, if desired. Score threshold will depend on dataset, score type and user preference. A threshold of 10 would be:

    awk -v OFS="\t" '$13>=10 {print}' NAME_peaks.gappedPeak > NAME.filteredPeaks.gappedPeak

    To filter the summit file by the same threshold:

    awk -v OFS="\t" '$5>=10 {print}' NAME_summits.bed > NAME.filteredSummits.bed

    NOTE: HMMRATAC will report all peaks that match the structure defined by the model, including weak peaks. Filtering by score can be used to retain stronger peaks. Lower score = higher sensitivity and lower precision, Higher score = lower sensitivity and higher precision.

Samtools can be downloaded here: http://www.htslib.org/download/

Be sure to run HMMRATAC using the executable file, found here: https://github.com/LiuLabUB/HMMRATAC/releases For details on HOW to run HMMRATAC, see HMMRATAC_Guide.md, which contains a thorough runthrough of all parameters, output files and input requirements and troubleshooting.

HMMRATAC requires paired-end data. Single-end data will not work. HMMRATAC is designed to process ATAC-seq data that hasn't undergone any size selection, either physical or in silico. This should be standard practice for any ATAC-seq analysis. Size selected data can be processed by HMMRATAC (see HMMRATAC_Guide.md on --trim option).

If you use HMMRATAC in your research, please cite the following paper:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.