spundhir / pare Goto Github PK

View Code? Open in Web Editor NEW

10.0 3.0 5.0 824 KB

PARE: a computational method to Predict Active Regulatory Elements

Home Page: http://spundhir.github.io/PARE

License: GNU General Public License v3.0

Makefile 0.07% Shell 59.37% Perl 31.24% R 4.26% C 5.06%

chip-seq nuclesosome ndr histone enhancer promoter

pare's Introduction

README file for PARE (v0.08)

Changes

important bug fix during FDR computation in the previous version (v0.07).

now user can control the stringency level at which to search for NFRs - using the -l parameter.

now performs normalization of read coverage (sequencing depth normalization) after removing the PCR duplicates.

since v0.06 PARE supports one or more BAM files (replicates) as input. Previous versions of PARE (0.01 - 0.05) supports only two BAM files (replicates) as input. In practical terms, -j parameter is deprecated and -p parameter requires an argument now.

Description

PARE is a computational method to Predict Active Regulatory Elements (enhancers and promoters). It implements a novel approach to detect Peak-Valley-Peak (PVP) pattern defined based on H3K4me1 and H3K4me3 signal at enhancers and promoters, respectively.

Genomic regions enriched for H3K4me1 PVP pattern are predicted as active enhancers.

Genomic regions enriched for H3K4me3 PVP pattern are predicted as active promoters.

Version

0.08

Citation

Please cite:

Pundhir S, Bagger FO, Lauridsen FB, Rapin N, Porse BT. (2016) Peak-valley-peak pattern of histone modifications delineates active regulatory elements and their directionality. Nucleic Acids Res. [PMID: 27095194].

Programs and datasets

Generally, the user would be interested in following two scripts:

pare: it is the main script to detect enhancers or promoters based on input BAM files (H3K4me1 - enhancers; H3K4me3 - promoters).

bed2direction: this script is used to detect directionality of stable transcription at promoter regions, provided as input in BED format.

An example dataset and expected results are available at:

http://servers.binf.ku.dk/pare/download/test_run/

Installation

To install PARESuite, download PARESuite.tar.gz and unpack it. A directory, PARESuite will be created

tar -zxvf PARESuite.tar.gz

Now compile and create executable blockbuster

make or make all

Export environment variable 'PAREPATH' containing path to PARESuite installation directory

export PAREPATH=<path to PARESuite installation directory>

Add 'PAREPATH' to your 'PATH' environment variable

export PATH=$PATH:$PAREPATH/bin

Add 'PAREPATH' to your 'PERL5LIB' environment variable

export PERL5LIB=$PERL5LIB:$PAREPATH/share/perl/

To permanently add or update the environment variable(s), add the last three export commands in your ~/.bashrc file

Dependency

We assume that the following programming platforms are installed and working: perl, R, and gcc. Besides, following packages should be installed.

Install the needed perl modules

sudo cpan Tie::IxHash Statistics::Basic

R modules are installed by entering R (type R on the cmdline) and then enter the following three commands (follow the instructions on the screen):

install.packages(c("ggplot2", "gridExtra", "optparse", "randomForest", "e1071"))

source("http://bioconductor.org/biocLite.R")

biocLite(c("DESeq"))

download samtools from http://sourceforge.net/projects/samtools/files/samtools/1.2/samtools-1.2.tar.bz2/download, go to the download location and do

tar xjf samtools-1.2.tar.bz2

cd samtools-1.2

make -j10 prefix=$HOME install

download bedtools from https://github.com/arq5x/bedtools2/releases/download/v2.23.0/bedtools-2.23.0.tar.gz, go to the download location and do

tar xzf BEDTools.v2.23

cd bedtools-2.23.0/

make -j 10

cp bin/* $HOME/bin

download featureCounts (subread) from http://sourceforge.net/projects/subread/files/subread-1.4.6-p4/, go to the download location and do

tar xzf subread-1.4.6-p4-Linux-x86_64.tar.gz

cd subread-1.4.6-p3-Linux-x86_64

cp bin/featureCounts $HOME/bin

download bedGraphToBigWig from http://hgdownload.soe.ucsc.edu/admin/exe/ for your operating system, go to the download location and do

cp bedGraphToBigWig $HOME/bin

chmod 755 $HOME/bin/bedGraphToBigWig

download macs2 version 2.1.0 from https://github.com/taoliu/MACS/, go to the download location and install as mentioned in INSTALL.rst file

Usage

PARESuite is called with the following parameters

pare -i <BAM file(s)> [OPTIONS]

Example

An usage example of PARESuite is shown below. As input, the method requires mapped reads in BAM format. An example dataset and expected results are available at http://servers.binf.ku.dk/pare/download/test_run/

pare -i data/h3k4me1_helas3_Rep1.bam,data/h3k4me1_helas3_Rep2.bam -o results -m hg19 -p 10 &> pare.log

Input

As input, the method requires one or more BAM files correspondng to each replicate of H3K4me1 (enhancer prediction) or H3K4me3 (promoter prediction) ChIP-seq experiment. The name of the input file(s) should be formatted as

Input file name (replicate 1): <unique id><Rep1>.bam (example: h3k4me1_Rep1.bam)

Input file name (replicate 2): <unique id><Rep2>.bam (example: h3k4me1_Rep2.bam)

.

.

.

Input file name (replicate N): <unique id><RepN>.bam (example: h3k4me1_RepN.bam)

The chromosome identifier in the input BAM files should start with chr, for example as chrY and not like Y.

Output

The results from the PARESuite are presented in two text files:

RESULTS.TXT: main result file in BED format

For easy access, the html version of this file (RESULTS.HTML) is also available within the output directory

RESULTS.UCSC: file to view the enhancer and promoter regions in UCSC browser

More info

for more and latest information, please refer to http://spundhir.github.io/PARE/ or http://servers.binf.ku.dk/pare/

License

PARE: a computational method to Predict Active Regulatory Elements using histone marks

Copyright (C) 2015 Sachin Pundhir ([email protected])

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

pare's People

Contributors

Stargazers

Watchers

Forkers

vd4mmind xflicsu biobenkj hmyh1202 tianwen0003

pare's Issues

Defining which perl to use

Several of the scripts in PARE have a #!/usr/bin/perl -w at the top which can lead to a problem with defining which perl version/installation to use if multiple versions are installed.

Changing it to #!/usr/bin/env perl will allow the user to define which perl installation they would like to use based on whatever comes first in PATH.

Happy to set up a pull request.

error /risapps/rhel6/python/2.7.6/anaconda/lib/libreadline.so.6: undefined symbol: PC

Hi,
I got an error when running it on our HPC cluster. what's wrong?

Thanks

/risapps/rhel6/R/3.1.0/lib64/R/bin/exec/R: symbol lookup error: /risapps/rhel6/python/2.7.6/anaconda/lib/libreadline.so.6: undefined symbol: PC
/scratch/genomic_med/mtang1/softwares/spundhir-PARE-63c74f7/bin/checkPrerequisite: line 35: [: ==: unary operator expected
/risapps/rhel6/R/3.1.0/lib64/R/bin/exec/R: symbol lookup error: /risapps/rhel6/python/2.7.6/anaconda/lib/libreadline.so.6: undefined symbol: PC
/scratch/genomic_med/mtang1/softwares/spundhir-PARE-63c74f7/bin/checkPrerequisite: line 35: [: ==: unary operator expected
/risapps/rhel6/R/3.1.0/lib64/R/bin/exec/R: symbol lookup error: /risapps/rhel6/python/2.7.6/anaconda/lib/libreadline.so.6: undefined symbol: PC
/scratch/genomic_med/mtang1/softwares/spundhir-PARE-63c74f7/bin/checkPrerequisite: line 35: [: ==: unary operator expected
/risapps/rhel6/R/3.1.0/lib64/R/bin/exec/R: symbol lookup error: /risapps/rhel6/python/2.7.6/anaconda/lib/libreadline.so.6: undefined symbol: PC
/scratch/genomic_med/mtang1/softwares/spundhir-PARE-63c74f7/bin/checkPrerequisite: line 35: [: ==: unary operator expected
/risapps/rhel6/R/3.1.0/lib64/R/bin/exec/R: symbol lookup error: /risapps/rhel6/python/2.7.6/anaconda/lib/libreadline.so.6: undefined symbol: PC
/scratch/genomic_med/mtang1/softwares/spundhir-PARE-63c74f7/bin/checkPrerequisite: line 35: [: ==: unary operator expected
/risapps/rhel6/R/3.1.0/lib64/R/bin/exec/R: symbol lookup error: /risapps/rhel6/python/2.7.6/anaconda/lib/libreadline.so.6: undefined symbol: PC
/scratch/genomic_med/mtang1/softwares/spundhir-PARE-63c74f7/bin/checkPrerequisite: line 35: [: ==: unary operator expected
Can't locate Statistics/Basic.pm in @INC (@INC contains: /opt/moab/lib/perl5 /opt/moab/lib/perl5 /scratch/genomic_med/mtang1/softwares/spundhir-PARE-63c74f7/sha
BEGIN failed--compilation aborted at /scratch/genomic_med/mtang1/softwares/spundhir-PARE-63c74f7/bin/findNFRAll.pl line 24.
Can't locate Statistics/Basic.pm in @INC (@INC contains: /opt/moab/lib/perl5 /opt/moab/lib/perl5 /scratch/genomic_med/mtang1/softwares/spundhir-PARE-63c74f7/sha
BEGIN failed--compilation aborted at /scratch/genomic_med/mtang1/softwares/spundhir-PARE-63c74f7/bin/commonNFR.pl line 24.
gzip: mybam-PARE-NFR/analysis/my.bam.All.nfr.gz: No such file or directory
mybam-PARE-NFR/analysis/my.bam.All.nfr: No such file or directory
cat: mybam-PARE-NFR/analysis/rep0/my.bam.tmp*: No such file or directory
cat: mybam-PARE-NFR/analysis/rep0/my.bam.tmp*: No such file or directory
my.sorted.bedGraph is not case-sensitive sorted at line 29707282.  Please use "sort -k1,1 -k2,2n" with LC_COLLATE=C,  or bedSort and try again
my_pare.e536160 (END)

PARE :Whether enhancers and promoters can be predicted by other histone markers

Hello teacher .I have a lot of histone modifications. Can I predict the regulatory elements in a comprehensive way?For example, add H3K56AC and H3K9AC

Whether PARE is useful for H3K27ac data

Hi :
I want to find the valley of H3k27ac signals as enhancer region , whether this software is helpful?

NFR analysis for randomly distributed nfr regions failed using 100,000 regions,

Hi there,

I run into an error when running PARE. RESULTS.TXT only has one line.
It is taking really long to run. ~6 hours for a 1.5G bam.
anyway to speed it up?
EDIT. I saw a p flag, but is there a way to specify how many cpus the program will use.
I am running PARE on a cluster.

Thanks for your help.

Ming

....
[main_samview] region "chrX:153237367-10020489" specifies an unknown reference name. Continue anyway.
Error: Only a single file was specified. Nothing to combine, exiting.
gzip: PARE-NFR/analysis/my.bam.All.nfr.gz: No such file or directory
PARE-NFR/analysis/myy.sorted.bam.All.nfr: No such file or directory
my.sorted.bedGraph is not case-sensitive sorted at line 29707282.  Please use "sort -k1,1 -k2,2n
(END)

Check, if all required parameters and files are provided (Wed Feb 17 19:24:06 CST 2016).. done
Determine number of input bam files (Wed Feb 17 19:24:06 CST 2016).. done
Create directory structure (Wed Feb 17 19:24:06 CST 2016).. done
Populating files based on input genome, hg19 (Wed Feb 17 19:24:06 CST 2016).. done
Determine number of bases by which to extend the 3' end of reads (Wed Feb 17 19:24:06 CST 2016).. done
Optimize the threshold for max length and min number of reads in a block group (Wed Feb 17 19:30:19 CST 2016).. do
Create index of input BAM files (Wed Feb 17 20:02:19 CST 2016).. done
Compute size factor for each replicate (Wed Feb 17 20:02:19 CST 2016).. done
Retrieve size factors to normalize the expression of reads... done
Convert input bam file into bed (Wed Feb 17 20:02:19 CST 2016).. convert input bam file into bed
done
Check, if BED files are created properly.. (Wed Feb 17 20:28:16 CST 2016)... done
Predict nucleosome free regions (NFR) for each replicate (Wed Feb 17 20:29:23 CST 2016).. done
Determine common NFR between replicates (Wed Feb 17 23:40:07 CST 2016).. done
Check if size factor files already exist (Wed Feb 17 23:40:07 CST 2016).. done
Create file containing genomic coordinates within which to randomly shuffle the NFRs (Wed Feb 17 23:40:07 CST 2016
NFR analysis for randomly distributed nfr regions (Wed Feb 17 23:40:18 CST 2016).. (failed using 100,000 regions, 
Convert input bam to bigWig format to visualize in UCSC browser (Wed Feb 17 23:40:18 CST 2016).. All done. Bye
(END)

parallel: command not found

Hi there,
I got something wrong when I use your PARE. It's ok when I use the test data that you provided for us,but there is a error when I use my data :

" Determine common NFR between replicates ~/software/spundhir-PARE-0e89a50/bin/nfrAnaAll: line 402: parallel: command not found"
~/software/spundhir-PARE-0e89a50/bin/nfrAnaAll: line 387: parallel: command not found

This is my code pare -i ENCFF001KOY.bam,ENCFF001KPA.bam -o results -m mm9 -p 10 &> pare.log
and I have tried change my data's name to pare -i h3k4me1_Rep1.bam,h3k4me1_Rep2.bam -o results -m mm9 -p 10 &> pare.log
Thanks for your help.

Gang