Giter Club home page Giter Club logo

delta's Introduction

DELTA

a Distal Enhancer Locating Tool based on AdaBoost and shape features of chromatin modifications

Introduction

Accurate identification of DNA regulatory elements becomes an urgent need in the post-genomic era. Recent genome-wide chromatin states mapping efforts revealed that DNA elements are associated with characteristic chromatin modification signatures, based on which several approaches have been developed to predict transcriptional enhancers. However, their practical application is limited by incomplete extraction of chromatin features and model inconsistency for predicting enhancers across different cell types. To address these issues, we define a set of non-redundant shape features of histone modifications, which shows high consistency across cell types and can greatly reduce the feature dimension. Integrating shape features with a machine-learning algorithm AdaBoost, we developed an enhancer predicting method, DELTA (Distal Enhancer Locating Tool based on AdaBoost). We show that DELTA significantly outperforms current enhancer prediction methods in prediction accuracy on different datasets and can predict enhancers in one cell type using models trained in other cell types without loss of accuracy. Overall, our study presents a novel framework for accurately identifying enhancers from epigenetic data across multiple cell types.

Install

Please check the file 'INSTALL' in the distribution.

Usage

Usage: delta.py [-c chip_files] [-P promoter_loci] [-E enhancer_loci] [options]

Example: delta.py -c H3K4me1.bed,H3K4me3.bed,H3K27ac.bed -E p300.bed -P tss.bed -g hg19

--version
									Show program's version number and exit
-h, --help
									Show this help message and exit
-c CHIP_BEDS, --chip_bed=CHIP_BEDS
									ChIP-seq bed file of histone modifications
-E ENHANCER, --enhancer=ENHANCER
									BED file containing the enhancer loci
-P PROMOTER, --promoter=PROMOTER
									BED file containing the promoter loci
-R, --read
									Read existing training and predicting data instead of 
									generate from ChIP-seq (default: False)
-g GENOME, --genome=GENOME
									Genome assembly should be one of the followings: dm3, 
									mm9, hg17, hg18, hg19
-b BIN_SIZE, --bin_size=BIN_SIZE
									Length of dividing bins (default: 100)
-w WIN_SIZE, --window_size=WIN_SIZE
									Length of sliding window, should be integer times of 
									bin size (default: 2000)
--iteration_number=ITER_NUM
									Number of iteration for AdaBoost (default: 100)
--pvalue_threshold=P_THRES
									P-value threshold for enhancer prediction (default: 
									0.5)
-o OUTPUT, --output=OUTPUT
									Output file name (default output file is 
									"predicted_enhancer.bed")

Parameters

-c / --chip_bed

ChIP-seq files contain chromatin modifications mapping data. User should provide ChIP-seq files separated by comma, e.g. H3K4me1.bed,H3K4me3.bed,H3K27ac.bed.

The BED format is defined in "http://genome.ucsc.edu/FAQ/FAQformat#format1".

-R / --read

The "-R" option lets user read existing training and predicting data instead generate them from ChIP-seq files, which would be a time consuming process. WARNING: Use with care!!!, wrong training and predicting data could be load.

--pvalue_threshold

P-value threshold for enhancer prediction. User could adjust number of predictions by tuning this parameter.

Output files

1.predicted_enhancer.bed is a BED format file containing the predicted enhancers. User should be aware that if the step size is smaller than window size, the predicted enhancers may be redundant. uniq command should be used in this situation to remove repetitive predictions.

2.adaboost.R is a R script generated by delta.py for executing AdaBoost algorithm.

3.tmp_dir is a directory contains temporary files created by delta.py. It should not be removed until the entire training and prediction is done.

License

Source code of DELTA is freely available for academic use. For commercial license please contact Dr. Chenggang Zhang ([email protected]).

delta's People

Contributors

genereader avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.