CTCF-MP

Required Package

CTCF-MP requires:

Python (tested 2.7.13)
numpy (tested 1.13.1)
pandas (tested 0.20.3)
gensim (tested 3.2.0)
sklearn (tested 0.19.1)
xgboost (tested 0.7)

Required Data

Put hg19 sequences in a folder named 'Chromosome' and named each file as ‘chr1.fasta’,'chr2.fasta'... The folder directory should look like this:

\Chromosome
\CTCF-MP
\CTCF-MP\Code
\CTCF-MP\Data
...

Usage

The parameters are as followed:

"-c","--cell",default = 'gm12878' :Controls the data it runs
"-w", "--word",default = 6 :Controls k in k-mer we choose
"-r", "--range",default = 250 :Controls the size of flanking region of the CTCF motif. (The final length of DNA sequence would be 2r+length of CTCF motif)
"-d", "--direnction",default = 'conv' :Controls the subset of the dataset we run. 'conv' for 'convergent','tandem' for 'in tandem','imb' for 'imbalance'.

##Example

python entrance.py -c gm12878 -d conv -r 250

Note

One problem that might occur in the de novo prediction using CTCF-MP is the uncalibrated probability. The models are usually trained in a balanced dataset while the actual prediction is on an imbalanced dataset(more negative samples as compared to the positive ones). So, one should tune the threshold of probability from the prediction model to get the final results. (For instance, using 0.6 or 0.8 as threshold rather than the default one 0.5).

Because how to tune the threshold remains an open problem, we don't implement a specific one in the scripts. A possible solution would be to separate an individual imbalanced validation set from the training set, and use that validation set to tune the threshold for the highest F1-score. Then one can apply the trained model with the calibrated threshold for the de novo prediction on a new cell line.

Cite

If you want to cite our work

@article{zhang2018predicting,
  title={Predicting CTCF-mediated chromatin loops using CTCF-MP},
  author={Zhang, Ruochi and Wang, Yuchuan and Yang, Yang and Zhang, Yang and Ma, Jian},
  journal={Bioinformatics},
  volume={34},
  number={13},
  pages={i133--i141},
  year={2018},
  publisher={Oxford University Press}
}

ma-compbio / ctcf-mp Goto Github PK

ctcf-mp's Introduction

CTCF-MP

Required Package

Required Data

Usage

Note

Cite

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent