Giter Club home page Giter Club logo

ctcf-mp's Introduction

CTCF-MP

Required Package

CTCF-MP requires:

  • Python (tested 2.7.13)
  • numpy (tested 1.13.1)
  • pandas (tested 0.20.3)
  • gensim (tested 3.2.0)
  • sklearn (tested 0.19.1)
  • xgboost (tested 0.7)

Required Data

Put hg19 sequences in a folder named 'Chromosome' and named each file as ‘chr1.fasta’,'chr2.fasta'... The folder directory should look like this:

  • \Chromosome
  • \CTCF-MP
  • \CTCF-MP\Code
  • \CTCF-MP\Data
  • ...

Usage

The parameters are as followed:

  • "-c","--cell",default = 'gm12878' :Controls the data it runs
  • "-w", "--word",default = 6 :Controls k in k-mer we choose
  • "-r", "--range",default = 250 :Controls the size of flanking region of the CTCF motif. (The final length of DNA sequence would be 2r+length of CTCF motif)
  • "-d", "--direnction",default = 'conv' :Controls the subset of the dataset we run. 'conv' for 'convergent','tandem' for 'in tandem','imb' for 'imbalance'.

##Example

python entrance.py -c gm12878 -d conv -r 250

Note

One problem that might occur in the de novo prediction using CTCF-MP is the uncalibrated probability. The models are usually trained in a balanced dataset while the actual prediction is on an imbalanced dataset(more negative samples as compared to the positive ones). So, one should tune the threshold of probability from the prediction model to get the final results. (For instance, using 0.6 or 0.8 as threshold rather than the default one 0.5).

Because how to tune the threshold remains an open problem, we don't implement a specific one in the scripts. A possible solution would be to separate an individual imbalanced validation set from the training set, and use that validation set to tune the threshold for the highest F1-score. Then one can apply the trained model with the calibrated threshold for the de novo prediction on a new cell line.

Cite

If you want to cite our work

@article{zhang2018predicting,
  title={Predicting CTCF-mediated chromatin loops using CTCF-MP},
  author={Zhang, Ruochi and Wang, Yuchuan and Yang, Yang and Zhang, Yang and Ma, Jian},
  journal={Bioinformatics},
  volume={34},
  number={13},
  pages={i133--i141},
  year={2018},
  publisher={Oxford University Press}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.