Giter Club home page Giter Club logo

deeptact's Introduction

DeepTACT

DeepTACT is a bootstrapping deep learning model, which integrates genome sequences and chromatin accessibility data for the prediction of chromatin contacts among regulatory elements.

Data preprocessing

Promoter capture Hi-C data of total B cells (tB), monocytes (Mon), foetal thymus (FoeT), total CD4+ T cells (tCD4), naive CD4+ T cells (nCD4), and total CD8+ T cells (tCD8) are derived from https://osf.io/u8tzp/ (Javierre et al. 2016). Labeled training pairs of these cell lines are available in the DeepTACT/ directory, where each cell line has its own subdirectory. For each cell line, we train and test models for promoter-promoter interactions (P-P) and promoter-enhancer interactions (P-E) separately. The sequence information of each interaction pair is extracted from hg19.fa. The openness signals are calculated from DNase-seq data.

  • Data augmentation. In each 'pairs.csv' file, we give all positive pairs and the augmented negative pairs. With DeepTACT/DataPrepare.py, you can augment positive training pairs, obtain balanced training data and imbalanced testing data for each cell line. Simply run
python DataPrepare.py Mon P-E
  • Bootstrapping. With DeepTACT/Bootstrapping.py, we bootstrap each original dataset 20 times for ensemble learning. We give an example of the inputs for bootstrapping in DeepTACT/demo directory. When running 'Bootstrapping.py', you need to specify the directory of inputs, the type of interactions, and the number of DNase-seq experiments that are used to provide openness signals. Using data in demo directory as an example, run
python Bootstrapping.py demo P-E 3

Training and evaluation

We implemented the DeepTACT model using Keras 1.2.0 on a Linux server. All experiments were carried out with 4 Nvidia K80 GPUs which significantly accelerated the training process than CPUs. We provide examples of sequences and DNase inputs in the DeepTACT/demo directory. If you want to train your own model with DeepTACT, you can simply substitute your data to DeepTACT/demo with the same format. To train a model, you can run

python DeepTACT.py demo P-E 3

We evaluate the ensemble model with a voting strategy. Given the information of a sample as an input, its final prediction score is the average of the prediction scores derived from all classifiers.

Application: predict promoter-level interactions from PCHi-C data

We apply the trained DeepTACT model to infer contacts between regulatory elements in situations where one or both interaction regions contain multiple regulatory elements. In this way, we predict promoter-level interactions from bin-level interactions. For each cell line, the promoter-level interactions are predicted and saved in 'predictions.csv' files (e.g. DeepTACT/Mon/P-E/predictions.csv).

Requirements

  • hickle
  • numpy=1.13.3
  • Theano=0.8.0
  • keras=1.2.0
  • pandas=0.20.1
  • biopython=1.70
  • Scikit-learn=0.18.2

Installation

Download DeepTACT by

git clone https://github.com/liwenran/DeepTACT

License

This project is licensed under the MIT License - see the LICENSE.md file for details

deeptact's People

Contributors

liwenran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeptact's Issues

About DNA acessibility score data

您好!

最近想尝试验证加入DNA其他信息能否有利于更多下游任务,所以非常希望能够复现DeepTATC去做探索!

我想请教下,DeepTATC中的DNA可及性得分数据,使用的是以下哪一种呢?论文中没有提及这部分

foreground read count, 
raw read openness, 
narrow peak openness, 
and broad peak openness.

image

期待您的回复!

立超

IOError: [Errno 2] No such file or directory: 'hg19.fa'

Hello,
Thanks for the nice paper and the associated code.

I wanted to test it on the monocyte example doing first:
python $indir/DataPrepare.py $indir/Mon P-E

but I had this error message
File "/..../DeepTACT/DataPrepare.py", line 119, in
main()
File "/.../DeepTACT/DataPrepare.py", line 115, in main
sequence_dict = SeqIO.to_dict(SeqIO.parse(open('hg19.fa'), 'fasta'))
IOError: [Errno 2] No such file or directory: 'hg19.fa'

So could you please provide the hg19.fa or tell me where to find the one you used? (masked or not, what type of masking... ?). And also tell me where I should place it in the tree structure ?

Also I think you should indicate that your code works with python2 and not python3.

Best regards,
Sarah Djebali

How to get Dnase data of 6 cell-type

您好!你们的论文非常棒,我尝试重新训练一次模型,后面我打算用自己的新HIC数据复现一遍。

我从当前github代码库能够通过数据处理的代码到6种细胞系类型的序列数据Seq.npz,但是无法获得这6种细胞系数据对应的DNase数据。

于是我重新阅读文章,从补充文件种得到一个Excel表格,里面的子表1提到配套的DNase数据,但是似乎与您提供的“demo”目录中“DNase”数据不太一样

image

因此才来提问,如何才能得到对应的处理好的DNase数据呢?或者说下载之后如何处理呢?

期待您的回复!感谢!

关于Supplementary Figure S6的一些疑惑

您好:
最近在阅读您的DeepTACT论文,在阅读您论文Results部分的“DeepTACT provides finer mapping of promoter–enhancer in-teractions from promoter capture Hi-C data”最后用GM12878细胞系验证部分时,对于Supplementary Text S7中描述的“We compared the overlaps between Hi-C contacts and positive predictions with that between Hi-C contacts and negative predictions”关于positive预测和negative预测的结果在Supplementary Figure S6中变成DeepTACT结果和candidate interactions结果有些疑惑。
negative predictions和candidate interactions之间以及positive predictions和DeepTACT之间有什么联系吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.