Giter Club home page Giter Club logo

m2a's Introduction

MethylationToActivity

a deep-learning framework that reveals promoter activity landscapes from DNA methylomes in individual tumors

DOI

MethylationToActivity (M2A) is a machine learning framework using convolutional neural networks (CNN) to infer histone modification (HM) enrichment from whole genome bisulfite sequencing (WGBS). To date, both H3K27ac and H3K4me3 enrichment prediction from WGBS is supported, from a tab-delimited text file format of M-values. Optionally, we also support transfer-learning where a user may have matching H3K27ac or H3K4me3 data with appropriate controls in addition to WGBS data.

M2A is comprised 5 parts, including transfer-learning:

Process Description
1_ResponseVariable Generate histone enrichment for each unique promote region (transfer-learning only)
2_MethylationFeatures Process WGBS features for model input
3_CombineInput Scale and recombine features, and for transfer learning, calculated HM values
4_TransferLearning Train fully-connected layers of a particular model for increased performance in your domain of interest (Optional)
5_RunModel Using pre-generated input, get HM predictions for each unique promoter region

Prerequisites

Python 3.6.5 or greater:

  1. pyBigWig v0.3.13
  2. numpy v1.17.1
  3. pandas v0.25.1
  4. pandarallel v1.4.2
  5. scikit-learn 0.20.2
  6. h5py v2.9.0
  7. keras v2.2.4
  8. tensorflow v1.10.1
  9. scipy v1.3.1
  10. matplotlib v3.3.0
  11. cwltool v1.0
  12. psutil v5.6.1

Obtain M2A

Clone M2A from GitHub:

git clone https://github.com/chenlab-sj/M2A.git

Inputs

M2A requires five inputs, defined in a YAML file as CWL inputs. E.g., inputs.yml:

chipBigwig:
  class: File
  path: sample.bw
inputBigwig:
  class: File
  path: input.bw
curated:
  class: File
  path: sites.txt
promoterDefinitions:
  class: File
  path: promoters.txt
model: 
  class: File
  path: model.h5

Input description

Name Description
Sample HM bigwig file (only if using M2A with Transfer) HM ChIP-seq experiment bigwig track.
Sample HM control (Input) bigwig (only if using M2A with Transfer) ChIP-seq Experiment control (Input) bigwig track.
WGBS data file M-values by chromosome and position (non-standard format, see below).
Promoter region definition file (provided, or user defined) File describing promoter regions to be predicted. (non-standard format, see below)
Model weights (provided, or user defined from transfer) hdf5 model weights for either H3K27ac prediction OR H3K4me3 prediction

Promoter region definition file

A tab delimited file containing the unique promoter-regions for either:

  • hg19-based data: 2_Promoter_Definitions_hg19.txt, or
  • GRCh38-based data: 2_Promoter_Definitions_GRCh38.txt
Column Description
EnsmblID_T Ensemble transcript ID (unique)
EnsmblID_G Ensemble gene ID (not unique)
Gene human readable gene name (abbrev, not unique)
Strand +, -
Chr chr1, chr2, ... chr22, etc.
Start Beginning of transcript definition
End End of transcript definition
RStart TSS - 1000bp
REnd TSS + 1000bp

WGBS data file

A bed-like file of genomic positions with corresponding M-values, tab delimited:

Column Description
chrom chromosome ID, e.g. 1,2,3 ...22
pos position of 5' cytosine of a CpG on the positive strand
mval calculated mvalue of a given CpG, typically M-value=log2(Beta/1-Beta)

Run M2A with transfer learning

M2A uses CWL to describe its workflow. To run an example workflow, update sample_data/input_data/inputs.yml with the path to a promoter definitions file. Then run the following.

$ mkdir results
$ cwltool --outdir results cwl/m2a.cwl sample_data/input_data/inputs.yml

Run M2A without transfer learning

M2A without transfer learning enabled is contained in the CWL workflow cwl/m2a_without_transfer_learning.cwl. It requires the same inputs as the with transfer learning pipeline, with the exception of the bigwig files.

Docker

M2A provides a Dockerfile that builds an image with all the included dependencies. To use this image, install Docker for your platform. This Docker image is used by the CWL workflow and contains the prerequisites.

Build Docker image

In the M2A project directory, build the Docker image.

$ docker build --tag stjude/m2a:0.0.1 .

Evaluate test data results

Today, the M2A pipeline does not produce an interactive visualization. If M2A with Transfer was run, the easiest measurement of training prediction accuracy would be calculating the Pearson's R2, or root mean square error (RMSE) between the measured and M2A predicted values. Furthermore, comparisons of sample-sample consistency with the same/similar cancer-type (as determined by Pearson's R2) is a good start for a contextual understanding of the predictions produced by M2A.

St. Jude Cloud

To run M2A in St. Jude Cloud, please follow the directions at https://university.stjude.cloud/docs/genomics-platform/workflow-guides/methylation-to-activity/

Availability

Copyright 2019 St. Jude Children's Research Hospital

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Seeking help

For questions and bug reports, please open an issue on the GitHub project page.

Publication analyses

All scripts describing the experiments and analyses in the M2A publication, including previous (unsupported) versions of M2A, can be found in the M2A_analyses directory.

Citing M2A

(In submission) Justin Williams, Beisi Xu, Daniel Putnam, Andrew Thrasher, and Xiang Chen. MethylationToActivity: a deep-learning framework that reveals promoter activity landscapes from DNA methylomes in individual tumors.

m2a's People

Contributors

chenlab-sj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.