greenelab / deep-review Goto Github PK

View Code? Open in Web Editor NEW

1.2K 128.0 272.0 93.1 MB

A collaboratively written review paper on deep learning, genomics, and precision medicine

Home Page: https://greenelab.github.io/deep-review/

License: Other

Shell 6.17% Jupyter Notebook 3.92% HTML 88.35% Python 1.56%

deep-learning neural-networks review genomics manuscript manubot

deep-review's Introduction

The Deep Review

Manuscript description

This repository is home to the Deep Review, a review article on deep learning in precision medicine. The Deep Review is collaboratively written on GitHub using a tool called Manubot (see below). The project operates on an open contribution model, welcoming contributions from anyone (see CONTRIBUTING.md or an existing example for more info). To see what's incoming, check the open pull requests. For project discussion and planning see the Issues.

The original version of the Deep Review was published in 2018 and should be cited as:

Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow P-M, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, and Greene CS. 2018. Opportunities and obstacles for deep learning in biology and medicine. Journal of The Royal Society Interface 15(141):20170387. doi:10.1098/rsif.2017.0387

Current stage: planning Deep Review version 2.0

As of writing, we are aiming to publish an update of the deep review. We will continue to make project preprints available on bioRxiv or another preprint service and aim to continue publishing the finished reviews in a peer-reviewed venue as well. Like the initial release, we are planning for an open and collaborative effort. New contributors are welcome and will be listed as version 2.0 authors. Please see issue #810 to contribute to the discussion of future plans, and help decide how to best continue this project.

Manubot updates: We recently updated this repository to use the latest Manubot version. Citations must now be semicolon separated like [@doi:10.1002/minf.201501008; @doi:10.1002/jcc.24764] and citation tags are required when the identifier contains forbidden characters. Previously, multiple citations were just separated by whitespace. In addition, we're switching from wrapping text at a character cutoff to "one sentence per line" as described in USAGE.md. Please make sure you base your pull requests off of the latest version of the greenelab:master branch. Keep your fork synced by setting its upstream remote to greenelab and running:

# If your branch only has commits from greenelab:master but is outdated
git pull --ff-only upstream master

# If your branch is outdated and has diverged from greenelab:master
git pull --rebase upstream master

Headline review format

The initial manuscript was a headline review for Journal of the Royal Society Interface on a topic overlapping the computer and life sciences in the area of systems pharmacology. The headline review solicitation states:

A Headline Review is one in a short, targeted series of high-level reviews within a particular topic of a burgeoning research area. We encourage authors to write in a style that opens the door to a broad range of readers working at the physical sciences - life sciences interface. We intend the reviews to address critical developments in an area of cross-disciplinary research and, when possible, to place such research in a broader context. This is not a place for comprehensive literature surveys.

We do encourage you to speculate in an informed way, and to be topical and provocative about the subject without worrying unduly about space, (the provisional target length is 8-12,000 words). Please think of this as an article which will be a landmark in your area, and will come to be considered as a classic paper of the literature.

Inspiration

On August 2, 2016, project maintainer Casey Greene introduced the project and its motivations:

I was recently inspired by Harold Pimentel's crowd-sourced collection of deep learning papers. Instead of having one individual write this, I thought that this invitation provided a wonderful opportunity to take advantage of the wisdom of crowds to bring a team together around this topic.

This repository provides a home for the paper. We'll operate on a pull request model. Anyone whose contributions meet the ICJME standards of authorship will be included as an author on the manuscript. I can't guarantee that it will be accepted, but I look forward to trying this approach out.

On August 5, Deep Review was announced with a tweet.

Manubot

Manubot is a system for writing scholarly manuscripts via GitHub. Manubot automates citations and references, versions manuscripts using git, and enables collaborative writing via GitHub. An overview manuscript presents the benefits of collaborative writing with Manubot and its unique features. The rootstock repository is a general purpose template for creating new Manubot instances. See USAGE.md for documentation how to write a manuscript.

Please open an issue for questions related to Manubot usage, bug reports, or general inquiries.

Repository directories & files

The directories are as follows:

content contains the manuscript source, which includes markdown files as well as inputs for citations and references. See USAGE.md for more information.
output contains the outputs (generated files) from Manubot including the resulting manuscripts. You should not edit these files manually, because they will get overwritten.
webpage is a directory meant to be rendered as a static webpage for viewing the HTML manuscript.
build contains commands and tools for building the manuscript.
ci contains files necessary for deployment via continuous integration.

Local execution

The easiest way to run Manubot is to use continuous integration to rebuild the manuscript when the content changes. If you want to build a Manubot manuscript locally, install the conda environment as described in build. Then, you can build the manuscript on POSIX systems by running the following commands from this root directory.

# Activate the manubot conda environment (assumes conda version >= 4.4)
conda activate manubot

# Build the manuscript, saving outputs to the output directory
bash build/build.sh

# At this point, the HTML & PDF outputs will have been created. The remaining
# commands are for serving the webpage to view the HTML manuscript locally.
# This is required to view local images in the HTML output.

# Configure the webpage directory
manubot webpage

# You can now open the manuscript webpage/index.html in a web browser.
# Alternatively, open a local webserver at http://localhost:8000/ with the
# following commands.
cd webpage
python -m http.server

Sometimes it's helpful to monitor the content directory and automatically rebuild the manuscript when a change is detected. The following command, while running, will trigger both the build.sh script and manubot webpage command upon content changes:

bash build/autobuild.sh

Continuous Integration

Whenever a pull request is opened, CI (continuous integration) will test whether the changes break the build process to generate a formatted manuscript. The build process aims to detect common errors, such as invalid citations. If your pull request build fails, see the CI logs for the cause of failure and revise your pull request accordingly.

When a commit to the master branch occurs (for example, when a pull request is merged), CI builds the manuscript and writes the results to the gh-pages and output branches. The gh-pages branch uses GitHub Pages to host the following URLs:

HTML manuscript at https://greenelab.github.io/deep-review/
PDF manuscript at https://greenelab.github.io/deep-review/manuscript.pdf

For continuous integration configuration details, see .github/workflows/manubot.yaml if using GitHub Actions or .travis.yml if using Travis CI.

License

Except when noted otherwise, the entirety of this repository is licensed under a CC BY 4.0 License (LICENSE.md), which allows reuse with attribution. Please attribute by linking to https://github.com/greenelab/deep-review.

Since CC BY is not ideal for code and data, certain repository components are also released under the CC0 1.0 public domain dedication (LICENSE-CC0.md). All files matched by the following glob patterns are dual licensed under CC BY 4.0 and CC0 1.0:

*.sh
*.py
*.yml / *.yaml
*.json
*.bib
*.tsv
.gitignore

All other files are only available under CC BY 4.0, including:

*.md
*.html
*.pdf
*.docx

Please open an issue for any question related to licensing.

deep-review's People

Contributors

Stargazers

Watchers

Forkers

hussius rujinlong dhimmel w9 kumardeep27 idakwo cgreene solversa mrfarhadi j3xugit qgzang chao1224 gailrosen annecarpenter sterlesser swamidass gwaybio jmrinaldi vsskanand gusuperstar cynthiazhaoq agitter leakey1905 qitong lanagarmire bdo311 vanhesling arashmh nunb nehz tbrittoborges benjamesbabala enricoferrero jiandanjinxin snashraf davedecaprio avantishri woodhaha jisraeli hexiangquan davharris aaronsheldon mrwns xieconnect burkesquires laurakwiley borisalmonacid neokito pustar nonzok4 yanjunli-cs bkbonde yuchae genefeng kormilitzin mdmustafizurrahman chibuta anthonyjlee lorybaby yanmc luzh2 rintukutum jestsai brettbj 0ut0fcontrol daobinhuang michaelmhoffman andzi lavenderca jimungimm qiyanjun chaoi111 petrichorcode yehudaitkin huizhuzhao againstcurrent yihanjiang drpeek gurgese resurgo-genetics inambioinfo stephenra ruska612 tengyu-ma souravsingh willamd weiwang2330 statdataanalyzer ravinpoudel igorhut zumbalamambo kmezhoud srinituraga jhon12m fazhiyang akundaje austinvhuang rasbt youpei0103 lbnphoenix

deep-review's Issues

Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

https://arxiv.org/abs/1605.01713

At a glance: A strategy for interpreting neural networks, which could be an important topic in the review. The paper is a technical description of a general method, but one subsection shows an application to genomics and there is more work coming from the Kundaje lab in this direction.

DANN: a deep learning approach for annotating the pathogenicity of genetic variants

Paper needs to be read carefully for relevance
https://dx.doi.org/10.1093/bioinformatics/btu703

Deep Machine Learning provides state-of-the-art performance in image-based plant phenotyping

https://dx.doi.org/10.1101/053033

Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders.

Paper needs to be read carefully for relevance.
http://dx.doi.org/10.1142/9789814644730_0014

Convolutional Networks on Graphs for Learning Molecular Fingerprints

http://arxiv.org/abs/1509.09292

At a glance: Related to virtual screening #45. Molecular fingerprints are one standard way to featurize chemical compounds for virtual screening. This paper adapts the standard fingerprinting algorithm by implementing it as a neural network and describes the advantages of doing so. Notably, it outputs real-valued fingerprints instead of binary fingerprints.

MiRTDL: a deep learning approach for miRNA target prediction

http://doi.org/10.1109/TCBB.2015.2510002

Overall Manuscript Structure

The overall aims of the Headline Review articles are outlined in the README. Here's a document structure that I am playing around with to target the review at this question: What would need to be true for deep learning to transform how we categorize, study, and treat individuals to maintain or restore health?

Relevant areas where methods inspired by deep learning are already having an impact
- Sequence -> Function
- Transcriptional regulation
- Patient information
- Imaging + Bio
The structures of problem statements which use deep learning towards these ends
- Supervised approaches
  - Convolutional NNs on genome
  - Principles in which multiple synergistic patterns are learned simultaneously
  - more examples of shared properties across approaches
- Unsupervised approaches
  - some denoising autoencoder work is common across systems
  - more shared properties
Perspectives towards the future & overall question.
- Which challenges do we think will be resolved first?
- Are there any approaches/data types that have taken off in other fields but that are under-utilized here?
- What initiatives or data do we think are particularly interesting for/amenable to deep learning analyses and why?
Overall summary on state of the field & reflection towards overall question.

There are some wonderful github-based reading groups/lists by @pimentel @hussius @gokceneraslan. If any of you have feedback as we structure this review, please provide it. If you'd like to participate - dive in!

https://dx.doi.org/10.1093/bioinformatics/btw252

can be shortened to

https://doi.org/10.1093/bioinformatics/btw252

More details from the DOI handbook:

Users may resolve DOI names that are structured to use the DOI system Proxy Server (http://doi.org (preferred) or http://dx.doi.org). The resolution of the DOI name in this case depends on the use of URL syntax: the example DOI name doi:10.10.123/456 would be resolved from the address: "http://doi.org/10.123/456". Any standard browser encountering a DOI name in this form will be able to resolve it. The proxy service (both doi.org and dx.doi.org) is accessible over IPv6, and supports DNSSEC. The proxy servers respond to HTTPS as well as HTTP requests.

Genome-Wide Prediction of cis-Regulatory Regions Using Supervised Deep Learning Methods

https://dx.doi.org/10.1101/041616

PEDLA: predicting enhancers with a deep learning-based algorithmic framework

https://dx.doi.org/10.1101/036129

Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data

https://doi.org/10.1021/acs.molpharmaceut.6b00248

ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions

https://dx.doi.org/10.1128/mSystems.00025-15

Motivation: Chromatin immune-precipitation sequencing (ChIP-seq) experiments are commonly used to obtain genome-wide profiles of histone modifications associated with different types of functional genomic elements. However, the quality of histone ChIP-seq data is affected by many experimental parameters such as the amount of input DNA, antibody specificity, ChIP enrichment and sequencing depth. Making accurate inferences from chromatin profiling experiments that involve diverse experimental parameters is challenging.
Results: We introduce a convolutional denoising algorithm, Coda, that uses convolutional neural networks to learn a mapping from suboptimal to high-quality histone ChIP-seq data. This overcomes various sources of noise and variability, substantially enhancing and recovering signal when applied to low-quality chromatin profiling datasets across individuals, cell types and species. Our method has the potential to improve data quality at reduced costs. More broadly, this approach—using a high-dimensional discriminative model to encode a generative noise process—is generally applicable to other biological domains where it is easy to generate noisy data but difficult to analytically characterize the noise or underlying data distribution.
Availability and implementation: https://github.com/kundajelab/coda.

Applications of deep learning in biomedicine

https://dx.doi.org/10.1021/acs.molpharmaceut.5b00982