Giter Club home page Giter Club logo

deeph3-distances-orientations's Introduction

Deep H3 Loop Prediction

A deep residual network architecture to predict probability distributions of inter-residue distances and angles for CDR H3 loops in antibodies. This work is protected by https://creativecommons.org/licenses/by-nc/3.0/. Please cite:

  • Ruffolo JA, Guerra C, Mahajan SP, Sulam J, & Gray JJ, "Geometric Potentials from Deep Learning Improve Prediction of CDR H3 Loop Structures," bioRXiv 2020. doi:10.1101/2020.02.09.940254

ResNet part of the code is re-implemented from https://github.com/KaimingHe/resnet-1k-layers which was based on
https://github.com/facebook/fb.resnet.torch. Network architecture is based on that of Wang et al. (RaptorX-Contact), and geometric descriptors based on Yang et al. (trRosetta) (references below).

Trained Model

Model trained on ~ 1400 antibodies from the SAbDab Database is available in deeph3/models/

Requirements and Setup

torch, tensorboard (2.1 or higher), biopython (see requirements.txt for the complete list). Install with:

pip3 install -r requirements.txt [--user]

Be sure that your PYTHONPATH environment variable has the deepH3-distances-orientations/ directory. On linux, use the following command:

export PYTHONPATH="$PYTHONPATH:/absolute/path/to/deepH3-distances-orientations"

Training

To train a model using a non-redundant set of bound and unbound antibodies downloaded from SAbDab, run:

cd deeph3
python3 train.py 

By default, structures are selected from SAbDab with paired VH/VL chains, a resolution of 3 A or better, and at most 99% sequence identity (ie, the set used in our original preprint.)

Other arguments can be listed using the --help or -h option.

Note that you can skip this step since the model described in our paper is available in this archive

Prediction

To predict the binned distance and angle matrices for a given antibody sequence (in a fasta file), run:

cd deeph3
python3 predict.py [--fasta_file [fasta file path] --model [model file path]]

The fasta file must have the following format:

>[PDB ID]:H	[heavy chain sequence length]
[heavy chain sequence]
>[PDB ID]:L	[light chain sequence length]
[light chain sequence]

Output is in the form of a pickle file ([fasta_file_basename].p) containing the predicted distance and orientation distributions.

See deeph3/data/antibody_dataset/fastas_testrun for an example.

Other arguments can be listed using the --help or -h option.

Generating Constraints

To generate constraint files to use in Rosetta, run:

cd deeph3
python3 generate_constraints.py [--fasta_file [fasta file path] --model [model file path]]

The fasta file must have the following format:

>[PDB ID]:H	[heavy chain sequence length]
[heavy chain sequence]
>[PDB ID]:L	[light chain sequence length]
[light chain sequence]

By default, the program will run the 1a0q example from the fasta files in data/.

Output will go to output_dir/ as a file (for example) 1a0q.constraints to use in Rosetta as -constraint_file deeph3/output_dir/1a0q.constraints. In turn, that file references a set of data files with spline parameters in output_dir/1a0q.histograms/.

Other arguments can be listed using the --help or -h option.

Authors

Research Advisors

References

  1. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, "Identity Mappings in Deep Residual Networks," ECCV, 2016. arXiv:1603.05027
  2. S. Wang, S. Sun, Z. Li, R. Zhang and J. Xu, "Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model", PLOS Computational Biology, vol. 13, no. 1, p. e1005324, 2017. Available: 10.1371/journal.pcbi.1005324.
  3. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. arXix:1512.03385
  4. J. Yang, I. Anishchenko, H. Park, Z. Peng, S. Ovchinnikov and D. Baker, “Improved protein structure prediction using predicted interresidue orientations.,” Proceedings of the National Academy of Sciences, 2020. PNAS
  5. B. D. Weitzner, D. Kuroda, N. Marze, J. Xu and J. J. Gray, “Blind prediction performance of RosettaAntibody 3.0: grafting, relaxation, kinematic loop modeling, and full CDR optimization.,” Proteins: Structure, Function, and Bioinformatics, vol. 82, no. 8, pp. 1611–1623, 2014. Wiley

deeph3-distances-orientations's People

Contributors

cguerramain avatar heiidii avatar jjgray avatar jeffreyruffolo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.