The deep-antibody from rvanasa

Find optimal docking parameters using Hex

Docking simulators give a useful although rather janky metric for how an antibody would potentially bind to the target antigen. Most of these tools generate thousands of possible configurations and then leave the user to perform some sort of post-processing such as energy minimization to retrieve useful information from the simulation.

We are focusing on three different docking tools:

Hex: http://hex.loria.fr/dist/index.php
Frodock: http://frodock.chaconlab.org/
ZDock: http://zdock.umassmed.edu/software/

Each of these has its own benefits and drawbacks. Hex is very fast and relatively easy to use; Frodock is state-of-the art but somewhat slower; and ZDock is well-established but relatively inaccurate and slow.

The goal is to create a pipeline where we can automatically dock antibodies to our target and evaluate how much the three simulators agree with each other. Configurations with the most agreement across simulations have been demonstrated to be much more reliable in their contact predictions.

The idea is that we feed the contact points from this pipeline into the deep learning model, which returns a score based on how strong the docking configuration would be in real life.

Fine-tuning the docking simulations is going to require learning how proteins and ligands interact with each other, so this is a perfect entry point for anyone who wants to dig into the microbiological side of the project.

How to start:

Download Hex (http://hex.loria.fr/dist/index.php)
- Available for Windows/Linux/Mac
Acquire some experimental data
- I would recommend downloading 6W41.pdb (https://www.rcsb.org/structure/6W41) because this is an actual antibody that binds to COVID-19
- Split the antibody-antigen pair so you can re-dock them
  - One way of doing this is using pdb-tools (https://pypi.org/project/pdb-tools/)
    - $ pip install pdb-tools
    - $ pdb_selchain -H,L 6w41.pdb > receptor.pdb
    - $ pdb_selchain -C 6w41.pdb > ligand.pdb
- If you just want the pre-split coronavirus PDB files, I'll drop them into the Slack chat for you
Run a docking simulation
Save and open the docked PDB file online (https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html)
Experiment with the docking parameters to get the simulation to match the real-life configuration

Common terminology:

Receptor: antibody
Ligand: antigen (in the context of the docking simulators)
Light / Heavy chains: the two protein sequences which comprise the antibody (they sort of twist around each other)
CDR: specific parts of the antibody which bind to the antigen (the tips of the light/heavy chains)
- Docked solutions should pretty much always bind to these regions
Epitope: contact point on the antibody
Paratope: contact point on the antigen

Note that in most simulators, the antibody is called the "receptor" while the antigen is called the "ligand." If you need any other clarification on terminology, be sure to let me know since other people will probably run into it as well.

Here's a bit of documentation for Hex: http://hex.loria.fr/manual800/hex_manual.pdf

Find a machine learning model and hyperparameters with maximal precision on our current dataset

An extremely important challenge in antibody design is making "no-go" predictions to filter out antibodies that would not be able to bind to the target antigen. Since machine learning has been shown to improve the precision of these guesses, we can use a neural network or other machine learning model to significantly improve the results of the antibody screening process.

The dataset we have created has 512 columns and about 125,000 rows with boolean labels. This is a very similar setup to the famous MNIST handwritten digit classification task.

Because this task has lots of possible approaches, this is a perfect entry point if you want to learn how to design neural networks and/or have a clever idea for how to tackle this challenge.

Recommended Python packages:

Pandas (loading / manipulating tabulated data such as csv files)
NumPy (input data for most ML models)
Keras (included in TensorFlow for creating deep learning models)

Relevant papers:

rvanasa / deep-antibody Goto Github PK

deep-antibody's People

Contributors

Watchers

deep-antibody's Issues

Find optimal docking parameters using Hex

Find a machine learning model and hyperparameters with maximal precision on our current dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent