Giter Club home page Giter Club logo

deep-antibody's People

Contributors

rvanasa avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deep-antibody's Issues

Find optimal docking parameters using Hex

Docking simulators give a useful although rather janky metric for how an antibody would potentially bind to the target antigen. Most of these tools generate thousands of possible configurations and then leave the user to perform some sort of post-processing such as energy minimization to retrieve useful information from the simulation.

We are focusing on three different docking tools:

Each of these has its own benefits and drawbacks. Hex is very fast and relatively easy to use; Frodock is state-of-the art but somewhat slower; and ZDock is well-established but relatively inaccurate and slow.

The goal is to create a pipeline where we can automatically dock antibodies to our target and evaluate how much the three simulators agree with each other. Configurations with the most agreement across simulations have been demonstrated to be much more reliable in their contact predictions.

The idea is that we feed the contact points from this pipeline into the deep learning model, which returns a score based on how strong the docking configuration would be in real life.

Fine-tuning the docking simulations is going to require learning how proteins and ligands interact with each other, so this is a perfect entry point for anyone who wants to dig into the microbiological side of the project.

How to start:

  • Download Hex (http://hex.loria.fr/dist/index.php)
    • Available for Windows/Linux/Mac
  • Acquire some experimental data
    • I would recommend downloading 6W41.pdb (https://www.rcsb.org/structure/6W41) because this is an actual antibody that binds to COVID-19
    • Split the antibody-antigen pair so you can re-dock them
      • One way of doing this is using pdb-tools (https://pypi.org/project/pdb-tools/)
        • $ pip install pdb-tools
        • $ pdb_selchain -H,L 6w41.pdb > receptor.pdb
        • $ pdb_selchain -C 6w41.pdb > ligand.pdb
    • If you just want the pre-split coronavirus PDB files, I'll drop them into the Slack chat for you
  • Run a docking simulation
  • Save and open the docked PDB file online (https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html)
  • Experiment with the docking parameters to get the simulation to match the real-life configuration

Common terminology:

  • Receptor: antibody
  • Ligand: antigen (in the context of the docking simulators)
  • Light / Heavy chains: the two protein sequences which comprise the antibody (they sort of twist around each other)
  • CDR: specific parts of the antibody which bind to the antigen (the tips of the light/heavy chains)
    • Docked solutions should pretty much always bind to these regions
  • Epitope: contact point on the antibody
  • Paratope: contact point on the antigen

Note that in most simulators, the antibody is called the "receptor" while the antigen is called the "ligand." If you need any other clarification on terminology, be sure to let me know since other people will probably run into it as well.

Here's a bit of documentation for Hex: http://hex.loria.fr/manual800/hex_manual.pdf

Find a machine learning model and hyperparameters with maximal precision on our current dataset

An extremely important challenge in antibody design is making "no-go" predictions to filter out antibodies that would not be able to bind to the target antigen. Since machine learning has been shown to improve the precision of these guesses, we can use a neural network or other machine learning model to significantly improve the results of the antibody screening process.

The dataset we have created has 512 columns and about 125,000 rows with boolean labels. This is a very similar setup to the famous MNIST handwritten digit classification task.

Because this task has lots of possible approaches, this is a perfect entry point if you want to learn how to design neural networks and/or have a clever idea for how to tackle this challenge.

Recommended Python packages:

  • Pandas (loading / manipulating tabulated data such as csv files)
  • NumPy (input data for most ML models)
  • Keras (included in TensorFlow for creating deep learning models)

Relevant papers:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.