Giter Club home page Giter Club logo

mcaps's Introduction

MCAPST5-X for Protein-Protein Interaction Prediction

A Hybrid of Deep multi-kernel Convolutional Neural Networks and XGBoost using Protein Language Model for Protein-Protein Prediction MCAPST5-X.

Directory Structure

  • models: This directory houses the various models used and developed throughout the project. These include:

    • LGBM - FSNN (Mahapatra et al., 2021)
    • PIPR (Chen et al., 2019)
    • D-SCRIPT (Sledzieski et al., 2021)
    • Topsy-Turvy (Singh et al., 2022)
    • MCAPST5-X (proposed)
  • data: This directory contains various datasets including:

    • Golden standard datasets: E. coli (Martin et al., 2005); Yeast (Guo et al., 2008); Human (Pan et al., 2010)
    • Independent test sets: Cross-species (Guo et al. 2008), HPRD version 2010, HIPPIE version 2.0, DIP version 20160430
    • Dscript-data: Human, E. coli, Fly, Worm, Yeast (Sledzieski et al., 2021)
    • Interspecies datasets: Virus-human PPI datasets (Yang et al., 2021)
  • checkpoints: This directory contains the saved states of MCAPST5-X training on Pan and Sledzieski human datasets for the inference on other indepedent test sets, allowing for the resumption of training and inference.

  • embeddings: This directory contains pre-computed embeddings used specifically for the PIPR model.

  • environment: This directory contains a requirements.txt file that lists all the Python packages needed to reproduce the MCAPST5-X model. However, this can be ignored as all the necessary libraries with specified versions are included in the Jupyter Notebook for MCAPST5-X.

Usage

To reproduce or experiment with the models, navigate to the models directory and open the corresponding Jupyter Notebook for each model. The notebooks suffixed with cross_validation are used for running cross-validation assessments, while the ones suffixed with inference are used for inferring on new independent datasets.

You can then choose different datasets from the data directory to perform cross-validation tests or inference evaluations. Each notebook contains detailed instructions and comments to guide you through the process. For more information on how to use each model, refer to the corresponding Jupyter Notebook.

Hardware Requirements

The MCAPST5-X project is designed to run on high-performance computing hardware. We recommend using a virtual machine equipped with an A100 SXM4 GPU (80 GB VRAM) and a CPU with 120 GB RAM. This setup ensures efficient model cross-validation, training, and inference, enabling fast and accurate protein-protein interaction predictions.

We leverage the power of cloud computing through VastAI, a high-throughput computing service, to access this level of hardware. We use the Docker Image Template tensorflow:latest-gpu with the Launch Mode jupyter-python notebook, which is a convenient and consistent setup for running deep learning experiments on the Jupyter Notebook environment. If you're using your own setup, please ensure your hardware meets these requirements to achieve optimal performance.

mcaps's People

Contributors

anhvt00 avatar vmeomeo avatar duong755 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.