MCAPST5-X for Protein-Protein Interaction Prediction

A Hybrid of Deep multi-kernel Convolutional Neural Networks and XGBoost using Protein Language Model for Protein-Protein Prediction MCAPST5-X.

Directory Structure

models: This directory houses the various models used and developed throughout the project. These include:
- LGBM - FSNN (Mahapatra et al., 2021)
- PIPR (Chen et al., 2019)
- D-SCRIPT (Sledzieski et al., 2021)
- Topsy-Turvy (Singh et al., 2022)
- MCAPST5-X (proposed)
data: This directory contains various datasets including:
- Golden standard datasets: E. coli (Martin et al., 2005); Yeast (Guo et al., 2008); Human (Pan et al., 2010)
- Independent test sets: Cross-species (Guo et al. 2008), HPRD version 2010, HIPPIE version 2.0, DIP version 20160430
- Dscript-data: Human, E. coli, Fly, Worm, Yeast (Sledzieski et al., 2021)
- Interspecies datasets: Virus-human PPI datasets (Yang et al., 2021)
checkpoints: This directory contains the saved states of MCAPST5-X training on Pan and Sledzieski human datasets for the inference on other indepedent test sets, allowing for the resumption of training and inference.
embeddings: This directory contains pre-computed embeddings used specifically for the PIPR model.
environment: This directory contains a requirements.txt file that lists all the Python packages needed to reproduce the MCAPST5-X model. However, this can be ignored as all the necessary libraries with specified versions are included in the Jupyter Notebook for MCAPST5-X.

Usage

To reproduce or experiment with the models, navigate to the models directory and open the corresponding Jupyter Notebook for each model. The notebooks suffixed with cross_validation are used for running cross-validation assessments, while the ones suffixed with inference are used for inferring on new independent datasets.

You can then choose different datasets from the data directory to perform cross-validation tests or inference evaluations. Each notebook contains detailed instructions and comments to guide you through the process. For more information on how to use each model, refer to the corresponding Jupyter Notebook.

Hardware Requirements

The MCAPST5-X project is designed to run on high-performance computing hardware. We recommend using a virtual machine equipped with an A100 SXM4 GPU (80 GB VRAM) and a CPU with 120 GB RAM. This setup ensures efficient model cross-validation, training, and inference, enabling fast and accurate protein-protein interaction predictions.

We leverage the power of cloud computing through VastAI, a high-throughput computing service, to access this level of hardware. We use the Docker Image Template tensorflow:latest-gpu with the Launch Mode jupyter-python notebook, which is a convenient and consistent setup for running deep learning experiments on the Jupyter Notebook environment. If you're using your own setup, please ensure your hardware meets these requirements to achieve optimal performance.

zhijs / mcaps Goto Github PK

mcaps's Introduction

MCAPST5-X for Protein-Protein Interaction Prediction

Directory Structure

Usage

Hardware Requirements

mcaps's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent