Giter Club home page Giter Club logo

dmpmetal's Introduction

DMPmetal

DMPmetal is a deep learning-based method for predicting metal binding sites from amino acid sequences. This uses a pretrained protein language model (pLM) to embed the target sequences and to provide the features for simple feed-forward classifier. From a user perspective, the input to the model is a single amino acid sequence, and the output probabilities relate to each of the 29 CHEBI metal codes. This model was ranked 1st in the UniProt Metal Binding Site Machine Learning Challenge held in 2022, and was trained on the organizers’ provided NEG_TRAIN and POS_TRAIN_FULL datasets, based on curated UniProt annotations (http://insideuniprot.blogspot.com/2022/02/the-uniprot-metal-binding-site-machine.html).

Installation

DMPmetal requires pytorch and the flash-attn packages. At the time of writing flash-attn most easily installs with pytorch 2.0.1 and cuda 11.8. Though you should be able to compile it against later versions of both. Python dependencies can be installed with

pip install -r requirements.txt

The weights file should be downloaded and unpackaged from:

http://bioinfadmin.cs.ucl.ac.uk/downloads/DMPmetal/metalpred.pt.gz

Usage

The standard usage is:

python pytorch_dmp2e2e_pred.py -i /path/to/file.fasta

Example

python pytorch_dmp2e2e_pred.py -i 5pcy.fasta

Options

  -i INPUT   Specify path to fasta file input. 
  -d DEVICE  Hardware to run on. Options: 'cpu', 'cuda'. default is 'cpu'

Outputs

String output to stdout

By default, DMPmetal returns output to stdout. You can capture this to a file with a typical redirection. A typical output might be:

FASTA HEADER ID, CHEBI CODE, RESIUDE NUMBER, PROBABILITY
PDB|5PCY	CHEBI:25213	11	0.01
PDB|5PCY	CHEBI:23378	37	0.99
PDB|5PCY	CHEBI:29036	37	0.02
PDB|5PCY	CHEBI:60240	37	0.03
PDB|5PCY	CHEBI:24875	37	0.02

Only binding residues that pass a 0.01 cutoff are returned. Metal binding residues are predicted for the following CHEBI codes:

CHEBI:48775   : Cd2+
CHEBI:29108   : Ca2+
CHEBI:48828   : Co2+
CHEBI:49415   : Co3+
CHEBI:23378   : Cu cation
CHEBI:49552   : Cu+
CHEBI:29036   : Cu2+
CHEBI:60240   : divalent metal cation
CHEBI:190135  : di-μ-sulfido-diiron
CHEBI:24875   : iron cation
CHEBI:29033   : Fe2+
CHEBI:29034   : Fe3+
CHEBI:30408   : iron-sulfur cluster
CHEBI:49713   : Li+
CHEBI:18420   : Mg2+
CHEBI:29035   : Mn2+
CHEBI:16793   : Hg2+
CHEBI:49786   : Ni2+
CHEBI:60400   : nickel-iron-sulfur cluster
CHEBI:47739   : NiFe4S4 cluster
CHEBI:29103   : K+
CHEBI:29101   : Na+
CHEBI:49883   : tetra-μ3-sulfido-tetrairon
CHEBI:21137   : tri-μ-sulfido-μ3-sulfido-triiron
CHEBI:29105   : Zn2+
CHEBI:177874  : NiFe4S5 cluster
CHEBI:21143   : Fe8S7 iron-sulfur cluster
CHEBI:60504   : iron-sulfur-iron cofactor
CHEBI:25213   : metal cation

dmpmetal's People

Contributors

danbuchan avatar

Watchers

 avatar Shaun Kandathil avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.