Giter Club home page Giter Club logo

pconpy's Introduction

PConPy

Overview

1ubq CA-CA distance map 1ubq CA-CA contact map

This is the official repository for the redevelopment of PConPy. The original (now obsolete) source code associated with the article is accessible from the legacy branch of this repository.

About

A contact map is a 2D representation of protein structure that illustrates the presence or absence of contacts between individual amino acids. This enables the rapid visual exploratory data analysis of structural features without 3D rendering software. Additionally, the underlying 2D matrix of a contact map, known as a contact matrix, can naturally be used as numeric input in subsequent automated knowledge discovery or machine learning tasks. PConPy generates publication-quality renderings of contact maps, and the related distance- and hydrogen bond maps.

Publication

Please consider citing our paper if you found PConPy to be useful in your research:

  • H. K. Ho, M. Kuiper and K. Ramamohanarao, "PConPy—a Python module for generating 2D protein maps", Bioinformatics, vol. 24, no. 24, pp. 2934-2935, 2008. [article]

Installation

Dependencies

PConPy was developed using Python 2.7 using the following libraries:

  • NumPy
  • BioPython
  • Matplotlib
  • docopt

which can be installed via apt-get using Ubuntu:

sudo apt-get install python-numpy python-biopython python-matplotlib python-docopt

or via the Anaconda Python Distribution:

conda install numpy biopython matplotlib pip
pip install docopt

DSSP

PConPy uses the DSSP secondary structure assignment program to obtain inter-residue hydrogen bond information. The DSSP executable needs to be installed into your system path and renamed to dssp, it can be downloaded from:

  • ftp://ftp.cmbi.ru.nl/pub/software/dssp/

Example usage

Generate a PDF contact map using the CA-CA distance measure:

python ./pconpy/pconpy.py cmap 8.0 --pdb ./tests/pdb_files/1ubq.pdb \
          --chains A --output 1ubqA_cmap.pdf --measure CA 

Generate a PNG distance map using the min. VDW distance measure:

python ./pconpy/pconpy.py dmap --pdb ./tests/pdb_files/3erd.pdb \
          --chains B,C --output 3erdBC_dmap.png --measure minvdw

Generate a plain-text hydrogen bond matrix:

python ./pconpy/pconpy.py hbmap --pdb ./tests/pdb_files/1ubq.pdb \
          --chains A --plaintext --output 1ubq.txt

Who's using PConPy?

  • B. Konopka, M. Ciombor, M. Kurczynska and M. Kotulska, "Automated Procedure for Contact-Map-Based Protein Structure Reconstruction", The Journal of Membrane Biology, vol. 247, no. 5, pp. 409-420, 2014. [article]

  • A. Stivala, A. Wirth and P. Stuckey, Tableau-based protein substructure search using quadratic programming, BMC Bioinformatics, vol. 10, no. 1, p. 153, 2009. [article]

Useful links

  • Peter Cock's (@peterjc) tutorial covers the basics of PDB file parsing and visualisation using the powerful Biopython library.

Contributors

See CONTRIBUTORS.md

pconpy's People

Contributors

doron0220 avatar kianho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pconpy's Issues

default to symmetric plots

users currently need to specify the --symmetric option at the command line to mirror the upper-triangle plot to the lower-triangle.

solution

  • change the --symmetric option to --asymmetric so that plots are symmetric by default.

fix sccmass and cmass calculation

tests reveal that their computation is incorrect, the weighted sum needs to be computed over axis=0 e.g.

weighted_sum = numpy.array(sidechain_atom_coords).sum(axis=0)

DSSP not find

ftp://ftp.cmbi.ru.nl/pub/software/dssp/ not find!!!

change --plaintext output format to edge tuples

The current --plaintext outputs the matrices as dense adjacency vectors. This format is needlessly verbose. Change the output format to graph-style edge tuples, such that tuples of non-contacts are not shown -- (res index 1, res index 2, distance)

standalone pconpy web-server

A standalone pconpy web-server would allow lab members to use pconpy without the need for the command-line.

Possible solution

This could be implemented in Flask.

hydrogen-bond maps

Add hydrogen-bond map (or HB-plot, http://en.wikipedia.org/wiki/Protein_contact_map#HB_Plot) functionality.

This requires parsing of the inter-residue hydrogen-bond interaction columns from DSSP output, a feature which isn't currently supported by biopython.

I have implemented this feature and made a pull-request to the official biopython repository (biopython/biopython#464). The feature has been informally accepted but not yet merged.

possible solution

Copy the required code from my biopython contribution into pconpy.py. Then remove it once the pull-request has been formally merged.

Some Fixes

For other users:

  1. DSSP executable is not available in the provided ftp server. But I can get it using conda install -c salilab dssp, and the command is mkdssp instead of dssp.
  2. from Bio._py3k import StringIO should be removed in python3. Without StringIO, the output can be read as str instead of file.
  3. without chain_ids defined residues = get_residues(opts["--pdb"], chain_ids=chain_ids) will throw an error. Just add chain_ids=None.

Features and labels

Hello, i have two questions hope i get the answers from you

1- first the rule of the sequence alignment is that to extract a chunks of subsequences represents the first sequence

2- and then those alignments are fed to the covariance matrix to extract a matrix called covariance matrix the measures the correlations between each of these alignments with each other

3-from what i understand it that proteins contact map describe the distance matrix as a label , like for example the distance between the first amino acid in the first chain and the first amino acid in the second chain is equal to 200 A, we set a threshold with 8 A so the proteins contact map description for this distance number will be "not in contact" "False" or in binary world "0" is im right with that understanding

My Questions
First
1-what is the rule of the covariance matrix
2- what is the rule of proteins contact map are those the labels of the matrix distances if so what is the rule of the covariance matrix
3- what is the input to the neural network model
A- what is the feature, are those the distance matrix if yes what is the rule of covariance matrix
B- what is the label of these features are Proteins contact map is the labels in (0's and 1's )

Second
1- i want from you kindly to give me a hint or steps which is the first script to use and second and so on cuz i want to cite your paper so i started to inspired from your great work

thanks in advance

Inefficient rendering of PDF plots.

Rendering PDF plots of large proteins takes a long time.
E.g. rendering 1mtp.

possible solutions

  • is there matplotlib trick to make this more efficient?
  • issue a warning to the user

BUG: the default distance inter-residue distance measure fails to be set

Note that opts["--measure"] is set to None by default at runtime, it should be "CA" instead.

$ ./pconpy.py cmap 8.0 -p ../tests/pdb_files/1ubq.pdb -o ./cmap.txt --plaintext -D
{'--dpi': '80',
 '--font-family': 'sans',
 '--font-size': '10',
 '--greyscale': False,
 '--height-inches': '6.0',
 '--mask-thresh': None,
 '--measure': None,
 '--no-colorbar': False,
 '--output': './cmap.txt',
 '--pdb': '../tests/pdb_files/1ubq.pdb',
 '--plaintext': True,
 '--show-frame': False,
 '--symmetric': False,
 '--title': None,
 '--transparent': False,
 '--width-inches': '6.0',
 '--xlabel': 'Residue index',
 '--ylabel': 'Residue index',
 '-D': True,
 '-c': None,
 '<dist>': 8.0,
 'cmap': True,
 'dmap': False,
 'hbmap': False}
Traceback (most recent call last):
  File "./pconpy.py", line 553, in <module>
    symmetric=opts["--symmetric"])
  File "./pconpy.py", line 467, in calc_dist_matrix
    dist = calc_distance(res_a, res_b, measure)
  File "./pconpy.py", line 433, in calc_distance
    raise NotImplementedError
NotImplementedError

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.