pybel / pybel-tools Goto Github PK

View Code? Open in Web Editor NEW

5.0 6.0 5.0 3.33 MB

A PyBEL extension for analyzing BEL graphs

Home Page: http://pybel-tools.readthedocs.io/

License: MIT License

Python 28.90% Jupyter Notebook 67.57% HTML 3.50% JavaScript 0.04%

systems-biology biological-expression-language networks bioinformatics

pybel-tools's Introduction

PyBEL

PyBEL is a pure Python package for parsing and handling biological networks encoded in the Biological Expression Language (BEL).

It facilitates data interchange between data formats like NetworkX, Node-Link JSON, JGIF, CSV, SIF, Cytoscape, CX, INDRA, and GraphDati; database systems like SQL and Neo4J; and web services like NDEx, BioDati Studio, and BEL Commons. It also provides exports for analytical tools like HiPathia, Drug2ways and SPIA; machine learning tools like PyKEEN and OpenBioLink; and others.

Its companion package, PyBEL Tools, contains a suite of functions and pipelines for analyzing the resulting biological networks.

We realize that we have a name conflict with the python wrapper for the cheminformatics package, OpenBabel. If you're looking for their python wrapper, see here.

Citation

If you find PyBEL useful for your work, please consider citing:

[1]	Hoyt, C. T., et al. (2017). PyBEL: a Computational Framework for Biological Expression Language. Bioinformatics, 34(December), 1–2.

Installation

PyBEL can be installed easily from PyPI with the following code in your favorite shell:

$ pip install pybel

or from the latest code on GitHub with:

$ pip install git+https://github.com/pybel/pybel.git

See the installation documentation for more advanced instructions. Also, check the change log at CHANGELOG.rst.

Getting Started

More examples can be found in the documentation and in the PyBEL Notebooks repository.

Compiling and Saving a BEL Graph

This example illustrates how the a BEL document from the Human Brain Pharmacome project can be loaded and compiled directly from GitHub.

>>> import pybel
>>> url = 'https://raw.githubusercontent.com/pharmacome/conib/master/hbp_knowledge/proteostasis/kim2013.bel'
>>> graph = pybel.from_bel_script_url(url)

Other functions for loading BEL content from many formats can be found in the I/O documentation. Note that PyBEL can handle BEL 1.0 and BEL 2.0+ simultaneously.

After you have a BEL graph, there are numerous ways to save it. The pybel.dump function knows how to output it in many formats based on the file extension you give. For all of the possibilities, check the I/O documentation.

>>> import pybel
>>> graph = ...
>>> # write as BEL
>>> pybel.dump(graph, 'my_graph.bel')
>>> # write as Node-Link JSON for network viewers like D3
>>> pybel.dump(graph, 'my_graph.bel.nodelink.json')
>>> # write as GraphDati JSON for BioDati
>>> pybel.dump(graph, 'my_graph.bel.graphdati.json')
>>> # write as CX JSON for NDEx
>>> pybel.dump(graph, 'my_graph.bel.cx.json')
>>> # write as INDRA JSON for INDRA
>>> pybel.dump(graph, 'my_graph.indra.json')

Summarizing the Contents of the Graph

The BELGraph object has several "dispatches" which are properties that organize its various functionalities. One is the BELGraph.summarize dispatch, which allows for printing summaries to the console.

These examples will use the RAS Model from EMMAA, so you'll have to be sure to pip install indra first. The graph can be acquired and summarized with BELGraph.summarize.statistics() as in:

>>> import pybel
>>> graph = pybel.from_emmaa('rasmodel', date='2020-05-29-17-31-58')  # Needs
>>> graph.summarize.statistics()
---------------------  -------------------
Name                   rasmodel
Version                2020-05-29-17-31-58
Number of Nodes        126
Number of Namespaces   5
Number of Edges        206
Number of Annotations  4
Number of Citations    1
Number of Authors      0
Network Density        1.31E-02
Number of Components   1
Number of Warnings     0
---------------------  -------------------

The number of nodes of each type can be summarized with BELGraph.summarize.nodes() as in:

>>> graph.summarize.nodes(examples=False)
Type (3)        Count
------------  -------
Protein            97
Complex            27
Abundance           2

The number of nodes with each namespace can be summarized with BELGraph.summarize.namespaces() as in:

>>> graph.summarize.namespaces(examples=False)
Namespace (4)      Count
---------------  -------
HGNC                  94
FPLX                   3
CHEBI                  1
TEXT                   1

The edges can be summarized with BELGraph.summarize.edges() as in:

>>> graph.summarize.edges(examples=False)
Edge Type (12)                       Count
---------------------------------  -------
Protein increases Protein               64
Protein hasVariant Protein              48
Protein partOf Complex                  47
Complex increases Protein               20
Protein decreases Protein                9
Complex directlyIncreases Protein        8
Protein increases Complex                3
Abundance partOf Complex                 3
Protein increases Abundance              1
Complex partOf Complex                   1
Protein decreases Abundance              1
Abundance decreases Protein              1

Grounding the Graph

Not all BEL graphs contain both the name and identifier for each entity. Some even use non-standard prefixes (also called namespaces in BEL). Usually, BEL graphs are validated against controlled vocabularies, so the following demo shows how to add the corresponding identifiers to all nodes.

from urllib.request import urlretrieve

url = 'https://github.com/cthoyt/selventa-knowledge/blob/master/selventa_knowledge/large_corpus.bel.nodelink.json.gz'
urlretrieve(url, 'large_corpus.bel.nodelink.json.gz')

import pybel
graph = pybel.load('large_corpus.bel.nodelink.json.gz')

import pybel.grounding
grounded_graph = pybel.grounding.ground(graph)

Note: you have to install pyobo for this to work and be running Python 3.7+.

Displaying a BEL Graph in Jupyter

After installing jinja2 and ipython, BEL graphs can be displayed in Jupyter notebooks.

>>> from pybel.examples import sialic_acid_graph
>>> from pybel.io.jupyter import to_jupyter
>>> to_jupyter(sialic_acid_graph)

Using the Parser

If you don't want to use the pybel.BELGraph data structure and just want to turn BEL statements into JSON for your own purposes, you can directly use the pybel.parse() function.

>>> import pybel
>>> pybel.parse('p(hgnc:4617 ! GSK3B) regulates p(hgnc:6893 ! MAPT)')
{'source': {'function': 'Protein', 'concept': {'namespace': 'hgnc', 'identifier': '4617', 'name': 'GSK3B'}}, 'relation': 'regulates', 'target': {'function': 'Protein', 'concept': {'namespace': 'hgnc', 'identifier': '6893', 'name': 'MAPT'}}}

This functionality can also be exposed through a Flask-based web application with python -m pybel.apps.parser after installing flask with pip install flask. Note that the first run requires about a ~2 second delay to generate the parser, after which each parse is very fast.

Using the CLI

PyBEL also installs a command line interface with the command pybel for simple utilities such as data conversion. In this example, a BEL document is compiled then exported to GraphML for viewing in Cytoscape.

$ pybel compile ~/Desktop/example.bel
$ pybel serialize ~/Desktop/example.bel --graphml ~/Desktop/example.graphml

In Cytoscape, open with Import > Network > From File.

Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.rst for more information on getting involved.

Acknowledgements

Support

The development of PyBEL has been supported by several projects/organizations (in alphabetical order):

Funding

DARPA Young Faculty Award W911NF2010255 (PI: Benjamin M. Gyori).
The European Union, European Federation of Pharmaceutical Industries and Associations (EFPIA), and Innovative Medicines Initiative Joint Undertaking under AETIONOMY [grant number 115568], resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies in kind contribution.

Logo

The PyBEL logo was designed by Scott Colby.

pybel-tools's People

Stargazers

Watchers

Forkers

andersx johnbachman bgyori covid-19-causal-reasoning flalix

pybel-tools's Issues

Allowed function in OWL Ontology

Waiting on Reagon for a standard format for adding allowed function usage for each term in an ontology. Have the parser pick these out and add to the [Values] section of the BEL namespace

Testing of mutation functions

Switch service to use flask bootstrap and flask wtf

Induce subgraph around single node

For Reagon's NPA variant, all of the upstream controllers of a given node are grabbed, then possibly recurring on their upstream controllers as well

Collapse based on orthology

Make a function that collapses nodes based on their orthology connections

def collapse_by_orthology(graph, priority_list=None):
   """Collapses a graph based on the orthology between nodes"""
   priority_list = ['HGNC', 'MGI', 'RGD'] if priority_list is none else priority_list
   ...

Produce graph permutations by rewiring edges

Give certain probability. Check the waltz-strogatz algorithm for small world network generation

Citation processing

Acquisition

Get all PubMed ID's in a graph
Look up the information for the given PMID. Store in a dictionary so multiple lookup isn't necessary
Stick in the network

Storage

Export data to pybel.manager.models.Citation table for long term storage
Provide inverse functions to relationalize/unrelationalize the citations in a BEL Graph

Summary

Get all authors across all the citations in a given BEL Graph
Get all authors, grouped by a given annotation (usually subgraph) in a given BEL Graph

See: pybel/pybel#60

Again, from aetionomy.pybel2 line 62:

def get_pubmedInfos_by_pmidList(self,pmids):
    """
    fetch from NCBI all publication information.
    @param pmids: PMID identifiers
    @type pmids: list
    @type pmids; tuple
    @return: dict with PMID as key and value => dict with keys: authors,title,pubdate,lastauthor,journal,volume,issue,pages,firstauthor,pmcId
    """
    pmids=list(set(list(pmids)))
    resultDict={}
    n = 200
    url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=%s&retmode=json"
    for pmidList in [','.join(pmids[i:i+n]) for i in xrange(0, len(pmids), n)]:
        pmidsJson = json.loads(urllib2.urlopen(url % pmidList).read())
        for pmid in pmidsJson['result']['uids']:
            p=pmidsJson['result'][pmid]
            if 'error' not in p:
                authors = ', '.join([x['name'] for x in p['authors']]) if 'authors' in p else None 
                pubdate = None
                if re.search('^[12][0-9]{3} [a-zA-Z]{3} \d{1,2}$',p['pubdate']):
                    pubdate = datetime.datetime.strptime(p['pubdate'], '%Y %b %d').strftime('%Y-%m-%d')
                elif re.search('^[12][0-9]{3} [a-zA-Z]{3}$',p['pubdate']):
                    pubdate = datetime.datetime.strptime(p['pubdate'], '%Y %b').strftime('%Y-%m-01')
                elif re.search('^[12][0-9]{3}$',p['pubdate']):
                    pubdate = p['pubdate']+"-01-01"
                elif re.search('^[12][0-9]{3} [a-zA-Z]{3}-[a-zA-Z]{3}$',p['pubdate']):
                    pubdate = datetime.datetime.strptime(p['pubdate'][:-4], '%Y %b').strftime('%Y-%m-01')
                else:
                    print pubdate
                pmcIds = [x['value'] for x in p['articleids'] if x['idtype']=='pmc']
                pmcId = pmcIds[0] if pmcIds else None  
                resultDict[pmid]={
                    'authors':authors,
                    'title':p['title'],
                    'pubdate':pubdate,
                    'lastauthor':p['lastauthor'],
                    'journal':p['fulljournalname'],
                    'volume':p['volume'],
                    'issue':p['issue'],
                    'pages':p['pages'],
                    'firstauthor':p['sortfirstauthor'],
                    'pmcId':pmcId}
            else:
                print "probelms with following id:%s\nhttp://www.ncbi.nlm.nih.gov/pubmed/%s" % (pmid,pmid)
        time.sleep(1)
    return resultDict

see also: pybel_tools.boilerplate line 154 for grabbing information about abstracts

Induce on shortest paths between list of nodes

Utility to merge namespaces

Allow list of definitions by file paths and urls

Merge DatabaseService with CacheManager

Since the functions in the database service are just simple wrappers around queries in the graph cache manager, would it make sense to just stick them in it? Or should we keep the cache manager just for functions of filling up and retrieving the cache, while query systems can wrap it and just use its connection?

Merge all node variants by value

Make a function that merge all node variants to its original protein/gene so we can infer all the neighborhood around that original node.

Validation checks for nodes and edges

Validate if:

enzyme activity is described by the IUBMB EC classification system (http://www.chem.qmul.ac.uk/iubmb/)
a dbSNP RS identifier is correctly linked to a gene
substrates and products in a (bio-)chemical reaction are correct
Please write comments if you have more suggestions

This issue has been migrated from pybel/pybel#65

From Reagon:

Find the statements that don't have a disease state (normal, AD, etc) to get flagged.

This could either be done on compile time, or the line number for each statement could be included in the graph.

This has been migrated from pybel/pybel#17.

Download orthology data

Build a simple function that download all orthology data from HGNC's dump:

def download_orthology_dict():
   """Downloads orthology data from HGNC's data service

   :rtype: dict
   """
   pass

Reimplement NeuroMMSig Topological Analysis

Filters for subgraph overlap

Chemicals shouldn't be counted, neither should pathologies, or pathologies

Maybe just pick genes after collapsing to genes?

Testing of summary functions

Allow submission of gene expression data directly to networks via posting

Implement Context-menu for nodes (e.g., expand neighborhood )

Collapse nodes utility

Download and merge orthology data

def integrate_orthology(graph, orthology_dict=None):
   """Integrates the information from the HGNC orthology data dump. 
  
   :param graph: A BEL Graph
   :type graph: BELGraph
   :param orthology_dict: A dictionary of parsed orthology data from HGNC's data service If :code:`None`, downloads and parses the data. 
   :type orthology_dict: dict
   """
   pass

Summary functions for molecular activities, translocations, and degradation

Check exporting functionalities in Flask dict_service.py

Graphml, csv, graph not sure whether they are working. Check with @cthoyt

Pygments Lexer for BEL

Make docs folder and get sphinx set up

Color coding of nodes in jupyter viewer

encode color map in python and have it format it into the javascript
new color map
need legend

PyBEL Flask-Restless API

Reference sankey.js and inspire-tree.js from CDN

Left Network Merge

Make an asymmetric network merge merge(A, B) where the data from network A's nodes take preference over the data from network B

Use case:
Build network to represent Gene - SNP connections, but don't add any node dicts. I just want to add the edges, but not mess with the nodes in another graph.

Better MVC for /explorer/ endpoint

We need to be able to give GET parameters to the /explorer/ endpoint and have those passed along to the ajax calls update the network. The ajax calls can conversely update the window's path to match any changes made without needing a hard reload.

Alternatively, the entire app could be model/view controller, like @cebel wanted in the first place.

With this all in place, it becomes trivial to implement the subgraph seeding

Fix expand neighborhood function

This needs a thorough testing suite with example graphs. Also test that node data gets updated properly (not overwritten?)

Filter by annotation value

It would be great if the next step in the visualization system is to have a way to filter by annotations.

The obvious use case would be to filter by a specific subgraph, defined in the AD and PD assemblies.

Additionally, it might be nice to have a keyword argument to pre-specify which annotation to filter by, so the whole graph can be loaded in javascript, but only a specific slice would be shown initially

Seed subgraph by list of pubmed id's/authors

Get all edges from papers with either the given pubmed id's or authors, then expand on the first neighbors of these networks

Component Connection Algorithm

For any two subgraphs and their parent graph, find the shortest path between the components. Do this with directed paths, and undirected path transformations.

Better documentation of flask cli

Write information about the default host and port in the cli help message

BEL to Natural Language

This would be useful for the pybel-recuration interface.

It might also be possible to use/adapt the INDRA english assembler

Overlay tabular data on nodes

Code skeleton:

def overlay_data(graph, data, label):
    """
    :type graph: :class:`pybel.BELGraph`
    :param data: a dictionary of {pybel node: data for that node}
    :type data: dict
    :param label:
    :type label: str
    """
    pass

Use Cases:

I have differential gene expression data that I want to add as an attribute to all of my RNA nodes so I can do RCR or NPA later.
I have a list of functional consequences for all of my SNPs and want to include this information in my network

Complete origin on miRNA

Most miRNA can be written with g(HGNC:YFG) transcribedTo m(HGNC:YFG) so the inference of the central dogma should also make these edges.

I'm also curious about miRNA's themselves, since there's a differentiation between a "premature" and a "mature" sequence. Does the premature sequence count as an RNA and not an miRNA?

Thoughts @cebel, @dexterpratt?

Resources

Full swagger.json: https://api.stoplight.io/v1/versions/2PcWLTYmcRWXSZ8a7/export/oas.json
Nanopub Schema: https://openbel.api-docs.io/1.1.0/models/nanopub
Connexion: https://connexion.readthedocs.io/en/latest/routing.html

.Gene {
fill: #FFBB78;
}

.miRNA {
fill: #D62728;
}

.Protein {
fill: #1F77B4;
}

.RNA {
fill: #FF9896;
}

.BiologicalProcess {
fill: #2CA02C;
}

.Pathology {
fill: #FF7F0E;
}

.Complex {
fill: #98DF8A;
}

.Composite {
fill: #9467BD;
}

// ADD MORE HERE