anergictcell / pyhpo Goto Github PK

View Code? Open in Web Editor NEW

20.0 2.0 2.0 121.4 MB

A Python library to work with, analyze, filter and inspect the Human Phenotype Ontology

License: MIT License

Makefile 0.35% Python 99.65%

bioinformatics hpo hpo-similarity ontology

pyhpo's Introduction

PyHPO

A Python library to work with, analyze, filter and inspect the Human Phenotype Ontology

Visit the PyHPO Documentation for a more detailed overview of all the functionality.

Main features

👫 Identify patient cohorts based on clinical features
👨‍👧‍👦 Cluster patients or other clinical information for GWAS
🩻→🧬 Phenotype to Genotype studies
🍎🍊 HPO similarity analysis
🕸️ Graph based analysis of phenotypes, genes and diseases

PyHPO allows working on individual terms HPOTerm, a set of terms HPOSet and the full Ontology.

The library is helpful for discovery of novel gene-disease associations and GWAS data analysis studies. At the same time, it can be used for oragnize clinical information of patients in research or diagnostic settings.

Internally the ontology is represented as a branched linked list, every term contains pointers to its parent and child terms. This allows fast tree traversal functionality.

It provides an interface to create Pandas Dataframe from its data, allowing integration in already existing data anlysis tools.

Hint

Check out hpo3 (Documentation) for an alternative implementation. hpo3 has the exact same functionality, but is much faster 🚀 and supports multithreading for even faster large data processing.

Getting started

The easiest way to install PyHPO is via pip

pip install pyhpo

This will install a base version of PyHPO that offers most functionality.

Note

Some features of PyHPO require pandas and scipy. The standard installation via pip will not include pandas or scipy and PyHPO will work just fine. (You will get a warning on the initial import though).

Without installing pandas, you won't be able to export the Ontology as a Dataframe, everything else will work fine.

Without installing scipy, you won't be able to use the stats module, especially the enrichment calculations.

If you want to do enrichment analysis, you must also install scipy.

pip install 'pyhpo[scipy]'

If you want to work with PyHPO using pandas dataframes, you can install the pandas dependency

pip install 'pyhpo[pandas]'

Or simply install both together:

# Include all dependencies
pip install 'pyhpo[all]'

Usage example

Basic use cases

Some examples for basic functionality of PyHPO

How similar are the phenotypes of two patients

from pyhpo import Ontology
from pyhpo.set import HPOSet

# initilize the Ontology ()
_ = Ontology()

# Declare the clinical information of the patients
patient_1 = HPOSet.from_queries([
    'HP:0002943',
    'HP:0008458',
    'HP:0100884',
    'HP:0002944',
    'HP:0002751'
])

patient_2 = HPOSet.from_queries([
    'HP:0002650',
    'HP:0010674',
    'HP:0000925',
    'HP:0009121'
])

# and compare their similarity
patient_1.similarity(patient_2)
#> 0.7594183905785477

How close are two HPO terms

from pyhpo import Ontology

# initilize the Ontology ()
_ = Ontology()

term_1 = Ontology.get_hpo_object('Scoliosis')
term_2 = Ontology.get_hpo_object('Abnormal axial skeleton morphology')

path = term_1.path_to_other(term_2)
for t in path[1]:
    print(t)

"""
HP:0002650 | Scoliosis
HP:0010674 | Abnormality of the curvature of the vertebral column
HP:0000925 | Abnormality of the vertebral column
HP:0009121 | Abnormal axial skeleton morphology
"""

HPOTerm

An HPOTerm contains various metadata about the term, as well as pointers to its parents and children terms. You can access its information-content, calculate similarity scores to other terms, find the shortest or longes connection between two terms. List all associated genes or diseases, etc.

Examples:

Basic functionalities of an HPO-Term

from pyhpo import Ontology

# initilize the Ontology ()
_ = Ontology()

# Retrieve a term e.g. via its HPO-ID
term = Ontology.get_hpo_object('Scoliosis')

print(term)
#> HP:0002650 | Scoliosis

# Get information content from Term <--> Omim associations
term.information_content['omim']
#> 2.39

# Show how many genes are associated to the term
# (Note that this includes indirect associations, associations
# from children terms to genes.)
len(term.genes)
#> 947

# Show how many Omim Diseases are associated to the term
# (Note that this includes indirect associations, associations
# from children terms to diseases.)
len(term.omim_diseases)
#> 730

# Get a list of all parent terms
for p in term.parents:
    print(p)
#> HP:0010674 | Abnormality of the curvature of the vertebral column

# Get a list of all children terms
for p in term.children:
    print(p)
"""
HP:0002943 | Thoracic scoliosis
HP:0008458 | Progressive congenital scoliosis
HP:0100884 | Compensatory scoliosis
HP:0002944 | Thoracolumbar scoliosis
HP:0002751 | Kyphoscoliosis
"""

(This script is complete, it should run "as is")

Some additional functionality, working with more than one term

from pyhpo import Ontology
_ = Ontology()
term = Ontology.get_hpo_object('Scoliosis')

# Let's get a second term, this time using it HPO-ID
term_2 = Ontology.get_hpo_object('HP:0009121')

print(term_2)
#> HP:0009121 | Abnormal axial skeleton morphology

# Check if the Scoliosis is a direct or indirect child
# of Abnormal axial skeleton morphology

term.child_of(term_2)
#> True

# or vice versa
term_2.parent_of(term)
#> True

# show all nodes between two term:
path = term.path_to_other(term_2)
for t in path[1]:
    print(t)

"""
HP:0002650 | Scoliosis
HP:0010674 | Abnormality of the curvature of the vertebral column
HP:0000925 | Abnormality of the vertebral column
HP:0009121 | Abnormal axial skeleton morphology
"""

print(f'Steps from Term 1 to Term 2: {path[0]}')
#> Steps from Term 1 to Term 2: 3


# Calculate the similarity between two terms
term.similarity_score(term_2)
#> 0.442

(This script is complete, it should run "as is")

Ontology

The Ontology contains all HPO terms, their connections to each other and associations to genes and diseases. It provides some helper functions for HPOTerm search functionality

Examples

from pyhpo import Ontology, HPOSet

# initilize the Ontology (this must be done only once)
_ = Ontology()

# Get a term based on its name
term = Ontology.get_hpo_object('Scoliosis')
print(term)
#> HP:0002650 | Scoliosis

# ...or based on HPO-ID
term = Ontology.get_hpo_object('HP:0002650')
print(term)
#> HP:0002650 | Scoliosis

# ...or based on its index
term = Ontology.get_hpo_object(2650)
print(term)
#> HP:0002650 | Scoliosis

# shortcut to retrieve a term based on its index
term = Ontology[2650]
print(term)
#> HP:0002650 | Scoliosis

# Search for term
for term in Ontology.search('olios'):
    print(term)

"""
HP:0002211 | White forelock
HP:0002290 | Poliosis
HP:0002650 | Scoliosis
HP:0002751 | Kyphoscoliosis
HP:0002943 | Thoracic scoliosis
HP:0002944 | Thoracolumbar scoliosis
HP:0003423 | Thoracolumbar kyphoscoliosis
HP:0004619 | Lumbar kyphoscoliosis
HP:0004626 | Lumbar scoliosis
HP:0005659 | Thoracic kyphoscoliosis
HP:0008453 | Congenital kyphoscoliosis
HP:0008458 | Progressive congenital scoliosis
HP:0100884 | Compensatory scoliosis
"""

(This script is complete, it should run "as is")

The Ontology is a Singleton and should only be initiated once. It can be reused across several modules, e.g:

main.py

from pyhpo import Ontology, HPOSet

import module2

# initilize the Ontology
_ = Ontology()

if __name__ == '__main__':
    module2.find_term('Compensatory scoliosis')

module2.py

from pyhpo import Ontology

def find_term(term):
    return Ontology.get_hpo_object(term)

HPOSet

An HPOSet is a collection of HPOTerm and can be used to represent e.g. a patient's clinical information. It provides APIs for filtering, comparisons to other HPOSet and term/gene/disease enrichments.

Examples:

from pyhpo import Ontology, HPOSet

# initilize the Ontology
_ = Ontology()

# create HPOSets, corresponding to 
# e.g. the clinical information of a patient
# You can initiate an HPOSet using either
# - HPO-ID: 'HP:0002943'
# - HPO-Name: 'Scoliosis'
# - HPO-ID (int): 2943

ci_1 = HPOSet.from_queries([
    'HP:0002943',
    'HP:0008458',
    'HP:0100884',
    'HP:0002944',
    'HP:0002751'
])

ci_2 = HPOSet.from_queries([
    'HP:0002650',
    'HP:0010674',
    'HP:0000925',
    'HP:0009121'
])

# Compare the similarity
ci_1.similarity(ci_2)
#> 0.7593552670152157

# Remove all non-leave nodes from a set
ci_leaf = ci_2.child_nodes()
len(ci_2)
#> 4
len(ci_leaf)
#> 1
ci_2
#> HPOSet.from_serialized("925+2650+9121+10674")
ci_leaf
#> HPOSet.from_serialized("2650")

# Check the information content of an HPOSet
ci_1.information_content()
"""
{
    'mean': 6.571224974009769,
    'total': 32.856124870048845,
    'max': 8.97979449089521,
    'all': [5.98406221734122, 8.286647310335265, 8.97979449089521, 5.5458072864100645, 4.059813565067086]
}
"""

(This script is complete, it should run "as is")

Get genes enriched in an `HPOSet`

Examples:

from pyhpo import Ontology, HPOSet
from pyhpo.stats import EnrichmentModel

# initilize the Ontology
_ = Ontology()

ci = HPOSet.from_queries([
    'HP:0002943',
    'HP:0008458',
    'HP:0100884',
    'HP:0002944',
    'HP:0002751'
])

gene_model = EnrichmentModel('gene')
genes = gene_model.enrichment(method='hypergeom', hposet=ci)

print(genes[0]['item'])
#> PAPSS2

(This script is complete, it should run "as is")

For a more detailed description of how to use PyHPO, visit the PyHPO Documentation.

Contributing

Yes, please do so. We appreciate any help, suggestions for improvement or other feedback. Just create a pull-request or open an issue.

License

PyHPO is released under the MIT license.

PyHPO is using the Human Phenotype Ontology. Find out more at http://www.human-phenotype-ontology.org

Sebastian Köhler, Leigh Carmody, Nicole Vasilevsky, Julius O B Jacobsen, et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Research. (2018) doi: 10.1093/nar/gky1105

pyhpo's People

Contributors

Stargazers

Watchers

Forkers

aktanipek zeromtmu

pyhpo's Issues

Unexpected values using JC and JC2 for hpo similarity

Values between JC and JC2 are not similar in pyhpo, but are similar in hpo3. Also, the dataset I am using produces values that range from .1 to 134 in JC and -.1 to -9 in JC2. I've read that the JC score is supposed to be between 0 and 1, so these values wouldn't make sense. Are values outside of 0-1 expected?

I ran the code below using pyhpo versions 3.2.4, 3.2.5, 3.2.6 and hpo3 version 1.0.3.

I made the small example below to demonstrate the differences.

from pyhpo import HPOSet, Ontology

disease_name = "Neurodevelopmental disorder with central hypotonia and dysmorphic facies"
hpo_terms_to_compare = ['HP:0000218', 'HP:0000384', 'HP:0000842', 'HP:0001212', 'HP:0001274', 'HP:0009796']

 _ = Ontology()
 omim_diseases = list(Ontology.omim_diseases)

omim_disease_hpo= [list(x.hpo) for x in omim_diseases if disease_name == x.name]
 omim_query = HPOSet.from_queries(omim_disease_hpo[0])
 hpo_query = HPOSet.from_queries(hpo_terms_to_compare)

 jc = hpo_query.similarity(omim_query, kind="omim", method="jc")
 jc2 = hpo_query.similarity(omim_query, kind="omim", method="jc2")

 print(jc)
 print(jc2)

This code gives values of 139.57 and -4.31 when using pyhpo v3.2.6 and -4.31 and -4.31 when using hpo3. In addition, there seems to be a discrepancy between versions in pyhpo. Versions 3.2.4 and 3.2.5 both give values of 54.24 and -4.3.

Different number of children for term than ebi HPO browser

pyhpo gives a different number of children for a term than the ebi browser.

For example, looking at term HP:0003674, the ebi HPO browser lists ~27 children and sub-children terms, but pyhpo appears to only list children with additional subchildren, and doesn't list the children terms:

for p in term.children:
     print(p)
 
HP:0003577 | Congenital onset
HP:4000040 | Puerpural onset
HP:0030674 | Antenatal onset
HP:0003623 | Neonatal onset
HP:0410280 | Pediatric onset
HP:0003581 | Adult onset

I'm not sure if I'm misusing phypo, or if it's using a different HPO version than the ebi browser, perhaps?

Translating existing strings into HPO terms

Hi,

I have sets of strings which are not yet in HPO form, but should be translated into HPO terms. In many cases they are (when comparing strings) already very close.

What would be the best way to map these strings to their corresponding HPO term (with uncertainty estimate maybe aka number of character mismatches)?

If I search via

for term in Ontology.search(MYOWNTERM):
print(term.name)

will I get the best matches or are they sorted alphabetically?
Any pointers in general?

New version of HPO

Hello,

last month HPO released an update. I tried to use the new version with this library but it seems that it cannot parse the ontology.

I tried to update the files with

from pyhpo.update_data import download_data
download_data()

and I also created a directory with the new files manually downloaded, but in both cases I get an error:

Traceback (most recent call last):
  File "*/pruebahpo.py", line 2, in <module>
    _ = Ontology()
  File "*/lib/python3.10/site-packages/pyhpo/ontology.py", line 51, in __call__
    self._load_from_obo_file(data_folder)
  File "*/lib/python3.10/site-packages/pyhpo/ontology.py", line 380, in _load_from_obo_file
    for term in terms_from_file(data_folder):
  File "*/lib/python3.10/site-packages/pyhpo/parser/obo.py", line 121, in terms_from_file
    yield parse_obo_section(term_section)
  File "*/lib/python3.10/site-packages/pyhpo/parser/obo.py", line 137, in parse_obo_section
    key, value = line.split(':', 1)
ValueError: not enough values to unpack (expected 2, got 1)

This error appears when I try to do:

from pyhpo import Ontology
_ = Ontology()

term = Ontology.get_hpo_object('Scoliosis')
print(term)

It would be a pity that this library could not be used with the new versions of HPO. It could also be that I did something wrong, but this exact script worked with the previous version of the ontology.

Thank you for your help.

After install problem

Dear all.
I've just installed the package (via 'pip installl pyhpo', also tried 'pip install pyhpo[all]') on two different platforms: Linux Mint, Python 3.8, Spyder IDE and independently on a W11, Python 3.11.4 , IDLE environment.
On both devices I get this error when trying to use the library (Ontology, HPOSet...)

Traceback (most recent call last):
File "<pyshell#0>", line 1, in
from pyhpo import Ontology
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pyhpo_init_.py", line 5, in
from pyhpo.term import HPOTerm
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pyhpo\term.py", line 9, in
from pyhpo.similarity import SimScore
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pyhpo\similarity_init_.py", line 1, in
from pyhpo.similarity.base import SimScore
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pyhpo\similarity\base.py", line 8, in
class _Similarity(BaseModel):
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pydantic_internal_model_construction.py", line 95, in new
private_attributes = inspect_namespace(
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pydantic_internal_model_construction.py", line 328, in inspect_namespace
raise PydanticUserError(
pydantic.errors.PydanticUserError: A non-annotated attribute was detected: kind = 'omim'. All model fields require a type annotation; if kind is not meant to be a field, you may be able to resolve this error by annotating it as a ClassVar or updating model_config['ignored_types'].

For further information visit https://errors.pydantic.dev/2.0.1/u/model-field-missing-annotation

Thanks!

Calculating information content from different datasets

Hello,

I have been using pyHPO to calculate the similarity scores between patients within a dataset using their clinical phenotype lists as HPOSets. ie - HPOSet1.similarity(HPOSet2)
However, I am worried my analyses may be skewed because pyHPO calculates the information contents used in the scoring algorithms based on the "kind" parameter - OMIM, orpha, decipher, or gene. I am wondering if there is any way to create a "custom kind" of sorts so that my patients' similarity scores are calculated using information contents derived from my dataset of choice instead of these publicly available ones?

Any feedback would be greatly appreciated. Thanks!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

anergictcell / pyhpo Goto Github PK

pyhpo's Introduction

PyHPO

Main features

Getting started

Usage example

Basic use cases

How similar are the phenotypes of two patients

How close are two HPO terms

HPOTerm

Examples:

Ontology

Examples

HPOSet

Examples:

Get genes enriched in an HPOSet

Examples:

Contributing

License

pyhpo's People

Contributors

Stargazers

Watchers

Forkers

pyhpo's Issues

Recommend Projects

Recommend Topics

Recommend Org

Get genes enriched in an `HPOSet`