Giter Club home page Giter Club logo

pyontutils's Introduction

pyontutils

PyPI version Build Status Coverage Status

python utilities for working with ontologies

Installation

pyontutils is slowly approaching stability. You can obtain it and other related packages from pypi and install them as you see fit (e.g. pip install --user pyontutils). If you need a bleeding edge version I reccomend installing it into your environment (virtual or otherwise) using pip install --user --editable .[dev,test] run from from your local copy of this repo.

Configuration

pyontutils makes use of 3 configuration files:

  1. ~/.config/pyontutils/config.yaml. This file can be used to augment the varibles defined in auth-config.py. For more details about the config see the orthauth guide.
  2. secrets.yaml that you can put wherever you want by editing the auth-stores: secrets: path: entry in config.yaml. The file mode needs to be set to 0600 so that only you can read and write it. It is also advisable to place it inside a folder with a mode set to 0700 since some editors do not preserve file modes. orthauth will fail loudly if this happens.
  3. ./nifstd/scigraph/curie_map.yaml or ~/.config/pyontutils/curie_map.yaml if a pyontutils git repository is not found. pyontutils will retrieve the latest version of this file from github on first run if it cannot find a local copy. The full list of locations that are searched for curie_map.yaml are specified in auth-config.py.

If you are going to use the SciCrunch SciGraph production instance follow the instructions in the sparc curation setup guide to obtain an API key and put it in the right place. In short you can set the key in your secrets file and specify the path to it in config.yaml under the =scigraph-api-key= variable. Alternately you can set the key using the SCICRUNCH_API_KEY environment variable (e.g., by running export SCICRUNCH_API_KEY=$(cat path/to/my/apikey)) or by whatever means you prefer for managing your keys.

Development Installation

From the directory that contains this readme run the following. Refer to .travis.yml for full details.

for f in {librdflib,htmlfn,ttlser,.,neurondm,nifstd}; do pushd $f; pip install --user --pre --editable . ; popd; done

If you need even more information there is fairly exhaustive doccumentation located in the sparc curation setup doc.

Requirements

This repo requires PyPy3 or >=Python3.6. See and setup.py and Pipfile for additional requirements. ontload requires Java8 and >=maven3.3 in order to build SciGraph. parcellation requires FSL to be installed or you need to obtain the atlases in some other way. In order to build the packages required by this repo you will need gcc (and toolchain) installed and will need to have the development packages for libxml installed. To build the development dependencies you will also need the development packages for postgresql, and protobuf installed on your system. Building the documentation for the ontology requires pandoc and emacs along with orgstrap. See .travis.yml for an example of how to bootstrap a working dev environment. Alternately see pyontutils-9999.ebuild and nifstd-tools-9999.ebuild in tgbugs-overlay.

Utility Scripts

pyontutils provides a set of scripts that are useful for maintaining and managing ontologies using git, and making them available via SciGraph. Note that if you choose the development installation option you will need to ln -sT the scripts to your preferred bin folder. For the full list please see the documentation.

  1. ttlfmt Reserialize ontology files using deterministic turtle (spec).
  2. ontutils Various useful and frequently needed commands for ontology processes as well as less frequent refactorings.
  3. ontload Load an ontology managed by git into SciGraph for easy deployment of services.
  4. qnamefix Set qnames based on the curies defined for a given ontology.
  5. necromancy Find dead ids in an ontology and raise them to be owl:Classes again.
  6. scigraph-codegen Generate a rest client against a SciGraph services endpoint.
  7. scig Run queries against a SciGraph endpoint from the command line.
  8. graphml_to_ttl Convert yEd graphml files to ttl.
  9. ontree Run a webserver to query and view hierarchies from the ontology.

NIF-Ontology

Many of these scripts are written for working on the NIF standard ontology found here.

SciGraph

scigraph_codegen.py is code geneator for creating a python client library against a SciGraph REST endpoint. scigraph_client.py is the client library generated against the nif development scigraph instance. ontload can be used to load your ontology into SciGraph for local use.

Building releases

See release.org.

pyontutils's People

Contributors

christian-oreilly avatar dbrnz avatar katrinleinweber avatar memartone avatar tgbugs avatar tmsincomb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyontutils's Issues

Classification error

I notice that after I run the reasoner, all projection neurons are listed as subclasses of Gigantocellular reticular nucleus projection neuron. It doesn't look like a location was defined for that cell type, which is probably the reason why.

Olfactory cells are classified as reticular cells

Two olfactory cells have assertions that they are subclasses of gigantocellular reticular neurons:

'Olfactory bulb (main) granule cell' SubClassOf 'Gigantocellular reticular nucleus intrinsic neuron'

'Olfactory cortex large multipolar cell' SubClassOf 'Gigantocellular reticular nucleus intrinsic neuron'

consolidate various tools into a basic utilties script

Beyond ontload, ontrefactor, and necromancy, there are a number of oneliners and other functionality that should be consolidated into a single file, many from https://github.com/SciCrunch/NIF-Ontology/blob/uri-switch/docs/processes.md. Some scripts such as make_catalog could be subsumed.

One thing that is not in the docs at the moment that needs to be implemented is the ability to make a ttl/reasoner-subset.ttl from a glob/urls so that we reason about collections of ontologies that are far apart in the import chain and whose only common denominator is nif.ttl or nif_backend.ttl.

ttlser serializes shortened ontology iris

We should avoid serializing ontology iris in the import statement if possible. Currently we avoid this by not including the relevant prefixes, however if they creep in as an ns1 (e.g. http://purl.obolibrary.org/obo/) then the ontology iris will also be shortened, which is irritating from a usability standpoint.

ttlser fails if a literal is a subject

import rdflib
from pyontutils.core import makeGraph
g = makeGraph('buggraph', graph=rdflib.Graph())
g.g.add((rdflib.Literal('UT OH'), rdflib.RDFS.comment, rdflib.Literal('This will end badly.')))
g.write()

Produces a key error since subject literals don't seem to make it into the global sort keys.

The default turtle serializer produces the following.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

"UT OH" rdfs:comment "This will end badly." .

scig fails on pipe

scig g UBERON:0000955 -v | head
http://localhost:9000/scigraph/graph/neighbors/UBERON:0000955
UBERON:0000955
	nodes
---------------------------------------------------
:birnlex_4    None

    category: 'anatomical entity'
    types: [
        'anatomical entity'
        'Class'
Traceback (most recent call last):
  File "/home/tom/bin/scig", line 94, in <module>
    main()
  File "/home/tom/bin/scig", line 66, in main
    scigPrint.pprint_neighbors(out)
  File "/home/tom/git/pyontutils/pyontutils/utils.py", line 716, in pprint_neighbors
    scigPrint.pprint_node({'nodes':[node]})
  File "/home/tom/git/pyontutils/pyontutils/utils.py", line 674, in pprint_node
    scigPrint.pprint_meta(node['meta'])
  File "/home/tom/git/pyontutils/pyontutils/utils.py", line 695, in pprint_meta
    print(base, scigPrint.sv(asdf, len(base) + 1, 4))
BrokenPipeError: [Errno 32] Broken pipe

Vestibular nuclei cells are missing from the CUTs. Please add.

Also, I think we should be consistent about putting any spatial qualifiers after the main nucleus. We have a bunch of "medial nucleus X" and I think it should be "nucleus X, medial" where the spatial qualifier differentiates different parts of the nucleus.

Neurofilament mRNA

We have an old class from NIF called Nucleic acid > RNA > mRNA > Neurofilament 150kD mRNA = http://uri.neuinfo.org/nif/nifstd/nlx_mol_090801

It is assigned to a few neuron classes. As it only has one subclass, I think this should probably be removed. Also given that it is a generic neuron protein, I'm not sure that encoding it for these classes is warranted, but if we are going to include inferred neuronal markers, then we should be consistent.

CUT Martinotti cell

Replace: somatostatin with correct identifier
No soma location property recorded. Has layer locations but these are not reasoned over to determine that this is a neocortical cell.

Why only one intrinsic neuron?

When I run the reasoner, there are two classes for intrinsic neuron, one asserted and one equivalent, but each only has one neuron. We say that we classify according to this property and have indicated for each neuron the axon phenotype. So why don't they show up?

hierarchies dematerialization issues

  1. dematerialize needs to account for whether a relationship is transitive or not. For example for subClassOf it is correct to remove cases where a class occurs higher up, it is incorrect to do so for citations where the relationship is not transitive.
  2. The asterisk marking the presence of additional parents is added too early/not removed when the graph is dematerialized.

ttlser determinism fails when OWLAPI reparents disjointness axioms

Running ttlcmp reveals that ttlser only deterministically serializes graphs that have identical BNode structures. There are a number of symmetric axioms in owl where OWLAPI will move a BNode from one parent class to another. We probably need to make ttlser minimally aware of cases where this can happen and come up with a consistent rule to prevent this from happening.

Got a ModuleNotFoundError for 'pyontutils' when using NeuronLangExample.ipynb

Hi,
I've followed the prerequisites steps in order to set up an env for NeuronLangExample.ipynb. The set up went well except for the following two operations:

  pip install dist/pyontutils-*-py3-none-any.whl

and

pip install dist/rdflib-*-py3-none-any.whl

that both finished with the following (non blocking) error:

Failed building wheel for mysql-connector

The notebook is launched at end but it is not working with the following error:
ModuleNotFoundError: No module named 'pyontutils'

I am using python 3.6.

Any chance you know what's wrong with this.

neurondm labels for logical phenotypes

Only the first of 3 phenotypes is from the logical phenotype on layers is listed.

rdfs:label "Mammalia neocortex EGL (with-axon-in cortical layer I) Martinotti +GABA receptor role +Glutamate Receptor +SS +GABA interneuron"

NeuronCUT
and ((hasLayerLocationPhenotype some 'cortical layer II') or (hasLayerLocationPhenotype some 'cortical layer III') or (hasLayerLocationPhenotype some 'cortical layer V'))
and (hasCircuitRolePhenotype some 'Intrinsic Phenotype')
and (hasLocationPhenotype some neocortex)
and (hasMolecularPhenotype some somatostatin)
and (hasMolecularPhenotype some 'GABA receptor role')
and (hasMolecularPhenotype some 'Glutamate Receptor')
and (hasMorphologicalPhenotype some 'Martinotti Phenotype')
and (hasNeurotransmitterPhenotype some GABA)
and (hasTaxonRank some Mammalia)

joblib 0.12.2 unpickling issues

There seems to have been a change to something that joblib is doing. Possibly related to joblib/joblib#745. When running parcellation.py all pools show the following error.

Process ForkPoolWorker-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/home/tom/.local/share/virtualenvs/pyontutils-a77fIBQE/lib/python3.6/site-packages/joblib/pool.py", line 149, in get
    return recv()
  File "/usr/lib64/python3.6/multiprocessing/connection.py", line 253, in recv
    return _ForkingPickler.loads(buf.getbuffer())
TypeError: __new__() takes 1 positional argument but 2 were given

So far this has been traced to /usr/lib64/python3.6/multiprocessing/connection.py when using backend='multiprocessing' but also occurs for other backends. When saving the pickle to disk and trying to reopen in another interpreter the same error occurs.

Erroneus class definition

We have two equivalent classes: Mammalia neuron, one of which seems to contain the definition for a superior colliculus piriform neuron:
http://uri.neuinfo.org/nif/nifstd/BAMSC1117
NeuronCUT
and (hasLayerLocationPhenotype some 'Superior colliculus stratum opticum')
and (hasLocationPhenotype some 'Superior colliculus stratum opticum')
and (hasTaxonRank some Mammalia)
**

scigraph_client bug in Graph.getEdges

pyontutils/scigraph_client.py in <dictcomp>(.0)
    126 
    127         kwargs = {'type':type, 'entail':entail, 'limit':limit, 'skip':skip, 'callback':callback}
--> 128         kwargs = {k:dumps(v) if type(v) is dict else v for k, v in kwargs.items()}
    129         param_rest = self._make_rest('type', **kwargs)
    130         url = self._basePath + ('/graph/edges/{type}').format(**kwargs)

TypeError: 'str' object is not callable

Martinotti cell-problems classifying

In all other neurons, you use "hasSomaLocatedIn", but in this class, you have "hasLocationPhenotype" and "LayerLocationPhenotype". Please make it consistent. I cannot return the Martinotti CUT neuron, however, even with this query: Neuron and (hasLocationPhenotype some neocortex) and ((hasExpressionPhenotype some PR_000015665) or (hasExpressionPhenotype some Sst) or (hasExpressionPhenotype some somatostatin) or (hasExpressionPhenotype some Sst-IRES-Cre))

move Ont Class, etc to their own file

Something like builder or similar, but need to come up with a better name. These depend on git and thus fall outside the core. Alternately, come up with a better version of makeGraph that covers the 80% use case and doesn't have all the baggage.

Duplicate class

Why do we have both cortical +SOM neuron and cortical somatostatin neuron?

switch neurons graphBase/Config to use Ont

c = Config('output-file') seems like a better pattern for setting up export filename, imports, etc. We need to integrate core.Ont as well to get the prov related functionality as well.

decouple external resource dependent code

There are many scripts that depend on resources and other repos and on being in a git repo which need to be separated from the core so that they can be tested independently. They should be distributed, but they need to be logically separate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.