pangenome / spodgi Goto Github PK

RDF and SPARQL ideas to build on top of [odgi](https://github.com/pangenome/odgi)

License: MIT License

Shell 15.59% Python 81.78% Common Workflow Language 1.59% Dockerfile 1.04%

spodgi's Issues

Import errors when using the docker container

I built the docker container and tried using it interactively but for the scripts odgi_to_rdf.py and sparql_odgi.py I get the following import error:

root@9580c13cbb68:/spodgi# odgi_to_rdf.py
Traceback (most recent call last):
  File "/usr/bin/odgi_to_rdf.py", line 3, in <module>
    import rdflib
  File "/usr/local/lib/python3.7/dist-packages/rdflib-7.0.0-py3.7.egg/rdflib/__init__.py", line 47, in <module>
    from importlib import metadata
ImportError: cannot import name 'metadata' from 'importlib' (/usr/lib/python3.7/importlib/__init__.py)

For the sparql_server.py script I got a different ImportError:

root@9580c13cbb68:/spodgi# ./sparql_server.py
Traceback (most recent call last):
  File "./sparql_server.py", line 2, in <module>
    from flask import Flask, request, jsonify, Response, g
  File "/usr/local/lib/python3.7/dist-packages/flask-3.0.2-py3.7.egg/flask/__init__.py", line 5, in <module>
    from . import json as json
  File "/usr/local/lib/python3.7/dist-packages/flask-3.0.2-py3.7.egg/flask/json/__init__.py", line 6, in <module>
    from ..globals import current_app
  File "/usr/local/lib/python3.7/dist-packages/flask-3.0.2-py3.7.egg/flask/globals.py", line 6, in <module>
    from werkzeug.local import LocalProxy
  File "/usr/local/lib/python3.7/dist-packages/werkzeug-3.0.1-py3.7.egg/werkzeug/__init__.py", line 5, in <module>
    from .serving import run_simple as run_simple
  File "/usr/local/lib/python3.7/dist-packages/werkzeug-3.0.1-py3.7.egg/werkzeug/serving.py", line 76, in <module>
    t.Union["ssl.SSLContext", t.Tuple[str, t.Optional[str]], t.Literal["adhoc"]]
AttributeError: module 'typing' has no attribute 'Literal'

Both of these issues seem to come from the fact that the docker container has Python 3.7 installed, but rdflib 7.0.0 and flask 3.0.2 require Python 3.8.

Mapping an IRI based lookup to ODGI

For example in the current vg models in RDF. All nodes have a {SOMEBASE}/node/{ID} iri as identifier. These can be used as hack to identify which methods to call.

Consider the sparql query.

PREFIX node:<http://example.org/node/>
PREFIX vg:<http://biohackathon.org/resource/vg#>

SELECT 
   ?node ?sequenceLength
WHERE {
  BIND(node:25 as ?node)
  ?node a vg:Node ; 
     rdf:value ?sequence .
 BIND(strlen(?sequence) AS ?sequenceLength)

Statically analysing the query AST we should be able to determine that this requires a call to odgi.get_handle as that will give us the handle for the node id.

ASK
node:25 a vg:Node .

Can return true as we can look into the IRI string to see it is a node.

SELECT
?sequence
WHERE
{
node:25 rdf:value ?sequence .
}

Can be mapped to odgi.get_handle on which we can ask for the sequence string.

Then the engine can do a classic translation to sequence length by just calling the python method.

SPARQL 1.1. Service does not work

This is a limitation of rdflib 4.2.2. You can check out and pip install my branch of rdflib 5 with pip install --pre ~/git/rdflib

Docker images

Make it easier to install and use SpOdgi as it is from master.

Step IRI generation

We need to figure out how to generate an IRI from a step. This requires at least two components. The name of the path, and either an ordinal offset or rank (step number in the path).

For the first we can use odgi::get_path_name.
To find the ordinal I don't yet know how to do it.

Inspect filters to see if they can be translated a better execution model.

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
SELECT 
  ?seq 
WHERE {
  ?x rdf:value ?seq . 
  FILTER(strlen(?seq) >5)
}

materializes the sequence as python string. Instead of using the odgi.get_length(handle) method.
If we could push such filter constraints into the triples method we would be able to be faster by generating less intermediate objects.

How to query SpOdgi to get external Ensembl annotations

First we modify the example from Ensembl.

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 
PREFIX skos:<http://www.w3.org/2004/02/skos/core#> 
PREFIX owl:<http://www.w3.org/2002/07/owl#> 
PREFIX dc:<http://purl.org/dc/terms/> 
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> 
PREFIX faldo:<http://biohackathon.org/resource/faldo#> 
PREFIX ensembltranscript:<http://rdf.ebi.ac.uk/resource/ensembl.transcript/> 
PREFIX sio:<http://semanticscience.org/resource/> 
PREFIX dcterms:<http://purl.org/dc/terms/>
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
PREFIX obo:<http://purl.obolibrary.org/obo/>
PREFIX vg:<http://biohackathon.org/resource/vg#>

SELECT *

WHERE {
  ?target a vg:Step ;
    vg:node|vg:reverseNode ?node ;
    faldo:location ?stepLinearLocation .
  ?stepLinearLocation faldo:begin ?bp ;
    faldo:end ?ep .
  ?bp faldo:position ?stepBegin .
  ?ep faldo:position ?stepEnd  .
  BIND(<http://rdf.ebi.ac.uk/resource/ensembl/97/saccharomyces_cerevisiae/R64-1-1/VIII> AS ?ref) .
  SERVICE<https://www.ebi.ac.uk/rdf/services/sparql/>{
    SELECT DISTINCT ?transcript ?ref ?begin ?end {
      ?transcript a <http://rdf.ebi.ac.uk/terms/ensembl/protein_coding> .
      ?transcript faldo:location ?location .
      ?location faldo:begin
        [a faldo:ExactPosition ;
        faldo:position ?begin] .
      ?location faldo:end
        [a faldo:ExactPosition ;
        faldo:position ?end] .
        ?location faldo:reference ?ref .
      FILTER(?begin > ?stepBegin && ?end < ?stepEnd)
    } LIMIT 10
  }
}

Validate Prefixes

Currently, prefixes are not validated.
I tried out several queries and none of them worked, until I realized, that the prefix was spelled wrong.
That would be a huge user relief.

odgi_to_rdf misleading error message when 2nd argument is missing

When odgi_to_rdf is executed without the 2nd argument, the error message is very cryptive about what is going on.

(/usr) [heumos@wave spodgi]$ time python odgi_to_rdf.py --syntax=ttl0k_R64-1-1.odgi 
Usage: odgi_to_rdf.py [OPTIONS] ODGIFILE TTL
Try "odgi_to_rdf.py --help" for help.

Error: Missing argument "TTL".

real    0m0.256s
user    0m0.231s
sys     0m0.025s

pangenome / spodgi Goto Github PK

spodgi's Issues

Import errors when using the docker container

Mapping an IRI based lookup to ODGI

SPARQL 1.1. Service does not work

Docker images

Step IRI generation

Inspect filters to see if they can be translated a better execution model.

How to query SpOdgi to get external Ensembl annotations

Validate Prefixes

odgi_to_rdf misleading error message when 2nd argument is missing

Integrate ODGI as a submodule into the repository?

Make this Conda, pip install etc. friendly

If path names have real IRI's then the "IRI denotes a path' hack does not work.

SPARQL endpoint over http?

Best way to turn for_each_handle into a generator instead of an internal iterator

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent