Giter Club home page Giter Club logo

spodgi's Issues

Import errors when using the docker container

I built the docker container and tried using it interactively but for the scripts odgi_to_rdf.py and sparql_odgi.py I get the following import error:

root@9580c13cbb68:/spodgi# odgi_to_rdf.py
Traceback (most recent call last):
  File "/usr/bin/odgi_to_rdf.py", line 3, in <module>
    import rdflib
  File "/usr/local/lib/python3.7/dist-packages/rdflib-7.0.0-py3.7.egg/rdflib/__init__.py", line 47, in <module>
    from importlib import metadata
ImportError: cannot import name 'metadata' from 'importlib' (/usr/lib/python3.7/importlib/__init__.py)

For the sparql_server.py script I got a different ImportError:

root@9580c13cbb68:/spodgi# ./sparql_server.py
Traceback (most recent call last):
  File "./sparql_server.py", line 2, in <module>
    from flask import Flask, request, jsonify, Response, g
  File "/usr/local/lib/python3.7/dist-packages/flask-3.0.2-py3.7.egg/flask/__init__.py", line 5, in <module>
    from . import json as json
  File "/usr/local/lib/python3.7/dist-packages/flask-3.0.2-py3.7.egg/flask/json/__init__.py", line 6, in <module>
    from ..globals import current_app
  File "/usr/local/lib/python3.7/dist-packages/flask-3.0.2-py3.7.egg/flask/globals.py", line 6, in <module>
    from werkzeug.local import LocalProxy
  File "/usr/local/lib/python3.7/dist-packages/werkzeug-3.0.1-py3.7.egg/werkzeug/__init__.py", line 5, in <module>
    from .serving import run_simple as run_simple
  File "/usr/local/lib/python3.7/dist-packages/werkzeug-3.0.1-py3.7.egg/werkzeug/serving.py", line 76, in <module>
    t.Union["ssl.SSLContext", t.Tuple[str, t.Optional[str]], t.Literal["adhoc"]]
AttributeError: module 'typing' has no attribute 'Literal'

Both of these issues seem to come from the fact that the docker container has Python 3.7 installed, but rdflib 7.0.0 and flask 3.0.2 require Python 3.8.

Mapping an IRI based lookup to ODGI

For example in the current vg models in RDF. All nodes have a {SOMEBASE}/node/{ID} iri as identifier. These can be used as hack to identify which methods to call.

Consider the sparql query.

PREFIX node:<http://example.org/node/>
PREFIX vg:<http://biohackathon.org/resource/vg#>

SELECT 
   ?node ?sequenceLength
WHERE {
  BIND(node:25 as ?node)
  ?node a vg:Node ; 
     rdf:value ?sequence .
 BIND(strlen(?sequence) AS ?sequenceLength)

Statically analysing the query AST we should be able to determine that this requires a call to odgi.get_handle as that will give us the handle for the node id.

ASK
node:25 a vg:Node .

Can return true as we can look into the IRI string to see it is a node.

SELECT
?sequence
WHERE
{
node:25 rdf:value ?sequence .
}

Can be mapped to odgi.get_handle on which we can ask for the sequence string.

Then the engine can do a classic translation to sequence length by just calling the python method.

SPARQL 1.1. Service does not work

This is a limitation of rdflib 4.2.2. You can check out and pip install my branch of rdflib 5 with pip install --pre ~/git/rdflib

Docker images

Make it easier to install and use SpOdgi as it is from master.

Step IRI generation

We need to figure out how to generate an IRI from a step. This requires at least two components. The name of the path, and either an ordinal offset or rank (step number in the path).

For the first we can use odgi::get_path_name.
To find the ordinal I don't yet know how to do it.

Inspect filters to see if they can be translated a better execution model.

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
SELECT 
  ?seq 
WHERE {
  ?x rdf:value ?seq . 
  FILTER(strlen(?seq) >5)
}

materializes the sequence as python string. Instead of using the odgi.get_length(handle) method.
If we could push such filter constraints into the triples method we would be able to be faster by generating less intermediate objects.

How to query SpOdgi to get external Ensembl annotations

First we modify the example from Ensembl.

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 
PREFIX skos:<http://www.w3.org/2004/02/skos/core#> 
PREFIX owl:<http://www.w3.org/2002/07/owl#> 
PREFIX dc:<http://purl.org/dc/terms/> 
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> 
PREFIX faldo:<http://biohackathon.org/resource/faldo#> 
PREFIX ensembltranscript:<http://rdf.ebi.ac.uk/resource/ensembl.transcript/> 
PREFIX sio:<http://semanticscience.org/resource/> 
PREFIX dcterms:<http://purl.org/dc/terms/>
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
PREFIX obo:<http://purl.obolibrary.org/obo/>
PREFIX vg:<http://biohackathon.org/resource/vg#>

SELECT *

WHERE {
  ?target a vg:Step ;
    vg:node|vg:reverseNode ?node ;
    faldo:location ?stepLinearLocation .
  ?stepLinearLocation faldo:begin ?bp ;
    faldo:end ?ep .
  ?bp faldo:position ?stepBegin .
  ?ep faldo:position ?stepEnd  .
  BIND(<http://rdf.ebi.ac.uk/resource/ensembl/97/saccharomyces_cerevisiae/R64-1-1/VIII> AS ?ref) .
  SERVICE<https://www.ebi.ac.uk/rdf/services/sparql/>{
    SELECT DISTINCT ?transcript ?ref ?begin ?end {
      ?transcript a <http://rdf.ebi.ac.uk/terms/ensembl/protein_coding> .
      ?transcript faldo:location ?location .
      ?location faldo:begin
        [a faldo:ExactPosition ;
        faldo:position ?begin] .
      ?location faldo:end
        [a faldo:ExactPosition ;
        faldo:position ?end] .
        ?location faldo:reference ?ref .
      FILTER(?begin > ?stepBegin && ?end < ?stepEnd)
    } LIMIT 10
  }
}

Validate Prefixes

Currently, prefixes are not validated.
I tried out several queries and none of them worked, until I realized, that the prefix was spelled wrong.
That would be a huge user relief.

odgi_to_rdf misleading error message when 2nd argument is missing

When odgi_to_rdf is executed without the 2nd argument, the error message is very cryptive about what is going on.

(/usr) [heumos@wave spodgi]$ time python odgi_to_rdf.py --syntax=ttl0k_R64-1-1.odgi 
Usage: odgi_to_rdf.py [OPTIONS] ODGIFILE TTL
Try "odgi_to_rdf.py --help" for help.

Error: Missing argument "TTL".

real    0m0.256s
user    0m0.231s
sys     0m0.025s

SPARQL endpoint over http?

The current code is command line only. We would need an rdflib based sparql endpoint to make these pangenomes available on the semantic web.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.