pangenome / spodgi Goto Github PK
View Code? Open in Web Editor NEWRDF and SPARQL ideas to build on top of [odgi](https://github.com/pangenome/odgi)
License: MIT License
RDF and SPARQL ideas to build on top of [odgi](https://github.com/pangenome/odgi)
License: MIT License
I built the docker container and tried using it interactively but for the scripts odgi_to_rdf.py
and sparql_odgi.py
I get the following import error:
root@9580c13cbb68:/spodgi# odgi_to_rdf.py
Traceback (most recent call last):
File "/usr/bin/odgi_to_rdf.py", line 3, in <module>
import rdflib
File "/usr/local/lib/python3.7/dist-packages/rdflib-7.0.0-py3.7.egg/rdflib/__init__.py", line 47, in <module>
from importlib import metadata
ImportError: cannot import name 'metadata' from 'importlib' (/usr/lib/python3.7/importlib/__init__.py)
For the sparql_server.py
script I got a different ImportError:
root@9580c13cbb68:/spodgi# ./sparql_server.py
Traceback (most recent call last):
File "./sparql_server.py", line 2, in <module>
from flask import Flask, request, jsonify, Response, g
File "/usr/local/lib/python3.7/dist-packages/flask-3.0.2-py3.7.egg/flask/__init__.py", line 5, in <module>
from . import json as json
File "/usr/local/lib/python3.7/dist-packages/flask-3.0.2-py3.7.egg/flask/json/__init__.py", line 6, in <module>
from ..globals import current_app
File "/usr/local/lib/python3.7/dist-packages/flask-3.0.2-py3.7.egg/flask/globals.py", line 6, in <module>
from werkzeug.local import LocalProxy
File "/usr/local/lib/python3.7/dist-packages/werkzeug-3.0.1-py3.7.egg/werkzeug/__init__.py", line 5, in <module>
from .serving import run_simple as run_simple
File "/usr/local/lib/python3.7/dist-packages/werkzeug-3.0.1-py3.7.egg/werkzeug/serving.py", line 76, in <module>
t.Union["ssl.SSLContext", t.Tuple[str, t.Optional[str]], t.Literal["adhoc"]]
AttributeError: module 'typing' has no attribute 'Literal'
Both of these issues seem to come from the fact that the docker container has Python 3.7 installed, but rdflib 7.0.0 and flask 3.0.2 require Python 3.8.
For example in the current vg models in RDF. All nodes have a {SOMEBASE}/node/{ID}
iri as identifier. These can be used as hack to identify which methods to call.
Consider the sparql query.
PREFIX node:<http://example.org/node/>
PREFIX vg:<http://biohackathon.org/resource/vg#>
SELECT
?node ?sequenceLength
WHERE {
BIND(node:25 as ?node)
?node a vg:Node ;
rdf:value ?sequence .
BIND(strlen(?sequence) AS ?sequenceLength)
Statically analysing the query AST we should be able to determine that this requires a call to odgi.get_handle
as that will give us the handle for the node id.
ASK
node:25 a vg:Node .
Can return true as we can look into the IRI string to see it is a node.
SELECT
?sequence
WHERE
{
node:25 rdf:value ?sequence .
}
Can be mapped to odgi.get_handle
on which we can ask for the sequence string.
Then the engine can do a classic translation to sequence length by just calling the python method.
This is a limitation of rdflib 4.2.2. You can check out and pip install my branch of rdflib 5 with pip install --pre ~/git/rdflib
Make it easier to install and use SpOdgi as it is from master.
We need to figure out how to generate an IRI from a step. This requires at least two components. The name of the path, and either an ordinal offset or rank (step number in the path).
For the first we can use odgi::get_path_name
.
To find the ordinal I don't yet know how to do it.
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT
?seq
WHERE {
?x rdf:value ?seq .
FILTER(strlen(?seq) >5)
}
materializes the sequence as python string. Instead of using the odgi.get_length(handle)
method.
If we could push such filter constraints into the triples
method we would be able to be faster by generating less intermediate objects.
First we modify the example from Ensembl.
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX dc:<http://purl.org/dc/terms/>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX faldo:<http://biohackathon.org/resource/faldo#>
PREFIX ensembltranscript:<http://rdf.ebi.ac.uk/resource/ensembl.transcript/>
PREFIX sio:<http://semanticscience.org/resource/>
PREFIX dcterms:<http://purl.org/dc/terms/>
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
PREFIX obo:<http://purl.obolibrary.org/obo/>
PREFIX vg:<http://biohackathon.org/resource/vg#>
SELECT *
WHERE {
?target a vg:Step ;
vg:node|vg:reverseNode ?node ;
faldo:location ?stepLinearLocation .
?stepLinearLocation faldo:begin ?bp ;
faldo:end ?ep .
?bp faldo:position ?stepBegin .
?ep faldo:position ?stepEnd .
BIND(<http://rdf.ebi.ac.uk/resource/ensembl/97/saccharomyces_cerevisiae/R64-1-1/VIII> AS ?ref) .
SERVICE<https://www.ebi.ac.uk/rdf/services/sparql/>{
SELECT DISTINCT ?transcript ?ref ?begin ?end {
?transcript a <http://rdf.ebi.ac.uk/terms/ensembl/protein_coding> .
?transcript faldo:location ?location .
?location faldo:begin
[a faldo:ExactPosition ;
faldo:position ?begin] .
?location faldo:end
[a faldo:ExactPosition ;
faldo:position ?end] .
?location faldo:reference ?ref .
FILTER(?begin > ?stepBegin && ?end < ?stepEnd)
} LIMIT 10
}
}
Currently, prefixes are not validated.
I tried out several queries and none of them worked, until I realized, that the prefix was spelled wrong.
That would be a huge user relief.
When odgi_to_rdf
is executed without the 2nd argument, the error message is very cryptive about what is going on.
(/usr) [heumos@wave spodgi]$ time python odgi_to_rdf.py --syntax=ttl0k_R64-1-1.odgi
Usage: odgi_to_rdf.py [OPTIONS] ODGIFILE TTL
Try "odgi_to_rdf.py --help" for help.
Error: Missing argument "TTL".
real 0m0.256s
user 0m0.231s
sys 0m0.025s
As the tool will heavily rely on ODGI, would it make sense to integrate it as a submodule into the repository?
I have no clue how modern python is supposed to be shipped. I would love some help on this !
This impacts ttl and sparql logic.
The current code is command line only. We would need an rdflib based sparql endpoint to make these pangenomes available on the semantic web.
for_each_handle
and for_each_path
are the two key methods that need efficient implementation.
However, these are C++ internal iterators while the librdf would like to see a generator.
I would love to see some help on how to best convert from the one into the other.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.