metasoarous / tripl Goto Github PK

This one weird trick turns JSON documents into semantic graph databases!

Python 100.00%

graph-data rdf datomic datascript json-data bioinformatics eav

tripl's Issues

tripl/tripl.py", line 586, in _entity_lookup - AttributeError: 'NoneType' object has no attribute 'keys'

The following code (mostly copy & paste from the readme):

from tripl import tripl


def cft_cons(name):
    return tripl.entity_cons('cft.type:' + name, 'cft.' + name)


def main():
    subject = cft_cons('subject')

    # Next our schema

    schema = {
        'cft.seq:timepoint': {'db:valueType': 'db.type:ref',
                              'db:cardinality': 'db.cardinality:many'},
        'cft.seq:subject': {'db:valueType': 'db.type:ref'}}

    ts = tripl.TripleStore(schema=schema, default_cardinality='db.cardinality:one')

    ts.assert_facts([
        subject(id='QA255')],
        id_attrs=['cft.timepoint:id', 'cft.seq:id', 'cft.subject:id'])

Causes an exception:

Traceback (most recent call last):
  File ".../venv/bin/...", line 11, in <module>
    load_entry_point('...', 'console_scripts', '...')()
  File ".../.../__init__.py", line 38, in main
    id_attrs=['cft.timepoint:id', 'cft.seq:id', 'cft.subject:id'])
  File "build/bdist.linux-x86_64/egg/tripl/tripl.py", line 521, in assert_facts
  File "build/bdist.linux-x86_64/egg/tripl/tripl.py", line 499, in assert_fact
  File "build/bdist.linux-x86_64/egg/tripl/tripl.py", line 472, in _assert_dict
  File "build/bdist.linux-x86_64/egg/tripl/tripl.py", line 448, in _resolve_eid
  File "build/bdist.linux-x86_64/egg/tripl/tripl.py", line 448, in <dictcomp>
  File "build/bdist.linux-x86_64/egg/tripl/tripl.py", line 591, in match
  File "build/bdist.linux-x86_64/egg/tripl/tripl.py", line 586, in _entity_lookup
AttributeError: 'NoneType' object has no attribute 'keys'

Do you have a hint at what might be causing this?

I am using Python 2.7.16 and unmodified Tripl from master branch.

Clarify output formats in documentation

No mention of the EAV index option for assert_facts or as a file format option next to lists of dicts.

Build out the bio namespace

We want a collection of utilities for representing and working with sequence, tree and tabular data. This will involve some data modelling work, tooling for slurping/spitting to standard formats, and in the case of ingest, linking/relating it to the rest of the data (I'm imagining being able to specify a join on some sequence data and a CSV metadata file and representing that as triples, for example). There's also a lot of room here for tooling at the build pipeline level, since these things tend to get into the semantics of the actual data ("for each subject, for each cell cluster, for each ..."; I'll probably specifically build out some thing along these lines for nestly). These things are likely going to have to get broken up into smaller pieces, so this is a bit of an epic issue.

Integrate with GraphQL via pull

The pull query and the graphql query are homomorphic, so it would be cool to think about how we could plug these things together.

https://blog.codeship.com/an-introduction-to-graphql-via-the-github-api/

Add documentation specific to bioinformatic uses

Probably a wiki page or .md file in docs. Should go a little more in depth into how things affect us in bioinformatics, and what these things might look like in a bioinformatic context, with tree/fasta/csv examples. And make reference to nestly work as well.

Assert by reverse lookup attribute

Right now it's possible to query reverse relationships by using an underscore after the namespace separator (so the reverse relation for person:parent would be person:_parent, and would map from parents to children). This should be possible for assertions as well. Imagine you wanted to describe a mother with 10 children.

data = [
    {'person:name': 'Momma Jones',
     'person:_parent': [{'person:name': 'Little Joe Jones'}, {'person:name': 'Wilma Jenkins'}, ...]}]

namespaced/keywords instead of namespaced:keywords

my argument is readibility: when you specify a dict, then there are many ":" which (imho) decreases readibility, compare

        'mock:type': 'mock.type:seq',
        'mock.seq:id': 'a1',
        'mock.seq:string': 'ACTGA',
        'mock:description': 'some foo from bar',

with

        'mock/type': 'mock.type:seq',
        'mock.seq/id': 'a1',
        'mock.seq/string': 'ACTGA',
        'mock/description': 'some foo from bar',

Document entity API

There's an entity api via tp.entity(entity_id) that lets you traverse the graph as a "live" graph of connected dicts. There might also need to be some cardinality or reverse lookup ref work here, but whatever the case, details should be documented in the README.

Finalize all special schema attribute names

Once we're out of alpha/beta we should just NEVER BREAK THE SCHEMA! Because then you'll have to get into having to figure out how to load different data in different versions and that just sucks. So we have to settle on all these little things like: what do we call our primary key? db:ident? tripl:id? Where do we install the schema? On a tripl:schema ident?

metasoarous / tripl Goto Github PK

tripl's Issues

tripl/tripl.py", line 586, in _entity_lookup - AttributeError: 'NoneType' object has no attribute 'keys'

Clarify output formats in documentation

Build out the bio namespace

Integrate with GraphQL via pull

Add documentation specific to bioinformatic uses

Assert by reverse lookup attribute

namespaced/keywords instead of namespaced:keywords

Document entity API

Finalize all special schema attribute names

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent