Giter Club home page Giter Club logo

pytassium's Introduction

pytassium

A Python library for working with Kasabi.com APIs.

Overview

This is a library for working with APIs provided by Kasabi

Homepage: http://github.com/iand/pytassium Pypi: http://pypi.python.org/pypi/pytassium

Installing

Install with easy_install:

>sudo easy_install pytassium

If you already have it installed then simply upgrade with:

>sudo easy_install --upgrade pytassium

pytassium requires the following (should be handled automatically with easy_install):

Getting Started

The basic pattern of usage is as follows:

import pytassium
import time
dataset = pytassium.Dataset('nasa','put-your-api-key-here')

# --------------------------
# Use the lookup API
# --------------------------
response, data = dataset.lookup('http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan')
if response.status in range(200,300):
  # data now contains an rdflib.Graph
  print data.serialize(format="turtle") 
else:
  print "Oh no! %d %s - %s" % (response.status, response.reason, body)

# --------------------------
# Use the sparql API
# --------------------------
response, data = dataset.select('select ?s where {?s a <http://xmlns.com/foaf/0.1/Person>} limit 10')
if response.status in range(200,300):
  # data now contains a dictionary of results
  print data
else:
  print "Oh no! %d %s - %s" % (response.status, response.reason, body)

# --------------------------
# Use the attribution API
# --------------------------
response, data = dataset.attribution()
# assuming success, data now contains dictionary
print data['homepage']

# --------------------------
# Use the search API
# --------------------------
# search for 5 results matching apollo
response, data = dataset.search("apollo", 5)
for result in data['results']:
  print "%s (score: %s)" % (result['title'], result['score'])

# facet on a search for alan, with the name and type fields
fields = ['name', 'type']
query = "alan"
response, data = dataset.facet(query, fields)
for facet in data['fields']:
  print "Top %ss matching %s" % (facet['name'],query)
  for term in facet['terms']:
    print "%s (%s results)" % (term['value'], term['number'])


# --------------------------
# Use the reconciliation API
# --------------------------
# Reconcile one label
response, data = dataset.reconcile('Alan Shepard')
print "Best match is: %s" % data['result'][0]['id']

# Reconcile a list of labels
labels = ['Neil Armstrong','alan shepard']
response, data = dataset.reconcile(labels)
for i in range(0, len(labels)):
  print "Best match for %s is: %s" % (labels[i], data['q%s'%i]['result'][0]['id'])

# Reconcile a label with specific parameters
response, data = dataset.reconcile('Apollo 11', limit=3, type='http://purl.org/net/schemas/space/Mission', type_strict ='any')
print "Best match is: %s" % data['result'][0]['id']

# Reconcile with a specific query
query = {
    "query" : "Apollo 11",
    "limit" : 3,
    "type" : "http://purl.org/net/schemas/space/Mission",
    "type_strict" : "any",
}
response, data = dataset.reconcile(query)
print "Best match is: %s" % data['result'][0]['id']

# --------------------------
# Use the update API
# --------------------------
dataset = pytassium.Dataset('my-writable-dataset','put-your-api-key-here')

# Store the contents of a turtle file
dataset.store_file('/tmp/mydata.ttl', media_type='text/turtle') 

# Store data from a string
mytriples = "<http://example.com/foo> a <http://example.com/Cat> ."
dataset.store_data(mytriples, media_type='text/turtle') 

# --------------------------
# Use the jobs API
# --------------------------
response, job_uri = dataset.schedule_reset()
print "Reset scheduled, URI is: %s" % job_uri
print "Waiting for reset to complete"
done = False
while not done:
  response, data = dataset.job_status(job_uri)
  if response.status in range(200,300):
    if data['status'] == 'scheduled':
      print "Reset has not started yet"
    elif data['status'] == 'running':
      print "Reset is in progress"
    elif data['status'] == 'failed':
      print "Reset has failed :("
      done = True
    elif data['status'] == 'succeeded':
      print "Reset has completed :)"
      done = True

  if not done:
    time.sleep(5)

Using pytassium command line

The pytassium package comes with a command line utility. Use it from the command line like this:

pytassium

You'll be presented with a command prompt:

>>>

First you need to tell it which dataset you want to work with. The "use" command does this. You can supply the short name of the store or the full URI, it doesn't matter:

>>> use nasa
Using nasa

You also need to supply your API key:

>>> apikey yourapikey

You can also specify the dataset and apikey using the -d and -a command line options:

./pytassium -d nasa -a yourapikey

Alternatively you can specify the default apikey to use by setting the KASABI_API_KEY environment variable. In Linux:

export KASABI_API_KEY=yourapikey
pytassium -d nasa

The -a parameter will override the environment variable.

To stop using pytassium use the "exit" command:

>>> exit

Exploring a dataset

pytassium has a number of commands that help with exploring a dataset. First up is "sample" which returns a sample of the subjects from the data:

>>> sample
0. http://data.kasabi.com/dataset/nasa/launchsite/russia
1. http://data.kasabi.com/dataset/nasa/mission/apollo-10/role/lunar-module-pilot
2. http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan
3. http://data.kasabi.com/dataset/nasa/launchsite/tyuratambaikonurcosmodrome
4. http://data.kasabi.com/dataset/nasa/launchsite/xichang
5. http://data.kasabi.com/dataset/nasa/discipline/resupply/refurbishment/repair
6. http://data.kasabi.com/dataset/nasa/mission/apollo-10
7. http://www.bbc.co.uk/programmes/b00lg2xb#programme
8. http://data.kasabi.com/dataset/nasa/mission/apollo-11
9. http://data.kasabi.com/dataset/nasa/mission/apollo-11/role/backup-commander

You'll see that each URI is numbered. You can quickly describe that URI by typing the describe command followed by its number:

>>> describe 1
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix space: <http://purl.org/net/schemas/space/> .

<http://data.kasabi.com/dataset/nasa/mission/apollo-10/role/lunar-module-pilot> 
    a <http://purl.org/net/schemas/space/MissionRole>;
    rdfs:label "Apollo 10 Lunar Module Pilot";
    space:actor <http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan>;
    space:mission <http://data.kasabi.com/dataset/nasa/mission/apollo-10>;
    space:role <http://data.kasabi.com/dataset/nasa/roles/lunar-module-pilot> .

The numbers are remembered between each listing of URIs, so describe 2 will still work.

You can also describe by URI:

>>> describe <http://www.bbc.co.uk/programmes/b00lg2xb#programme>
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix po: <http://purl.org/ontology/po/> .

<http://www.bbc.co.uk/programmes/b00lg2xb#programme> a <http://purl.org/ontology/po/Episode>;
    dc:title "One Small Step";
    po:short_synopsis "The story of Neil Armstrong and Buzz Aldrin's trip to the moon.";
    foaf:primaryTopic <http://data.kasabi.com/dataset/nasa/mission/apollo-10>,
        <http://data.kasabi.com/dataset/nasa/mission/apollo-11>,
        <http://data.kasabi.com/dataset/nasa/person/edwineugenealdrinjr>,
        <http://data.kasabi.com/dataset/nasa/person/neilaldenarmstrong>;
    foaf:topic <http://data.kasabi.com/dataset/nasa/mission/apollo-10> .

You can get a count of the triples in a store:

>>> count
99448 triples

Or counts of various other types:

>>> count subjects
12357 subjects

>>> count classes
10 classes

>>> count properties
39 properties

You can also count occurrences of a class:

>>> count <http://xmlns.com/foaf/0.1/Person>
58 <http://xmlns.com/foaf/0.1/Person>

Or you can use the prefixed version (see below for more on prefixes)

>>> count foaf:Person
58 foaf:Person

The "show" command enables you to explore characteristics of the data:

>>> show classes
0. http://purl.org/net/schemas/space/MissionRole
1. http://purl.org/net/schemas/space/Mission
2. http://xmlns.com/foaf/0.1/Person
3. http://purl.org/net/schemas/space/Spacecraft
4. http://purl.org/net/schemas/space/Launch
5. http://xmlns.com/foaf/0.1/Image
6. http://purl.org/net/schemas/space/Discipline
7. http://purl.org/net/schemas/space/LaunchSite
8. http://purl.org/ontology/po/Episode
9. http://rdfs.org/ns/void#Dataset

>>> show properties
0. http://purl.org/net/schemas/space/place
1. http://www.w3.org/2000/01/rdf-schema#label
2. http://www.w3.org/1999/02/22-rdf-syntax-ns#type
3. http://purl.org/net/schemas/space/actor
4. http://purl.org/net/schemas/space/role
5. http://purl.org/net/schemas/space/mission
6. http://xmlns.com/foaf/0.1/name
7. http://purl.org/net/schemas/space/performed
8. http://www.w3.org/2002/07/owl#sameAs
9. http://purl.org/net/schemas/space/country
10. http://xmlns.com/foaf/0.1/isPrimaryTopicOf
11. http://purl.org/dc/elements/1.1/title
12. http://purl.org/net/schemas/space/missionRole
13. http://purl.org/ontology/po/short_synopsis

>>> show schemas
http://purl.org/net/schemas/space/
http://xmlns.com/foaf/0.1/
http://purl.org/ontology/po/
http://rdfs.org/ns/void#
http://www.w3.org/2000/01/rdf-schema#
http://www.w3.org/1999/02/22-rdf-syntax-ns#
http://www.w3.org/2002/07/owl#
http://purl.org/dc/elements/1.1/
http://purl.org/dc/terms/

>>> show topclasses
                    class                     | count
==============================================+======
http://purl.org/net/schemas/space/Spacecraft  | 6692 
http://purl.org/net/schemas/space/Launch      | 5090 
http://xmlns.com/foaf/0.1/Image               | 303  
http://purl.org/net/schemas/space/MissionRole | 142  
http://xmlns.com/foaf/0.1/Person              | 58   

The "show void" command lists all void descriptions in the dataset, or describes it if there is only one:

>>> show void
@prefix dcterm: <http://purl.org/dc/terms/> .
@prefix void: <http://rdfs.org/ns/void#> .

<http://data.kasabi.com/dataset/nasa/> a <http://rdfs.org/ns/void#Dataset>;
    dcterm:description """
  Conversion of various NASA datasets into RDF, starting with the spacecraft data from the NSSDC master catalog
  """;
    dcterm:source <http://history.nasa.gov/SP-4029/Apollo_00a_Cover.htm>,
        <http://nssdc.gsfc.nasa.gov/nmc/>;
    dcterm:title "NASA Space Flight & Astronaut data";
    void:exampleResource <http://data.kasabi.com/dataset/nasa/mission/apollo-11>,
        <http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan>,
        <http://data.kasabi.com/dataset/nasa/person/neilaldenarmstrong>,
        <http://data.kasabi.com/dataset/nasa/spacecraft/1957-001B>,
        <http://data.kasabi.com/dataset/nasa/spacecraft/1969-059A>,
        <http://data.kasabi.com/dataset/nasa/spacecraft/1977-084A>;
    void:sparqlEndpoint <http://api.talis.com/stores/space/services/sparql>;
    void:uriRegexPattern "http://data.kasabi.com/dataset/nasa/.+" .

The status and attribution commands provide more information about a dataset:

>>> status
Status: published

>>> attribution
Name: NASA
Homepage: http://beta.kasabi.com/dataset/nasa
Source: http://beta.kasabi.com
Logo: http://api.kasabi.com/images/kasabi-20-20.png

Loading data

You can load data from a local file with the "store" command:

>>> store yourdata.nt
Uploading 'yourdata.nt'

The store command will automatically chunk ntriples files and load the pieces into the store. Note: this does not take account of blank nodes so don't use store on files over 2MB if they contain blank nodes

A future version will add support for the ingest service.

Querying data

pytassium provides a "sparql" command to run a sparql query. It will attempt to format the results nicely.

>>> sparql select * where {?s a <http://xmlns.com/foaf/0.1/Person>} limit 5
                                 s                                 
===================================================================
http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan      
http://data.kasabi.com/dataset/nasa/person/jamesarthurlovelljr     
http://data.kasabi.com/dataset/nasa/person/richardfrancisgordonjr  
http://data.kasabi.com/dataset/nasa/person/robertfranklynovermyer  
http://data.kasabi.com/dataset/nasa/person/edgardeanmitchellusn/scd

The sparql command expands well known prefixes automatically:

>>> sparql select ?title where {?s a space:Mission; dc:title ?title } limit 5
  title  
=========
Apollo 10
Apollo 11
Apollo 12
Apollo 17
Apollo 15

You can use "show prefixes" to list the recognised prefixes:

>>> show prefixes
foaf: <http://xmlns.com/foaf/0.1/>
owl: <http://www.w3.org/2002/07/owl#>
xsd: <http://www.w3.org/2001/XMLSchema#>
bibo: <http://purl.org/ontology/bibo/>

You can add your own prefixes with the "prefix" command:

>>> prefix ex <http://example.com/foo/>

By default, when pytassium starts up it attempts to fetch a list of common prefixes from http://prefix.cc. This file is cached in the system temp directory for future use.

Searching

pytassium provides the "search" command for accessing a dataset's search API. All following parameters are assumed to be the search query:

>>> search apollo
0. Apollo 6 (score: 1.0)
1. ASTP-Apollo (score: 0.9938665)
2. Apollo 7 (score: 0.9717672)
3. Apollo 10 (score: 0.9620834)
4. Apollo 8 (score: 0.9620834)

>>> search apollo 13
0. Apollo 13 (score: 1.0)
1. Apollo 13 Lunar Module/ALSEP (score: 0.97858286)
2. Apollo 13 Command and Service Module (CSM) (score: 0.83995086)
3. Apollo 13 SIVB (score: 0.7720434)
4. Soyuz 13 (score: 0.71551764)

>>> search "apollo 13"
0. Apollo 13 (score: 1.0)
1. Apollo 13 Lunar Module/ALSEP (score: 0.9803725)
2. Apollo 13 Command and Service Module (CSM) (score: 0.84758013)
3. Apollo 13 SIVB (score: 0.49793136)
4. Apollo 13 Capsule Communicator (score: 0.41402295)

Reconciling data

pytassium provides a reconcile command which invokes the dataset's reconciliation service.

>>> reconcile apollo
0. http://data.kasabi.com/dataset/nasa/spacecraft/1968-025A (score: 1.0)
1. http://data.kasabi.com/dataset/nasa/spacecraft/1975-066A (score: 0.9938665)
2. http://data.kasabi.com/dataset/nasa/spacecraft/1968-089A (score: 0.9717672)
3. http://data.kasabi.com/dataset/nasa/spacecraft/1969-043A (score: 0.9620834)
4. http://data.kasabi.com/dataset/nasa/spacecraft/1968-118A (score: 0.9620834)

Enclose multi word labels in quotes:

>>> reconcile "apollo 13"
0. http://data.kasabi.com/dataset/nasa/mission/apollo-13 (score: 1.0)
1. http://data.kasabi.com/dataset/nasa/spacecraft/1970-029C (score: 0.97858286)
2. http://data.kasabi.com/dataset/nasa/spacecraft/1970-029A (score: 0.83995086)
3. http://data.kasabi.com/dataset/nasa/spacecraft/1970-029B (score: 0.7720434)
4. http://data.kasabi.com/dataset/nasa/spacecraft/1973-103A (score: 0.71551764)

Specify a type:

>>> reconcile apollo space:Mission
0. http://data.kasabi.com/dataset/nasa/mission/apollo-13 (score: 1.0)
1. http://data.kasabi.com/dataset/nasa/mission/apollo-12 (score: 1.0)
2. http://data.kasabi.com/dataset/nasa/mission/apollo-14 (score: 1.0)
3. http://data.kasabi.com/dataset/nasa/mission/apollo-7 (score: 1.0)
4. http://data.kasabi.com/dataset/nasa/mission/apollo-10 (score: 1.0)

Resetting a dataset

You can schedule a reset job on your dataset:

>>> reset
Scheduling reset job for immediate execution
Reset scheduled, URI is: http://api.kasabi.com/dataset/id-test-dataset/jobs/8777c36e-a904-4498-b837-bcc214a9216d
Reset has not started yet
Reset is in progress
Reset has completed

Batch scripts

pytassium provides a -f command line options which specifies a filename containing commands to run. When pytassium is invoked with the -f option it reads the commands from the file, runs them and then terminates

./pytassium -f /tmp/myscript

You can save the history from an interactive session with the "save" command:

>>> save history /tmp/newscript

And execute the commands in any script with the "run" command:

>>> run /tmp/newscript

Command line operation

Any parameters supplied on the command line are assumed to a command for pytassium. It runs the command and then terminates:

pytassium -a yourapikey -d nasa show classes
0. http://purl.org/net/schemas/space/MissionRole
1. http://purl.org/net/schemas/space/Mission
2. http://xmlns.com/foaf/0.1/Person

Sparql queries will typically need to be enclosed in quotes:

pytassium -a yourapikey -d nasa sparql "select * where {?s a <http://xmlns.com/foaf/0.1/Person>}"
                                 s                                 
===================================================================
http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan      
http://data.kasabi.com/dataset/nasa/person/jamesarthurlovelljr     
http://data.kasabi.com/dataset/nasa/person/richardfrancisgordonjr  
http://data.kasabi.com/dataset/nasa/person/robertfranklynovermyer  
http://data.kasabi.com/dataset/nasa/person/edgardeanmitchellusn/scd

Multi-word reconciliations will need quotes to be doubled or escaped othewise the second word will be treated as the type:

pytassium -a yourapikey -d nasa reconcile "'apollo 13'" space:Mission
0. http://data.kasabi.com/dataset/nasa/mission/apollo-13 (score: 1.0)

A common pattern is to reset a dataset and load some fresh data into it:

pytassium -a yourapikey -d yourdataset reset
pytassium -a yourapikey -d yourdataset store yourdata.nt

To-do

The following APIs are not yet implemented:

  • Augmentation

Related Projects

If Python's not your thing, you may also be interested in:

Author

Ian Davis, [email protected]

Licence

This work is hereby released into the Public Domain.

To view a copy of the public domain dedication, visit http://creativecommons.org/licenses/publicdomain or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

pytassium's People

Contributors

iand avatar bnowack avatar

Stargazers

Zach Beauvais avatar Charles Care avatar Jason Zou avatar John Griffin  avatar Tim Richardson avatar Thomas Grange avatar Jan Killian avatar Peter Ansell avatar  avatar Adam Huffman avatar

Watchers

 avatar James Cloos avatar  avatar

pytassium's Issues

attempting to upload data fails with missing graph dependency

I tried both this python script:


import pytassium
import time

dataset = pytassium.Dataset('chembl-rdf','XXXX')

Store the contents of a turtle file

dataset.store_file('docs.ttl', media_type='text/turtle')

and with the pytassium command line:

pytassium -a XXX -d chembl-rdf store docs.ttl

Both give this error about the missing graph package:

Traceback (most recent call last):
File "/usr/local/bin/pytassium", line 5, in
pkg_resources.run_script('pytassium==0.2.2', 'pytassium')
File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line 467, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line 1200, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/local/lib/python2.6/dist-packages/pytassium-0.2.2-py2.6.egg/EGG-INFO/scripts/pytassium", line 17, in
from pytassium import Dataset
File "/usr/local/lib/python2.6/dist-packages/pytassium-0.2.2-py2.6.egg/pytassium/init.py", line 14, in
from rdflib.graph import Graph
ImportError: No module named graph

I tried: $ sudo easy_install graph
Searching for graph
Reading http://pypi.python.org/simple/graph/
Reading http://robertdick.org/python/mods.html
Reading http://ziyang.ece.northwestern.edu/~dickrp/python/mods.html
Download error: [Errno -2] Name or service not known -- Some packages may not be found!
Reading http://ziyang.ece.northwestern.edu/~dickrp/python/mods.html
Download error: [Errno -2] Name or service not known -- Some packages may not be found!
Reading http://ziyang.ece.northwestern.edu/~dickrp/python/mods.html
Download error: [Errno -2] Name or service not known -- Some packages may not be found!
Best match: graph 0.4
Downloading http://robertdick.org/python/graph-0.4.tar.gz
Processing graph-0.4.tar.gz
Running graph-0.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-pJhiqC/graph-0.4/egg-dist-tmp-BDsCi7
zip_safe flag not set; analyzing archive contents...
Adding graph 0.4 to easy-install.pth file

Installed /usr/local/lib/python2.6/dist-packages/graph-0.4-py2.6.egg
Processing dependencies for graph
Finished processing dependencies for graph

... but that did not help either.

The docs do not seem to say anything about it, and the install config files apparently assume the right graph version to be installed, without defining a right dependency?

wishlist: threaded/parallel uploading?

Currently the speed of uploading is rather slow with chunks of 2MB, and I found that it effectively uploads one chunk every few seconds, not limited by the network throughput. This means that uploading large amounts of data takes very long. On #kasabi it was suggested to split my .ttl input into several files (I used split) and run pytassium on those files in parallel.

This feature request is that the library itself will do this with a reasonable number of parallel upload threads.

Carriage returns cause changeset to fail

If I try to remove a triple that contains a carriage return, calling apply_changeset() fails with a 409 saying that the triple does not exist, when I know that it does. I finally got it work by altering apply_changeset, after the serialize() line and before the actual request(), adding this line:

data = data.replace('\r', '&#13;')

...which then leads to it working. So the carriage returns are lost somewhere if they're not escaped in the XML representation of the changeset. Hope that's helpful.

last chunk not uploaded?

For two subsets of the ChEMBL-RDF data I noted that the last chunk also had a size of about 2MB. While for the first I could accept is being chance, but seeing it with both suggests that the large smaller chunk is not uploaded:

ls -al *.nt
-rw-r--r-- 1 egonw egonw 41360761 Aug 15 11:54 docs.nt
-rw-r--r-- 1 egonw egonw 24221404 Aug 15 11:31 targets.nt

pytassium -a 9f349b2668f976950855ad2b37376731fa354a85 -d chembl-rdf store docs.nt
Uploading 'docs.nt'
Storing chunk 1 of docs.nt (2000096 bytes)
Storing chunk 2 of docs.nt (2000025 bytes)
Storing chunk 3 of docs.nt (2000117 bytes)
Storing chunk 4 of docs.nt (2000003 bytes)
Storing chunk 5 of docs.nt (2000056 bytes)
Storing chunk 6 of docs.nt (2000051 bytes)
Storing chunk 7 of docs.nt (2000096 bytes)
Storing chunk 8 of docs.nt (2000043 bytes)
Storing chunk 9 of docs.nt (2000103 bytes)
Storing chunk 10 of docs.nt (2000018 bytes)
Storing chunk 11 of docs.nt (2000094 bytes)
Storing chunk 12 of docs.nt (2000006 bytes)
Storing chunk 13 of docs.nt (2000102 bytes)
Storing chunk 14 of docs.nt (2000101 bytes)
Storing chunk 15 of docs.nt (2000084 bytes)
Storing chunk 16 of docs.nt (2000037 bytes)
Storing chunk 17 of docs.nt (2000094 bytes)
Storing chunk 18 of docs.nt (2000106 bytes)
Storing chunk 19 of docs.nt (2000087 bytes)
Storing chunk 20 of docs.nt (2000063 bytes)

No 21st chunk.

I checked one of the last triples in my docs.nt and is seems indeed missing from kasabi:

http://data.kasabi.com/dataset/chembl-rdf/09/resource/r52546

The upload for the other one looks like:

pytassium -a 9f349b2668f976950855ad2b37376731fa354a85 -d chembl-rdf store targets.nt
Uploading 'targets.nt'
Storing chunk 1 of targets.nt (2000113 bytes)
Storing chunk 2 of targets.nt (2000072 bytes)
Storing chunk 3 of targets.nt (2000011 bytes)
Storing chunk 4 of targets.nt (2000059 bytes)
Storing chunk 5 of targets.nt (2000051 bytes)
Storing chunk 6 of targets.nt (2000563 bytes)
Storing chunk 7 of targets.nt (2000109 bytes)
Storing chunk 8 of targets.nt (2000149 bytes)
Storing chunk 9 of targets.nt (2000104 bytes)
Storing chunk 10 of targets.nt (2000603 bytes)
Storing chunk 11 of targets.nt (2000075 bytes)
Storing chunk 12 of targets.nt (2000063 bytes)

without a 13th chunk.

>>> describe 1 in README.md error

The README.md documents:

describe 1

However, doing that here gives an error:

$ pytassium

use cdk-cito
apikey 9f349b2668f976950855ad2b37376731fa354a85
describe 1
1 does not look like a URI

easy_install failed with HTML page for rdflib download

On Debian testing I get:

$ sudo easy_install pytassium
Processing pytassium
Running setup.py -q bdist_egg --dist-dir /home/egonw/var/Projects/GitHub/pytassium/egg-dist-tmp-mOcFj1
zip_safe flag not set; analyzing archive contents...
pytassium 0.2.8 is already the active version in easy-install.pth
Installing pytassium script to /usr/local/bin

Installed /usr/local/lib/python2.7/dist-packages/pytassium-0.2.8-py2.7.egg
Processing dependencies for pytassium==0.2.8
Searching for rdflib>=3.1.0
Reading http://pypi.python.org/simple/rdflib/
Reading http://rdflib.net/
Best match: rdflib 3.2.0
Downloading http://rdflib.net/rdflib-3.2.0.tar.gz
error: Unexpected HTML page found at http://rdflib.net/rdflib-3.2.0.tar.gz

A manual download of that tar.gz with wget also gives a HTML page, complaining about an unsupported web browser.

Querying draft dataset APIs

IAND, thanks so much for this library, you mare making my life so easy right now. :) I think I may have found a bug, but It may be an issue with my dataset.

I have a detaset called ons-stats that's in draft, I should and can query it from for instance the sparql API. It seems I can't make this query when using pytassium however.

dataset = pytassium.Dataset('ons-stats','5dac588049e18e51fead5e788dc2449e38c2077a')

response, data = dataset.select("SELECT * WHERE{?s ?p ?o} limit 10")

if response.status in range(200,300):
    # data now contains a dictionary of results
    import pprint
    pprint.pprint(data)
else:
    print "Oh no! %d %s " % (response.status, response.reason)

gives output:

Traceback (most recent call last):
  File "/home/tim/workspace/KasabiUpdate/src/pytassiumTest.py", line 11, in <module>
    response, data = dataset.select("SELECT * WHERE{?s ?p ?o} limit 10")
  File "/usr/lib/python2.7/site-packages/pytassium/__init__.py", line 394, in select
    raise PytassiumError("Dataset does not have a sparql api")
pytassium.PytassiumError: Dataset does not have a sparql api

But if i use pytassium for a publiched repo it works fine.

Another thing, do you know any similar libraries for querying rdf stores but not limited to kasabi? Is it possible to use pytassium for other sparql endpoints?

Counting

Hi Ian,

If I'm using a particular dataset and have sampled it, I can then count it. If I run "count" before sampling (expecting it to give me the number of triples in the 'set) I get an error:

Traceback (most recent call last):
File "/Library/Python/2.6/site-packages/pytassium-0.2.2-py2.6.egg/EGG-INFO/scripts/pytassium", line 128, in call
File "/Library/Python/2.6/site-packages/pytassium-0.2.2-py2.6.egg/EGG-INFO/scripts/pytassium", line 525, in handle_count
File "/Library/Python/2.6/site-packages/pytassium-0.2.2-py2.6.egg/EGG-INFO/scripts/pytassium", line 136, in execute
File "build/bdist.macosx-10.6-universal/egg/pytassium/init.py", line 395, in select
File "build/bdist.macosx-10.6-universal/egg/pytassium/init.py", line 107, in select
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/etree/ElementTree.py", line 964, in XML
return parser.close()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/etree/ElementTree.py", line 1254, in close
self._parser.Parse("", 1) # end of data
ExpatError: no element found: line 7, column 0

If I "sample" the dataset first, I can then run count.

What I'm not sure of is whether I'm "counting" the triples in a sample, or whether it's running over the whole set?

So:

use dataset
sample
0
1
2
...
count
n triples

Does n==triples in the sample?

last chunk not uploaded

A while a go I filed issue #4 about the last chunk of my files not being uploaded. At the time, a fix was committed, but unfortunately that did not fix the problem for me; the last chunk is still not uploaded...

I cannot reopen the original report, so opening this new one, but the #4 has some debug info:

#4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.