Giter Club home page Giter Club logo

sparql-engine's Introduction

sparql-engine

build package codecov npm version JavaScript Style Guide

An open-source framework for building SPARQL query engines in Javascript/Typescript.

Online documentation

Main features:

Table of contents

Installation

npm install --save sparql-engine

Getting started

The sparql-engine framework allow you to build a custom SPARQL query engine on top of any data storage system.

In short, to support SPARQL queries on top of your data storage system, you need to:

Examples

As a starting point, we provide you with two examples of integration:

Preliminaries

SPARQL.js algebra and TypeScript

The sparql-engine framework use the SPARQL.js library for parsing and manipulating SPARQL queries as JSON objects. For TypeScript compiltation, we use a custom package sparqljs-legacy-type for providing the types information.

Thus, if you are working with sparql-engine in TypeScript, you will need to install the sparqljs-legacy-type package.

If want to know why we use a custom types package, see the discussion of this issue.

RDF triples representation

This framework represents RDF triples using Javascript Object. You will find below, in Java-like syntax, the "shape" of such object.

interface TripleObject {
  subject: string; // The Triple's subject
  predicate: string; // The Triple's predicate
  object: string; // The Triple's object
}

PipelineStage

The sparql-engine framework uses a pipeline of iterators to execute SPARQL queries. Thus, many methods encountered in this framework needs to return PipelineStage<T>, i.e., objects that generates items of type T in a pull-based fashion.

An PipelineStage<T> can be easily created from one of the following:

To create a new PipelineStage<T> from one of these objects, you can use the following code:

const { Pipeline } = require('sparql-engine')

const sourceObject = // the object to convert into a PipelineStage

const stage = Pipeline.getInstance().from(sourceObject)

Fore more information on how to create and manipulate the pipeline, please refers to the documentation of Pipeline and PipelineEngine.

RDF Graphs

The first thing to do is to implement a subclass of the Graph abstract class. A Graph represents an RDF Graph and is responsible for inserting, deleting and searching for RDF triples in the database.

The main method to implement is Graph.find(triple), which is used by the framework to find RDF triples matching a triple pattern in the RDF Graph. This method must return an PipelineStage<TripleObject>, which will be consumed to find matching RDF triples. You can find an example of such implementation in the N3 example.

Similarly, to support the SPARQL UPDATE protocol, you have to provides a graph that implements the Graph.insert(triple) and Graph.delete(triple) methods, which insert and delete RDF triple from the graph, respectively. These methods must returns Promises, which are fulfilled when the insertion/deletion operation is completed.

Finally, the sparql-engine framework also let your customize how Basic graph patterns (BGPs) are evaluated against the RDF graph. The engine provides a default implementation based on the Graph.find method and the Index Nested Loop Join algorithm. However, if you wish to supply your own implementation for BGP evaluation, you just have to implement a Graph with an evalBGP(triples) method. This method must return a PipelineStage<Bindings>. You can find an example of such implementation in the LevelGraph example.

You will find below, in Java-like syntax, an example subclass of a Graph.

  const { Graph } = require('sparql-engine')

  class CustomGraph extends Graph {
    /**
     * Returns an iterator that finds RDF triples matching a triple pattern in the graph.
     * @param  triple - Triple pattern to find
     * @return An PipelineStage which produces RDF triples matching a triple pattern
     */
    find (triple: TripleObject, options: Object): PipelineStage<TripleObject> { /* ... */ }

    /**
     * Insert a RDF triple into the RDF Graph
     * @param  triple - RDF Triple to insert
     * @return A Promise fulfilled when the insertion has been completed
     */
    insert (triple: TripleObject): Promise { /* ... */ }

    /**
     * Delete a RDF triple from the RDF Graph
     * @param  triple - RDF Triple to delete
     * @return A Promise fulfilled when the deletion has been completed
     */
    delete (triple: : TripleObject): Promise { /* ... */ }
  }

RDF Datasets

Once you have your subclass of Graph ready, you need to build a collection of RDF Graphs, called a RDF Dataset. A default implementation, HashMapDataset, is made available by the framework, but you can build your own by subclassing Dataset.

 const { HashMapDataset } = require('sparql-engine')
 const CustomGraph = require(/* import your Graph subclass */)

 const GRAPH_A_IRI = 'http://example.org#graph-a'
 const GRAPH_B_IRI = 'http://example.org#graph-b'
 const graph_a = new CustomGraph(/* ... */)
 const graph_b = new CustomGraph(/* ... */)

 // we set graph_a as the Default RDF dataset
 const dataset = new HashMapDataset(GRAPH_A_IRI, graph_a)

 // insert graph_b as a Named Graph
 dataset.addNamedGraph(GRAPH_B_IRI, graph_b)

Running a SPARQL query

Finally, to run a SPARQL query on your RDF dataset, you need to use the PlanBuilder class. It is responsible for parsing SPARQL queries and building a pipeline of iterators to evaluate them.

  const { PlanBuilder } = require('sparql-engine')

  // Get the name of all people in the Default Graph
  const query = `
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    SELECT ?name
    WHERE {
      ?s a foaf:Person .
      ?s foaf:name ?name .
    }`

  // Creates a plan builder for the RDF dataset
  const builder = new PlanBuilder(dataset)

  // Get an iterator to evaluate the query
  const iterator = builder.build(query)

  // Read results
  iterator.subscribe(
    bindings => console.log(bindings),
    err => console.error(err),
    () => console.log('Query evaluation complete!')
  )

Enable caching

The sparql-engine provides support for automatic caching of Basic Graph Pattern evaluation using the Semantic Cache algorithm. Basically, the cache will save the results of BGPs already evaluated and, when the engine wants to evaluates a BGP, it will look for the largest subset of the BGP in the cache. If one is available, it will re-use the cached results to speed up query processing.

By default, semantic caching is disabled. You can turn it on/off using the PlanBuilder.useCache and PlanBuilder.disableCache methods, respectively. The useCache method accepts an optional parameter, so you can provide your own implementation of the semantic cache. By defaults, it uses an in-memory LRU cache which stores up to 500MB of items for 20 minutes.

// get an instance of a PlanBuilder
const builder = new PlanBuilder(/* ... */)

// activate the cache
builder.useCache()

// disable the cache
builder.disableCache()

Full Text Search

The sparql-engine provides a non-standard full text search functionnality, allowing users to execute approximate string matching on RDF Terms retrieved by SPARQL queries. To accomplish this integration, it follows an approach similar to BlazeGraph and defines several magic predicates that are given special meaning, and when encountered in a SPARQL query, they are interpreted as configuration parameters for a full text search query.

The simplest way to integrate a full text search into a SPARQL query is to use the magic predicate ses:search inside of a SPARQL join group. In the following query, this predicate is used to search for the keywords neil and gaiman in the values binded to the ?o position of the triple pattern.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ses: <https://callidon.github.io/sparql-engine/search#>
SELECT * WHERE {
  ?s foaf:knows ?o .
  ?o ses:search “neil gaiman” .
}

In a way, full text search queries allows users to express more complex SPARQL filters that performs approximate string matching over RDF terms. Each result is annotated with a relevance score (how much it matches the keywords, higher is better) and a rank (they represent the descending order of relevance scores). These two values are not binded by default into the query results, but you can use magic predicates to get access to them (see below). Note that the meaning of relevance scores is specific to the implementation of the full text search.

The full list of magic predicates that you can use in a full text search query is:

  • ses:search defines keywords to search as a list of keywords separated by spaces.
  • ses:matchAllTerms indicates that only values that contain all of the specified search terms should be considered.
  • ses:minRelevanceand ses:maxRelevance limits the search to matches with a minimum/maximum relevance score, respectively. In the default implementation, scores are floating numbers, ranging from 0.0 to 1.0 with a precision of 4 digits.
  • ses:minRank and ses:maxRank limits the search to matches with a minimum/maximum rank value, respectively. In the default implementation, ranks are positive integers starting at 0.
  • ses:relevance binds each term's relevance score to a SPARQL variable.
  • ses:rank binds each term's rank to a SPARQL variable.

Below is a more complete example, that use most of these keywords to customize the full text search.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ses: <https://callidon.github.io/sparql-engine/search#>
SELECT ?s ?o ?score ?rank WHERE {
  ?s foaf:knows ?o .
  ?o ses:search “neil gaiman” .
  ?o ses:minRelevance “0.25” .
  ?o ses:maxRank “1000” .
  ?o ses:relevance ?score .
  ?o ses:rank ?rank .
  ?o ses:matchAllTerms “true” .
}

To provide a custom implementation for the full text search that is more integrated with your backend, you simply need to override the fullTextSearch method of the Graph class. You can find the full signature of this method in the relevant documentation.

The sparql-engine framework provides a default implementation of this method, which computes relevance scores as the average ratio of keywords matched by words in the RDF terms. Notice that this default implementation is not suited for production usage. It will performs fine for small RDF datasets, but, when possible, you should always provides a dedicated implementation that leverages your backend. For example, for SQL databases, you could use GIN or GIST indexes.

Federated SPARQL Queries

The sparql-engine framework provides automatic support for evaluating federated SPARQL queries, using the SERVICE keyword.

To enable them, you need to set a Graph Factory for the RDF dataset used to evaluate SPARQL queries. This Graph factory is used by the dataset to create new RDF Graph on-demand. To set it, you need to use the Dataset.setGraphFactory method, as detailed below. It takes a callback as parameter, which will be invoked to create a new graph from an IRI. It's your responsibility to define the graph creation logic, depending on your application.

const { HashMapDataset } = require('sparql-engine')
const CustomGraph = require(/* import your Graph subclass */)

const my_graph = new CustomGraph(/* ... */)

const dataset = new HashMapDataset('http://example.org#graph-a', my_graph)

// set the Graph factory of the dataset
dataset.setGraphFactory(iri => {
  // return a new graph for the provided iri
  return new CustomGraph(/* .. */)
})

Once the Graph factory is set, you have nothing more to do! Juste execute your federated SPARQL queries as regular queries, like before!

Custom Functions

SPARQL allows custom functions in expressions so that queries can be used on domain-specific data. The sparql-engine framework provides a supports for declaring such custom functions.

A SPARQL value function is an extension point of the SPARQL query language that allows URI to name a function in the query processor. It is defined by an IRI in a FILTER, BIND or HAVING BY expression. To register custom functions, you must create a JSON object that maps each function's IRI to a Javascript function that takes a variable number of RDF Terms arguments and returns one of the following:

  • A new RDF Term (an IRI, a Literal or a Blank Node) in RDF.js format.
  • An array of RDF Terms.
  • An Iterable or a Generator that yields RDF Terms.
  • The null value, to indicates that the function's evaluation has failed.

RDF Terms are represented using the RDF.js data model. The rdf subpackage exposes a lot of utilities methods to create and manipulate RDF.js terms in the context of custom SPARQL functions.

The following shows a declaration of some simple custom functions.

// load the utility functions used to manipulate RDF terms
const { rdf } = require('sparql-engine')

// define some custom SPARQL functions
const customFunctions = {
  // reverse a RDF literal
  'http://example.com#REVERSE': function (rdfTerm) {
    const reverseValue = rdfTerm.value.split("").reverse().join("")
    return rdf.shallowCloneTerm(rdfTerm, reverseValue)
  },
  // Test if a RDF Literal is a palindrome
  'http://example.com#IS_PALINDROME': function (rdfTerm) {
    const result = rdfTerm.value.split("").reverse().join("") === rdfTerm.value
    return rdf.createBoolean(result)
  },
  // Test if a number is even
  'http://example.com#IS_EVEN': function (rdfTerm) {
    if (rdf.termIsLiteral(rdfTerm) && rdf.literalIsNumeric(rdfTerm)) {
      const jsValue = rdf.asJS(rdfTerm.value, rdfTerm.datatype.value)
      const result = jsValue % 2 === 0
      return rdf.createBoolean(result)
    }
    return terms.createFalse()
  }
}

Then, this JSON object is passed into the constructor of your PlanBuilder.

const builder = new PlanBuilder(dataset, {}, customFunctions)

Now, you can execute SPARQL queries with your custom functions! For example, here is a query that uses our newly defined custom SPARQL functions.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX example: <http://example.com#>
SELECT ?length
WHERE {
  ?s foaf:name ?name .

  # this bind is not critical, but is here for illustrative purposes
  BIND(<http://example.com#REVERSE>(?name) as ?reverse)

  BIND(STRLEN(?reverse) as ?length)

  # only keeps palindromes
  FILTER (!example:IS_PALINDROME(?name))
}
GROUP BY ?length
HAVING (example:IS_EVEN(?length))

Advanced usage

Customize the pipeline implementation

The class PipelineEngine (and its subclasses) is the main component used by sparql-engine to evaluate all SPARQL operations. It defines basic operations (map, filter, etc) that can be used to manipulate intermediate results and evaluate SPARQL queries.

By default, the framework uses an implementation of PipelineEngine based on rxjs, to implements a SPARQL query execution plan as a pipeline of iterators. However, you are able to switch to others implementations of PipelineEngine, using Pipeline.setInstance.

const { Pipeline, PipelineEngine } = require('sparql-engine')

class CustomEngine extends PipelineEngine {
  // ...
}

// add this before creating a new plan builder
Pipeline.setInstance(new CustomEngine())
// ...

Two implementations of PipelineEngine are provided by default.

  • RxjsPipeline, based on rxjs, which provides a pure pipeline approach. This approach is selected by default when loading the framework.
  • VectorPipeline, which materializes all intermediate results at each pipeline computation step. This approach is more efficient CPU-wise, but also consumes a lot more memory.

These implementations can be imported as follows:

const { RxjsPipeline, VectorPipeline } = require('sparql-engine')

Customize query execution

A PlanBuilder implements a Builder pattern in order to create a physical query execution plan for a given SPARQL query. Internally, it defines stages builders to generates operators for executing all types of SPARQL operations. For example, the OrderByStageBuilder is invoked when the PlanBuilder needs to evaluate an ORDER BY modifier.

If you want to customize how query execution plans are built, you have to implement your own stage builders, by extending existing ones. Then, you need to configure your plan builder to use them, with the use function.

  const { PlanBuilder, stages } = require('sparql-engine')

  class MyOrderByStageBuilder extends stages.OrderByStageBuilder {
    /* Define your custom execution logic for ORDER BY */
  }

  const dataset = /* a RDF dataset */

  // Creates a plan builder for the RDF dataset
  const builder = new PlanBuilder(dataset)

  // Plug-in your custom stage builder
  builder.use(stages.SPARQL_OPERATION.ORDER_BY, MyOrderByStageBuilder(dataset))

  // Now, execute SPARQL queries as before with your PlanBuilder

You will find below a reference table of all stage builders used by sparql-engine to evaluate SPARQL queries. Please see the API documentation for more details.

Executors

SPARQL Operation Default Stage Builder Symbol
Aggregates AggregateStageBuilder SPARQL_OPERATION.AGGREGATE
Basic Graph Patterns BGPStageBuilder SPARQL_OPERATION.BGP
BIND BindStageBuilder SPARQL_OPERATION.BIND
DISTINCT DistinctStageBuilder SPARQL_OPERATION.DISTINCT
FILTER FilterStageBuilder SPARQL_OPERATION.FILTER
Property Paths PathStageBuilder SPARQL_OPERATION.PROPERTY_PATH
GRAPH GraphStageBuilder SPARQL_OPERATION.GRAPH
MINUS MinusStageBuilder SPARQL_OPERATION.MINUS
OPTIONAL OptionalStageBuilder SPARQL_OPERATION.OPTIONAL
ORDER_BY OrderByStageBuilder SPARQL_OPERATION.ORDER_BY
SERVICE ServiceStageBuilder SPARQL_OPERATION.SERVICE
UNION UnionStageBuilder SPARQL_OPERATION.UNION
UPDATE UpdateStageBuilder SPARQL_OPERATION.UPDATE

Documentation

To generate the documentation in the docs director:

git clone https://github.com/Callidon/sparql-engine.git
cd sparql-engine
yarn install
npm run doc

Acknowledgments

This framework is developed since 2018 by many contributors, and we thanks them very much for their contributions to this project! Here is the full list of our amazing contributors.

  • Corentin Marionneau (@Slaanaroth)
  • Merlin Barzilai (@Rintarou)
    • Merlin designed the first SPARQL compliance tests for the framework during its research internship at the LS2N.
  • Dustin Whitney (@dwhitney)
    • Dustin implemented the support for custom SPARQL functions and provided a lot of feedback during the early stages of development.
  • Julien Aimonier-Davat (@Lastshot97)
    • Julien implemented the support for SPARQL Property Paths evaluation during its research internship at the LS2N. He is now a Ph.D. Student at the University of Nantes.
  • Arnaud Grall (@folkvir)
    • Arnaud contributed to many bugfixes and provided a lot of feedback throughout the development of the framework. He is now a Software Engineer at SII Atlantique.
  • Thomas Minier (@Callidon)

References

sparql-engine's People

Contributors

callidon avatar dependabot[bot] avatar dwhitney avatar earthlyreason avatar folkvir avatar juniperchicago avatar lastshot97 avatar sroze avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

sparql-engine's Issues

BGP question

This is not really an issue but more of a question - some of the data sets I'm working with have time series data in them and basic queries to get date and value are suffering from the N + 1 select problem - date and value are related to an entity with a blank node, so I get a date and time query for each blank node which can be in the thousands. It's slow and I'm wondering if you have some suggestions for improving performance. My naive guess is that something different than the indexed nested loop join would be better but I'm new to this stuff and could use a pointer in the right direction

RDF/JS Compatibility

Is your feature request related to a problem? Please describe.

I do not find any mention of RDF/JS spec in the docs. It would be really useful if the query engine was usable with any RDF/JS DatasetCore. It's a shame that the library uses its own abstraction of the RDF model.

Describe the solution you'd like

Ideally, the usage would be something like

import { PlanBuilder } from 'sparql-engine'
// or any other RDF/JS compatible factory
import $rdf from 'rdf-ext'

const dataset = $rdf.dataset()

// the rest unchanged
const builder = new PlanBuilder(dataset)
const iterator = builder.build(query)

Describe alternatives you've considered

An adapter would also be an option like below, but I think that given the incompatibility of the graph and triple models this would not make a great solution

import { PlanBuilder, RdfjsDatasetAdapter } from 'sparql-engine'
import $rdf from 'rdf-ext'

const dataset = $rdf.dataset()
const builder = new PlanBuilder(new RdfjsDatasetAdapter(dataset))

Additional context

RDF/JS is a de-facto standard, driven by community, and widely supported. N3 is compatible, graph.js is compatible, comunica is compatible. I think it beneficiary to also use RDF/JS as the underlying model for sparql-engine

Abstract the Observable type

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

I brought this up in #3 and I'd like to revisit it before I attempt an implementation.

There is significant overhead in using Observable, and I'd like to abstract the type away while still providing a concrete implementation with it.

For some context - I've written my own triple store because those available were either too slow or too memory hungry. The triple store I've written can often return results in the 10s to 100s of microseconds, but streaming the results through rxjs turns those 10s to 100s of microseconds into 10s to 100s of milliseconds. The JS runtime will often optimize this if a graph is reused, but the results are still not great. And for my use case, the runtime doesn't really get a chance to do much optimization since I will often run a query against a graph just once.

From my perspective, if the query is streaming across the network, Observable is great, because it'll keep memory consumption low, and it's what you'd want, but often in my case, I already have the data in an Array, and pushing each triple through Observable adds a ton of overhead for really no gain. If I could choose my "stream" type, then I could use Observable when it makes sense, or just get a "pure" piece of data back when that makes sense.

The example of this I've used most is from a Scala library called fs2. In essence you have a Stream type with two type arguments - the first represents the stream's container, which could be something like Observable or Identity, and the second represents the type contained, which would be Algebra.TripleObject in the case of the find function. You can do all sorts of nice map, fold, reduce things etc on it, and once you're ready for it to run you call run (or something similar), and out pops your Observable or Identity (Identity will just give you a pure data structure if that's what you want). More streaming libraries can be supported by simply implementing a few interfaces.

I wanted to get your feedback before attempting anything.

Thanks!

SPARQL JSON format compatibility

Is your feature request related to a problem? Please describe.
I'd like to display the results of sparql-engine in a YasGUI/YASR component (https://triply.cc/docs/yasgui-api/#yasr) in the browser. For this I need to pass a SPARQL JSON result format (https://www.w3.org/TR/sparql11-results-json/)

Describe the solution you'd like
Either directly return bindings in the expected structure, or provide a utility function to convert the original format into a SPARQL JSON result format.

Describe alternatives you've considered
None

Additional context
Sparnatural : https://sparnatural.eu/

Move away from SPARQL.js (and potentially N3)

Is your feature request related to a problem? Please describe.

SPARQL.js is a major performance bottleneck! I've found that in the majority of my queries, parsing the query takes the most time. For example this query takes just over 100ms to parse, and about 6ms to process the data. I realize it's not a trivial query, but 100ms!

What I like about this library is that I can cache small data sets locally in the browser and query large data sets remotely via SERVICE clauses (very cool!), but if it takes 100ms to parse the query, I may as well just do everything remotely, because caching isn't saving me any time.

Describe the solution you'd like

I did some research on JS parsers and found a library under pretty active development called Chevrotain that pretty much blows jison out of the water from a performance perspective. (jison is what SPARQL.js uses). Then I found another library that already has a SPARQL and Turtle parser called millan. Just some quick bench marking on the query I linked to above showed a 100% performance improvement on the first parse and then 500% performance improvement upon subsequent parses.

Milan has a bunch of "junk" in their library that I don't think is necessary for sparql-engine, so I think I'd maybe take that library as inspiration rather than simply using it directly. Also Milan is running an older version of Chevrotain, and I can see a bunch of places that performance improvements could be made to their current parser.

What do you think about this idea:

  • abstract out the parser and make the PlanBuilder take a parser param of type string => Algebra.RootNode.
  • create another project called sparql-engine-sparqljs-parser that provides an implementation with the current functionality
  • create another project called sparql-engine-chevrotain-parser that provides an implementation with a Chevrotain parser.

This would let people choose which they liked better -- maybe they are already using one or the other.

Also down the road N3 could probably be replaced with a Chevrotain parser as well. I have had issues with N3 in the past, and I find the source code very difficult to read, but it's fine for now.

Let me know your thoughts and I am happy to do this when I get to it.

custom functions

Is your feature request related to a problem? Please describe.

I'd like to create a custom data type, similar to something like "xsd:date", and have a couple of functions available to filter on that custom type.

For more detail - I have a lot of time series data describing some subject over time. Currently for each moment in time, I have a blank node to relate date and value. This is making the my queries slow because each blank node requires two bgp calls to get the date and value for each. I'd like to create a data type that smashes date and value together into one value and then I would not need the blank node, but I'd still need a way to filter on those values. I can use regular expressions, but I think that will be tough for my users. It would be nice to have a custom function to decouple those values like BIND(timeseries:date(?timeseries) as ?date) or BIND(timeseries:value(?timeseries) as ?value). I can do this with regular expressions, but I think that will become tedious.

Describe the solution you'd like

Not sure.

Describe alternatives you've considered

I can think of a way I could do this with regular expressions as described above, but I think that's not ideal if there is another way. I do See that Jena offers something like this https://github.com/apache/jena/blob/master/jena-arq/src-examples/arq/examples/propertyfunction/uppercase.java

Outline Query Hints

Hi, I'm doing some performance optimization over the next couple of weeks in our app, and I'm wondering if you could outline some of the existing query hints - how they work and what they do, etc.

Also I see a query hint called SORTED_TRIPLES, but I don't think there is an implementation for it. In my custom graph my triples are indexed and sorted, and I think I could see some pretty good benefits from this particular query hint. Could you describe how it's supposed to work and if I get a chance I will attempt to implement it and make a PR?

Thanks!

Union Types are Hard to Use

Apologies if this issue is due to not understanding how the library works, but I'm having trouble understanding how the type system works:

The method Planbuilder.build has a return signature of PipelineStage<QueryOutput> | Consumable. This means that calling code can't know in advance if the returned object is subscribable or a promise. Similarly, QueryOutput is defined as the union Bindings | Algebra.TripleObject | boolean, so again calling code can't know what to expect.

This results in awkward code that needs to test the return types before using it and prevents the IDE from suggesting completions:

import { Graph, Dataset, HashMapDataset, PlanBuilder, PipelineStage, Bindings } from 'sparql-engine';
import { Consumable } from 'sparql-engine/dist/operators/update/consumer';
import { QueryOutput } from 'sparql-engine/dist/engine/plan-builder';
import { Algebra } from 'sparqljs';

    const query = `
      PREFIX dc: <http://purl.org/dc/elements/1.1/>
      INSERT DATA { <http://example/book1>  dc:title  "Fundamentals of Compiler Design" }`;
    const output = builder.build(query);
    if ('subscribe' in output) {
      (output as PipelineStage<QueryOutput>).subscribe((value: QueryOutput) => {
        if (value instanceof Bindings) {
          value.forEach((variable, val) => console.log(`${variable}: ${val}`));
        } else if (typeof value !== 'boolean') {
          const t = value as Algebra.TripleObject;
          console.log(`Triple: ${t.subject} ${t.predicate} ${t.object}`);
        } else { // boolean result
          console.log(`Boolean result: ${value}`);
        }
      }, console.error, () => {});
    } else {
      (output as Consumable).execute().then(result => {
        console.log('Query completed. No result returned');
      });
    }

Also, many of those imports aren't exported by sparql-engine, so have to be imported from specific files in the dist directory. Is this intentional?

Finally, I'm wondering why the RXJS interfaces aren't used directly? E.g. PipelineOutput could implement Subscribable or just be an rxjs Observable. Then it would be pipeable to other rxjs operators.

JsonFormat not valid when no results

Describe the bug

The JsonFormat produces {]}} if there are no results.

To Reproduce
Steps to reproduce the behavior:

Just use any query (e.g. "SELECT * WHERE { ?s ?p ?o }")) to a dataset with no data and apply the JsonFormat.

Expected behavior

It would ideally produce the expected W3C formatted result, however I realise that you can't produce the bindings with the current set up as the first result is used, but it would be good if it would at least be valid JSON so the JSON.parse doesn't throw.

Failing example

Thanks for this great project! ❤️

Describe the bug

The levelgraph example doesn't run: https://github.com/Callidon/sparql-engine/blob/2d699c93033d8e9737f19d50395f8c4fb13d2b87/examples/levelgraph.js

To Reproduce
Steps to reproduce the behavior:

  1. Copy the above file
  2. NPM Install sparql-engine
  3. node levelgraph.js
  4. See error:
error TypeError: dest.on is not a function
    at FormatterStream.Readable.pipe (_stream_readable.js:669:8)
    at RxjsPipeline.map (/example/node_modules/sparql-engine/dist/engine/pipeline/rxjs-pipeline.js:95:22)
    at MergeMapSubscriber.project (/example/node_modules/sparql-engine/dist/engine/stages/bgp-stage-builder.js:71:23)
    at MergeMapSubscriber._tryNext (/example/node_modules/rxjs/internal/operators/mergeMap.js:69:27)
    at MergeMapSubscriber._next (/example/node_modules/rxjs/internal/operators/mergeMap.js:59:18)
    at MergeMapSubscriber.Subscriber.next (/example/node_modules/rxjs/internal/Subscriber.js:66:18)
    at Observable._subscribe (/example/node_modules/rxjs/internal/util/subscribeToArray.js:5:20)
    at Observable._trySubscribe (/example/node_modules/rxjs/internal/Observable.js:44:25)
    at Observable.subscribe (/example/node_modules/rxjs/internal/Observable.js:30:22)
    at MergeMapOperator.call (/example/node_modules/rxjs/internal/operators/mergeMap.js:39:23)

Expected behavior
Example should not crash.

Unexpected slowdown

Originally posted by @dwhitney in #3 (comment)

Hmm... I'm seeing some pretty significant slowdown in master now. A query that previously took roughly 400ms is now taking 40 seconds. Odd thing is it seems to only be happening in the browser and not in node. Here's a screenshot of some profiling I've been doing. I'll post more as I get more results.

screen shot 2018-10-02 at 11 41 21 pm

Add ability to sparql query over unbound (?g) named graphs

Having variables that describes graph IRIs should be possible within a query

A sparql query that should work according to the 1.1 spec is:

  PREFIX dblp-pers: <https://dblp.org/pers/m/>
  PREFIX dblp-rdf: <https://dblp.uni-trier.de/rdf/schema-2017-04-18#>
  PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  SELECT ?s ?name ?article ?g
  WHERE {
        GRAPH ?g {
          ?s rdf:type dblp-rdf:Person .
          ?s dblp-rdf:primaryFullPersonName ?name .
          ?s dblp-rdf:authorOf ?article .
      }
  }

I am also willing to contribute but have not much experience with this project yet. May be you can sketch a solution and point me to the important parts?

I created a test case for it that (of coarse) right now fails, and placed it into a pull request #43

Aggregates

Do aggregates actually work? I have tested them out and they don't seem to really do what's advertised. If this is not a known issue, I can provide some examples, but otherwise, are there plans to build this in?

TIA!

Modules not found after fresh install

Describe the bug
Relative modules in ./dist/api.js are not found after installing

To Reproduce
Steps to reproduce the behavior:

  1. Run npm install sparql-engine
  2. Use the package

Expected behavior
It should not throw errors like the following:

ERROR in ./node_modules/sparql-engine/dist/api.js
Module not found: Error: Can't resolve './engine/pipeline/pipeline-engine' in ...

Desktop (please complete the following information):

  • OS: Windows
  • Browser: Chrome (but not relevant)
  • Version: latest

Additional context
The dist/ folder only contains api.js after fresh install but should contain more files.

I ran npm run build to build the package and all the other files appeared then and the package works fine.

Documentation / help on how to use in the browser

Is your feature request related to a problem? Please describe.

It's not related to a problem, but I would like documentation or examples on how to use sparql-engine + N3.js in the browser.

Describe the solution you'd like

A documentatio/example HTML page that demonstrates how to :

  1. Init an N3 storage + sparql-engine on top of that
  2. Load an RDF file from a URL in the N3 storage
  3. Executes a SPARQL query
  4. Shows the necessary Javascript files to do that (from a CDN ? needs to be compiled locally ?)

Describe alternatives you've considered

Additional context

My goal is to use a local SPARQL-compliant triplestore to execute SPARQL queries built with Sparnatural : https://github.com/sparna-git/Sparnatural

ccing @dwhitney As I read in another issue you are caching local data in the browser, so maybe you could share an example ?

Thanks for your help !

The N3 example is not working...

Describe the bug
The N3.js implementation example seems to be outdated. It's not working for several reasons

e.g. Store is a constructor and needs to be called with 'new', the N3 store fully relies on rdf.js spec terms (not on strings), ...

Expected behavior

I attached a working example.

sparql-engine-n3-example.txt

Error: Unknown graph with iri ?g

Describe the bug

Insert/Where with a graph variable fails

To Reproduce

I tried to run the following query

PREFIX sh: <http://www.w3.org/ns/shacl#>

INSERT {
  GRAPH ?g { 
    ?shape sh:property ?property .
    ?property sh:group ?group . 
  }
} WHERE {
  GRAPH ?g { 
    ?shape <http://example.com/group> ?group .
    ?group <http://example.com/property> ?property . 
  }
}

This fails as seen below

node_modules/sparql-engine/dist/rdf/hashmap-dataset.js:81
            throw new Error("Unknown graph with iri " + iri);
            ^

Error: Unknown graph with iri ?g
    at HashMapDataset.getNamedGraph (node_modules/sparql-engine/dist/rdf/hashmap-dataset.js:81:19)
    at UpdateStageBuilder._buildInsertConsumer (node_modules/sparql-engine/dist/engine/stages/update-stage-builder.js:208:81)
    at node_modules/sparql-engine/dist/engine/stages/update-stage-builder.js:192:30
    at Array.map (<anonymous>)
    at UpdateStageBuilder._handleInsertDelete (node_modules/sparql-engine/dist/engine/stages/update-stage-builder.js:191:60)
    at node_modules/sparql-engine/dist/engine/stages/update-stage-builder.js:77:38
    at Array.map (<anonymous>)
    at UpdateStageBuilder.execute (node_modules/sparql-engine/dist/engine/stages/update-stage-builder.js:71:53)
    at PlanBuilder.build (node_modules/sparql-engine/dist/engine/plan-builder.js:211:73)
    at test.js:66:30

Expected behavior

The query should succeed, since ?g is a variable bound in the WHERE clause and not an IRI

Incompatible sparqljs Type Definitions

Describe the bug
This library includes its own type definitions for sparqljs. These definitions conflict with the ones hosted on DefinitelyTyped that are used by other projects. These incompatibilities make it very difficult to use sparql-engine in an application that also uses other libraries that depend on the DefinitelyTyped defintions. Importing from 'sparqljs' will resolve to @types/sparqljs rather than the custom ones (which is what is desired some of the time). Could you please transition to using the types provided by DefinitelyTyped (ideally from v3)?

To Reproduce
Steps to reproduce the behavior:

  1. Create a TypeScript NodeJS application (see e.g. https://www.digitalocean.com/community/tutorials/setting-up-a-node-project-with-typescript)
  2. Install sparql-engine and @types/sparqljs (or another library that depends on @types/sparqljs)
  3. Attempt to create a custom sparql-engine Graph and import needed types from sparqljs
  4. The system tries to import from @types/sparqljs instead of sparql-engine/types/sparqljs

I've managed to alter my .tsconfig file to prefer one set of definitions over the other, but this fails when it needs to use both.

Expected behavior
Only a single set of types should exist with the given name. Apps within the TypeScript SPARQL ecosystem should be able to share type definitions as much as possible.

The DefinitelyTyped definitions are here: https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/sparqljs/index.d.ts

[Federated SPARQL Queries] setGraphFactory callback should return a `Promise<Graph>` instead of `Graph`

I think it would make sense to have the callback parameter in setGraphFactory on Dataset return a Promise<Graph> instead of Graph because it will help improve performance quite a bit for Graphs that deal with remote data.

Under the current type signature, Graphs must handle async themselves, so the underlying data must be wrapped in a Promise, which means there are unnecessary function invocations to get at the underlying data (calls to .then) as well as unnecessary context switching as the runtime evaluates Promises. This can be avoided if the data for a graph is loaded inside of setGraphFactory and then injected into a custom Graph. On some queries there are quite a large number of invocations to Graph.find and this overhead impacts performance quite a bit. I believe that setGraphFactory is far less likely to be invoked than Graph.find, and wrapping setGraphFactory in a Promise doesn't disallow the underlying Graph implementation to continue using Promise to fetch data, but the current type signature definitely requires it for remote data.

The main place where I see this is when using VectorStage. The content is wrapped in a Promise, so much of the performance characteristics one would expect from this Pipeline are lost.

If you agree to this, I'd be happy to make a PR.

Types Not Visible for Importing

Describe the bug
I'm trying to use sparql-engine in a browser Angular Typescript app. When I try to implement Graph, the Algebra namespace is not visible for importing, resulting in all triples having type any.

The ExecutionContext and PIpelineInput types also require importing from deep within the sparql internals (e.g. 'sparql-engine/dist/engine/context/execution-context')

I have other questions about how to correctly implement Graph, that perhaps can be answered here as well:

  • What is the correct use of ExecutionContext within the find method? Neither of the examples show its use.
  • How do graphs within the execution context relate to the Graph subclass being implemented?

To Reproduce
Steps to reproduce the behavior:

  1. Create an angular app: npm i -g @angular/cli, ng new sparql-app, sparql-app
  2. Install sparql-engine and rdf-lib: npm i --save sparql-engine rdflib, npm i --save-dev @types/rdflib
  3. Implement Graph (see attached source code)
    rdflib-graph.ts.txt
  4. See type errors

Expected behavior
All necessary types should be easily imported from the sparql-engine package.

Desktop (please complete the following information):

  • OS: Mac OSX
  • Version: 0.5.1

Bug in levelgraph adapter (`this` not resolving as expected)

First, thanks for this fantastic project! It is a thing of beauty.

To Reproduce
Steps to reproduce the behavior:

  1. Use the levelgraph adapter (from the examples directory)
  2. Do any update operation (the test program only does a read operation)

Expected behavior
Performs update operation

Actual behavior

TypeError: Cannot read property '_db' of undefined

Additional context

The reason for the error is that the insert and delete operations use function expressions in the Promise constructors, which changes the context of this resolution.

I am submitting a PR momentarily.

Add support for official W3C SPARQL compliance tests

Here we are: we now have all the features required to match SPARQL 1.1 compliance. However, we need to run the official benchmark for W3C SPARQL compliance on the framework in order to claim such compliance.

To do so, we need a few things:

  • A runner that can automatically load W3C SPARQL compliance tests and execute them using sparql-engine.
  • A method for comparing sparql-engine query results with the expected test results.

For me, the main issue is that the W3C SPARQL compliance tests rely on XML output files to compare the expected output of query execution, and the syntax can be very permissive.

Q: Setting up federated queries to existing SPARQL endpoints

Is there an easy way to set up federated queries to existing SPARQL endpoints? From what I have seen, I cannot simply implement find() based on SPARQL endpoints, so I came up with the following (which has a few problems):

const { Graph, Pipeline, BindingBase } = require('sparql-engine')
const fetch = require('node-fetch')

function formatTriple (triple) {
  return [triple.subject, triple.predicate, triple.object].join(' ')
}

function formatQuery (triples) {
  return `SELECT * WHERE {
  ${triples.map(formatTriple).join(` .
  `)}  
}`
}

class EndpointGraph extends Graph {
  constructor (iri) {
    super()
    this.iri = iri
  }

  evalBGP (triples) {
    return Pipeline.getInstance().fromAsync(input => {
      
      fetch(this.iri, {
        method: 'POST',
        body: formatQuery(triples),
        headers: {
          'Content-Type': 'application/sparql-query',
          Accept: 'application/json'
        }
      })
        .then(response => response.json())
        .then(response => {
          for (const binding of response.results.bindings) {
            for (const variable of binding) {
              binding[variable] = binding[variable].value
            }

            input.next(BindingBase.fromObject(binding))
          }

          input.complete()
        })
        .catch(error => {
          input.error(error)
        })
    })
  }
}

module.exports = EndpointGraph

/*
dataset.setGraphFactory(iri => {
  return new EndpointGraph(iri)
})
*/

For one thing though, formatQuery() results in queries like this one:

SELECT * WHERE {
  ?rhea http://www.w3.org/2000/01/rdf-schema#subClassOf http://rdf.rhea-db.org/Reaction .
  ?rhea http://rdf.rhea-db.org/ec http://purl.uniprot.org/enzyme/1.17.4.1  
}

This isn't correct as the URIs should have angle brackets. Also, it seems a bit inefficient as it would be a HTTP request per value of ?protein (http://purl.uniprot.org/enzyme/1.17.4.1 in this case). I could make a VALUES but that would mean collecting the BGPs manually, I think. I started using this framework yesterday though so I may be missing something.

OPTIONAL clauses include a redundant result if the value exists

Describe the bug
In an OPTIONAL clause if the result exists, an extra result will be produced containing a non-existent value.

To Reproduce
Steps to reproduce the behavior:

The following query:

SELECT * 
WHERE {
  OPTIONAL {
    VALUES (?s ?p ?o) { ("s" "p" "o") }
  }
}

produces

[ 
  { '?s': '"s"', '?p': '"p"', '?o': '"o"' }, 
  {} 
]

Expected behavior

[ { '?s': '"s"', '?p': '"p"', '?o': '"o"' } ]

Additional context

I've written a test case: https://github.com/dwhitney/sparql-engine/blob/optional-problem/tests/sparql/optional-test.js#L37

I will try and resolve it this weekend.

When a variable is in a `GRAPH` clause, the bound variables are not used

NOTE I have fixed this bug in my fork, but there is a failing test I need to resolve and I have only fixed this for GRAPH clauses. This also needs to work in SERVICE clauses. I will submit a PR when that work is done.

Describe the bug

If you check section 13.3.3 (Restricting Possible GraphIRIs)[https://www.w3.org/TR/sparql11-query/#GraphPattern] it describes and shows an example where a variable is used in a graph clause, like GRAPH ?ppd, where ?ppd is a bound variable. This example won't work in sparql-engine since only named and default graphs are checked for variables.

To Reproduce

You can load the data and query from the example mentioned above.

Expected behavior

The results described in the section should be produced

BIND operator broken since 7.0

Describe the bug
Bind operator working wrong since 7.0

To Reproduce
Just use a query with BIND inside one of the examples (here I chose the one about using a storage based on N3.js).

'use strict'

const { Parser, Store } = require('n3')
const { HashMapDataset, Graph, PlanBuilder } = require('sparql-engine')

// Format a triple pattern according to N3 API:
// SPARQL variables must be replaced by `null` values
function formatTriplePattern (triple) {
  let subject = null
  let predicate = null
  let object = null
  if (!triple.subject.startsWith('?')) {
    subject = triple.subject
  }
  if (!triple.predicate.startsWith('?')) {
    predicate = triple.predicate
  }
  if (!triple.object.startsWith('?')) {
    object = triple.object
  }
  return { subject, predicate, object }
}

class N3Graph extends Graph {
  constructor () {
    super()
    this._store = Store()
  }

  insert (triple) {
    return new Promise((resolve, reject) => {
      try {
        this._store.addTriple(triple.subject, triple.predicate, triple.object)
        resolve()
      } catch (e) {
        reject(e)
      }
    })
  }

  delete (triple) {
    return new Promise((resolve, reject) => {
      try {
        this._store.removeTriple(triple.subject, triple.predicate, triple.object)
        resolve()
      } catch (e) {
        reject(e)
      }
    })
  }

  find (triple) {
    const { subject, predicate, object } = formatTriplePattern(triple)
    return this._store.getTriples(subject, predicate, object)
  }

  estimateCardinality (triple) {
    const { subject, predicate, object } = formatTriplePattern(triple)
    return Promise.resolve(this._store.countTriples(subject, predicate, object))
  }
}

const graph = new N3Graph()
const dataset = new HashMapDataset('http://example.org#default', graph)

// Load some RDF data into the graph
const parser = new Parser()
parser.parse(`
  @prefix foaf: <http://xmlns.com/foaf/0.1/> .
  @prefix : <http://example.org#> .
  :a foaf:name "a" .
  :b foaf:name "b" .
`).forEach(t => {
  graph._store.addTriple(t)
})

const query = `
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  SELECT ?s ?name
  WHERE {
    BIND(<http://example.org#a> as ?s)
    ?s foaf:name ?name .
  }`

// Creates a plan builder for the RDF dataset
const builder = new PlanBuilder(dataset)

// Get an iterator to evaluate the query
const iterator = builder.build(query)

// Read results
iterator.subscribe(bindings => {
  console.log('Found solution:', bindings.toObject())
}, err => {
  console.error('error', err)
}, () => {
  console.log('Query evaluation complete!')
})

Expected behavior
On 6.0 and below, I get the expected result:

Found solution: { '?s': 'http://example.org#a', '?name': '"a"' }
Query evaluation complete!

While on 7.0 and above, I get:

Found solution: { '?s': 'http://example.org#a', '?name': '"a"' }
Found solution: { '?s': 'http://example.org#a', '?name': '"b"' }
Query evaluation complete!

Sparqljs deps is broken.

Describe the bug

Npm install fails

To Reproduce
npm install --save sparql-engine

Expected behavior

An installation

Screenshots

ENOENT: no such file or directory, chmod '.../node_modules/sparqljs/bin/sparql-to-json'

Desktop (please complete the following information):

node v10.15.0 (npm v6.4.1)

Additional context

RubenVerborgh/SPARQL.js#74

Abstract return type from find?

Hey I've been using this the last few weeks and I really love it! I'm wondering how you'd feel about making the return type from "find" on the graph function more abstract. I'd rather not have to add the dependency of asynciterator to my project and if there were just some class or interface to make sure I returned from "find" then I wouldn't have to. Thoughts?

Support for SILENT modifiers with SERVICE queries

Since v0.5.1, SERVICE clauses are automatically executed by the engine.
However, we are still missing support for the SILENT modifier, which removes errors when querying a distant RDF graph with a federated SPARQL query.

This can be done in rxjs using the catchError operator, so we need to add a similar operation to the PipelineEngine API, adn then use it in the ServiceStageBuilder.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.