eclipse-rdf4j / rdf4j Goto Github PK

View Code? Open in Web Editor NEW

345.0 345.0 160.0 568.05 MB

Eclipse RDF4J: scalable RDF for Java

Home Page: https://rdf4j.org/

License: BSD 3-Clause "New" or "Revised" License

HTML 0.42% Java 95.64% Ruby 0.21% CSS 0.14% Shell 0.17% XSLT 0.78% TypeScript 0.31% JavaScript 2.32% Dockerfile 0.01%

hacktoberfest java linked-data rdf semantic-web shacl sparql

rdf4j's People

Contributors

Stargazers

Watchers

Forkers

ansell xquery stain pulquero tpt fedict jgrzebyta sesuncedu rickmoynihan manuelfiorelli avataar josephw wiltonlazary rowhit slylockfox ontotext-ad tfrancart heshanjse champtc catch-point petar-react infgeoax thibaudvibes seralf mfarre the-alchemist yasengmarinov davinash swcbelks anqit lukaszimmermann z30832564 chenying99 dreamflychen aschwarte10 ravindersinghmaan metaphacts andrewruckerjones gphmath pmcb55 johnreedlol ilibx tool-recommender-bot prabusubra manasys mnmercer vincentmigot morostr diffblue-benchmarks woutermont youlei5898y kenwenzel jetztgradnet 13324077 glow-mdsol tonicebrian tqrg-bot daiwt erikgb reeshabhranjan samiyac mirr99 guvi007 hasmac-as knoan 29kritika deepali17043 abhishekag03 tanishapandey sutapa99m vishijain raghav17083 tanya18048 reverb283 navdhaagarwal shubhi0108 sonali18047 sakshisaini17092 yashtomar31 vtanwer14 hochgi elad-shaked ristoautioc gitter-badger henry-lp perfectkb auriga2 alyhdr joevansteen zdkg bjonnh aindlq agentlab jervenbolleman yaxche-io naturzukunft tioannid linnaung woyuanbingbuyuan amivanoff

rdf4j's Issues

Loss of bound values for variables assigned in a SERVICE graphPattern

(Migrated from https://openrdf.atlassian.net/browse/SES-2209 )

Federated queries like the following won't return bound values for variables assignments defined in the SERVICE graph pattern.

{code}
SELECT ?s ?bindingS ?now
WHERE {
SERVICE http://dbpedia.org/sparql {
?s a ?someType .
BIND (?s as ?bindingS)
BIND (now() as ?now)
}
} LIMIT 1
{code}

get approval for ElasticSearch

We have yet to a file a CQ for ElasticSearch library inclusion.

Update design and naming for Server, Console and Workbench apps

We need to remove references to 'Sesame' from the UI, and the logo as well.

possible error parsing long unicode escape sequences

(Migrated from SES-2161)

There's a query parse error with the following query:

{code}
insert data {
urn:alpha urn:beta """\U0001F61F""" .
}
{code}

I am fairly certain this is a valid query, from what I can grok of the spec, that unicode sequence is correct. ARQ also bombs out on this query though, which leaves me with some doubt.

Provide support for SPARQL 1.1 entailment regimes in inferencing stores

(Migrated from https://openrdf.atlassian.net/browse/SES-2178 )

As Jeen raised the point in http://stackoverflow.com/questions/28415722/sparql-1-1-entailment-regimes-and-query-with-from-clause , the inferencing stores implementations predates the http://www.w3.org/TR/sparql11-entailment/ recommendation.

It would be nice to have this recommendation fulfilled for more complete SPARQL 1.1 support.

stabilize build

We need to stabilize the build - there's several things fail after merging in the final sync with the old Sesame repo.

Provide compatibility/transitional modules for old package hierarchy

The goal would be to provide the old package hierarchy with all classes deprecated, and marked as (empty) subclasses of their equivalent.

Support parallel Java 7-compatible release

We are currently focusing on the Sesame 4 code base as the launch point for RDF4J. However there are several core users who require to stay at Java 7 for a while longer. We should consider bringing over the Sesame 2.9 code base to RDF4J to live alongside the main branch, so we can do parallel releases for those users who wish to stick to Java 7.

Transfer open issues from JIRA to Github

We need to migrate open JIRA issues from our old JIRA issue tracker to GitHub. Anybody know any good tools for this?

Clean up and simplify Javadoc

The current RDF4J Javadoc is massive and quite hard to find your way in. We should try and do some simplifications to make it easier to browse. Things to think of:

remove AST classes from class overview
remove "internal" packages from general overview
reorganize to have user-facing interfaces/package most prominently featured on main page

Provide utility methods for IRI validation

(Migrated from https://openrdf.atlassian.net/browse/SES-2191 )

We should provide utility methods to allow checking that a given string is a valid IRI according to RFC3987.

Verify versions of 3rd party libraries used

Should we rewrite @since tags?

Should we rewrite @SInCE tags in Javadoc to point to the new versioning scheme (or keep them but rewrite them with "sesame-2.7.0" or similar? Alternative could be to remove all of the @SInCE tags and start afresh.

Set up CI server

Either (temporarily) using ci.rdf4j.net, or looking into using an Eclipse-hosted environment

Add configurable size limit on request payload

(Migrated from https://openrdf.atlassian.net/browse/SES-2218 )

To guard against server-side OoM errors (especially when processing SPARQL update requests) the RDF4J Server should support a configurable size limit for the payload.

update SPI services entries

These all need to be modified to refer to correct new package naming.

Parser produces non-canonical integer when parsing math '+' expression

(Migrated from https://openrdf.atlassian.net/browse/SES-2234 )

This query:

{code}
PREFIX ex: ex:

ASK WHERE {
?this ex:score ?score .
FILTER (!(?score+5 != 0)) .
}
{code}

produces the following algebra expression:

{code}
Slice ( limit=1 )
Filter
Not
Compare (!=)
MathExpr (+)
Var (name=score)
ValueConstant (value="+5"^^http://www.w3.org/2001/XMLSchema#integer)
ValueConstant (value="0"^^http://www.w3.org/2001/XMLSchema#integer)
StatementPattern
Var (name=this)
Var (name=_const-313ecd0b-uri, value=ex:score, anonymous)
Var (name=score)
{code}

The value constant representing the integer 5 incorrectly has a '+' sign prepended - presumably because the parser incorrectly processes the + math operator as part of the integer value.

Although this causes no problems in normal operation of the SPARQL engine, it is an issue in work by [~pulquero] on a SPIN engine.

Statement's context is not justified by SPARQLConnection#add

(Migrated from https://openrdf.atlassian.net/browse/SES-2175 )

If Statement has associated context it will be ignored by SPARQLConnection#add method. User need to provide explicit context to add method to make it work.

The cause is inside SPARQLConnection#createInsertDataCommand, this method just ignores associated Statement's context.

XMLDatatypeUtil.isValidValue() doesn't validate xsd:anyURI

(Migrated from https://openrdf.atlassian.net/browse/SES-2226 )

Example from taken from http://www.datypic.com/sc/xsd/t-xsd_anyURI.html

new java.net.URI("http://datypic.com#f% rag")

throws "java.net.URISyntaxException: Malformed escape pair at index 20: http://datypic.com#f% rag"

where as:

XMLDatatypeUtil.isValidValue("http://datypic.com#f% rag", XMLSchema.ANYURI)

returns true.

Looking at the source for isValidValue there is no case to validate XMLSchema.ANYURIs, is this deliberate or simply an omission?

Support BIND in SparqlQueryRenderer

(Migrated from https://openrdf.atlassian.net/browse/SES-2206 )

SPARQL 1.1 constructs cannot be rendered via the SparqlQueryRenderer class.

Subtasks

support BIND
migrate metaphacts internal query renderer to RDF4J (#3012 )
Integrate experimental new renderer with existing renderer code (#3041)
support aggregates (#3000)
support subqueries (#3001)

Examples

For example If I parse this query with QueryParserUtil.parseQuery:

SELECT \* WHERE {
  ?s ?p ?o .
  BIND(uri("http://test-graph.com/") AS ?foo) .
}

And then render the ParsedQuery back out again with SPARQLQueryRenderer, it appears to lose the binding clause, returning the following string:

select ?g ?s ?p ?o ?g2
where {
  GRAPH ?g {
    ?s ?p ?o.
}}

Looking at renderTupleExpr in SparqlTupleExprRenderer I can see that the following lines are commented out:

    // aRenderer.mProjection = new ArrayList<ProjectionElemList>(mProjection);
    // aRenderer.mDistinct = mDistinct;
    // aRenderer.mReduced = mReduced;
    // aRenderer.mExtensions = new HashMap<String, ValueExpr>(mExtensions);
    // aRenderer.mOrdering = new ArrayList<OrderElem>(mOrdering);
    // aRenderer.mLimit = mLimit;
    // aRenderer.mOffset = mOffset;

With the following commented out in SPARQLQueryRenderer:

                    // SPARQL does not support this, its an artifact of copy and
                    // paste from the serql stuff
                    // aQuery.append(mRenderer.getExtensions().containsKey(aElem.getSourceName())
                    // ?
                    // mRenderer.renderValueExpr(mRenderer.getExtensions().get(aElem.getSourceName()))
                    // : "?"+aElem.getSourceName());
                    //
                    // if (!aElem.getSourceName().equals(aElem.getTargetName()) ||
                    // (mRenderer.getExtensions().containsKey(aElem.getTargetName())
                    // &&
                    // !mRenderer.getExtensions().containsKey(aElem.getSourceName())))
                    // {
                    // aQuery.append(" as ").append(mRenderer.getExtensions().containsKey(aElem.getTargetName())
                    // ?
                    // mRenderer.renderValueExpr(mRenderer.getExtensions().get(aElem.getTargetName()))
                    // : aElem.getTargetName());
                    // }

I believe these lines are commented out in error and that they should be commented back in in order to get be able to round trip queries from SPARQL text into the AST and back out again.

Other SPARQL 1.1 queries that fail include:

SELECT (COUNT (*) as ?c) WHERE { ?s ?p ?o }

is rendered as

select ?c where { ?s ?p ?o }

and the query

SELECT (?p as ?x) WHERE { ?s ?p ?o }

is rendered as

select ?p WHERE { ?s ?p ?o }

Results of aggregations in service calls are not included in the inner query projections

(Migrated from https://openrdf.atlassian.net/browse/SES-2189 )

This is simple to reproduce. I installed RDF4J into Tomcat and created a new in-memory repository called "test".
Add the following triples:

<http://example.org/a> <http://example.org/value> 1 .
<http://example.org/b> <http://example.org/value> 2 .

Running this query returns the value "3" as expected.

SELECT (SUM(?value) AS ?total) {
  ?s <http://example.org/value> ?value
}

Now, create a second in-memory repository called "test2".
Running this query from that repository returns a blank value.

SELECT ?total {
    SERVICE <http://localhost:8080/openrdf-sesame/repositories/test> {{
        SELECT (SUM(?value) AS ?total)  {
            ?s <http://example.org/value> ?value
        } 
    }}
}

By turning debug logging I was able to see the query being sent to "test".

[DEBUG] 2015-02-27 11:38:31,682 [http-bio-8080-exec-7] path info: /test
[DEBUG] 2015-02-27 11:38:31,682 [http-bio-8080-exec-7] repositoryID is 'test'
[DEBUG] 2015-02-27 11:38:31,682 [http-bio-8080-exec-7] queryLn="SPARQL"
[DEBUG] 2015-02-27 11:38:31,682 [http-bio-8080-exec-7] query="PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX sesame: http://www.openrdf.org/schema/sesame# PREFIX owl: http://www.w3.org/2002/07/owl# PREFIX xsd: http://www.w3.org/2001/XMLSchema# PREFIX fn: http://www.w3.org/2005/xpath-functions# SELECT  ?s ?value WHERE { {
        SELECT (SUM(?value) AS ?total)  {
            ?s http:/example.org/value ?value
        } 
    } }"
[DEBUG] 2015-02-27 11:38:31,682 [http-bio-8080-exec-7] infer="true"

Scrolling to the right, you can see that although ?value is included in the projection, ?total from within the aggregation is not. As a workaround, I added an additional inner select to ensure ?total is projected:

SELECT ?total {
  SERVICE http://localhost:8080/openrdf-sesame/repositories/test {{
    SELECT ?total {  
      {
        SELECT (SUM(?value) AS ?total)  {
          ?s http:/example.org/value ?value
        }
      }
    }
  }}
}

Decide on version number for initial release

We need to make a decision on what version number to use for the initial RDF4J release. There are, roughly, three options:

continue where Sesame left off: Sesame 4.1.0 -> RDF4J 4.1.0.
jump a minor version after the last Sesame release: Sesame 4.1.0 -> RDF4J 4.2.0.
reset. RDF4J 1.0.0.

Advantage of the first option is that it's a more 'gradual' transition. Potential downside is that it suggests it's not the first Eclipse RDF4J release.

Advantage of the second option is that it is more clear that there may be compatibility problems between the last Sesame release and the first RDF4J release.

Advantage of the last option is that we get to start fresh. Downside is that it's not obvious how this release relates to existing Sesame releases.

No matter what we choose, we will always need to provide accompanying upgrade notes anyway.

get approval for Solr inclusion

We have yet to file a CQ for inclusion of the Solr library.

Query results download: Provide query results page with a hidden form for sending long query text

(Migrated from https://openrdf.atlassian.net/browse/SES-2229 )

The fix for SES-1995 is less than ideal when dealing with a results page coming directly from the query page POSTing a long query (>~ 1k characters). It requires working around by saving the long query on the server.

However, the query text is actually present in the cookies along with the other parameters needed to specify the query. These cookies could be copied into a hidden form at page load, then the Download link would perform its request as a form POST, getting around the URL character limit.

Incorrect variable bindings with subqueries

(Migrated from https://openrdf.atlassian.net/browse/SES-2248 )

Hi Jeen,

I could reproduce the behaviour for [https://openrdf.atlassian.net/browse/SES-2099] with a much simpler query; you'll see that each result binds ct01 with a different bnode.


SELECT * WHERE {
    BIND (bnode() as ?ct01) 
    { SELECT ?s WHERE {
            ?s ?p ?o .
      }
      LIMIT 10
    }
}

If I'm not mistaken, the query should be equivalent with this one which actually works as expected :

SELECT * WHERE {
    BIND (bnode() as ?ct01) 
    ?s ?p ?o .
}
LIMIT 10

meaning the algebra should first create a SingetonSet then extend it with the BIND and only then do the join so the variable ct01 should be bound to the same bnode for each result of the subquery.

So it seems that evaluating the subquery first (which is indeed required by the recommendation) does not respect the evaluation or join orders of the preceding graph patterns.

rename OpenRDFException

Possible candidates are RDFException or RDF4JException. I personally prefer the first since it's shorter. OpenRDFException should remain as a deprecated class for backward compatibility.

possible error parsing long unicode escape sequences

(Migrated from https://openrdf.atlassian.net/browse/SES-2161 )

There's a query parse error with the following query:

{code}
insert data {
urn:alpha urn:beta """\U0001F61F""" .
}
{code}

I am fairly certain this is a valid query, from what I can grok of the spec, that unicode sequence is correct. ARQ also bombs out on this query though, which leaves me with some doubt.

SPIN compliance tests are slow and unstable

The SPIN compliance tests severely slow down the build (just these tests take almost 45 minutes to run on our HIPP), and moreover they are unstable: in several builds the testOrderByQueriesAreInterruptable test intermittently fails.

We should temporarily disable these compliance tests from the normal build process and only (manually) execute them when changes are made to the SPIN modules.

Set up upload of artifacts to maven central

Current maven configuration still relies on old Sesame project settings for syncing with sonatype OSS (and from there syncing to maven central). This needs to be tweaked/reconfigured according to what Eclipse projects do for maven artifact deployment.

Clean up maven project config where necessary

Basic housecleaning aimed at getting the maven lifecycle to run more smoothly in combination with M2E.

File format autodetect does not work in 'Add RDF' screen

(Migrated from https://openrdf.atlassian.net/browse/SES-2185 )

When uploading a file through the "Add RDF" screen, the (autodetect) option is supposed to determine the correct format and select the right parser. However, this does not work. In the current system, for any format other than RDF/XML, file upload with autodetect results in an error "Content is not allowed in prolog. [line 1, column 1] "

Only after explicitly selecting the correct format from the dropdown does file upload work.

Add a CONTRIBUTING.md file to display the guidelines when opening issues or pull requests

Eclipse recommends that contributors be shown or directed to the following text when attempting to make a contribution:

Before your contribution can be accepted by the project, you need to create and 
electronically sign the Eclipse Foundation Contributor License Agreement (CLA) and sign 
off on the Eclipse Foundation Certificate of Origin. 

For more information, please visit

http://wiki.eclipse.org/Development_Resources/Contributing_via_Git

This can be done for GitHub issues and pull requests by adding a file to the repository named either CONTRIBUTING or CONTRIBUTING.md:

https://help.github.com/articles/setting-guidelines-for-repository-contributors/

Use try-with-resources for RepositoryConnection uses

(Migrated from https://openrdf.atlassian.net/browse/SES-2194 )

Since RepositoryConnection now extends AutoCloseable, it is a valid candidate for use in try-with-resources.

We should simplify our internal code based on this to reduce the number of finally blocks that we need to maintain.

FederationSail fails on SPARQL 1.1 query test subquery/sq14

The test evaluates the handling of a limit on a subselect in a larger CONSTRUCT query. Failure is specific to FederationSail as the same test succeeds on other store types.

BIND with type errors result leads to cross joins instead of empty result

(Migrated from https://openrdf.atlassian.net/browse/SES-2250 )

When using a BIND variable in a pattern join, the result is a cross join of the dataset triple when the BIND expression raises a type error.

This query should expose the behavior on any store that contains blank nodes.
{code}
SELECT *
WHERE {
?s ?p ?o .
FILTER(isBlank(?o))
BIND (iri(?o) as ?s2)
?s2 ?p2 ?o2 .
} LIMIT 10
{code}

The join evaluation should normally conclude that both multisets are incompatible since ?s2 is unbound in the join's left argument so the query should normally return no result.

fix assembly setup and distribution file naming

Creation of the SDK distro files still uses org.openrdf and sesame in places.

MapDB-based SAIL

(Migrated from https://openrdf.atlassian.net/browse/SES-2227 )

Investigate creation of a new persistent RDF store using MapDB - possibly as a replacement/alternative for the native store.

RDF4J POMs do not play nicely with gradle

(Migrated from https://openrdf.atlassian.net/browse/SES-2168 )

afaict, there's no way for gradle projects to pull down sesame artifacts from maven central.

I am admittedly still new to gradle, so i might have overlooked something obvious, but i think the fact that some dependencies are unversioned and others use variables, is problematic for gradle when trying to resolve that dependency.

If you look at [http://repo1.maven.org/maven2/org/openrdf/sesame/sesame-model/2.7.14/sesame-model-2.7.14.pom] you can see that junit has no scope or version, and that the sesame-util uses variable placeholders.

Trying to grab that artifact via
{code}
compile ("org.openrdf.sesame:sesame-model:2.7.14")
{code}

will yield:

{code}

Could not resolve org.openrdf.sesame:sesame-model:2.7.14.
Required by:
com.complexible.stardog.openrdf-utils:openrdf:2.2.4
Could not parse POM https://repo1.maven.org/maven2/org/openrdf/sesame/sesame-model/2.7.14/sesame-model-2.7.14.pom
> Unable to resolve version for dependency 'junit:junit:jar'
{code}

I'm not a maven guru either, but I thought these, while legal, are not recommended.

As an aside, this works fine using Ivy to resolve the exact same dependency, and I'm assuming it works fine with Maven. So I think only gradle users are affected.

I know Jeen is mucking about with the maven stuff atm, it would be nice if this could be resolved as well.

Update project documentation and website

The project documentation and website at http://rdf4j.org/ will need to be updated to reflect the changes from Sesame to RDF4J. In particular, we'll need:

remove logo
rename "(OpenRDF) Sesame" to "(Eclipse) RDF4J"
remove obsolete copyright notices
update documentation to use new package names, project names, etc.

Allow SAIL to inspect/process unparsed query at prepareQuery stage

(Migrated from https://openrdf.atlassian.net/browse/SES-2162 )

The current SAIL interface assumes it gets passed a TupleExpr (that is, an algebra representation of a query), and currently this is handled by SailRepositoryConnection.prepareQuery, which passes the query string to RDF4J query parser and produces a TupleExpr.

However, some SAIL implementation prefer to do their own parsing and/or prefer not to base their query evaluation on RDF4J's algebra model. To facilitate this, we should do a pass-down of the query string at the prepare stage, which allows a SAIL to (optionally) process or wrap the query in such a way that the RDF4J query parser is bypassed and the SAIL implementation can opt to use a completely independent parser and query engine.

Review and edit Javadoc

The Javadoc still contains references to 'org.openrdf' and 'sesame' in many places. This needs to be reviewed and edited.

reintegrate test case data from W3C

Current integration tests fail because the W3C test case data was not included in the initial contrib. We need to re-integrate this.

Fix test failures

Hudson build is currently failing with test failures. We need to get the build stabilized ASAP.

Run code formatting over entire master branch

We should run code formatting with rdf4j settings over the entire master branch, so that the code base is consistently well-formatted again.

Add templates for pull request opening

GitHub released a new feature enabling a template to be created as the basis for new issues and pull requests. This is more visible than the guidelines for contributing as it is inserted into the comment for each pull request when it is opened, so may be useful to add support for.

https://github.com/blog/2111-issue-and-pull-request-templates

Rename datadirs

Sesame datadirs are by defaults stored in $APP_DIR/Aduna/OpenRDF Sesame or something along those lines. This needs to be modified to something simpler. A preference is to have a root dir $APP_DIR/RDF4J/ with subdirs for the various RDF4J applications: RDF4J/Server RDF4J/Workbench, etc.

In addition, we should provide a conversion method that allows users to migrate their existing data to the new dir structure. This should either be a separate script (so that users can choose to run it), or an automated one-time migration, with a preference for the former (an automated procedure can cause problems if the datadirs are sufficiently large).

Modify data dir structure in code
provide method to easily migrate existing datadir

design new logo

We need a new logo (and house style) for the rdf4j project, to visually distinguish ourselves from the 'old' Sesame project. This issue can be used to propose and discuss designs.

Remove legacy lucene versions

lucene sails for lucene 3 and 4 need to be removed from the code base.

Reintegrate YASQE and CodeMirror

YASQE and its dependency CodeMirror were excluded from intial code contribution and treated as third-party dependencies. We need to reintegrate this code.

The CQ for YASQE (v 2.7.2) is https://dev.eclipse.org/ipzilla/show_bug.cgi?id=10646 .

The CQ for CodeMirror (v 4.13) is https://dev.eclipse.org/ipzilla/show_bug.cgi?id=10573 .

SPARQL endpoint implementation should treat update sequences as atomic

The current SPARQL endpoint implementation handles update sequences by sending them down to the underlying Repository. Since at the level of SPARQL protocol no transactions are supported, this effectively means that transaction handling is left to the Repository API.

The Repository API handles SPARQL update sequences by treating each operation in the sequence as a separate update, which is conform the SPARQL 1.1 Update specification (section 3):

Implementations MUST ensure that the operations of a single request are
executed in a fashion that guarantees the same effects as executing them
sequentially in the order they appear in the request.

In effect the SPARQL endpoint implementation handles update sequence requests as several transactions. The SPARQL spec, however, also has the following soft requirement (see section 2.2):

SPARQL 1.1 Update requests are sequences of operations. Each request SHOULD
be treated atomically by a SPARQL 1.1 Update service. The term 'atomically'
means that a single request will result in either no effect or a complete
effect, regardless of the number of operations that may be present in the
request.

While the current implementation does not break the spec, it does deviate from this recommended pattern. To change this, we should add a flag to the RDF4J REST protocol that allows our service implementation to distinguish between requests coming from a SPARQL endpoint client, and requests coming from an RDF4J client. In the former case, the service can choose to explicitly start a transaction before executing the sequence, so that the sequence is treated as an atomic update.