Comments (7)
Note that the GTAA service layer also provides score information, so that results can be relevance ranked.
Thanks, @wmelder, that’s the hint that solved our issue: the GeoNames query wasn’t sorting results by relevance score yet. Fixed in #1066.
from network-of-terms.
it should expand the search query with an AND clause when multiple terms are used.
We already do so for ?virtuosoQuery
; you can try using that in your SPARQL query from the NoT, although the backend seems to be Fuseki rather than Virtuoso.
We could also add a third query variable besides ?query
and ?virtuosoQuery
. Something like ?intersectionQuery
, which joins all query words with AND
but does not quoting like ?virtuosoQuery
does.
from network-of-terms.
I think it would indeed be very useful to add to add another variable like that. Treating the phrase as "OR"ed words seems to be default behaviour of the Jena Full Text Search. We have a similar problem with the current Geonames implementation. Searching for "Sas van Gent" returns 195 results, all most all of them are irrelevant. Rewriting the query to "(sas AND gent AND van)"
returns the two relevant ones, see https://api.triplydb.com/s/8DBt_82Qy.
Note: The GTAA also uses the Jena Full Text Search but does not have this problem, searching for Wim de Bie returns the correct results. Maybe we should approach @wmelder to asks for details on his config settings?
from network-of-terms.
Thanks for signalling this @EnnoMeijers. I agree, using an AND clause for multiple terms should preferrably be standard.
@ddeboer what would be the optimal solution in your opinion? Would adding ?virtuosoQuery
to the queries work for all datasets?
from network-of-terms.
Sounds familiar, the problem of too many hits with full text search. The GTAA settings for the Lucene index here.
The service points to the text datastore instead of the triple datastore. The text datastore points to an index and the triple datastore.
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb1: <http://jena.hpl.hp.com/2008/tdb#> .
@prefix tdb2: <http://jena.apache.org/2016/tdb#> .
@prefix text: <http://jena.apache.org/text#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix : <#> .
[] rdf:type fuseki:Server .
<#service_public> rdf:type fuseki:Service ;
fuseki:name "public" ;
fuseki:label "Service Layer Triple Store (for public access)" ;
fuseki:serviceQuery "query" , "sparql" ;
fuseki:serviceReadGraphStore "get" ;
fuseki:dataset <#text_public> ;
fuseki:serviceUpdate "update" ;
fuseki:serviceUpload "upload" ;
fuseki:serviceReadWriteGraphStore "data" ;
.
<#text_public> rdf:type text:TextDataset ;
text:dataset <#tdb_public> ;
text:index <#index_public> ;
.
<#index_public> rdf:type text:TextIndexLucene ;
text:directory "/opt/fuseki/databases/public_text" ;
text:entityMap <#entity_map_public> ;
text:analyzer [
rdf:type text:ConfigurableAnalyzer ;
text:tokenizer text:StandardTokenizer ;
text:filters (text:ASCIIFoldingFilter text:LowerCaseFilter)
] ;
.
<#entity_map_public> rdf:type text:EntityMap ;
text:defaultField "prefLabel" ;
text:entityField "uri" ;
text:uidField "uid" ;
text:map (
[
text:field "prefLabel" ;
text:predicate <http://www.w3.org/2004/02/skos/core#prefLabel>
]
[
text:field "altLabel" ;
text:predicate <http://www.w3.org/2004/02/skos/core#altLabel>
]
[
text:field "hiddenLabel" ;
text:predicate <http://www.w3.org/2004/02/skos/core#hiddenLabel>
]
) ;
.
<#tdb_public> rdf:type tdb1:DatasetTDB ;
tdb1:location "/opt/fuseki/databases/public" ;
ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "120000" ] ;
.
Note that the GTAA service layer also provides score information, so that results can be relevance ranked.
Please let me know if you need more information.
from network-of-terms.
Ah, thanks for sharing @wmelder! It looks quite similar to the Geonames configuration that we currently use. I can't explain the differences in behaviour for searching between these two, can you @ddeboer?
from network-of-terms.
Using the ?virtuosoQuery
var instead of the regular ?query
var in the sparql query seems to fix the problems with the jena full text search. Updated the geonames.rq
accordingly, see #1065
from network-of-terms.
Related Issues (20)
- Add licence section to NDE terminology requirements/ datasets? HOT 3
- Support a transparent Lookup function HOT 5
- Local docker build fails HOT 3
- running tests in docker dev environment fails HOT 8
- Count number of queries not using OPTIMIZED HOT 2
- Change default query mode to OPTIMIZED HOT 2
- Change ports to prevent conflict with network-of-terms-demo HOT 4
- Restart dev server on catalog change HOT 2
- Document language of sources HOT 9
- Fewer results when using bif:contains text indexing HOT 2
- Add ELSST Thesaurus HOT 5
- Lookup on Brabantse Gebouwen URI's gives no results HOT 3
- Add RKD birth and death date to skos:scopeNote HOT 1
- Consider using skos:definition HOT 2
- Return dataset usage for NoT datasets HOT 1
- Include English preflabel when searching Wikidata entities HOT 5
- GTM query is broken HOT 4
- Add linkedart:equivalent to rkd query for skos:exactMatch HOT 3
- Add parameters to search queries
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from network-of-terms.