hbz / lobid-gnd Goto Github PK
View Code? Open in Web Editor NEWUI and API to the Integrated Authority File (Gemeinsame Normdatei, GND)
Home Page: http://lobid.org/gnd
License: Eclipse Public License 2.0
UI and API to the Integrated Authority File (Gemeinsame Normdatei, GND)
Home Page: http://lobid.org/gnd
License: Eclipse Public License 2.0
See e.g. http://test.lobid.org/gnd/search?q=*+AND+type:%22CorporateBody%22 where all GND resources are listed with those on top that contain "type" and something similar to "CorporateBody".
If an authority resource is covered in lobid-resources either in the contribution array or in the subject array, we should show links underneath its entry titled:
Clicking on the link will open another browser tab/window with the respective result list.
Copied from lobid/lodmill#75.
To improve disambiguation during auto-suggest selection we should add the profession to auto-suggest labels on the
/person
endpoint. The information is available through the GND links, but we have to resolve the labels
At some point, we should switch lobid-authorities to public beta:
beta
" to the main navigation at http://lobid.orgPerhaps we could add a beta
label for the issues that need to be solved before we do so.
@acka47 What do you think?
Like Erscheinungsjahr in lobid-resources: http://lobid.org/resources/search
From #1 (comment):
Re. the framing output from http://tinyurl.com/ychm4t92, I just noticed that blank nodes get an
id
:"hasGeometry": { "@id": "_:b0", "@type": "http://www.opengis.net/ont/sf#Point", "asWKT": "Point ( -000.125740 +051.508530 )" }We should get rid of them. This has already been addressed in the JSON-LD Framing spec 1.1 ("pruneBlankNodeIdentifiers") but is currently only implemented in the Ruby library, see json-ld/json-ld.org#293.
We consistently use label/altlabel in lobid-resources and lobid-organisations and, thus, should probably also use it in lobid-authorities...
We decided to rename the service to lobid-gnd.
I will provide a label per type in a later comment to this issue.
Moving from old lobid-gnd repo, issue by @acka47.
We will use short keys instead of the URIs for classes/types in GND API 2.0. Thus, we will need these keys be defined in the @context
. At best, we will generate this (half-)automatically from the GND ontology.
We already did some half-automatic generation of the GND context, see lobid/lodmill#251 & https://gist.github.com/niklasl/2770154#gistcomment-946311.
I once again used Niklas' python tool and created the JSON-LD context. Afterwards a bit of manual post-processing was necessary as well as addition of owl:sameAs
and foaf:page
and the deletion of deprecated properties and classes. The result can be found at .https://gist.github.com/acka47/98035a3f215c783bdc00.
Leaving this open (and renaming the ticket) until the context is published.
For example:
Probably the best way to handle this would be to use elasticsearch filters instead of queries for facetting, see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html (as we do in OER World Map).
BTW, we do it like this in all our services...
For example, http://lobid.org/gnd/2047974-8 makes problems. We might show an image – if available – instead of a second column (see #23).
When there are lots of variant names it doesn't look very good, e.g. https://test.lobid.org/gnd/1083009273
If we show the field at all we should hide it by default like so:
e.g. /authorities/5502844-5
People should get suggestions when typing a search time.
...while they are actually subclasses of "Differentiated Person".
The GND relase 2018.02 on 2018-05-15 will bring some additional sub-classes with it. From the announcement:
3. Änderungen in der GND-Ontologie
Die Klasse „gndo:CorporateBody“ erhält vier neue Unterklassen:
„gndo:Company“, „gndo:MusicalCorporateBody“, „gndo:ReligiousCorporateBody“
und „gndo:ReligiousAdministrativeUnit4. Änderungen in der GND-Konversion
Vier weitere Unterarten von Körperschaften („gndo:CorporateBody“)
werden explizit mit rdf:type als solche ausgezeichnet. Es handelt sich um die
Klassen „gndo:Company“, „gndo:MusicalCorporateBody“,
„gndo:ReligiousCorporateBody“ und „gndo:ReligiousAdministrativeUnit“. Sie sind
jeweils Unterklassen (rdfs:subClassOf) von „gndo:CorporateBody“, vgl. 3. Abschnitt
zur GND-Ontologie. Wie in dem Fall üblich, wird auf eine weitere explizite
Zuordnung zur Oberklasse „gndo:CorporateBody“ verzichtet.Beispiele in der Testdatei:
http://d-nb.info/gnd/6518137-2, http://d-nb.info/gnd/111260619X,
http://d-nb.info/gnd/2001043-6
Test indexing the format from #1 in Elasticsearch.
ConferenceOrEvent
mit preferredNameForTheCorporateBody
: http://test.lobid.org/authorities/search?q=type:ConferenceOrEvent+AND+preferredNameForTheCorporateBody:*CorporateBody
mit preferredNameForTheConferenceOrEvent
: http://test.lobid.org/authorities/search?q=type:CorporateBody+AND+preferredNameForTheConferenceOrEvent:*DifferentiatedPerson
mit preferredNameForTheCorporateBody
: http://test.lobid.org/authorities/search?q=type:DifferentiatedPerson+AND+preferredNameForTheCorporateBody:*CorporateBody
mit preferredNameForThePerson
: http://test.lobid.org/authorities/search?q=type:CorporateBody+AND+preferredNameForThePerson:*PlaceOrGeographicName
mit preferredNameForTheCorporateBody
: http://test.lobid.org/authorities/search?q=type:PlaceOrGeographicName+AND+preferredNameForTheCorporateBody:*ConferenceOrEvent
mit preferredNameForTheSubjectHeading
: http://test.lobid.org/authorities/search?q=type:ConferenceOrEvent+AND+preferredNameForTheSubjectHeading:*SubjectHeadingSensoStricto
mit preferredNameForTheCorporateBody
: http://test.lobid.org/authorities/search?q=type:SubjectHeadingSensoStricto+AND+preferredNameForTheCorporateBody:*gndo:academicDegree
wird sowohl mit URI (z.B. 100095089) als auch mit Strings (z.B. 100072488) benutzt, obwohl im Oktober 2013 angekündigt wurde, dass die Property nur noch als datatype property benutzt werden würde.As resources only have one preferred name, an array isn't needed or might be confusing. Also, we don't have an array for label
in lobid-resources so this would be consistant with our other services.
PlaceOfGeographicname
if easyWe have to automate our data transformation setup for production mode:
Both could be triggered via curl POSTs from a cron job (like in lobid-organisations). Calling the transformation from within the web app would result in logging to the application.log (currently calling classes with main methods manually for transformations).
Server side setup should probably wait until https://github.com/hbz/lobid-webserver/issues/5 is done.
See http://www.dnb.de/DE/Service/DigitaleDienste/OAI/oai_node.html
See also implementation in lodmill and lobid-organisations.
Currently I always get a maximum of ten results, no matter of what I set the size
parameter to (see e.g. http://test.lobid.org/authorities/search?q=academicDegree:*&size=19).
Currently, we only allow filtering by first-level types Person
, CorporateBody
, ConferenceOrEvent
, Work
, PlaceOrGeographicName
, SubjectHeading
, Family
.
One adjustment should be that the filter section adjusts when clicking on a type: Either the not-clicked first-level types shoud disappear (and the sub-types should be shown, see below) or the not-clicked types should be greyed out.
We should also enable filtering by the subtypes in these two ways:
From #1:
We don't want specific name properties like
preferredNameForThePlaceOrGeographicName
andvariantNameForThePlaceOrGeographicName
. For all entities, we should just usepreferredName
andvariantName
.
This will allow us to query the whole data in a uniform way. (The type is made clear by other means so that we don't need the specific properties.)
e.g. for 4074335-4
E.g. God & spirit, example http://test.lobid.org/gnd/1024589951
e.g. to search for other übernatürliche Wesen from http://test.lobid.org/authorities/4021469-2.html
or who else is acquainted with Charlotte von Stein in http://test.lobid.org/authorities/118540238.html
Suggested by @jschnasse, e.g. for http://test.lobid.org/authorities/118540238
We should add the date of our latest pull from the GND. So that users don't have to pull all GND files.
Even if the DNB adds the modification date from PICA it wouldn't be optimal as it's changed even if the postbox is used and nothing changes on the norm data itself.
With the new release on 2018-01-16, there will be somenew properties used in GND data, see the news Änderungen im Format RDF ab 16. Januar 2018 (Export-Release 2018.01), pages 1-3. We will have to update the JSON-LD context to accomodate those changes.
Four abbreviatedName properties:
Twelve properties from AgRelOn:
For getting labels for URIs like http://d-nb.info/standards/vocab/gnd/geographic-area-code#XA-DE
Moved here with some adjustments from lobid/lodmill#447.
Example query: "heinsberg"
Facets/possibilities for filtering by secend-level types:
(Similar to e.g. https://portal.dnb.de/opac.htm?method=simpleSearch&query=heinsberg)
Clicking a facet should yield filtered search results with sub-facets, e.g.:
Place or Geographic Name:
Currently, it looks like an "edit" button.
From #1 (comment):
Furthermore, we should have a type from the second level of GND ontology attached to each resource. We will need this for facetting. GND ontology has three levels in its type hierarchy (except for Person, where we have a fourth one added). see the overview over the GND class hierarchy at https://wiki1.hbz-nrw.de/x/CIeW. In the concrete example,
PlaceOrGeographicName
should be in the data.
For example http://test.lobid.org/authorities/118506560.
Currently looks like:
{
"professionOrOccupation": [
"http://d-nb.info/gnd/4131406-2",
"http://d-nb.info/gnd/4012434-4"
]
}
With added labels:
{
"professionOrOccupation":[
{
"id":"http://d-nb.info/gnd/4131406-2",
"preferredName":"Pianist"
},
{
"id":"http://d-nb.info/gnd/4012434-4",
"preferredName":"Dirigent"
}
]
}
However, there are several other cases (e.g. placeOfDeath
, relatedWork
, familialRelationship
) where we only have internal GND links without labels in the data. Thus, a general approach would probably make sense to automatically add a label for all URIs from the http://d-nb.info/gnd/
namespace. (If we implement a general approach, it should be possible to exclude some properties from this label enrichment, e.g. we might get problems with trying to fetch the label for deprecatedUri
and also labels for sameAs
links within GND aren't necessary I think.) What do you think, @fsteeg?
Search yields weird results, e.g.:
@context
is not_analyzed
)This probably has #24 (plus the addition of other labels) as prerequisite.
Currently, the search result list shows name, some variant names and the GND ID with a link to the entry at the DNB. With this ticket we should discuss and implement a more useful result list.
Here are some ideas:
Person
, show occupation (professionOrOccupation
) and birth/death year.ConferenceOrEvents
, show place (placeOfConferenceOrEvent
) and date (dateOfConferenceOrEvent
. (May not be necessary as this is also appended in parentheses to the preferred name.)CorporateBody
show placeOfBusiness
(more than half of the 1.4 Million corporate bodies have that info).Work
we definitely need firstAuthor
/firstComposer
,PlaceOrGeographicName
show geographicAreaCode
SubjectHeading
show gndSubjectCategory
Copied from lobid/lodmill#468, originally opened by @nichtich. We should consider this for the new implementation of GND lookup. (I think, @jschnasse also mentioned this.)
For simple lookup it would be nice to add
format=suggest
in OpenSearch Suggestions format. Formatshort
returns a plain array of strings, e.g. http://api.lobid.org/person?name=Marx&format=short
[
"Marx, Antônio Augusto (1919-)",
"Marx, J. A.",
"Marxsen, Peter Christian (1806-1869)",
...
]
OpenSearch Suggestions would be:
[
"Marx",
[
"Marx, Antônio Augusto",
"Marx, J. A.",
"Marxsen, Peter Christian",
...
],
[
"Architekt und Künstler (1919-)"
"Organist und Musiker in Yselstein",
"Subrektor (1806-1869)",
...
],
[
"http://d-nb.info/gnd/1030092001",
"http://d-nb.info/gnd/1012540626",
"http://d-nb.info/gnd/1016221088",
...
],
]
Both dumps and updates (via OAI) are available as RDF-XML, so that would be a suitable source format:
http://datendienst.dnb.de/cgi-bin/mabit.pl?userID=opendata&pass=opendata&cmd=login
http://www.dnb.de/DE/Service/DigitaleDienste/OAI/oai_node.html (s. "Formate")
We should test serializing that RDF-XML as compact JSON-LD using the entityfacts context:
http://hub.culturegraph.org/entityfacts/context/v1/entityfacts.jsonld
http://hub.culturegraph.org/entityfacts/118540238
If the result looks good, this might be the format to index in Elasticsearch. We might have to do some preprocessing to make sure the values always have the same type (see footnote 1 in http://blog.lobid.org/2017/06/08/lobid-api-why-how.html about compact JSON-LD serialization in Elasticsearch).
Documentation: http://www.dnb.de/DE/Service/DigitaleDienste/EntityFacts/entityfacts_node.html
Example of sameAs
links for London:
{
"sameAs":[
{
"@id":"http://d-nb.info/gnd/4074335-4/about",
"collection":{
"abbr":"DNB",
"name":"Gemeinsame Normdatei (GND) im Katalog der Deutschen Nationalbibliothek",
"publisher":"Deutsche Nationalbibliothek",
"icon":"http://www.dnb.de/SiteGlobals/StyleBundles/Bilder/favicon.png?__blob=normal&v=1"
}
},
{
"@id":"http://viaf.org/viaf/236493943",
"collection":{
"abbr":"VIAF",
"name":"Virtual International Authority File (VIAF)",
"publisher":"OCLC",
"icon":"http://viaf.org/viaf/images/viaf.ico"
}
},
{
"@id":"http://www.wikidata.org/entity/Q6669738",
"collection":{
"abbr":"WIKIDATA",
"name":"Wikidata",
"publisher":"Wikimedia Foundation Inc.",
"icon":"https://www.wikidata.org/static/favicon/wikidata.ico"
}
},
{
"@id":"https://de.wikisource.org/wiki/London",
"collection":{
"abbr":"WIKISOURCE",
"name":"Wikisource",
"publisher":"Wikimedia Foundation Inc.",
"icon":"https://wikisource.org/static/favicon/wikisource.ico"
}
},
{
"@id":"https://en.wikipedia.org/wiki/London%2C_Wisconsin",
"collection":{
"abbr":"enwiki",
"name":"Wikipedia (English)",
"publisher":"Wikimedia Foundation Inc.",
"icon":"https://en.wikipedia.org/static/favicon/wikipedia.ico"
}
}
]
}
First step: Use EntityFacts to show links (using the icons linke din the JSON) to other resources in the HTML interface.
Possible further step: Enrich JSON-LD with links from EntityFacts. There might be a dump in the future we could build a map from, see https://twitter.com/junicatalo/status/971793109796433926.
Currently only linking from result list
For the result and facet counts, e.g. instead of "4577743" "4.577.743". Keep in mind that we will implement this differently for the – to be added – English version of the service.
We discussed which filter options we could add. I identified for additional filters that could be useful, two of them only covering persons:
For differentiated persons which account for ~31% of the entries, there are two more:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.