Giter Club home page Giter Club logo

prosopographi's Introduction

International Prosopographical Interchange Framework (IPIF)

This is a draft for an RESTful API for prosopographical data.

It should allow you to search for factoid modeled data on persons extracted from historical sources, and to update resources based on the model. With "factoid" model we refer to Bradley/Short 2005. See for an ontology of the factoid model: https://github.com/johnBradley501/FPO

At one day or the other it will be available at http://prosopography.org

Initiated by Georg Vogeler (Graz University, [email protected]), but developed by the collective intelligence of the following persons:

  • Gunter Vasold (Univ. Graz)
  • Daniel Jeller (ICARus Vienna)
  • Thomas Wallnig (Univ. Vienna)
  • Matthew Wilcoxson (Univ. Oxford)
  • Matthias Schlögl (ÖAW)
  • Miguel Vieira (King's Digital Lab London)
  • John Bradley (London)
  • Francesco Beretta (LARHRA, Lyon)
  • Rainer Simon (AIT Vienna)
  • Stefan Eichert (ÖAW Vienna)
  • Bärbel Kröger (Germania Sacra, Göttingen)
  • Christian Popp (Germania Sacra, Göttingen)
  • Vincent Cheng (Czech Academy)
  • Dagmar Mrozik
  • James Kelly (University of Durham)
  • Nada Zečević
  • Ekaterini Mitsiou (Univ. Vienna)
  • P. Alkuin Schachenmayr OCist (Stift Heiligenkreuz)
  • Stephan Makowksi (CCeH Köln)
  • Hedvika Kuchařová, Jana Borovičková (Prague)
  • Irene Rabl
  • Katja Almberger
  • (tbc)

Started in 2016 in a workshop at Vienna University. Substantially enhanced during the prosopography hackathon 2019 February (Vienna).

The major file in this is the swagger description of the proposed API (prosopogrAPhI.yaml). The rationale.md gives you some background to it.

The wiki contains some proof of concept implementations.

See Vogeler, Georg; Vasold, Gunter; Schlögl, Matthias. "Von IIIF zu IPIF? Ein Vorschlag für den Datenaustausch über Personen". In: Sahle, Patrick (Hg.): DHd 2019 Digital Humanities: multimedial & multimodal. Konferenzabstracts. Frankfurt / Mainz. DHd. 2019 DOI: 10.5281/zenodo.2600812. pp. 239-241. (Slides of the presentation)

Georg Vogeler, Gunter Vasold, Matthias Schlögl. "Data exchange in practice: Towards a prosopographical API". BD2019, ed. by Angel Daza; Antske Fokkens; Petya Osenova; Kiril Simov; Alexander Popov; Paul Arthur; Thierry Declerck; Ronald Sluijter; Serge ter Braake; Eveline Wandl-Vogt. CEUR Workshop series 3152. 2022, 40-48 (preprint)

Matthias Schlögl, Georg Vogeler, Gunter Vasold, Richard Hadden. "IPIF - pragmatic modelling decisions", presentation at the Data for History Conference, Berlin, 9.6.2021, https://d4h2020.sciencesconf.org/data/pages/Schlo_gl_Vogeler_Vasold_IPIF_2.pdf

Vogeler, Georg, Hadden, Richard, Schlögl, Matthias, & Vasold, Gunter. (2022, March 7). Prosopographische Interoperabilität (IPIF) - Stand der Entwicklungen. DHd 2022 Kulturen des digitalen Gedächtnisses. 8. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" (DHd 2022), Potsdam. https://doi.org/10.5281/zenodo.6328211

Hadden, Richard, Matthias Schlögl, Georg Vogeler. Towards a prosopographical ecosystem: modelling, design, and implementation issues. In: Yifan Wang et al. (Eds.): Digital Humanities 2022 : Conference AbstractsThe University of Tokyo, Japan, 25-29 July 2022. ADHO. 2022. 472-473

prosopographi's People

Contributors

gvogeler avatar gvasold avatar pdaengeli avatar

Stargazers

Peter Stadler avatar Josselin Morvan avatar Joe Wicentowski avatar Hippolyte Souvay avatar Ash Clark avatar Sharon Howard avatar _tanya_gray avatar  avatar Conal Tuohy avatar  avatar Andreas Motl avatar Nikolaus Schlemm avatar Nils Geißler avatar Ingo Börner avatar Bernhard Koschiček-Krombholz avatar Roberto Rosselli Del Turco avatar Jon Crump avatar Schiller-Stoff avatar Peter Hinkelmanns avatar Thomas Efer avatar Markus Trapp avatar Till Grallert avatar Matthias Schlögl avatar Alex avatar

Watchers

Thomas Efer avatar Jon Crump avatar  avatar Matthias Schlögl avatar James Cloos avatar Alex avatar  avatar Stefan Dumont avatar Till Grallert avatar  avatar Sue Perdue avatar Uwe Kretschmer avatar Peter Hinkelmanns avatar Bernhard Koschiček-Krombholz avatar Stephan Kurz avatar  avatar  avatar

prosopographi's Issues

Introduce a more strict schema for the data model of the statements

Currently the statements are rather lose. https://github.com/GVogeler/prosopogrAPhI/blob/master/prosopogrAPhI.yaml#L1107-L1111 provides some indications about the data model by listing URIs used in role and statementType properties. With this lose description of the data model using the interface for data creation in a database maintained by others is almost impossible. Creating a visual interface for data capture is also made very hard as the structure of the statements can only be inferred from all existing statements. To mitigate this, I'd propose to add a schema property to the describe endpoint that provides a JSON-schema for the data model. Preferably this should be a JSON-LD schema to create valid RDF data. The schema is considered complete, i.e. no other classes and properties would be allowed than defined in the schema.

add statemenType property to statemt

The statement property currently has no information on its model. IPIF recommends some properties but does not define how they are used (see also #32 ). It seems benefical to gather more information on the model of the statement by assigning a 'statemenType' property to the statements. This could include references to externally formalised models or just a project specific controlled vocabulary.

combining json-ld with api endpoint resolving

If we want to achieve a full json-ld based API consumption, it would be good to let the *-refs properties point to the explicit endpoint. This could be achieved in JSON-LD by adding a @context locally, which would replace the @base from the general context-file with the current one.
But:

  1. following https://www.w3.org/TR/json-ld/#advanced-context-usage each context definition, will override the original, so the new @base would be valid for all following URI expansions
  2. It would be necessary to make this @context required.

JSON-HAL HATEOAS compliant refs

My question is: Wouldn't it be better for an API-consumer to have JSON-HAL conform JSON-Response available?
Otherwise a consumre would have to implement API logic in the client code and could not rely on abstract REST-traversel features provided by HATEOAS using e.g. the Traverson library in Spring-Boot or in Javascript.

So instead of implicit knowledge on client side, each REST-Response (and so the server-side) would include the knowledge about how to traverse / use the API for further information, following the factoid model. E.g. instead of providing ids in the response, we could use full urls in an HAL-conform way to inform a consumer on how to further use rest-parameters / what further paths are etc.

HAL Spec:
http://stateless.co/hal_specification.html

Example of JSON-HAL:
image

Statement params wildcard

Given wide range of vocabularies in use, it would be useful to select statements based on a non-empty value for specific fields, e.g. using the presence of any value for the ?name parameter to select "naming" statements.

The in-development Python client (https://gitlab.com/acdh-oeaw/ipif-client-python) works around this by fetching each statement with a separate request, but this is clearly inefficient.

I propose using a 'wildcard' value to match any non-empty value, e.g. /statements/?name=*. (An alternative is to introduce a mustNotBeEmpty parameter that takes a list required parameters, though this is clearly more complicated)

If the intention (as yet not fully described in the IPIF spec) for full-text matching is to adhere to Lucene syntax (https://lucene.apache.org/core/2_9_4/queryparsersyntax.html), an asterisk is not the best character (a Lucene query cannot start with an asterisk).

The canonical way to do a non-empty search in Lucene is fieldName:[* TO *], but [* TO *] seems arcane as a URI parameter. Are there any single, suitable characters that will not break a URL or Lucene?

(One possibility is to 'repurpose' the single asterisk, which is not a legitimate Lucene syntax anyway, and have each endpoint translate that into whatever query would be required for non-empty values)

Use case "Autocomplete": human readable identification for a person

It seems to be typical usecase that user want to look up which persons comply to a search constraint, e.g. by filling in an autocomplete input box. The return to this kind of request should be human readable to help the user make decisions on the sensibility of the result. A typical behaviour would be to return an identifier but it should be human readable.
Conceptually this human readable label to a person would have to be considere a statement on the person or be constructed from the statements stored in the prosopographical resource and thus only being returned via the factoid and statement endpoints. Creating this on client side is a) expensive and b) needs knowledge about the details of the implementation.
We (i.e. @sennierer and @richardhadden and me @GVogeler ) suggest to introduce a return label property for GET requests on the person resource, which would have the following features:

  • no expectation to be consistent over time
  • not considered to be explicitely stored (i.e. it might be created algorithmically on the fly from a currente state of date stored)
  • not mandatory (and would suggest to use the id of the person as default value)
  • not to be part of content in POST and PUT requests, although it can be part of the returns.
    In practice this label would typically constructed from statements on names, basic biographical dates (birth, death) and maybe a claim of fame / occupation, but the decision how to construct this label would be completely under responsibility of the service provider.

add param `independentStamtent` to make combination of parameters explicit

As described in Vogeler, Georg, Hadden, Richard, Schlögl, Matthias, & Vasold, Gunter. (2022, March 7). Prosopographische Interoperabilität (IPIF) - Stand der Entwicklungen. DHd 2022 Kulturen des digitalen Gedächtnisses. 8. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" (DHd 2022), Potsdam. https://doi.org/10.5281/zenodo.6328211 and Matthias Schlögl, Georg Vogeler, Gunter Vasold, Richard Hadden. "IPIF - pragmatic modelling decisions", presentation at the Data for History Conference, Berlin, 9.6.2021, https://d4h2020.sciencesconf.org/data/pages/Schlo_gl_Vogeler_Vasold_IPIF_2.pdf the combination of parameters as in https://example.org/ipif/v0.1/person/?name=Georg&place=Graz&from=2010&to=2015 has two possible interpretations: searching for one statement matching all parameters or applying the filter parameters to different statements. Both are sensible interpretations. We suggest to use the frist interpertation as default and make second one accesible via an optional filter parameter independentStatement=true .

full text search syntax

The API could define the full text search syntax for endpoints like ?s=John%20Smith, in particular introduce quotation marks to set phrase search.

consider to add a json-schema reference to the output?

JSON-Schema would allow validation and documentation. Documentation is already done with this API definition. Validation would make it easier to control output, but on the other hand API conformity should theoretically be sufficient.

implement EDTF for `sortdate`?

The EDTF expresses many typical cases of historical data and has been integreated into ISO 8601.2019. Implementation in databases is still rather fragmentary. Would it already make sense to define sortdate as EDTF?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.