Giter Club home page Giter Club logo

datamodel's Introduction

DOI

Datamodel Ontology

Content

Overview

An ontological description of a simple data model aimed to make application specific data semantic interoperabel.

The basic idea is that the data model should stay very close to the way the application repesents its data. At the same time, the data model allows easy mapping the elements in the data model (entities, dimensions and properties) to a globally shared ontology and thereby enable semantic interoperability.

This ontology only describes the data model. A set of accompanying tools is needed in order to achieve actual interoperability in real applications.

Short description of the data model ontology

The root concept in the taxonomy of this ontology is called DataModel.

Entity

The Entity is the most central concept in this ontology. It is the class of individuals that represent any self-contained piece of information. It id uniquely defined by its IRI. In addition it has the following parts (composition):

  • description: a human description of the entity.
  • dimensions: zero or more named dimensions, which are referred to by the property shapes (see below). A dimension has two parts:
    • label: a label identifying the dimension within the entity.
    • description: a human description of this dimension.
  • properties: a set or properties describing the underlying data. A property has the following paths:
    • label: a label identifying the property within the entity.
    • type: the data type of the property. More specific types, like integer, float, string etc... are subclasses of Type. Type may also be a reference to another entity. The actual type values are implementation-specific and are not included in this ontology. In future releases type may be changed to a class.
    • relation: In case type refer to another entity, relation specifies how this and the other entity are related.
    • shape: The shape of the property. It is a ordered list of DimensionExpressions. For example ["N", "N+1"] where "N" is a dimension label. Actual implementations may leave the shape optional.
    • unit: the unit of the property. Would typically refer to other ontologies, like EMMO, QUDT or OM, or simply be a conventional symbol for the unit (e.g. "km/h"). In future releases unit may be changed to a class.
    • description: a human description of the property.

Figure 1 shows the relations between the entity and its parts.

Relations between entity parts

Figure 1: The relations the Entity parts. The taxonomy is not shown for clarity.

Relations

The datamodel ontology categorises its relations in terms of:

  • composition which describe parthood relations. It correspond to mereology in EMMO and composition in UML.
  • connection which describe connections between two concepts. It correspond to topology in EMMO and the subproperty of UML aggregation that is not a composition.
  • relation which describe generic relations between concepts that are not connections or compositions. It correspond to semiotical in EMMO. In UML it is the subproperty of UML association that is not a aggregation.

As shown in Figure 2 is the same categorisation used for both object and data properties.

Relations

Figure 2: Taxonomy of object properties and data properties.

Metamodel

A metamodel for the metadata hierarchy, which is not part of the basic entity ontology was suggested in SOFT and implemented in DLite. It extends the entity ontology with the following concepts:

  • Metadata, which is a generalisation of Entity that is able to describe not only data objects, but also entities and other metadata.
  • DataInstance, which is an instance on an entity representing actual data.
  • Instance, which is the class of all metadata instances, i.e. what can be described by a Metadata. It is the disjoint union of Metadata and DataInstance.
  • EntitySchema is a metadata that can describe an entity (i.e. a meta-metadata).
  • BasicMetadataSchema is a metadata that can describe an entity schema. Furthermore it has the ability to describe itself, terminating the metadata hierarchy.

Since metadata are instances of the meta-metadata that describes them, all metadata are also instances. This is similar to Python, where classes (Metadata) are a special kind (subclasses) of Python objects (Instance).

The metadata model is shown in Figure 3. Note that this multi-level of abstractions requires second order logic to describe. It can therefore not be described formally in OWL description logic, which is based on first order logic. Instead we introduce the the instanceOf relations as an annotation property. It is ignored by the reasoner, but should be thought of as having the same semantic meaning as rdf:type without the constrain that the domain must be an individual.

OWL2 punning, which is to use the same IRI for both a class and individual, could have been another way to formalise the metadata hierarchy. However, we would like to avoid that, since punning is not anchored in first order logic.

DLite metadata

Figure 3: The extended metadata hierarchy.

Connection to EMMO

When connecting to EMMO, the datamodel ontology is describing as a formal language. That entities are self-contained are reflected in making them subclasses of spatially fundamental wholes. The entity dimension, property, shape and dimension expression parts are therefore constituents. Since shape has a finite set of (ordered) dimension expression direct parts, it is a state. This is shown in Figure 4.

The relations are not shown in Figure 4, but fits very well with EMMO:

  • composition -> emmo:hasOverlap (mereological)
  • connection -> emmo:isCausallyConnectedWith (topological)
  • relation -> emmo:semiotical

Connection to EMMO

Figure 4: Connection to EMMO.

The provided turtle files

  • entity.ttl defines the basic datamodel as a standalone turtle file.
  • metamodel.ttl imports entity.ttl and extends it with the metadata model.
  • transaction.ttl imports metamodel.ttl and extends it with the concept of transactions. A transaction is an instance that has an immutable parent instance. This allows for time series and strict provenance.
  • collection.ttl import transaction.ttl and extends it with the concept of collections. A collection is a special type of instance that contains references to a set of other instances and relations between them. Collections make it possible to describe complex data structures and can be seen as a local knowledge base.
  • datamodel.ttl imports collection.ttl and links it to EMMO.

Usage example

See docs/serialisation.md for an example of how entities and instances can be serialised with the Datamodel Ontology.

References

  1. A Practical Approach to Ontology-Based Data Modelling for Semantic Interoperability, https://www.scipedia.com/public/Hagelien_et_al_2021a

Attributions and credits

Contributing projects

Contributors and contacts

  • Jesper Friis, SINTEF
  • Francesca Lønstad Bleken, SINTEF
  • Thomas Hagelien, SINTEF

License

The datamodel ontology is released under the Creative Commons Attribution 4.0 International license (CC BY 4.0).

datamodel's People

Contributors

jesper-friis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

datamodel's Issues

Add inferred ontology

Hi @jesper-friis ,

I am currenlty including a specification for the installation of this ontology within osp-core. The result of my implementation will make it pico-installable via cmdline (e.g. pico install datamodel or pico install datamodel.v0xyz ).

Would it be possible to add an inferred serialization of the datamodel ontology, hence I can add it as rawgithub to the yml-specification.

Best wishes
Matthias

Document how entities and instances are serialised

Even though that the ontology is fixed, there are still some choices for how entities and instances can be serialised in RDF, namely

  • whether to serialise dimensions and properties as classes or individuals
  • whether to shape is explicit serialised for instances

These alternatives are illustrated in this figure:
serialisation.svg

It is important that we choose an alternative. The choice should be documented. The above figure could be used for that.

Any opinions?

I think I like alternative 2 best (serialisation as individuals and without explicit of serialisation of shape for individuals) because of its simplicity.

Difference between hasUnit and hasUnitSymbol

Both relations range a xsd:string and have the example (m/s); which should be used when ?
Also in EMMO are classes not literals and EMMO does also include the hasReferenceUnit object property. Do I correctly assume, that hasUnit and hasUnitSymbol are used here as a simplification, because it's difficult to map the correct unit class ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.