I see that we currently have the logical source cardinality to be equal to 1, <p d

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Logical source cardinality about rml-core HOT 25 CLOSED

kg-construct commented on June 26, 2024

Logical source cardinality

from rml-core.

Comments (25)

DylanVanAssche commented on June 26, 2024 1

has been this solved in the I/O spec? Shall we move this to that repo or close it?

This is a Core issue: the cardinality of rml:logicalSource property inside a TriplesMap.

from rml-core.

DylanVanAssche commented on June 26, 2024 1

Needs updating in the spec. I'm all for the proposal:

A Triples Map SHOULD have 1 Logical Source.
A triples map MAY not have a Logical Source if all its Term Maps are

constant-valued expression maps
have a blank node as term type and no expression map

from rml-core.

bjdmeest commented on June 26, 2024

👍 to change to 'at least 1'

from rml-core.

pmaria commented on June 26, 2024

I would say at most 1, keeping the option to have triples maps without direct logicalSource relationship (think old (and maybe current??) function use case).

Say we were to allow more than 1: what is the expected behavior when a TriplesMap has more than 1 LogicalSource?

from rml-core.

andimou commented on June 26, 2024

Interesting 🤔
I see your point regarding why more than 1. The cases that I can think of is to have logical sources coming from different data sources but still have the same structure and format. But then again you need multiple data sources and not logical sources.
@bjdmeest what do you have in mind and you agree on having multiple data sources?

@pmaria I am not sure how safe it is to have a Triples map without a logical source. Then how do we know where the references refer to?

from rml-core.

DylanVanAssche commented on June 26, 2024

I would prefer even exactly one because it will complicate too much the idea of a Triples Map with rml:reference, rr:column, rr:template, etc.

at least one

Multiple sources can be defined then, causing issues with references unless you require RML Fields, which I am really against as it removes the ability to only implement a few specifications instead of all. Multiple sources without Fields will make it impossible to reconcile: JSONPath vs XPath vs a tabular column vs $WHATEVER. Also joining will be an issue.
Avoids the problem where no Logical Source is defined and it becomes hard to know where the references are pointing to. Which is a good thing
This can easily achieved by making 2 Triples Map.

at most one

Avoids the multiple sources possibility.
Lack of a Logical Source makes the references from rml:reference, rr:column, rr:template undefined as you don't know where they point to, unless the processor has it as a config, but then we are not declarative anymore.
This cannot be achieved currently without violating any shape.
Functions are currently not in a Triples Map, but are linked with FNML as a TermMap.

exactly one

Avoids the multiple sources possibility.
Avoids the lack of a Logical Source, thus ambiguous references
Multiple sources, just multiple Triples Maps.
Functions are TermMaps so they are unaffected.

from rml-core.

andimou commented on June 26, 2024

Just for the records, R2RML does say exactly 1

from rml-core.

pmaria commented on June 26, 2024

@pmaria I am not sure how safe it is to have a Triples map without a logical source. Then how do we know where the references refer to?

at most one

Avoids the multiple sources possibility.

Lack of a Logical Source makes the references from rml:reference, rr:column, rr:template undefined as you don't know where they point to, unless the processor has it as a config, but then we are not declarative anymore.

This cannot be achieved currently without violating any shape.

Functions are currently not in a Triples Map, but are linked with FNML as a TermMap.

Well the point is more that the LogicalSource could be implied in the mapping. During evaluation of the mapping the single correct logical source would then still be bound.
For example, one could imagine a use case for a TriplesMap somehow nested in another TriplesMap. Where the LogicalSource of the nested TriplesMap is implied to be the one of the outer TriplesMap.
In fact this is what was discussed for the old FunctionTriplesMap, which is now Execution in the new fnml spec.
Conceptually, an Execution still behaves much like a TriplesMap does, and could be seen as a subClass thereof (however currently not explicitly stated I believe).

So, I think we should be careful with requiring a rml:logicalSource property to be specified for each TriplesMap.

from rml-core.

bjdmeest commented on June 26, 2024

Hah, I was maybe a bit quick :). I would, in the ontology, not make any cardinality restrictions, but in the application profile of RML-Core, maybe having exactly one LogicalSource makes things the most clear-cut. The Function case was an extension, and we can specify that the application profile of a 'Nested Triples Map' may have 0..1 LogicalSource, bc that's an extension without violating the ontology. Making the standard stuff a bit more focused will ease development I assume.

If we would introduce the Nested Triples Map or smth similar in RML core, we need to revisit this issue I assume.

My use case was a folder of CSV files that was chunked in, eg 10k lines, and having multiple logicalsources could allow having a single line in RML to state 'take all chunk-*.csv files'. But revisiting that, I agree it makes more sense to have a more specialized CSV DataSource that supports globs or smth

from rml-core.

DylanVanAssche commented on June 26, 2024

The Function case was an extension, and we can specify that the application profile of a 'Nested Triples Map' may have 0..1 LogicalSource, bc that's an extension without violating the ontology.

I would like this semantic difference, this makes things more clear cut.

If we would introduce the Nested Triples Map or smth similar in RML core, we need to revisit this issue I assume.

Not entirely, if it differs semantically, it can co-exist.
Depends also a bit if the implementation MUST | SHOULD | MAY support it besides a regular Triples Map if we ever come to it.

My use case was a folder of CSV files that was chunked in, eg 10k lines, and having multiple logicalsources could allow having a single line in RML to state 'take all chunk-*.csv files'. But revisiting that, I agree it makes more sense to have a more specialized CSV DataSource that supports globs or smth

That's mainly an access description IMO for a Logical Source, there are 2 possible 'use cases' here:

Archive with multiple files which are not sharing the same structure, for example: GTFS dump has different CSVs with different headers. This would be multiple separate Logical Sources, each with their own Triples Map.
Archive with multiple files, sharing the same structure, for example: @bjdmeest use case. This would be a single Logical Source with its Triples Map. Combining these chunks happens in the access of the Logical Source.

from rml-core.

pmaria commented on June 26, 2024

So would you agree with the following?

in the ontology we don't limit the cardinality
in the specifications we describe what the cardinality is
in the shapes we can describe the cardinality matching the specification as long as it does not restrict cardinality generally for all instances of rml:TriplesMap.

from rml-core.

andimou commented on June 26, 2024

@pmaria your summary is correct, so far we have agreed on not putting restrictions on the ontology and include all restrictions to the shapes. However the question is what this cardinality should be? (My question comes after I was updating the text of the spec)

Let's try to think of rml:TriplesMap independently of potential Nested Maps, would we still stick to 0 to 1? Or would we go for exactly 1?

from rml-core.

pmaria commented on June 26, 2024

I guess that depends on how we set up the shapes.

If we want to support extensions creating subclasses of a TriplesMap where the rml:logicalSource is not specified, which seems reasonable to expect, we cannot define a shape like:

<TriplesMapShape>
  sh:targetClass rml:TriplesMap ;
  sh:property [
    sh:path rml:logicalSource ;
    sh:minCount 1 ;
    sh:maxCount 1 ;
  ] ;
.

since this would also hold for all X where X rdfs:subClassOf rml:TriplesMap, thus limiting the alternative usage in extensions.

So, if we want the restriction to be strictly defined for core, we'd need to use an alternative approach. Maybe using an sh:or or sh:xone which wraps the property shape, which could be extended in the extensions? I'd have to think a bit more about the best approach. (This is a more general issue: how to provide extension points in the shapes)

from rml-core.

pmaria commented on June 26, 2024

Discussed and decided in today's meeting: Keep cardinality 1 to 1 in shapes.

from rml-core.

DylanVanAssche commented on June 26, 2024

I came across a possibility where a Logical Source should not be required:

Triples Map:

SubjectMap: rml:constant
PredicateObjectMap:
- PredicateMap: rml:constant
- ObjectMap: rml:constant

In this case (where ALL are rml:constant), the Triples Map is completely constant and should be executed once.
This is an edge case but can be useful if you need to express some constant properties in a separate Triples Map e.g. you publish a dataset as DCAT and fill in the information about the publisher/dataset (dcat:Catalog and friends) [1] which is constant. Currently, you need to put this in a file, like it to a source and perform the mapping like that.

Should we add an exception for the cardinality in such case?
(although would be hard to check this with SHACL shapes and friends)

[1] https://www.w3.org/TR/vocab-dcat-3/#Class:Catalog

from rml-core.

pmaria commented on June 26, 2024

Interesting.

the Triples Map is completely constant and should be executed once.

How do you define that it should be executed only once?

from rml-core.

DylanVanAssche commented on June 26, 2024

How do you define that it should be executed only once?

Well that is must be executed once is 'trivial' to see as it would yield always the same RDF triples, no matter which sources are involved in the RML mapping. Question is indeed how to advertise this as an engine would need to read the Triples Map and when it can be executed without any Logical Source (read: all Term Maps have rr:constant), it proceeds.

Currently, engines rely on the Logical Source records to trigger Triples Map execution, but in this edge case, there's no source involved as all Term Maps have rr:constant.

from rml-core.

DylanVanAssche commented on June 26, 2024

At most one
Enforce this through SHACL: most one for rr:constant-only, exactly one when rml:reference or rml:template are present for a TriplesMap
Document this edge case in the spec.

from rml-core.

dachafra commented on June 26, 2024

@DylanVanAssche has been this solved in the I/O spec? Shall we move this to that repo or close it?

from rml-core.

andimou commented on June 26, 2024

@DylanVanAssche do I summarize it correctly here?

A Triples Map SHOULD have 1 Logical Source.
A triples map MAY not have a Logical Source if all its Term Maps are

constant-valued expression maps
have a blank node as term type and no expression map

from rml-core.

tirrolo commented on June 26, 2024

Just a question: given two sources, say file1.csv and file2.json, would it be possible to write a mapping specifying the class of "employees occurring both in file1.csv and file2.json"?

This looks like a pretty natural thing to write, in a federated setting, but I do not see how it could be achieved if at most one logical source is allowed (can multiple data sources be "wrapped" within the same logical source? That was not my impression by reading the spec).

from rml-core.

DylanVanAssche commented on June 26, 2024

@andimou

do I summarize it correctly here?

Yes

from rml-core.

dachafra commented on June 26, 2024

@andimou @DylanVanAssche seems there is a final proposal for this, is it already in the spec or shall we make a PR for closing this?

from rml-core.

dachafra commented on June 26, 2024

Can you then make a PR with the update and we can close this? @DylanVanAssche

from rml-core.

DylanVanAssche commented on June 26, 2024

See #84

from rml-core.

Logical source cardinality about rml-core HOT 25 CLOSED

Comments (25)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent