Giter Club home page Giter Club logo

Comments (25)

DylanVanAssche avatar DylanVanAssche commented on June 26, 2024 1

has been this solved in the I/O spec? Shall we move this to that repo or close it?

This is a Core issue: the cardinality of rml:logicalSource property inside a TriplesMap.

from rml-core.

DylanVanAssche avatar DylanVanAssche commented on June 26, 2024 1

Needs updating in the spec. I'm all for the proposal:

A Triples Map SHOULD have 1 Logical Source.
A triples map MAY not have a Logical Source if all its Term Maps are

constant-valued expression maps
have a blank node as term type and no expression map

from rml-core.

bjdmeest avatar bjdmeest commented on June 26, 2024

👍 to change to 'at least 1'

from rml-core.

pmaria avatar pmaria commented on June 26, 2024

I would say at most 1, keeping the option to have triples maps without direct logicalSource relationship (think old (and maybe current??) function use case).

Say we were to allow more than 1: what is the expected behavior when a TriplesMap has more than 1 LogicalSource?

from rml-core.

andimou avatar andimou commented on June 26, 2024

Interesting 🤔
I see your point regarding why more than 1. The cases that I can think of is to have logical sources coming from different data sources but still have the same structure and format. But then again you need multiple data sources and not logical sources.
@bjdmeest what do you have in mind and you agree on having multiple data sources?

@pmaria I am not sure how safe it is to have a Triples map without a logical source. Then how do we know where the references refer to?

from rml-core.

DylanVanAssche avatar DylanVanAssche commented on June 26, 2024

I would prefer even exactly one because it will complicate too much the idea of a Triples Map with rml:reference, rr:column, rr:template, etc.

at least one

  • Multiple sources can be defined then, causing issues with references unless you require RML Fields, which I am really against as it removes the ability to only implement a few specifications instead of all. Multiple sources without Fields will make it impossible to reconcile: JSONPath vs XPath vs a tabular column vs $WHATEVER. Also joining will be an issue.
  • Avoids the problem where no Logical Source is defined and it becomes hard to know where the references are pointing to. Which is a good thing
  • This can easily achieved by making 2 Triples Map.

at most one

  • Avoids the multiple sources possibility.
  • Lack of a Logical Source makes the references from rml:reference, rr:column, rr:template undefined as you don't know where they point to, unless the processor has it as a config, but then we are not declarative anymore.
  • This cannot be achieved currently without violating any shape.
  • Functions are currently not in a Triples Map, but are linked with FNML as a TermMap.

exactly one

  • Avoids the multiple sources possibility.
  • Avoids the lack of a Logical Source, thus ambiguous references
  • Multiple sources, just multiple Triples Maps.
  • Functions are TermMaps so they are unaffected.

from rml-core.

andimou avatar andimou commented on June 26, 2024

Just for the records, R2RML does say exactly 1

from rml-core.

pmaria avatar pmaria commented on June 26, 2024

@pmaria I am not sure how safe it is to have a Triples map without a logical source. Then how do we know where the references refer to?

at most one

  • Avoids the multiple sources possibility.
  • Lack of a Logical Source makes the references from rml:reference, rr:column, rr:template undefined as you don't know where they point to, unless the processor has it as a config, but then we are not declarative anymore.
  • This cannot be achieved currently without violating any shape.
  • Functions are currently not in a Triples Map, but are linked with FNML as a TermMap.

Well the point is more that the LogicalSource could be implied in the mapping. During evaluation of the mapping the single correct logical source would then still be bound.
For example, one could imagine a use case for a TriplesMap somehow nested in another TriplesMap. Where the LogicalSource of the nested TriplesMap is implied to be the one of the outer TriplesMap.
In fact this is what was discussed for the old FunctionTriplesMap, which is now Execution in the new fnml spec.
Conceptually, an Execution still behaves much like a TriplesMap does, and could be seen as a subClass thereof (however currently not explicitly stated I believe).

So, I think we should be careful with requiring a rml:logicalSource property to be specified for each TriplesMap.

from rml-core.

bjdmeest avatar bjdmeest commented on June 26, 2024

Hah, I was maybe a bit quick :). I would, in the ontology, not make any cardinality restrictions, but in the application profile of RML-Core, maybe having exactly one LogicalSource makes things the most clear-cut. The Function case was an extension, and we can specify that the application profile of a 'Nested Triples Map' may have 0..1 LogicalSource, bc that's an extension without violating the ontology. Making the standard stuff a bit more focused will ease development I assume.

If we would introduce the Nested Triples Map or smth similar in RML core, we need to revisit this issue I assume.

My use case was a folder of CSV files that was chunked in, eg 10k lines, and having multiple logicalsources could allow having a single line in RML to state 'take all chunk-*.csv files'. But revisiting that, I agree it makes more sense to have a more specialized CSV DataSource that supports globs or smth

from rml-core.

DylanVanAssche avatar DylanVanAssche commented on June 26, 2024

The Function case was an extension, and we can specify that the application profile of a 'Nested Triples Map' may have 0..1 LogicalSource, bc that's an extension without violating the ontology.

I would like this semantic difference, this makes things more clear cut.

If we would introduce the Nested Triples Map or smth similar in RML core, we need to revisit this issue I assume.

Not entirely, if it differs semantically, it can co-exist.
Depends also a bit if the implementation MUST | SHOULD | MAY support it besides a regular Triples Map if we ever come to it.

My use case was a folder of CSV files that was chunked in, eg 10k lines, and having multiple logicalsources could allow having a single line in RML to state 'take all chunk-*.csv files'. But revisiting that, I agree it makes more sense to have a more specialized CSV DataSource that supports globs or smth

That's mainly an access description IMO for a Logical Source, there are 2 possible 'use cases' here:

  1. Archive with multiple files which are not sharing the same structure, for example: GTFS dump has different CSVs with different headers. This would be multiple separate Logical Sources, each with their own Triples Map.
  2. Archive with multiple files, sharing the same structure, for example: @bjdmeest use case. This would be a single Logical Source with its Triples Map. Combining these chunks happens in the access of the Logical Source.

from rml-core.

pmaria avatar pmaria commented on June 26, 2024

So would you agree with the following?

  • in the ontology we don't limit the cardinality
  • in the specifications we describe what the cardinality is
  • in the shapes we can describe the cardinality matching the specification as long as it does not restrict cardinality generally for all instances of rml:TriplesMap.

from rml-core.

andimou avatar andimou commented on June 26, 2024

@pmaria your summary is correct, so far we have agreed on not putting restrictions on the ontology and include all restrictions to the shapes. However the question is what this cardinality should be? (My question comes after I was updating the text of the spec)

Let's try to think of rml:TriplesMap independently of potential Nested Maps, would we still stick to 0 to 1? Or would we go for exactly 1?

from rml-core.

pmaria avatar pmaria commented on June 26, 2024

I guess that depends on how we set up the shapes.

If we want to support extensions creating subclasses of a TriplesMap where the rml:logicalSource is not specified, which seems reasonable to expect, we cannot define a shape like:

<TriplesMapShape>
  sh:targetClass rml:TriplesMap ;
  sh:property [
    sh:path rml:logicalSource ;
    sh:minCount 1 ;
    sh:maxCount 1 ;
  ] ;
.

since this would also hold for all X where X rdfs:subClassOf rml:TriplesMap, thus limiting the alternative usage in extensions.

So, if we want the restriction to be strictly defined for core, we'd need to use an alternative approach. Maybe using an sh:or or sh:xone which wraps the property shape, which could be extended in the extensions? I'd have to think a bit more about the best approach. (This is a more general issue: how to provide extension points in the shapes)

from rml-core.

pmaria avatar pmaria commented on June 26, 2024

Discussed and decided in today's meeting: Keep cardinality 1 to 1 in shapes.

from rml-core.

DylanVanAssche avatar DylanVanAssche commented on June 26, 2024

I came across a possibility where a Logical Source should not be required:

Triples Map:

  • SubjectMap: rml:constant
  • PredicateObjectMap:
    • PredicateMap: rml:constant
    • ObjectMap: rml:constant

In this case (where ALL are rml:constant), the Triples Map is completely constant and should be executed once.
This is an edge case but can be useful if you need to express some constant properties in a separate Triples Map e.g. you publish a dataset as DCAT and fill in the information about the publisher/dataset (dcat:Catalog and friends) [1] which is constant. Currently, you need to put this in a file, like it to a source and perform the mapping like that.

Should we add an exception for the cardinality in such case?
(although would be hard to check this with SHACL shapes and friends)

[1] https://www.w3.org/TR/vocab-dcat-3/#Class:Catalog

from rml-core.

pmaria avatar pmaria commented on June 26, 2024

Interesting.

the Triples Map is completely constant and should be executed once.

How do you define that it should be executed only once?

from rml-core.

DylanVanAssche avatar DylanVanAssche commented on June 26, 2024

How do you define that it should be executed only once?

Well that is must be executed once is 'trivial' to see as it would yield always the same RDF triples, no matter which sources are involved in the RML mapping. Question is indeed how to advertise this as an engine would need to read the Triples Map and when it can be executed without any Logical Source (read: all Term Maps have rr:constant), it proceeds.

Currently, engines rely on the Logical Source records to trigger Triples Map execution, but in this edge case, there's no source involved as all Term Maps have rr:constant.

from rml-core.

DylanVanAssche avatar DylanVanAssche commented on June 26, 2024
  • At most one
  • Enforce this through SHACL: most one for rr:constant-only, exactly one when rml:reference or rml:template are present for a TriplesMap
  • Document this edge case in the spec.

from rml-core.

dachafra avatar dachafra commented on June 26, 2024

@DylanVanAssche has been this solved in the I/O spec? Shall we move this to that repo or close it?

from rml-core.

andimou avatar andimou commented on June 26, 2024

@DylanVanAssche do I summarize it correctly here?

A Triples Map SHOULD have 1 Logical Source.
A triples map MAY not have a Logical Source if all its Term Maps are

  • constant-valued expression maps
  • have a blank node as term type and no expression map

from rml-core.

tirrolo avatar tirrolo commented on June 26, 2024

Just a question: given two sources, say file1.csv and file2.json, would it be possible to write a mapping specifying the class of "employees occurring both in file1.csv and file2.json"?

This looks like a pretty natural thing to write, in a federated setting, but I do not see how it could be achieved if at most one logical source is allowed (can multiple data sources be "wrapped" within the same logical source? That was not my impression by reading the spec).

from rml-core.

DylanVanAssche avatar DylanVanAssche commented on June 26, 2024

@andimou

do I summarize it correctly here?

Yes

from rml-core.

dachafra avatar dachafra commented on June 26, 2024

@andimou @DylanVanAssche seems there is a final proposal for this, is it already in the spec or shall we make a PR for closing this?

from rml-core.

dachafra avatar dachafra commented on June 26, 2024

Can you then make a PR with the update and we can close this? @DylanVanAssche

from rml-core.

DylanVanAssche avatar DylanVanAssche commented on June 26, 2024

See #84

from rml-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.