Comments (25)
has been this solved in the I/O spec? Shall we move this to that repo or close it?
This is a Core issue: the cardinality of rml:logicalSource
property inside a TriplesMap.
from rml-core.
Needs updating in the spec. I'm all for the proposal:
A Triples Map SHOULD have 1 Logical Source.
A triples map MAY not have a Logical Source if all its Term Maps areconstant-valued expression maps
have a blank node as term type and no expression map
from rml-core.
👍 to change to 'at least 1'
from rml-core.
I would say at most 1, keeping the option to have triples maps without direct logicalSource relationship (think old (and maybe current??) function use case).
Say we were to allow more than 1: what is the expected behavior when a TriplesMap
has more than 1 LogicalSource
?
from rml-core.
Interesting 🤔
I see your point regarding why more than 1. The cases that I can think of is to have logical sources coming from different data sources but still have the same structure and format. But then again you need multiple data sources and not logical sources.
@bjdmeest what do you have in mind and you agree on having multiple data sources?
@pmaria I am not sure how safe it is to have a Triples map without a logical source. Then how do we know where the references refer to?
from rml-core.
I would prefer even exactly one because it will complicate too much the idea of a Triples Map with rml:reference
, rr:column
, rr:template
, etc.
at least one
- Multiple sources can be defined then, causing issues with references unless you require RML Fields, which I am really against as it removes the ability to only implement a few specifications instead of all. Multiple sources without Fields will make it impossible to reconcile: JSONPath vs XPath vs a tabular column vs $WHATEVER. Also joining will be an issue.
- Avoids the problem where no Logical Source is defined and it becomes hard to know where the references are pointing to. Which is a good thing
- This can easily achieved by making 2 Triples Map.
at most one
- Avoids the multiple sources possibility.
- Lack of a Logical Source makes the references from
rml:reference
,rr:column
,rr:template
undefined as you don't know where they point to, unless the processor has it as a config, but then we are not declarative anymore. - This cannot be achieved currently without violating any shape.
- Functions are currently not in a Triples Map, but are linked with FNML as a TermMap.
exactly one
- Avoids the multiple sources possibility.
- Avoids the lack of a Logical Source, thus ambiguous references
- Multiple sources, just multiple Triples Maps.
- Functions are TermMaps so they are unaffected.
from rml-core.
Just for the records, R2RML does say exactly 1
from rml-core.
@pmaria I am not sure how safe it is to have a Triples map without a logical source. Then how do we know where the references refer to?
at most one
- Avoids the multiple sources possibility.
- Lack of a Logical Source makes the references from
rml:reference
,rr:column
,rr:template
undefined as you don't know where they point to, unless the processor has it as a config, but then we are not declarative anymore.- This cannot be achieved currently without violating any shape.
- Functions are currently not in a Triples Map, but are linked with FNML as a TermMap.
Well the point is more that the LogicalSource
could be implied in the mapping. During evaluation of the mapping the single correct logical source would then still be bound.
For example, one could imagine a use case for a TriplesMap
somehow nested in another TriplesMap
. Where the LogicalSource
of the nested TriplesMap
is implied to be the one of the outer TriplesMap
.
In fact this is what was discussed for the old FunctionTriplesMap
, which is now Execution
in the new fnml spec.
Conceptually, an Execution
still behaves much like a TriplesMap
does, and could be seen as a subClass thereof (however currently not explicitly stated I believe).
So, I think we should be careful with requiring a rml:logicalSource
property to be specified for each TriplesMap
.
from rml-core.
Hah, I was maybe a bit quick :). I would, in the ontology, not make any cardinality restrictions, but in the application profile of RML-Core, maybe having exactly one LogicalSource makes things the most clear-cut. The Function case was an extension, and we can specify that the application profile of a 'Nested Triples Map' may have 0..1 LogicalSource, bc that's an extension without violating the ontology. Making the standard stuff a bit more focused will ease development I assume.
If we would introduce the Nested Triples Map or smth similar in RML core, we need to revisit this issue I assume.
My use case was a folder of CSV files that was chunked in, eg 10k lines, and having multiple logicalsources could allow having a single line in RML to state 'take all chunk-*.csv
files'. But revisiting that, I agree it makes more sense to have a more specialized CSV DataSource that supports globs or smth
from rml-core.
The Function case was an extension, and we can specify that the application profile of a 'Nested Triples Map' may have 0..1 LogicalSource, bc that's an extension without violating the ontology.
I would like this semantic difference, this makes things more clear cut.
If we would introduce the Nested Triples Map or smth similar in RML core, we need to revisit this issue I assume.
Not entirely, if it differs semantically, it can co-exist.
Depends also a bit if the implementation MUST | SHOULD | MAY support it besides a regular Triples Map if we ever come to it.
My use case was a folder of CSV files that was chunked in, eg 10k lines, and having multiple logicalsources could allow having a single line in RML to state 'take all chunk-*.csv files'. But revisiting that, I agree it makes more sense to have a more specialized CSV DataSource that supports globs or smth
That's mainly an access description IMO for a Logical Source, there are 2 possible 'use cases' here:
- Archive with multiple files which are not sharing the same structure, for example: GTFS dump has different CSVs with different headers. This would be multiple separate Logical Sources, each with their own Triples Map.
- Archive with multiple files, sharing the same structure, for example: @bjdmeest use case. This would be a single Logical Source with its Triples Map. Combining these chunks happens in the access of the Logical Source.
from rml-core.
So would you agree with the following?
- in the ontology we don't limit the cardinality
- in the specifications we describe what the cardinality is
- in the shapes we can describe the cardinality matching the specification as long as it does not restrict cardinality generally for all instances of
rml:TriplesMap
.
from rml-core.
@pmaria your summary is correct, so far we have agreed on not putting restrictions on the ontology and include all restrictions to the shapes. However the question is what this cardinality should be? (My question comes after I was updating the text of the spec)
Let's try to think of rml:TriplesMap
independently of potential Nested Maps, would we still stick to 0 to 1? Or would we go for exactly 1?
from rml-core.
I guess that depends on how we set up the shapes.
If we want to support extensions creating subclasses of a TriplesMap where the rml:logicalSource
is not specified, which seems reasonable to expect, we cannot define a shape like:
<TriplesMapShape>
sh:targetClass rml:TriplesMap ;
sh:property [
sh:path rml:logicalSource ;
sh:minCount 1 ;
sh:maxCount 1 ;
] ;
.
since this would also hold for all X
where X rdfs:subClassOf rml:TriplesMap
, thus limiting the alternative usage in extensions.
So, if we want the restriction to be strictly defined for core, we'd need to use an alternative approach. Maybe using an sh:or
or sh:xone
which wraps the property shape, which could be extended in the extensions? I'd have to think a bit more about the best approach. (This is a more general issue: how to provide extension points in the shapes)
from rml-core.
Discussed and decided in today's meeting: Keep cardinality 1 to 1 in shapes.
from rml-core.
I came across a possibility where a Logical Source should not be required:
Triples Map:
- SubjectMap:
rml:constant
- PredicateObjectMap:
- PredicateMap:
rml:constant
- ObjectMap:
rml:constant
- PredicateMap:
In this case (where ALL are rml:constant
), the Triples Map is completely constant and should be executed once.
This is an edge case but can be useful if you need to express some constant properties in a separate Triples Map e.g. you publish a dataset as DCAT and fill in the information about the publisher/dataset (dcat:Catalog
and friends) [1] which is constant. Currently, you need to put this in a file, like it to a source and perform the mapping like that.
Should we add an exception for the cardinality in such case?
(although would be hard to check this with SHACL shapes and friends)
[1] https://www.w3.org/TR/vocab-dcat-3/#Class:Catalog
from rml-core.
Interesting.
the Triples Map is completely constant and should be executed once.
How do you define that it should be executed only once?
from rml-core.
How do you define that it should be executed only once?
Well that is must be executed once is 'trivial' to see as it would yield always the same RDF triples, no matter which sources are involved in the RML mapping. Question is indeed how to advertise this as an engine would need to read the Triples Map and when it can be executed without any Logical Source (read: all Term Maps have rr:constant
), it proceeds.
Currently, engines rely on the Logical Source records to trigger Triples Map execution, but in this edge case, there's no source involved as all Term Maps have rr:constant
.
from rml-core.
- At most one
- Enforce this through SHACL: most one for rr:constant-only, exactly one when rml:reference or rml:template are present for a TriplesMap
- Document this edge case in the spec.
from rml-core.
@DylanVanAssche has been this solved in the I/O spec? Shall we move this to that repo or close it?
from rml-core.
@DylanVanAssche do I summarize it correctly here?
A Triples Map SHOULD have 1 Logical Source.
A triples map MAY not have a Logical Source if all its Term Maps are
- constant-valued expression maps
- have a blank node as term type and no expression map
from rml-core.
Just a question: given two sources, say file1.csv and file2.json, would it be possible to write a mapping specifying the class of "employees occurring both in file1.csv and file2.json"?
This looks like a pretty natural thing to write, in a federated setting, but I do not see how it could be achieved if at most one logical source is allowed (can multiple data sources be "wrapped" within the same logical source? That was not my impression by reading the spec).
from rml-core.
do I summarize it correctly here?
Yes
from rml-core.
@andimou @DylanVanAssche seems there is a final proposal for this, is it already in the spec or shall we make a PR for closing this?
from rml-core.
Can you then make a PR with the update and we can close this? @DylanVanAssche
from rml-core.
See #84
from rml-core.
Related Issues (20)
- Some bad hyperlinks in the spec HOT 3
- Data error handling (e.g., lenient mode)
- Separate sections for ExpressionMap and TermMaps in spec
- Extending usage of GraphMap at PredicateMap and ObjectMap level HOT 2
- rr:template: also provide an URI-unsafe alternative? HOT 1
- Ontology: rdfs:isDefinedBy using incorrect ontology IRI HOT 1
- Join specification when logical source is the same HOT 11
- Mistakes in the shapes HOT 6
- Joins in RML HOT 25
- Validity of template HOT 1
- The RML namespaces do not dereference HOT 6
- What about referencing object/term maps HOT 2
- Issues in test cases HOT 7
- Defining window operations in RML HOT 4
- Details lacking in datatype map HOT 1
- RML-Core test cases are too dependent on RML-IO HOT 3
- Bugs in shacl shapes HOT 2
- Section 6.6.1 Automatically deriving datatypes is underspecified HOT 6
- Issue in test case 10b-JSON HOT 3
- Should RML engines always output valid RDF? HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rml-core.