oeg-upm / mapeathor Goto Github PK

View Code? Open in Web Editor NEW

29.0 15.0 10.0 60.02 MB

Translator of spreadsheet mappings into R2RML, RML or YARRRML

Home Page: https://morph.oeg.fi.upm.es/tool/mapeathor

License: Apache License 2.0

Python 98.76% Dockerfile 1.24%

r2rml rml knowledge-graph data-integration

mapeathor's Introduction

Mapeathor

Mapeathor translates your mapping rules specified in spreadsheets to a mapping language.

Mapeathor is a simple spreadsheet parser able to generate mapping rules in three mapping languages: R2RML, RML (with extension to functions from FnO) and YARRRML. It takes the mapping rules expressed in a spreadsheet and transforms them into the desired language. The spreadsheet template is designed to facilitate the mapping rules' writting, with the aim of being language independent, and thus, lowering the barrier of generating mappings for non-expert users.

Example

A more detailed explanation is provided in the wiki.

First Step: Fill the xlsx template with the transformation rules

The template has five mandatory sheets, Prefixes, Source, Subject PredicateObjectMap and Functions. The last one can be left blank in case there are no functions. The spreadsheet can be in XLSX format or a Google Spreadsheet. Careful! When using Google Spreasheets, the sharing option must be enabled. Here is an example of the structure of the spreadsheet.

Second Step: Choose the output language

One of three options can be chosen: R2RML, RML or YARRRML.

Third Step: Run it!

The easiest way of running Mapeathor is using the web service and the Swagger instance. For CLI lovers, the service is available as a PyPi package and Docker image. The instructions of the latest can be found in the wiki.

Publications

Iglesias-Molina, A., Pozo-Gilo, L., Dona, D., Ruckhaus, E., Chaves-Fraga, D., & Corcho, O. (2020, January). Mapeathor: Simplifying the Specification of Declarative Rules for Knowledge Graph Construction. In ISWC (Demos/Industry). Online version

Iglesias-Molina, A., Chaves-Fraga, D., Priyatna, F., & Corcho, O. (2019). Towards the Definition of a Language-Independent Mapping Template for Knowledge Graph Creation. In Proceedings of the Third International Workshop on Capturing Scientific Knowledge co-located with the 10th International Conference on Knowledge Capture (K-CAP 2019) (pp. 33-36). Online version

Authors and contact

Ana Iglesias-Molina - [email protected]

mapeathor's People

Contributors

Stargazers

Watchers

Forkers

fpriyatna daniel-dona paoespinozarias w0xter serge3006 vb6hobbyst7 gautamshahi eurosap-labs darreal44

mapeathor's Issues

Error in prefix syntax in generated rml mapping

Prefix has the following problem: @Prefix foaf: <http//xmlns.com/foaf/0.1/>., should be @Prefix foaf: http://xmlns.com/foaf/0.1/. Some of the other prefix have the same problem. See https://github.com/oeg-upm/Mapeathor/blob/master/examples/publicBus/inputBus.rml.ttl

Handling missing data

I am currently working on transforming species interactions data from csv to rdf. My data looks like this:

consumerID	resourceID	resourceTaxonID
NCBI:211278	SFWO:0000464	nan
NCBI:211278	nan	GBIF:68

From this data, I'd like to generate triples of the form:
consumer member_of [rdf:type consumerID]
consumer eats resource
with resource member_of [rdf:type resourceTaxonID] OR resource rdf:type resourceID depending on whether the field resourceID or resourceTaxonID is not nan for the resource.

To do that, I thought maybe I could generate an individual resourceAsTaxon or an individual resourceAsMaterial
in the Subject tab, depending on whether the field resourceID or resourceTaxonID is set. That would look like something like this:

The data

ID1	consumerID	ID2	resourceID	ID3	resourceTaxonID
1	NCBI:211278	1	SFWO:0000464	nan	nan
2	NCBI:211278	nan	nan	2	GBIF:68

The mapping

ID	Class	URI
CONSUMER	obo:CARO_0001010	consumer_{ID1}
RESOURCEASMATERIAL	obo:BFO_0000040	resourceAsMaterial_{ID2}
RESOURCEASTAXON	obo:BFO_0000040	resourceAsTaxon_{ID3}

but it seems that missing data are generating errors :

INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/__main__.py", line 3, in <module>
INFO -     mapeathor.main()
INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/__init__.py", line 43, in main
INFO -     outputFile = mapping_generator.generateMapping(inputFile, args.output_file)
INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 294, in generateMapping
INFO -     json = organizeJson(json)
INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 54, in organizeJson
INFO -     json['TriplesMap'][subject['ID']]['Source'] = reFormatSource(json['TriplesMap'][subject['ID']]['Source'])
INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 269, in reFormatSource
INFO -     result['ID'] = data[0]['ID']
INFO - IndexError: list index out of range

Do you have any advice?

Installing under Windows 10

Is Mapeathor ready to be used under Windows 10? I have some issues getting started.

Non valid turtle from Function

Given the following configuration in the Function tab:

FunctionID	Feature	Value
<Divide>	fno:executes	ex:function
<Divide>	zin:p_dec_a	{omvang}
<Divide>	zin:p_dec_b	36

The following RML file is generated:

<#Divide>
    a rr:TriplesMap;
    a fnml:FunctionTermMap;
    rr:termType rr:IRI;

    fnml:functionValue [
        rml:logicalSource [
            rml:source "source.csv";
            rml:referenceFormulation ql:CSV;
        ];
        rr:predicateObjectMap [
            rr:predicate fno:executes ;
            rr:objectMap rr:constant ex:function
        ];
        rr:predicateObjectMap [
            rr:predicate zin:p_dec_a ;
            rr:objectMap rml:reference [ "omvang" ]
        ];
        rr:predicateObjectMap [
            rr:predicate zin:p_dec_b ;
            rr:objectMap rr:constant [ "36" ]
        ];
    ]
.

I believe this should be:

<#Divide>
    a rr:TriplesMap;
    a fnml:FunctionTermMap;
    rr:termType rr:IRI;

    fnml:functionValue [
        rml:logicalSource [
            rml:source "source.csv";
            rml:referenceFormulation ql:CSV;
        ];
        rr:predicateObjectMap [
            rr:predicate fno:executes ;
            rr:objectMap [ rr:constant ex:function ]
        ];
        rr:predicateObjectMap [
            rr:predicate zin:p_dec_a ;
            rr:objectMap [ rml:reference "omvang" ]
        ];
        rr:predicateObjectMap [
            rr:predicate zin:p_dec_b ;
            rr:objectMap [ rr:constant "36" ]
        ];
    ]
.

Notice the difference in the use of brackets in lines following rr:objectMap (three times, line numbers 13, 17 and 21)

Use rr:column/rr:tableName instead of rml:reference/rml:source when generating R2RML mapping

Pandas openpyxl version

Investigating #27 bug I found this other possible problem

I was getting this output from Mapeathor:

ERROR: File not found

But the real problem is this:

>>> data = pandas.ExcelFile("input/InputIncidences.xlsx", engine='openpyxl')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dani/.local/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 824, in __init__
    self._reader = self._engines[engine](self._io)
  File "/home/dani/.local/lib/python3.6/site-packages/pandas/io/excel/_openpyxl.py", line 484, in __init__
    import_optional_dependency("openpyxl")
  File "/home/dani/.local/lib/python3.6/site-packages/pandas/compat/_optional.py", line 109, in import_optional_dependency
    raise ImportError(msg)
ImportError: Pandas requires version '2.5.7' or newer of 'openpyxl' (version '2.4.9' currently installed).

Maybe the error messages can be improved a bit.

Subjects as blank nodes

Hi.
YARRRML specification allows subjects to be blank nodes : https://rml.io/yarrrml/spec/#subjects
In the case of a blank node, subjects is set to null or is not specified at all.
I tried to mimick this in Mapeathor, but the subject is systematically filled with "nan".
Am I missing something, or is Mapeathor unable to handle blank nodes at the moment ?
Many thanks.

add explicit a rr:TriplesMap

There are some engines that need explicit a rr:TriplesMap.

Currently, I have to use this command: sed 's/rml:logicalSource/a rr:TriplesMap; rml:logicalSource/g'

generate xsd:string instead of xsd:nan for unspecified literal datatype

If no datatype specified (for literal), then it's better to generate xsd:string instead of xsd:nan

add support for defining sql view (sqlQuery) in the logical table element

Not considering datatype in predicateObjectMap

Not considering datatype in predicateObjectMap. For example, a boolean should be: "false"^^http://www.w3.org/2001/XMLSchema#boolean, it is generated as "false". Similarly with numbers that should be generated as double, integer, float. All iri are generated as strings.

Defining delimiter when source is a csv file

Hi. It does not seem possible to define the delimiter when working with CSV files :
https://rml.io/yarrrml/spec/#delimiter
Please consider adding this feature :)

Update readme with changes in templates

Specially in functions, but also to include language

add the option to add default namespace

Currently, I have to use this command sed 's/<no value>//g' mapeathor_output.ttl

An unused Function lead to error

As soon as a Function is defined, but not used in the Predicate_Object sheet, I get the following error:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/mapeathor/__main__.py", line 3, in <module>
    mapeathor.main()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/mapeathor/__init__.py", line 43, in main
    outputFile = mapping_generator.generateMapping(inputFile, args.output_file)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 273, in generateMapping
    json = organizeJson(json)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 55, in organizeJson
    json['Function'] = reFormatFunction(data['Function'], json)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 138, in reFormatFunction
    result[fun]['Source']['FunctionID'] = fun
TypeError: 'NoneType' object does not support item assignment

Why would I define a function that I'm not using?

Well I do use the function. The result of the function is used as the input for another function. That works fine, but only after I also use the first function as a 'dummy' Object in the Predicate_Object sheet, the error is gone. However, this leaves me with a dummy triple I don't really want....

Blank node as object in pom

Now we have subjects as blank nodes also objects as blank nodes will be very useful.

Using blank nodes as objects as follows is normal practice:

<#DecubitusWondMapping> a rr:TriplesMap;
...
    rr:predicateObjectMap [
        rr:predicate vph-g:hasQuality ;
        rr:objectMap [
            rr:template "ernst_{wond_id}_{categorie}" ;
            rr:termType rr:BlankNode
        ]
    ];

<#DecubitusWondErnstMapping> a rr:TriplesMap;
...
    rr:subjectMap [ 
            rr:template "ernst_{wond_id}_{categorie}" ;
            #rr:class vph-g:MedicalRatingQuality ;
            rr:termType rr:BlankNode ;
        ];

Adding (cities/{city_id}) as object and blanknode as datatype in the pom tab looks appropriate.

Google Sheet Integration

It would be nice to add Google Sheet in addition to Excel files as input

Specify graph IRI in RDF quads

Hi
I would like to generate quads from a csv source, and I am wondering if and how I can specify the IRI of the graph in the mapping rule definition ?

CSVW generation

it will be nice to have the option to generate CSVW

Google spreadsheets API

Generate rr:Literal instead of rr:literal for Term Type

Currently, Mapeathor generates rr:termType rr:literal but it should be rr:termType rr:Literal.

I need to use this command to fix it: sed 's/rr:termType rr:literal/rr:termType rr:Literal/g'

Provide a docker image for mapeathor

create a docker image and save at oegdataintegration docker hub organization

Subject Tab URI to also run functions from function tab

The subject tab only allows for a manual placement of a URI with a link to a column in brackets, as an enhancement could you add the feature to allow this column (URI) to also execute a function defined on the functions table?

ID Class URI
ExampleID owl:Class https://data.example.edu/xyz/

Result

a turtle with a iri like --> https://data.example.edu/xyz/1a593ea3-81f8-4706-8294-386aff8ef469 a owl:Class .

Import error still exists

The import error still exists. from .global_config import * might work but import * is no good practice and maybe not supported anymore.

Changing all local imports into the following seems to work better:

from . import global_config

Relational databases as source

When the data source is a table, the YML file shows the line corresponding to the source field as:

sources: 
  - [name_of-the_table~SQL2008]

this does not work when transforming the YML file to an RML file. The problem is fixed by modifying the YML file manually as follows:

sources: 
  - table: name_of-the_table

Is it possible that this line was generated this way directly from Mapeathor?

Specify string language

It would be convenient to be able to specify the language of a string.

Error in PyPI package

Hi.

Running Mapeathor installed from PyPI on your test file returns the following error :

...mapeathor/__init__.py", line 198, in reFormatFunction
if data_function[0]['Feature'] == 'nan' and data_function[0]['Value'] == 'nan':
IndexError: list index out of range

Everything works fine using Docker

Generation of uris as objects in mappings is incorrect

An example of the input specification is this predicate object map:
sosa:observedProperty
recurso-trafico:propiedadmediciontrafico/intensidad

The generated output is: - [sosa:observedProperty, "recurso-trafico:propiedadmediciontrafico/carga"~iri, xsd:nan]

should be - [sosa:observedProperty, recurso-trafico:propiedadmediciontrafico/carga~iri]

Authors and contact

Add the names of the authors of the tool and the contact

Documentation issue

This link is dead

Here you can see the [Available Languages](https://github.com/oeg-upm/mapeathor/blob/master/templates).

RDF based Mapping Generation

Instead of use templates you can generate a RDF representation of the spreadsheets and easily transform the RDF data into YARRRML, RML or R2RML.
This can be done manually reading the data and generating the graph with rdflib or you can create a mapping for the "csv" files (each sheet of the spreadsheets) and generate the rdf with tools like RDFizer or RMLmapper.
I can help too.

Get predicate from source sheet

Hi.

For my use case, I need to read the Predicate linking two entities (two "subjects") from a source sheet.
It was a bit tricky to do that directly with YARRRML but I managed to make it work.
I wonder if it is something Mapeathor could do ?

Constant, full URI in Object column is generated wrong in RML

I have a declared a prefix declared in my spreadsheet, p: http://demo.org/data/
In the Predicate_Object sheet, in the Object column, it works fine if I provide a constant value using this prefix as p:item.
The RML is generated correctly and looks like:

    rr:predicateObjectMap [
                rr:predicateMap [ rr:constant some:property];
                rr:objectMap    [ rr:constant p:item; rr:termType rr:IRI; rr:datatype xsd:anyURI ]
    ];

However, if I specify the full URI in the same cell in my spreadsheet, i.e. http://demo.org/data/item instead of p:item, the RML output is incorrect:

    rr:predicateObjectMap [
                rr:predicateMap [ rr:constant some:property];
                rr:objectMap    [ rr:constant http://demo.org/data/item; rr:termType rr:IRI; rr:datatype xsd:anyURI ]
    ];

For the RML/TTL to be valid, the URI in question should be in quotes.
I understand that full, constant URI values should be supported Predicate_Objects/Objects, so this looks like a bug.

Read predicate from source

Hi, to generate my RML rules, I need to read a predicate IRI from the source file.

In the Predicate column of the Predicate_Object tab, I wrote {interaction_type_uri}, where interaction_type_uri is the name of a column in my csv file containing the IRI of a relation (e.g. http://purl.obolibrary.org/obo/RO_0002470).

This creates a rml:reference to "interaction_type_uri" in the RML rules. However, in the resulting triples file, the predicate is interpreted as a string (between quotes) and not a valid IRI (between <>).

Is is possible to read a predicate IRI from the source file, and if so, how can I do that ?

DataType for result of Function is ignored?

When I define the DataType to be of type 'iri', this seems to be ignored for the result of a Function.

In the Predicate_Object sheet:

Predicate	Object	DataType
rdf:type	<Function1>	iri

The generated RML mapping looks something like:

<#Function1>
    a rr:TriplesMap;
    a fnml:FunctionTermMap;

    fnml:functionValue [ ....

The result when using RMLMapper is a triple that looks like:
<subject> a "http://example.com/uri"

I would expect:
<subject> a ex:uri

When I manually add rr:termType rr:IRI to the RML mapping I do get the desired triple

<#Function1>
    a rr:TriplesMap;
    a fnml:FunctionTermMap;
    rr:termType rr:IRI;

    fnml:functionValue [ ....

add the option to add base uri

Currently, I have to use this command echo '@base <http://www.example.com/> .' | cat - mapeathor_output.ttl

Error in template file and setLanguage function

When I try to run mapeathor via command line with the example template, I get the following error:
ERROR: The spreadsheet template is not correct. Check the sheet and column names are correct.

After cloning the source and trying to fix the issue, I came across the following errors within the code:

In the function reFormatPredicateObject(data) of the mapping_generator.py file, when trying to make the sentence element['language'] = element['language'].lower() , the system halts since the key language does not exist within the data.
In global_config.py, the function setMappingLanguage(language) is not called, and therefore the variable templatesDir is not assign. This causes the execution to halt, since later when the mappings are written the value of the directory is non-existent.

Function as parameter generates error in RML mapping

When I use the result of a function as the input parameter in another function, an incorrect RML mapping seems to be generated.

In the Function sheet:

FunctionID	Feature	Value
<Function2>	grel:valueParameter	<Function1>

Generates output.rml.ttl of the triplesMap form , that ends with:

rr:predicateObjectMap [
            rr:predicate zin:p_string_b ;
            rr:objectMap [  <#Function1> ]
        ];

The brackets that surround #Function1 lead to a RDFParseExeption in RMLMapper: Expected an RDF value here, found ']'.

When I manually delete the brackets the correct output is generated by RMLMapper.

website unaccessible?

The repo lists https://morph.oeg.fi.upm.es/tool/mapeathor as a hosted version of the tool, but the it seems to be unavailable.

Generate owl:equivalentClass triples

Hi,
As part of my knowledge graph creation process, I have some data in a csv file I need to transform into owl:equivalentClass triples.
My data basically consists of two columns, one with the subject IRI, the second with the object IRI, and I want to generate triples of the form (subject_iri, owl:equivalentClass, object_iri).
It does not seem possible, as mapeathor treats each subject as individuals. Here, my subjects are classes.
Am I right ?

<prefix:property> in output

Using Mapeathor 1.5.2 and having in the template

ID            Predicate                  Object                                  DataType
IdEmployee    vph:heeftWerkOvereenkomst  http://data#overeenkomst_{objectId}     anyURI

results in the following mapping

<#IdEmployee>
    a rr:TriplesMap;
    rml:logicalSource [
    	rml:source "employees.csv";
    	rml:referenceFormulation ql:CSV;
    ];
    rr:subjectMap [
    	a rr:Subject;
    	rr:termType rr:IRI;
    	rr:template "http://data#employee_{identificationNo}";
    	rr:class vph:Human;
    ];
    rr:predicateObjectMap [
    	rr:predicateMap	[ rr:constant "vph:heeftWerkOvereenkomst"];
    	rr:objectMap	[ rr:template "http://data#overeenkomst_{objectId}"; rr:termType rr:IRI; rr:datatype xsd:anyURI ]
    ];
.

and the following triples using rmlmapper or rdfizer

<http://data#employee_10> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.purl.org/vph#Human>.
<http://data#employee_10> <vph:heeftWerkOvereenkomst> <http://data#overeenkomst_1>.
<http://data#employee_20> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.purl.org/vph#Human>.
<http://data#employee_20> <vph:heeftWerkOvereenkomst> <http://data#overeenkomst_2>.

in which <vph:heeftWerkOvereenkomst> is not correct.

Add a reference to the SciKnow2019 paper

When the bibliographic reference is ready in CEUR-WS

ERROR: The spreadsheet template is not correct.

I just installed mapeathor==1.6.1 and have tried the tutorial-kgc22 example, the template and some input files from the use-cases and always get an error

python -m mapeathor -l rml -i stop-stoptimes.xlsx
Generating mapping file
ERROR: The spreadsheet template is not correct. Check the sheet and column names are correct.

Import error

After Ana has changed the templating to jinja Mapeathor works fine under Windows with 1 minor change.
The several imports might be changed from

import global_config

into

from . import global_config

xml rml:iterator

Using an iterator for reading a xml file gives incorrect rml

<#IdElement>
    a rr:TriplesMap;
    rml:logicalSource [
    	rml:source "architest.xml";
    	rml:referenceFormulation ql:XPath;
        rml:iterator ;
    "/model/elements/element"];
    rr:subjectMap [
    	a rr:Subject;

removing the ; and new line gives a proper rml syntax.

input file:
archimatetest.xlsx

Invalid RML file

Hi. Generating RML rules from my spreadsheet results in an invalid RML file (according to RMLMapper).
I managed to identify one of the problems, not the other.

CSV appears to be an invalid data format. Replacing 'CSV' by 'csv' as specified in YARRRML specification works. From the attached spreadsheet, I am able to generate YARRRML rules, then to convert these rules to RML rules using yarrrml-parser, then to generate RDF triples using RMLMapper.
However, if I try to generate RML rules directly from the spreadsheet, then RMLMapper returns the following error during RDF generation:

10:40:42.691 [main] ERROR be.ugent.rml.cli.Main               .main(179) - Unable to parse mapping rules as Turtle. Does the file exist and is it valid Turtle?
org.eclipse.rdf4j.rio.RDFParseException: Not a valid (absolute) IRI: #CONSUMER [line 11]

mapping.xlsx
s.tsv.txt

oeg-upm / mapeathor Goto Github PK

mapeathor's Introduction

Mapeathor

Mapeathor translates your mapping rules specified in spreadsheets to a mapping language.

Example

First Step: Fill the xlsx template with the transformation rules

Second Step: Choose the output language

Third Step: Run it!

Publications

Authors and contact

mapeathor's People

Contributors

Stargazers

Watchers

Forkers

mapeathor's Issues

should be - [sosa:observedProperty, recurso-trafico:propiedadmediciontrafico/carga~iri]

Recommend Projects

Recommend Topics

Recommend Org