Giter Club home page Giter Club logo

mapeathor's Introduction

Generic badge License DOI version Latest PyPI version

workflow

Mapeathor

Mapeathor translates your mapping rules specified in spreadsheets to a mapping language.

Mapeathor is a simple spreadsheet parser able to generate mapping rules in three mapping languages: R2RML, RML (with extension to functions from FnO) and YARRRML. It takes the mapping rules expressed in a spreadsheet and transforms them into the desired language. The spreadsheet template is designed to facilitate the mapping rules' writting, with the aim of being language independent, and thus, lowering the barrier of generating mappings for non-expert users.

workflow

Example

A more detailed explanation is provided in the wiki.

First Step: Fill the xlsx template with the transformation rules

The template has five mandatory sheets, Prefixes, Source, Subject PredicateObjectMap and Functions. The last one can be left blank in case there are no functions. The spreadsheet can be in XLSX format or a Google Spreadsheet. Careful! When using Google Spreasheets, the sharing option must be enabled. Here is an example of the structure of the spreadsheet.

sheets

Second Step: Choose the output language

One of three options can be chosen: R2RML, RML or YARRRML.

Third Step: Run it!

The easiest way of running Mapeathor is using the web service and the Swagger instance. For CLI lovers, the service is available as a PyPi package and Docker image. The instructions of the latest can be found in the wiki.

Publications

Iglesias-Molina, A., Pozo-Gilo, L., Dona, D., Ruckhaus, E., Chaves-Fraga, D., & Corcho, O. (2020, January). Mapeathor: Simplifying the Specification of Declarative Rules for Knowledge Graph Construction. In ISWC (Demos/Industry). Online version

Iglesias-Molina, A., Chaves-Fraga, D., Priyatna, F., & Corcho, O. (2019). Towards the Definition of a Language-Independent Mapping Template for Knowledge Graph Creation. In Proceedings of the Third International Workshop on Capturing Scientific Knowledge co-located with the 10th International Conference on Knowledge Capture (K-CAP 2019) (pp. 33-36). Online version

Authors and contact

mapeathor's People

Contributors

anaigmo avatar arenas-guerrero-julian avatar dachafra avatar daniel-dona avatar ednaru avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mapeathor's Issues

Handling missing data

I am currently working on transforming species interactions data from csv to rdf. My data looks like this:

consumerID resourceID resourceTaxonID
NCBI:211278 SFWO:0000464 nan
NCBI:211278 nan GBIF:68

From this data, I'd like to generate triples of the form:
consumer member_of [rdf:type consumerID]
consumer eats resource
with resource member_of [rdf:type resourceTaxonID] OR resource rdf:type resourceID depending on whether the field resourceID or resourceTaxonID is not nan for the resource.

To do that, I thought maybe I could generate an individual resourceAsTaxon or an individual resourceAsMaterial
in the Subject tab, depending on whether the field resourceID or resourceTaxonID is set. That would look like something like this:

The data

ID1 consumerID ID2 resourceID ID3 resourceTaxonID
1 NCBI:211278 1 SFWO:0000464 nan nan
2 NCBI:211278 nan nan 2 GBIF:68

The mapping

ID Class URI
CONSUMER obo:CARO_0001010 consumer_{ID1}
RESOURCEASMATERIAL obo:BFO_0000040 resourceAsMaterial_{ID2}
RESOURCEASTAXON obo:BFO_0000040 resourceAsTaxon_{ID3}

but it seems that missing data are generating errors :

INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/__main__.py", line 3, in <module>
INFO -     mapeathor.main()
INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/__init__.py", line 43, in main
INFO -     outputFile = mapping_generator.generateMapping(inputFile, args.output_file)
INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 294, in generateMapping
INFO -     json = organizeJson(json)
INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 54, in organizeJson
INFO -     json['TriplesMap'][subject['ID']]['Source'] = reFormatSource(json['TriplesMap'][subject['ID']]['Source'])
INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 269, in reFormatSource
INFO -     result['ID'] = data[0]['ID']
INFO - IndexError: list index out of range

Do you have any advice?

Non valid turtle from Function

Given the following configuration in the Function tab:

FunctionID Feature Value
<Divide> fno:executes ex:function
<Divide> zin:p_dec_a {omvang}
<Divide> zin:p_dec_b 36

The following RML file is generated:

<#Divide>
    a rr:TriplesMap;
    a fnml:FunctionTermMap;
    rr:termType rr:IRI;

    fnml:functionValue [
        rml:logicalSource [
            rml:source "source.csv";
            rml:referenceFormulation ql:CSV;
        ];
        rr:predicateObjectMap [
            rr:predicate fno:executes ;
            rr:objectMap rr:constant ex:function
        ];
        rr:predicateObjectMap [
            rr:predicate zin:p_dec_a ;
            rr:objectMap rml:reference [ "omvang" ]
        ];
        rr:predicateObjectMap [
            rr:predicate zin:p_dec_b ;
            rr:objectMap rr:constant [ "36" ]
        ];
    ]
.

I believe this should be:

<#Divide>
    a rr:TriplesMap;
    a fnml:FunctionTermMap;
    rr:termType rr:IRI;

    fnml:functionValue [
        rml:logicalSource [
            rml:source "source.csv";
            rml:referenceFormulation ql:CSV;
        ];
        rr:predicateObjectMap [
            rr:predicate fno:executes ;
            rr:objectMap [ rr:constant ex:function ]
        ];
        rr:predicateObjectMap [
            rr:predicate zin:p_dec_a ;
            rr:objectMap [ rml:reference "omvang" ]
        ];
        rr:predicateObjectMap [
            rr:predicate zin:p_dec_b ;
            rr:objectMap [ rr:constant "36" ]
        ];
    ]
.

Notice the difference in the use of brackets in lines following rr:objectMap (three times, line numbers 13, 17 and 21)

Pandas openpyxl version

Investigating #27 bug I found this other possible problem

I was getting this output from Mapeathor:

ERROR: File not found

But the real problem is this:

>>> data = pandas.ExcelFile("input/InputIncidences.xlsx", engine='openpyxl')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dani/.local/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 824, in __init__
    self._reader = self._engines[engine](self._io)
  File "/home/dani/.local/lib/python3.6/site-packages/pandas/io/excel/_openpyxl.py", line 484, in __init__
    import_optional_dependency("openpyxl")
  File "/home/dani/.local/lib/python3.6/site-packages/pandas/compat/_optional.py", line 109, in import_optional_dependency
    raise ImportError(msg)
ImportError: Pandas requires version '2.5.7' or newer of 'openpyxl' (version '2.4.9' currently installed).

Maybe the error messages can be improved a bit.

Subjects as blank nodes

Hi.
YARRRML specification allows subjects to be blank nodes : https://rml.io/yarrrml/spec/#subjects
In the case of a blank node, subjects is set to null or is not specified at all.
I tried to mimick this in Mapeathor, but the subject is systematically filled with "nan".
Am I missing something, or is Mapeathor unable to handle blank nodes at the moment ?
Many thanks.

add explicit a rr:TriplesMap

There are some engines that need explicit a rr:TriplesMap.

Currently, I have to use this command: sed 's/rml:logicalSource/a rr:TriplesMap; rml:logicalSource/g'

An unused Function lead to error

As soon as a Function is defined, but not used in the Predicate_Object sheet, I get the following error:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/mapeathor/__main__.py", line 3, in <module>
    mapeathor.main()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/mapeathor/__init__.py", line 43, in main
    outputFile = mapping_generator.generateMapping(inputFile, args.output_file)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 273, in generateMapping
    json = organizeJson(json)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 55, in organizeJson
    json['Function'] = reFormatFunction(data['Function'], json)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 138, in reFormatFunction
    result[fun]['Source']['FunctionID'] = fun
TypeError: 'NoneType' object does not support item assignment

Why would I define a function that I'm not using?

Well I do use the function. The result of the function is used as the input for another function. That works fine, but only after I also use the first function as a 'dummy' Object in the Predicate_Object sheet, the error is gone. However, this leaves me with a dummy triple I don't really want....

Blank node as object in pom

Now we have subjects as blank nodes also objects as blank nodes will be very useful.

Using blank nodes as objects as follows is normal practice:

<#DecubitusWondMapping> a rr:TriplesMap;
...
    rr:predicateObjectMap [
        rr:predicate vph-g:hasQuality ;
        rr:objectMap [
            rr:template "ernst_{wond_id}_{categorie}" ;
            rr:termType rr:BlankNode
        ]
    ];

<#DecubitusWondErnstMapping> a rr:TriplesMap;
...
    rr:subjectMap [ 
            rr:template "ernst_{wond_id}_{categorie}" ;
            #rr:class vph-g:MedicalRatingQuality ;
            rr:termType rr:BlankNode ;
        ];

Adding (cities/{city_id}) as object and blanknode as datatype in the pom tab looks appropriate.

Specify graph IRI in RDF quads

Hi
I would like to generate quads from a csv source, and I am wondering if and how I can specify the IRI of the graph in the mapping rule definition ?

Import error still exists

The import error still exists. from .global_config import * might work but import * is no good practice and maybe not supported anymore.

Changing all local imports into the following seems to work better:

from . import global_config

Relational databases as source

When the data source is a table, the YML file shows the line corresponding to the source field as:

sources: 
  - [name_of-the_table~SQL2008]

this does not work when transforming the YML file to an RML file. The problem is fixed by modifying the YML file manually as follows:

sources: 
  - table: name_of-the_table

Is it possible that this line was generated this way directly from Mapeathor?

Error in PyPI package

Hi.

Running Mapeathor installed from PyPI on your test file returns the following error :

...mapeathor/__init__.py", line 198, in reFormatFunction
if data_function[0]['Feature'] == 'nan' and data_function[0]['Value'] == 'nan':
IndexError: list index out of range

Everything works fine using Docker

Generation of uris as objects in mappings is incorrect

An example of the input specification is this predicate object map:
sosa:observedProperty
recurso-trafico:propiedadmediciontrafico/intensidad

The generated output is: - [sosa:observedProperty, "recurso-trafico:propiedadmediciontrafico/carga"~iri, xsd:nan]

should be - [sosa:observedProperty, recurso-trafico:propiedadmediciontrafico/carga~iri]

--

Documentation issue

This link is dead

Here you can see the [Available Languages](https://github.com/oeg-upm/mapeathor/blob/master/templates).

RDF based Mapping Generation

Instead of use templates you can generate a RDF representation of the spreadsheets and easily transform the RDF data into YARRRML, RML or R2RML.
This can be done manually reading the data and generating the graph with rdflib or you can create a mapping for the "csv" files (each sheet of the spreadsheets) and generate the rdf with tools like RDFizer or RMLmapper.
I can help too.

Get predicate from source sheet

Hi.

For my use case, I need to read the Predicate linking two entities (two "subjects") from a source sheet.
It was a bit tricky to do that directly with YARRRML but I managed to make it work.
I wonder if it is something Mapeathor could do ?

Constant, full URI in Object column is generated wrong in RML

I have a declared a prefix declared in my spreadsheet, p: http://demo.org/data/
In the Predicate_Object sheet, in the Object column, it works fine if I provide a constant value using this prefix as p:item.
The RML is generated correctly and looks like:

    rr:predicateObjectMap [
                rr:predicateMap [ rr:constant some:property];
                rr:objectMap    [ rr:constant p:item; rr:termType rr:IRI; rr:datatype xsd:anyURI ]
    ];

However, if I specify the full URI in the same cell in my spreadsheet, i.e. http://demo.org/data/item instead of p:item, the RML output is incorrect:

    rr:predicateObjectMap [
                rr:predicateMap [ rr:constant some:property];
                rr:objectMap    [ rr:constant http://demo.org/data/item; rr:termType rr:IRI; rr:datatype xsd:anyURI ]
    ];

For the RML/TTL to be valid, the URI in question should be in quotes.
I understand that full, constant URI values should be supported Predicate_Objects/Objects, so this looks like a bug.

Read predicate from source

Hi, to generate my RML rules, I need to read a predicate IRI from the source file.

In the Predicate column of the Predicate_Object tab, I wrote {interaction_type_uri}, where interaction_type_uri is the name of a column in my csv file containing the IRI of a relation (e.g. http://purl.obolibrary.org/obo/RO_0002470).

This creates a rml:reference to "interaction_type_uri" in the RML rules. However, in the resulting triples file, the predicate is interpreted as a string (between quotes) and not a valid IRI (between <>).

Is is possible to read a predicate IRI from the source file, and if so, how can I do that ?

DataType for result of Function is ignored?

When I define the DataType to be of type 'iri', this seems to be ignored for the result of a Function.

In the Predicate_Object sheet:

Predicate Object DataType
rdf:type <Function1> iri

The generated RML mapping looks something like:

<#Function1>
    a rr:TriplesMap;
    a fnml:FunctionTermMap;

    fnml:functionValue [ ....

The result when using RMLMapper is a triple that looks like:
<subject> a "http://example.com/uri"

I would expect:
<subject> a ex:uri

When I manually add rr:termType rr:IRI to the RML mapping I do get the desired triple

<#Function1>
    a rr:TriplesMap;
    a fnml:FunctionTermMap;
    rr:termType rr:IRI;

    fnml:functionValue [ ....

Error in template file and setLanguage function

When I try to run mapeathor via command line with the example template, I get the following error:
ERROR: The spreadsheet template is not correct. Check the sheet and column names are correct.

After cloning the source and trying to fix the issue, I came across the following errors within the code:

  • In the function reFormatPredicateObject(data) of the mapping_generator.py file, when trying to make the sentence element['language'] = element['language'].lower() , the system halts since the key language does not exist within the data.
  • In global_config.py, the function setMappingLanguage(language) is not called, and therefore the variable templatesDir is not assign. This causes the execution to halt, since later when the mappings are written the value of the directory is non-existent.

Function as parameter generates error in RML mapping

When I use the result of a function as the input parameter in another function, an incorrect RML mapping seems to be generated.

In the Function sheet:

FunctionID Feature Value
<Function2> grel:valueParameter <Function1>

Generates output.rml.ttl of the triplesMap form , that ends with:

rr:predicateObjectMap [
            rr:predicate zin:p_string_b ;
            rr:objectMap [  <#Function1> ]
        ];

The brackets that surround #Function1 lead to a RDFParseExeption in RMLMapper: Expected an RDF value here, found ']'.

When I manually delete the brackets the correct output is generated by RMLMapper.

Generate owl:equivalentClass triples

Hi,
As part of my knowledge graph creation process, I have some data in a csv file I need to transform into owl:equivalentClass triples.
My data basically consists of two columns, one with the subject IRI, the second with the object IRI, and I want to generate triples of the form (subject_iri, owl:equivalentClass, object_iri).
It does not seem possible, as mapeathor treats each subject as individuals. Here, my subjects are classes.
Am I right ?

<prefix:property> in output

Using Mapeathor 1.5.2 and having in the template

ID            Predicate                  Object                                  DataType
IdEmployee    vph:heeftWerkOvereenkomst  http://data#overeenkomst_{objectId}     anyURI

results in the following mapping

<#IdEmployee>
    a rr:TriplesMap;
    rml:logicalSource [
    	rml:source "employees.csv";
    	rml:referenceFormulation ql:CSV;
    ];
    rr:subjectMap [
    	a rr:Subject;
    	rr:termType rr:IRI;
    	rr:template "http://data#employee_{identificationNo}";
    	rr:class vph:Human;
    ];
    rr:predicateObjectMap [
    	rr:predicateMap	[ rr:constant "vph:heeftWerkOvereenkomst"];
    	rr:objectMap	[ rr:template "http://data#overeenkomst_{objectId}"; rr:termType rr:IRI; rr:datatype xsd:anyURI ]
    ];
.

and the following triples using rmlmapper or rdfizer

<http://data#employee_10> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.purl.org/vph#Human>.
<http://data#employee_10> <vph:heeftWerkOvereenkomst> <http://data#overeenkomst_1>.
<http://data#employee_20> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.purl.org/vph#Human>.
<http://data#employee_20> <vph:heeftWerkOvereenkomst> <http://data#overeenkomst_2>.

in which <vph:heeftWerkOvereenkomst> is not correct.

ERROR: The spreadsheet template is not correct.

I just installed mapeathor==1.6.1 and have tried the tutorial-kgc22 example, the template and some input files from the use-cases and always get an error

python -m mapeathor -l rml -i stop-stoptimes.xlsx
Generating mapping file
ERROR: The spreadsheet template is not correct. Check the sheet and column names are correct.

Import error

After Ana has changed the templating to jinja Mapeathor works fine under Windows with 1 minor change.
The several imports might be changed from

import global_config

into

from . import global_config

xml rml:iterator

Using an iterator for reading a xml file gives incorrect rml

<#IdElement>
    a rr:TriplesMap;
    rml:logicalSource [
    	rml:source "architest.xml";
    	rml:referenceFormulation ql:XPath;
        rml:iterator ;
    "/model/elements/element"];
    rr:subjectMap [
    	a rr:Subject;

removing the ; and new line gives a proper rml syntax.

input file:
archimatetest.xlsx

Invalid RML file

Hi. Generating RML rules from my spreadsheet results in an invalid RML file (according to RMLMapper).
I managed to identify one of the problems, not the other.

  1. CSV appears to be an invalid data format. Replacing 'CSV' by 'csv' as specified in YARRRML specification works. From the attached spreadsheet, I am able to generate YARRRML rules, then to convert these rules to RML rules using yarrrml-parser, then to generate RDF triples using RMLMapper.
  2. However, if I try to generate RML rules directly from the spreadsheet, then RMLMapper returns the following error during RDF generation:
10:40:42.691 [main] ERROR be.ugent.rml.cli.Main               .main(179) - Unable to parse mapping rules as Turtle. Does the file exist and is it valid Turtle?
org.eclipse.rdf4j.rio.RDFParseException: Not a valid (absolute) IRI: #CONSUMER [line 11]

mapping.xlsx
s.tsv.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.