Giter Club home page Giter Club logo

dictionary-service's Introduction

ELEXIS Dictionary Service

This tool provides a simple way to host dictionaries that can be contributed to the ELEXIS infrastructure. This interface is the reference implementation of the REST API defined here:

https://elexis-eu.github.io/elexis-rest

Installation

From Source

This tool can be built with Rust/Cargo using the following command

cargo build --release

This will create a single binary at target/release/elexis-dictionary-service.

By Docker

The dictionary service is available from Docker Hub.

You can run the command with

docker run -it --rm -p 8000:8000 jmccrae/elexis-dictionary-service

Usage

The ELEXIS dictionary service supports a number of commands

Loading data

Data can be loaded with the load command

USAGE:
    elexis-dictionary-service load [FLAGS] [OPTIONS] <data>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -c, --config <config>                                    Configuration to help with mapping
        --db-path <db_path>                                  The path to use for the database (Default: eds.db)
    -f, --format <json|ttl|tei>                              The format of the input
        --genre <gen|lrn|ety|spe|his|ort|trm>                The genre(s) of the dataset (comma separated)
        --id <id>                                            The identifier of the dataset
        --release <PUBLIC|NONCOMMERCIAL|RESEARCH|PRIVATE>    The release level of the resource

ARGS:
    <data>    The data to host

For example to load a file it is normally sufficient to give a command as follows:

# A Json file
elexis-dictionary-service load example/example.json
# A TEI-Lex0 file
elexis-dictionary-service load example/example-tei.xml --id tei_dict --release PUBLIC
# An OntoLex file
elexis-dictionary-service load example/example.rdf --release PUBLIC

Starting the server

The REST server may be started with the start command:

Start the server

USAGE:
    elexis-dictionary-service start [FLAGS] [OPTIONS]

FLAGS:
    -h, --help       Prints help information
        --no-sql     Do not use SQLite (all data is temporary and session only)
    -V, --version    Prints version information

OPTIONS:
    -c, --config <config>                                    Configuration to help with mapping
    -d, --data <data>                                        Also load a single data file
        --db-path <db_path>                                  The path to use for the database (Default: eds.db)
    -f, --format <json|ttl|tei>                              The format of the input
        --genre <gen|lrn|ety|spe|his|ort|trm>                The genre(s) of the dataset (comma separated)
        --id <id>                                            The identifier of the dataset
    -p, --port <port>                                        The port to start the server on
        --release <PUBLIC|NONCOMMERCIAL|RESEARCH|PRIVATE>    The release level of the resource

For example to start a server

elexis-dictionary-service start

The server will be available at http://localhost:8000/

To start a temporary server for a single file (not using SQlite) the following command can be used

elexis-dictionary-service start -d example/example.json --no-sql

Deleting a dictionary

A dictionary may be removed from the server with the delete command

USAGE:
    elexis-dictionary-service delete [OPTIONS] [data]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
        --db-path <db_path>    The path to use for the database (Default: eds.db)

ARGS:
    <data>    Data file to delete

For example

elexis-dictionary-service delete dict_id

Formats

Json

The Json format consists of an object of the following form

{
    "dict_id": {
        "meta": {    },
        "entries: [    ]
    }
}

Where dict_id is the name of the dictionary, the meta value is exactly as would be returned by the about REST call. The entries value is an array where each element is as would be returned by the entry as Json REST call

TEI-Lex0

The TEI-Lex0 document should be a valid XML document with at least the following tags

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <teiHeader>
        <fileDesc>
            <titleStmt>
                <title>Name of the dictionary</author>
            </titleStmt>
            <publicationStmt>
                <publisher>Named of the publisher</publisher>
                <availability>
                    <licence target="http://url.of.licence">...</licence>
                </availability>
            </publicationStmt>
            <sourceDesc>
                <author>Name of the author</author>
            </sourceDesc>
        </fileDesc>
    </teiHeader>
    <body>
       <entry xml:lang="en" xml:id="test">
        <form type="lemma">
            <orth>girl</orth>
        </form>
        <form type="variant">
            <orth>girls</orth>
        </form>
        <gramGrp>
            <gram type="pos" norm="NOUN">noun</gram>
        </gramGrp>
        <sense>
            <def>young female</def>#
        </sense>
    </body>
</TEI>

The following constraints are required

  1. A licence must be given with a target
  2. An entry must have a form[@type=lemma]
  3. An entry must have a gram[@type=pos] and it should have a norm referring to a UD category unless mapping is used (see below)
  4. An entry must have a lang and a id
  5. An entry must not occur within another entry

OntoLex

An OntoLex document should be a valid Turtle document such as follows:

@prefix lime: <http://www.w3.org/ns/lemon/lime#> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> .

<#dictionary> a lime:Lexicon ;
    lime:language "en" ;
    dct:license <http://www.example.com/license> ;
    dct:description "A test resource" ;
    dct:creator [
        foaf:name "Joe Bloggs" ;
        foaf:mbox <mailto:[email protected]> ;
        foaf:homepage <http://www.example.com/>
    ] ;
    dct:publisher [
        foaf:name "Publisher"
    ] ;
    lime:entry <#entry1>, <#test> .

<#entry1> a ontolex:LexicalEntry ;
    lexinfo:partOfSpeech lexinfo:commonNoun ;
    ontolex:canonicalForm [
        ontolex:writtenRep "cat"@en 
    ] ;
    ontolex:sense [
        skos:definition "This is a definition"@en
    ] .

<#test>  a ontolex:LexicalEntry ;
    ontolex:canonicalForm [
        ontolex:writtenRep "dog"@en 
    ] ;
    ontolex:sense [
        ontolex:reference <http://www.example.com/ontology>  
    ] .

In order to process the file well, certain information should be grouped together, in particular all information about the lexicon should follow after the triple

<#dictionary> a lime:Lexicon

A dictionary must have a lime:language and a dct:license.

The entry starts with a triple of the form

<#entry1> a ontolex:LexicalEntry

All triples after this until another similar triple occurs in the file are considered the description of this entry.

All entries must have an ontolex:canonicalForm with an ontolex:writtenRep.

All entries must be given by URIs and referred to by a lime:entry triple from a lexicon

Configuration

Configuration maybe performed using a configuration file. This is particularly useful for providing mappings. An example configuration is as below

{
    "posProperty": "http://www.lexinfo.net/ontology/2.0/lexinfo#partOfSpeech",
    "posMapping": {
        "substantive": "NOUN",
        "http://www.lexinfo.net/ontology/2.0/lexinfo#pronoun": "PRON"
    },
    "defaultId": "dict_id",
    "defaultRelease": "PUBLIC"
}

The configuration has the following values

  • posProperty: The URI of the RDF property used to indicate part-of-speech
  • posMapping: A mapping of values, either RDF URI or the content of TEI tags that is mapped to a given UD value (ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X)
  • defaultId: The default ID for a dictionary (instead of a --id flag)
  • defaultRelease: The default release level of the dictionary (PUBLIC, NONCOMMERCIAL, RESEARCH, PRIVATE)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.