Giter Club home page Giter Club logo

sense-disambiguator's Introduction

Word-Sense Disambiguator for Finnish

A word-sense disambiguator for Finnish using the FinnWordNet synsets.

The app reads in plain Finnish text, tokenizes it, and assigns the word tokens a synset (concept) from the FinnWordNet. For more information on the analysis pipeline, see the Wiki.

The app is available as a Docker image so it can be run without installing the dependencies, see below.

Docker Image

The ready-made Docker image is published in the Docker Hub where Docker can find it automatically.

To disambiguate Finnish text, run:

$ echo Tämä on hyvä esimerkkilause. | docker run -i teemuruokolainen/sense-disambiguator:latest

The output consists of word token, word lemma, part-of-speech, assigned synset, the Brown corpus frequency of the synset (see Wiki), and the synset definition separated with a tab (\t):

$ echo Tämä on hyvä esimerkki. | docker run -i teemuruokolainen/sense-disambiguator:latest
token lemma pos synset Brown_frequency definition
Tämä tämä PRONOUN - - -
on olla VERB Synset('be.v.01') 10742 have the quality of being; (copula, used with an adjective or a predicate noun)
hyvä hyvä ADJECTIVE Synset('good.a.01') 190 having desirable or positive qualities especially those suitable for a thing >specified
esimerkki esimerkki NOUN Synset('example.n.01') 50 an item of information that is typical of a class or group
. . PUNCTUATION - - -\

To disambiguate a collection of texts (e.g. sentences or documents), run:

$ cat text-file.txt | docker run -i teemuruokolainen/sense-disambiguator:latest

where text-file.txt should contain one text per row. Each text is separated by an empty row in the output.

sense-disambiguator's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.