Giter Club home page Giter Club logo

resources's Introduction

Resources

A curated list of resources for the processing of Slovak language.

Pages

Tools

  • Spelling Dictionary
  • List of common names, abbreviations, pejoratives and neologisms.
  • tokenization, segmentation, UPOS, XPOS (SNK), lemmatization, UD
  • models trained on UD
  • implementation in Python/PyTorch, command-line interface, web service interface
  • license: Apache v2.0
  • tokenization, segmentation, UPOS, XPOS (SNK), lemmatization, UD
  • models trained on UD
  • implementation in Python/dyNET, command-line interface, web service interface
  • license: Apache v2.0
  • tokenization, stemming
  • tokenization, segmentation
  • implementation in C++
  • license: GPL v3.0
  • UPOS, UD
  • models trained on UD
  • implementation in Python/PyTorch, command-line interface
  • license: MIT
  • tokenization, segmentation, UPOS, XPOS (SNK), lemmatization, UD
  • models trained on UD
  • implementation in C++, bindings in Java, Python, Perl, C#, command-line interface, web service interface
  • license: MPL v2.0
  • tokenization, stemming, lemmatization, diacritic restoration, POS (SNK), NER
  • web service interface only
  • license: ?
  • tokenization, segmentation, lemmatization, POS (OpenNLP, SNK), UD (CoreNLP), NER
  • implementation in Java/DL4J
  • license: GNU AGPLv3
  • Web-based Visualisation of Slovak word vectors
  • Lemmatization for 25 languages
  • In Python
  • Slovak trained on UDP corpus

Corpora, datasets, vocabularies

Web

  • automatic POS (SNK)
  • source: web
  • deduplicated
  • source: Common Crawl
  • automatic POS (AUT, TreeTagger)
  • source: web
  • no annotattion
  • twitter part

Morpho-syntactic

  • tokenization, segmentation, UPOS, XPOS (SNK), UD, lemma
  • manual annotation
  • format: conllu
  • source: SNK
  • tokenization, segmentation, UPOS, XPOS (SNK), UD, lemma
  • format: conllu
  • source: Slovak UD, SNK
  • form, lemma, POS (SNK)
  • source: SNK
  • form, lemma, POS (Multext East)

Parallel

  • source: Europarl
  • speech, vectors, language
  • automatic POS (SNK)
  • source: Acquis, Europarl, EU-journal, EC-Europa, OPUS
  • automatic POS (SNK)
  • source: Acquis, Europarl, EU-journal, EC-Europa, OPUS
  • sentence aligned, POS
  • Bulgarian, Czech, English, Estonian, Hungarian, Macedonian, Persian, Polish, Romanian, Serbian, Slovak, Slovenian
  • source: "1984" novel
  • Parallel web Corpus with Slovak Part
  • 3.3 mil sentences English-Slovak

Sentiment

  • source: Twitter

NER

Wordnet

Models

Word embeddings

  • source: Wikipedia, Common Crawl
  • source: Common Crawl
  • source: Wikipedia

Transformers

  • Slovak RoBERTa base language model
  • trained on web corpus
  • Transformer models for machine translation
  • Slovak, English, Finish, Swedish, Spanish, French
  • VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
  • Facebook's Wav2Vec2 base model pretrained on the 10K unlabeled subset of VoxPopuli corpus and fine-tuned on the transcribed data in sk
  • multilingual BERT, trained on Wikipedia
  • Language-agnostic BERT Sentence Encoder (LaBSE) is a BERT-based model trained for sentence embedding for 109 languages.
  • Flores101: Large-Scale Multilingual Machine Translation
  • Baseline pretrained models for small and large tracks of WMT 21 Large-Scale Multilingual Machine Translation competition.
  • Includes Slovak language
  • For fairseq

resources's People

Contributors

peterbednar avatar hladek avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.