Giter Club home page Giter Club logo

kurdish-g2p-dataset's Introduction

DOI

Kurdish-G2P-dataset

Datasets for evaluation of Central Kurdish Grapheme-to-Phoneme Conversion systems.

Format

Central Kurdish words in Standard Arabic script and its corresponding phoneme string separated by tab character. Syllable start is indicated by full stop. For example: ئازادی .ʔa.za.dî

Datasets

AsoSoft Kurdish Corpus most frequent tokens

Manually converted First 5000 most frequent words of AsoSoft Kurdish Corpus presented by:

Veisi, H., MohammadAmini, M., & Hosseini, H. (2019). “Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus”. Digital Scholarship in the Humanities.

@article{veisi2020toward,
  title={Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus},
  author={Veisi, Hadi and MohammadAmini, Mohammad and Hosseini, Hawre},
  journal={Digital Scholarship in the Humanities},
  volume={35},
  number={1},
  pages={176--193},
  year={2020},
  publisher={Oxford University Press}
}

Wergor dataset

Manually converted 5041 unique words of document presented by: https://github.com/sinaahmadi/wergor

Ahmadi, S. (2019). “A Rule-Based Kurdish Text Transliteration System”. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(2), 18.

@article{ahmadi2019rule,
  title={A Rule-Based Kurdish Text Transliteration System},
  author={Ahmadi, Sina},
  journal={ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)},
  volume={18},
  number={2},
  pages={18},
  year={2019},
  publisher={ACM}
}

kurdish-g2p-dataset's People

Contributors

aso-mehmudi avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

abdulhadynlp

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.