Giter Club home page Giter Club logo

python-jamo's Introduction

Jamo: Hangul Character Analysis

https://readthedocs.org/projects/python-jamo/badge/?version=latest

Python-jamo is a Python Hangul syllable decomposition and synthesis library for working with Hangul characters and jamo.

Currently in beta release, function names are subject to change, but there is coverage for nearly all Hangul-related codepoints under Unicode 7.0.

Originally designed to help students identify difficult-to-spell words containing (ㅔ,ㅐ) or (ㅗ,ㅜ), this project hopes to fill the niche of Korean phonetic and spelling analysis.

Installation

To install Jamo from pypi, simply:

$ pip install jamo

The jamo module is Python 3 only. Viva the bleeding edge!

Documentation

Documentation is available at ReadTheDocs in English.

Contributing

Like this project or want to help? Take a look at the issues! I'm active on github, and will review pulls. I'm open to email as well, so please contact me if you have any ideas for this project.

License

Apache 2.0 licensed.

Anyone is free to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software, under the terms of the license, without concern for royalties.

python-jamo's People

Contributors

brownbat avatar jdongian avatar mmcauliffe avatar ollipa avatar peblair avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

python-jamo's Issues

Add syllable -> English name

As defined in Unicode 7.0, e.g.
name('ㄱ') == "HANGUL LETTER KIYEOK"
name('ᅖ') == "HANGUL CHOSEONG PHIEUPH-PIEUP"

bad import style

There should be no jamo.jamo, i.e. users should be able to do something like from jamo import hangul_to_jamo.

installing error

screenshot from 2018-01-05 08-37-12

As you guys writes kindly in ReadMe, I ran "sudo pip install python-jamo" but got the error in the picture.

It might be solvable without your effort, but could you check up what would be the reason for this?

In case it is about the version of Python that I'm using, please let me know.

Have a good day guys.

쌍- and cluster decomposition

I'd find it useful to be able to break down compound consonant and vowel clusters further, into their components.

ie, I might expect
jamo.h2j('ㄸ')
to produce
ㄷㄷ
rather than simply

as it does now.

Also, having the inverse would be nice too:
jamo.j2h('ㅅ','ㅅ','ㅏ','ㅇ')
output:

Maybe a different step-up function would be required for that. I see this as very useful for Issue #7, though it might cause some ambiguities. As an example with random jamo, would ㅇ ㅜ ㅅ ㅅ ㅡ ㅇ become 웃승 or 우씅? Maybe you'd always just err on avoiding clusters unless required.

Same for ㄺ and ㅀ and the other 9(?) mixed consonant clusters. I think this gets wildly more complicated if you support old Hangul, which seems to have occasional triple (or more?) letter clusters. Though don't quote me on that, I'm new to all this.

Even what I propose would break the model of assuming all characters are 2-3 jamo, so this might not be trivial.

An .is_cluster() or .is_compound() returning True or False as a supporting function might be useful too. Maybe having constants that have all the clusters, consonants, and vowels could be used so you could just say:
jamo.is_consonant("ㄸ")
> True
or
"ㄸ" in jamo.COMPOUNDS
> True

I'm just starting to learn Korean and decided to write a short random grammar quiz for the vocab I know. One of the challenges was identifying final vowels in stems for conjugations, and identifying final consonants for irregulars. I started to write something to turn hangul to jamo before learning about the craziness of Hangul in Unicode. I was completely yak shaving until I found this great project.

Thanks for this, it's really useful!

Trie

It would be interesting to build an example project that did an jamo-level autocomplete. Maybe not fitting for this repo, but worth playing with.

Version 0.4.2 and 0.4.3 missing from PyPI

Looking at the Git history for jamo/__init__.py indicates that two new releases should have been made of the library, 0.4.2 and 0.4.3, but neither of them are available on PyPI: https://pypi.org/project/jamo/#history.

Would it be possible to get these versions (or at least 0.4.3) uploaded to PyPI so it's possible to install the latest version without pointing at the Git repository?

ReadTheDocs

Set jamo up on ReadTheDocs with some nice examples. That would be nice. Korean documentation would be nice too.

jamo list <-> hangul string function

It would be nice to have functions that auto-converted:

[0x1112, 0x1161, 0x11ab,
 0x1100, 0x1166, 0x11a8,
 0x1108, 0x1165, 0x0000]
or
['ᄒ', 'ᅡ', 'ᆫ',
 'ᄀ', 'ᅮ', 'ᆨ',
 'ᄋ', 'ᅥ', '\x00']
or
"한국어\x00"
or
['ㅎ', 'ㅏ', 'ㄴ', 'ㄱ', 'ㅜ', 'ㄱ', 'ㅇ', 'ㅓ']
or
"ㅎㅏㄴㄱㅜㄱㅇㅓ"

to

한국어

both ways.

Update README.md

Add the ReadTheDocs link. This should probably happen after #9, though.

string -> Hangul function

example inputs:

ㅎㅏㄴ
ㅎㅏㅎㅏㅎㅏ
Mixing roman characters with jamo: ㅈㅏㅁㅗ

should produce

한
하하하
Mixing roman characters with jamo: 자모

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.