Giter Club home page Giter Club logo

jaconv's Introduction

Hi there 👋

Anurag's GitHub stats Top Langs

jaconv's People

Contributors

cuddlemuffin007 avatar ernix avatar frog42 avatar furukawatakumi avatar ikegami-yukino avatar kokimame avatar ksato9700 avatar kyamada-exwzd-xware avatar letuananh avatar manjuu-eater avatar shiumachi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jaconv's Issues

kwarg `ignore` was not used in `jaconv.normalize`

Hi!
Pretty lucky to get to use this convenient tool for handling Japanese script.
Keyword argument ignore comes handy in some circumstances where you don't want some of the characters to stay unchanged. I just found ignore was not used in jaconv.normalize. It'd be an easy fix though.
ありがとうございます〜

日本語Readmeのサンプル

※日本語で申し訳ないですが、これを英語でというのも若干罰ゲーム感漂い、だったらスルーしちゃおうって流れになるところからのIssue作成なのでどうかご容赦を。

以下の部分の日本語が間違っていないでしょうか?

# ASCII以外の半角文字 to 全角文字
jaconv.h2z(u'abc', ascii=True)
# => u'abc'

# 数字以外の半角文字 to 全角文字
jaconv.h2z(u'123', digit=True)
# => u'123'

なお、使う側からすると、混ざっている文字列に対して操作をするケースがあることを考えるとASCII以外→ASCIIのみという言い換えも良くないと思います。

というのは、kanaはデフォルトTrueで、いまのロジックだと以下のようになり、カナは常に変換されます。

>>> import jaconv
>>> jaconv.h2z(u'123abcティロ・フィナーレ', digit=True)
'123abcティロ・フィナーレ'
>>> jaconv.h2z(u'123abcティロ・フィナーレ', ascii=True)
'123abcティロ・フィナーレ'
>>>

これに関しては、英語のコメントを修正するのか、ロジックの修正なのか判断つきませんが、合わせてご判断頂ければと思います。

README.rst and CHANGES.rst are installed in wrong location

OS: Arch Linux
jaconv version: 0.3.4

  • When installing jaconv as a system package, README.rst and CHANGES.rst get installed to /usr/README.rst and /usr/CHANGES.rst.
  • When installing locally for the current user, the files get installed to $HOME/.local/README.rst and $HOME/.local/CHANGES.rst.

This happens with both python setup.py install and pip install.

The issue seems to be the data_files argument to setup in setup.py. This keyword is deprecated according to the documentation and removing it fixes the problem.

alphabet2kana bug

  1. su is converted before tsu leading to words like 'atsui' being converted to 「あっすい」
  2. Trailing oh is converted too soon leaving words like 'itoh' as 「いっおお」
  3. Remaining oh replacements should be done after other replacements, current placement causes words like toho to become 「っおおお」
  4. singular m remains unconverted and becomes 「っ」, e.g., namba => なっば
  5. dzu is not converted, e.g. tsudsuku => っすっずく
>>> from jaconv import alphabet2kana as a2k
>>> a2k('tsudzuku')
'っすっずく'
>>> a2k('namba')
'なっば'
>>> a2k('itoh')
'いっおお'
>>> a2k('toho')
'っおおお'

Support Small/Large Conversion

This is mostly useful for OCR purposes, but being able to change っ to つ, ぃ to い, etc. would be helpful when standardizing texts for search.

kana2alphabet bug っ(xtsu)

I found a bug in your code when I tried to convert a sentence that ends with 'っ' . (example : 'あっ' , 'ぐっ')
IMO, Cause is clear, list variable 'text' must be cast to str

In [2]: jaconv.kana2alphabet('あっ')


TypeError Traceback (most recent call last)
in ()
----> 1 jaconv.kana2alphabet('あっ')

C:\ProgramData\Anaconda3\lib\site-packages\jaconv\jaconv.py in kana2alphabet(text)
211 tsu_pos = text.index('っ')
212 if len(text) <= tsu_pos + 1:
--> 213 return text[:-1] + 'xtsu'
214 text[tsu_pos] = text[tsu_pos + 1]
215 text = ''.join(text)

TypeError: can only concatenate list (not "str") to list

Supports <= 3.4 but using typing module new in version 3.5

Python3.4以下もサポートされているようですが、typingモジュールはPython3.5で追加されたようです。
https://docs.python.org/ja/3/library/typing.html

Python3.4の環境で検証ができなかったのですが、とりあえず型ヒントのある記法はPython2.7ではinvalid syntaxになるようです…(Wandboxで確認

enlargesmallkana()にある型ヒントは<=3.4の環境で邪魔にならないのでしょうか?

いまいち確証が持てていなくて申し訳ないです。よろしくお願いします。

Tag the source

Could you please tag the source again? This allows distributions to get the complete source from GitHub if they want.

This was done in the past but not for 0.3.3.

Thanks

Tag version 0.4.0

The git tag for version v0.4.0 is missing. Could it be added please?

See also #27.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.