worksapplications / sudachipy Goto Github PK

View Code? Open in Web Editor NEW

370.0 24.0 48.0 685 KB

Python version of Sudachi, a Japanese tokenizer.

License: Apache License 2.0

Python 93.23% Shell 0.50% Cython 6.27%

nlp-library morphological-analysis segmentation pos-tagging

sudachipy's Introduction

SudachiPy

日本語

SudachiPy is a Python version of Sudachi, a Japanese morphological analyzer.

Warning

This repository is for 0.5.* version of SudachiPy, 0.6* and above are developed as Sudachi.rs.

TL;DR

$ pip install sudachipy sudachidict_core

$ echo "高輪ゲートウェイ駅" | sudachipy
高輪ゲートウェイ駅	名詞,固有名詞,一般,*,*,*	高輪ゲートウェイ駅
EOS

$ echo "高輪ゲートウェイ駅" | sudachipy -m A
高輪	名詞,固有名詞,地名,一般,*,*	高輪
ゲートウェイ	名詞,普通名詞,一般,*,*,*	ゲートウェー
駅	名詞,普通名詞,一般,*,*,*	駅
EOS

$ echo "空缶空罐空きカン" | sudachipy -a
空缶	名詞,普通名詞,一般,*,*,*	空き缶	空缶	アキカン	0
空罐	名詞,普通名詞,一般,*,*,*	空き缶	空罐	アキカン	0
空きカン	名詞,普通名詞,一般,*,*,*	空き缶	空きカン	アキカン	0
EOS

Setup

You need SudachiPy and a dictionary.

Step 1. Install SudachiPy

$ pip install sudachipy

Step 2. Get a Dictionary

You can get dictionary as a Python package. It make take a while to download the dictionary file (around 70MB for the core edition).

$ pip install sudachidict_core

Alternatively, you can choose other dictionary editions. See this section for the detail.

Usage: As a command

There is a CLI command sudachipy.

$ echo "外国人参政権" | sudachipy
外国人参政権	名詞,普通名詞,一般,*,*,*	外国人参政権
EOS
$ echo "外国人参政権" | sudachipy -m A
外国	名詞,普通名詞,一般,*,*,*	外国
人	接尾辞,名詞的,一般,*,*,*	人
参政	名詞,普通名詞,一般,*,*,*	参政
権	接尾辞,名詞的,一般,*,*,*	権
EOS

$ sudachipy tokenize -h
usage: sudachipy tokenize [-h] [-r file] [-m {A,B,C}] [-o file] [-s string]
                          [-a] [-d] [-v]
                          [file [file ...]]

Tokenize Text

positional arguments:
  file           text written in utf-8

optional arguments:
  -h, --help     show this help message and exit
  -r file        the setting file in JSON format
  -m {A,B,C}     the mode of splitting
  -o file        the output file
  -s string      sudachidict type
  -a             print all of the fields
  -d             print the debug information
  -v, --version  print sudachipy version

Output

Columns are tab separated.

Surface
Part-of-Speech Tags (comma separated)
Normalized Form

When you add the -a option, it additionally outputs

Dictionary Form
Reading Form
Dictionary ID
- 0 for the system dictionary
- 1 and above for the user dictionaries
- -1\t(OOV) if a word is Out-of-Vocabulary (not in the dictionary)

$ echo "外国人参政権" | sudachipy -a
外国人参政権	名詞,普通名詞,一般,*,*,*	外国人参政権	外国人参政権	ガイコクジンサンセイケン	0
EOS

echo "阿quei" | sudachipy -a
阿	名詞,普通名詞,一般,*,*,*	阿	阿		-1	(OOV)
quei	名詞,普通名詞,一般,*,*,*	quei	quei		-1	(OOV)
EOS

Usage: As a Python package

Here is an example;

from sudachipy import tokenizer
from sudachipy import dictionary

tokenizer_obj = dictionary.Dictionary().create()

# Multi-granular Tokenization

mode = tokenizer.Tokenizer.SplitMode.C
[m.surface() for m in tokenizer_obj.tokenize("国家公務員", mode)]
# => ['国家公務員']

mode = tokenizer.Tokenizer.SplitMode.B
[m.surface() for m in tokenizer_obj.tokenize("国家公務員", mode)]
# => ['国家', '公務員']

mode = tokenizer.Tokenizer.SplitMode.A
[m.surface() for m in tokenizer_obj.tokenize("国家公務員", mode)]
# => ['国家', '公務', '員']

# Morpheme information

m = tokenizer_obj.tokenize("食べ", mode)[0]

m.surface() # => '食べ'
m.dictionary_form() # => '食べる'
m.reading_form() # => 'タベ'
m.part_of_speech() # => ['動詞', '一般', '*', '*', '下一段-バ行', '連用形-一般']

# Normalization

tokenizer_obj.tokenize("附属", mode)[0].normalized_form()
# => '付属'
tokenizer_obj.tokenize("SUMMER", mode)[0].normalized_form()
# => 'サマー'
tokenizer_obj.tokenize("シュミレーション", mode)[0].normalized_form()
# => 'シミュレーション'

(With 20200330 core dictionary. The results may change when you use other versions)

Dictionary Edition

**WARNING: sudachipy link is no longer available in SudachiPy v0.5.2 and later. **

There are three editions of Sudachi Dictionary, namely, small, core, and full. See WorksApplications/SudachiDict for the detail.

SudachiPy uses sudachidict_core by default.

Dictionaries are installed as Python packages sudachidict_small, sudachidict_core, and sudachidict_full.

The dictionary files are not in the package itself, but it is downloaded upon installation.

Dictionary option: command line

You can specify the dictionary with the tokenize option -s.

$ pip install sudachidict_small
$ echo "外国人参政権" | sudachipy -s small

$ pip install sudachidict_full
$ echo "外国人参政権" | sudachipy -s full

Dictionary option: Python package

You can specify the dictionary with the Dicionary() argument; config_path or dict_type.

class Dictionary(config_path=None, resource_dir=None, dict_type=None)

config_path
- You can specify the file path to the setting file with config_path (See [Dictionary in The Setting File](#Dictionary in The Setting File) for the detail).
- If the dictionary file is specified in the setting file as systemDict, SudachiPy will use the dictionary.
dict_type
- You can also specify the dictionary type with dict_type.
- The available arguments are small, core, or full.
- If different dictionaries are specified with config_path and dict_type, a dictionary defined dict_type overrides those defined in the config path.

from sudachipy import tokenizer
from sudachipy import dictionary

# default: sudachidict_core
tokenizer_obj = dictionary.Dictionary().create()  

# The dictionary given by the `systemDict` key in the config file (/path/to/sudachi.json) will be used
tokenizer_obj = dictionary.Dictionary(config_path="/path/to/sudachi.json").create()  

# The dictionary specified by `dict_type` will be set.
tokenizer_obj = dictionary.Dictionary(dict_type="core").create()  # sudachidict_core (same as default)
tokenizer_obj = dictionary.Dictionary(dict_type="small").create()  # sudachidict_small
tokenizer_obj = dictionary.Dictionary(dict_type="full").create()  # sudachidict_full

# The dictionary specified by `dict_type` overrides those defined in the config path.
# In the following code, `sudachidict_full` will be used regardless of a dictionary defined in the config file. 
tokenizer_obj = dictionary.Dictionary(config_path="/path/to/sudachi.json", dict_type="full").create()

Dictionary in The Setting File

Alternatively, if the dictionary file is specified in the setting file, sudachi.json, SudachiPy will use that file.

{
    "systemDict" : "relative/path/to/system.dic",
    ...
}

The default setting file is sudachipy/resources/sudachi.json. You can specify your sudachi.json with the -r option.

$ sudachipy -r path/to/sudachi.json

User Dictionary

To use a user dictionary, user.dic, place sudachi.json to anywhere you like, and add userDict value with the relative path from sudachi.json to your user.dic.

{
    "userDict" : ["relative/path/to/user.dic"],
    ...
}

Then specify your sudachi.json with the -r option.

$ sudachipy -r path/to/sudachi.json

You can build a user dictionary with the subcommand ubuild.

WARNING: v0.3.* ubuild contains bug.

$ sudachipy ubuild -h
usage: sudachipy ubuild [-h] [-d string] [-o file] [-s file] file [file ...]

Build User Dictionary

positional arguments:
  file        source files with CSV format (one or more)

optional arguments:
  -h, --help  show this help message and exit
  -d string   description comment to be embedded on dictionary
  -o file     output file (default: user.dic)
  -s file     system dictionary path (default: system core dictionary path)

About the dictionary file format, please refer to this document (written in Japanese, English version is not available yet).

Customized System Dictionary

$ sudachipy build -h
usage: sudachipy build [-h] [-o file] [-d string] -m file file [file ...]

Build Sudachi Dictionary

positional arguments:
  file        source files with CSV format (one of more)

optional arguments:
  -h, --help  show this help message and exit
  -o file     output file (default: system.dic)
  -d string   description comment to be embedded on dictionary

required named arguments:
  -m file     connection matrix file with MeCab's matrix.def format

To use your customized system.dic, place sudachi.json to anywhere you like, and overwrite systemDict value with the relative path from sudachi.json to your system.dic.

{
    "systemDict" : "relative/path/to/system.dic",
    ...
}

Then specify your sudachi.json with the -r option.

$ sudachipy -r path/to/sudachi.json

For Developers

Cython Build

$ python setup.py build_ext --inplace

Code Format

Run scripts/format.sh to check if your code is formatted correctly.

You need packages flake8 flake8-import-order flake8-buitins (See requirements.txt).

Test

Run scripts/test.sh to run the tests.

Contact

Sudachi and SudachiPy are developed by WAP Tokushima Laboratory of AI and NLP.

Open an issue, or come to our Slack workspace for questions and discussion.

https://sudachi-dev.slack.com/ (Get invitation here)

Enjoy tokenization!

sudachipy's People

Contributors

Stargazers

Watchers

sudachipy's Issues

Plugin structure for Python version

The original Java version has a plugin structure to add/remove extra process. How can we do similar thing with SudachiPy?

UnicodeDecodeError

I got following error when splitting mobile app store description.
The description is
'Take advantage at the Black Friday sale with 50% off.Now for $1.99 for this successful Keyboard Application at Google Play, Save money and tell your friends. The sale is extended till Monday , November 30, 2015. Don’t miss it! ai.type のスマートフォンやタブレットのための賢い、最もパーソナライズされたのキーボードです。全世界で2500万人を超えるユーザーにより、我々は、メッセージングの経験を変換する。\xa0私たちのアプリは、あなたが好きなようにあなたの文章のスタイルを学習することにより、それがカスタマイズにあなたを可能にし、パーソナライズのキーボードより良く、より速く入力できます。***すべての私達のプレミアム機能（予測、補完、訂正、スワイプ、絵文字など）を楽しみ、すべての回でそれらを使用し続けています。オリジナル★機能セットのハイライト★デザインキーボードレイアウトの変更、テーマ、色、フォント、800以上の絵文字を使用し、正確にあなたがしたいように自分を表現。★新機能★♦自由なテーマの何千人もの - 私たちのテーマ市場の一環として、何千もの中から自由で魅力的なキーボードのテーマのいずれかの利用可能なテーマを選択してください。♦デザインと独自のテーマを共有する：\to デザインのパーソナライズされたレイアウト（背景、色、およびフォント）、\to💫共有のテーマや他の何百万ものユーザーが、参照レートと⬇それらをダウンロードすることができます我々のアプリのテーマ市場へのテーマをアップロードします。♦自動絵文字🙏🚄💑を提案し、スクロール可能な😄レイアウト\to 私たちが助けて、あなたが使う言葉とコンテキスト😍🍩に基づいて右の絵文字を提案してみましょう。\to あなたは簡単に完全な絵文字パック（> 800）をナビゲートするのを助ける私たちの新しい絵文字スクロール可能なレイアウトをお楽しみください。♦の改良補正予測。★オリジナル★\t♦次単語予測、完成＆自動補正 - コンテキストベースの次の単語予測し、独自のユニークな文体に基づいて、（英語で文法チェックを含む）を自動補正。\t♦スワイプ - キーのキーから指をスワイプすることによってより速く書く。\t♦パーソナライズの🔧キーボードのルックアンドフィール、機能性と、独自のカスタムキー＆ショートカットを作成。\t\to 動的に必要に応じて、キーボードのサイズのサイズを変更します。\t\to あなたの背景画像として任意の画像を設定します。\t\to 主要キーボード画面の中から、効率的に数字、句読点（およびその他）の文字を追加するために私たちのトップ（5日）の行を使用してください。\t\to ロング任意のキーを押して、便利に、アクセントや代替文字を使用します。\t♦語キーボード検索の🔎あなたはすぐにテキストを見つけること。\t♦音声ナレーションには - あなたがそれらを入力するよう🔊言葉を指示する。\t♦新規ロリポップ材料設計＆新素材のキーボード。\t♦言語サポート - 自動予測、40以上の言語で完了し、訂正能力。\t♦プライバシー - あなたのプライバシーは私達の主な関心事である。私たちは、あなたのデータを共有しないか、パスワードフィールドから学ぶん。テキストは暗号化され、プライベートのまま。★パーミッションので説明★連絡先データベースを読む権限を連絡先リストに基づいて名前を生成するために必要とされる。SMSを読み取る権限がSMSの内容に基づいてワードを生成するために必要とされる。すべての情報は、局部的にスマートフォンの語彙に保存されます。★サポート＆質問★動画、回答＆ヒントを見つけるために私たちのヘルプ＆よくある\u200b\u200b質問のページを訪問し、サポート要求を開きます。http://www.aitype.com/support/また[email protected]に私達に電子メールを送ったり、Facebook上で私たちを訪問することができます：http://www.facebook.com/pages/aitype。'

UnicodeDecodeError Traceback (most recent call last)
in
1 for text in jp_descriptions.dropna():
----> 2 sudachi_split_words(text)

in sudachi_split_words(text)
4 wakati_words = []
5 for token in tokens:
----> 6 hinshi = token.part_of_speech()[0]
7 wakati_words.append(token.normalized_form())
8 return wakati_words

~/Dropbox/MobileApp/main/src/sudachipy/sudachipy/morpheme.py in part_of_speech(self)
15
16 def part_of_speech(self):
---> 17 wi = self.get_word_info()
18 return self.list.grammar.get_part_of_speech_string(wi.pos_id)
19

~/Dropbox/MobileApp/main/src/sudachipy/sudachipy/morpheme.py in get_word_info(self)
45 def get_word_info(self):
46 if not self.word_info:
---> 47 self.word_info = self.list.get_word_info(self.index)
48 return self.word_info

~/Dropbox/MobileApp/main/src/sudachipy/sudachipy/morphemelist.py in get_word_info(self, index)
34
35 def get_word_info(self, index):
---> 36 return self.path[index].get_word_info()
37
38 def split(self, mode, index, wi):

~/Dropbox/MobileApp/main/src/sudachipy/sudachipy/latticenode.py in get_word_info(self)
44 def get_word_info(self):
45 if self.word_id >= 0:
---> 46 return self.lexicon.get_word_info(self.word_id)
47 elif(self.extra_word_info is not None):
48 return self.extra_word_info

~/Dropbox/MobileApp/main/src/sudachipy/sudachipy/dictionarylib/lexiconset.py in get_word_info(self, word_id)
39
40 def get_word_info(self, word_id):
---> 41 return self.lexicons[self.get_dictionary_id(word_id)].get_word_info(self.get_word_id(word_id))
42
43 def get_dictionary_id(self, word_id):

~/Dropbox/MobileApp/main/src/sudachipy/sudachipy/dictionarylib/doublearraylexicon.py in get_word_info(self, word_id)
49
50 def get_word_info(self, word_id):
---> 51 return self.word_infos.get_word_info(word_id)

~/Dropbox/MobileApp/main/src/sudachipy/sudachipy/dictionarylib/wordinfolist.py in get_word_info(self, word_id)
17 pos_id = int.from_bytes(self.bytes[index:index+2], 'little')
18 index += 2
---> 19 normalized_form = self.buffer_to_string(index)
20 index += 1 + 2 * len(normalized_form)
21 if not normalized_form:

~/Dropbox/MobileApp/main/src/sudachipy/sudachipy/dictionarylib/wordinfolist.py in buffer_to_string(self, offset)
48 offset += 1
49 end = offset + 2 * length
---> 50 return self.bytes[offset:end].decode("utf-16-le")
51
52 def buffer_to_int_array(self, offset):

~/anaconda3/envs/mobileapp/lib/python3.7/encodings/utf_16_le.py in decode(input, errors)
14
15 def decode(input, errors='strict'):
---> 16 return codecs.utf_16_le_decode(input, errors, True)
17
18 class IncrementalEncoder(codecs.IncrementalEncoder):

UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 40-41: illegal UTF-16 surrogate

No reading form for certain words

>>> from sudachipy import tokenizer, dictionary
>>> tokenizer_obj = dictionary.Dictionary().create()
>>> [m.reading_form() for m in tokenizer_obj.tokenize("コンピュータ")]
['']
>>> [m.reading_form() for m in tokenizer_obj.tokenize("計算機")]
['ケイサンキ']

It should show the surface when the reading_form does not exist in the lexicon.

e.g., In the original Java implementation - dictionary/WordInfoList.java;

    WordInfo getWordInfo(int wordId) {
        
        ...

        String readingForm = bufferToString(buf);
        if (readingForm.isEmpty()) {
            readingForm = surface;
        }

        ...

    }

Thanks sig_m on the slack channel for reporting this!

Add to PyPI

After we settle the #4 and #5 .

replace function of UTF8InputTextBuilder

implementation in utf8inputtextbuilder.py, at line 27

self.modified_text = self.modified_text.replace(self.modified_text[begin:end], str_, 1)

So the situation like bellow is happening.
Do you think it's normal ? If not I can fix it.

>>> b = sudachipy.utf8inputtextbuilder.UTF8InputTextBuilder("abcdabcd", None)
>>> b.replace(4, 6, "ほげ")
>>> b.get_text()
'ほげcdabcd'

can't release v0.3.0

Uploading PyPI failed. This is error message from travis

HTTPError: 400 Client Error: Invalid value for requires_dist. Error: Can't have direct dependency: 'SudachiDict-core @ https://object-storage.tyo2.conoha.io/v1/nc_2520839e1f9641b08211a5c85243124a/sudachi/SudachiDict_core-20190531.tar.gz' for url: https://upload.pypi.org/legacy/

refactor: sudachipy.Dictionary.init

Requirement

sudachipy.Dictionary should be refactored to be testable.
Current sudachipy.Dictionary takes config file path and parse all parameters from files like dictionary written the config file.

I hope these files (name of files) can be programmable in python code in a simple manner.

Problem

I want to use test dictionary in all tests but in some tests it's difficult.

optimize JoinKatakana plugin

related #74

JoinKatakana Plugin is one of the most time consuming but simple (input & output are clear) plugin.

OK: cythonize
OK: change implementation

Tokenization fails because of UnicodeDecodeError for specific python versions

Detail: #17 (comment)

AttributeError: EOS is not connected to BOS

阿Qu
With SudachiPy v0.3.13 + Python 3.6.5 + MacOS 10.14.6, above text causes 'AttributeError: EOS is not connected to BOS'.
(SudachiPy v0.3.12 works correctly.)

Register module archive to PyPI

There are big needs for using SudachiPy as a library hence I hope to do it by:
pip install SudachiPy

Assertion fail in test_utf8inputtext.py

at line 140, test fails (original index is 8 I'd think ...)

self.assertEqual(input_.get_original_index(21), 10)

User dictionary

add ignore_test_*.py to CI

after #13

Documentation

easy installable dictionary

explosion/spaCy#3756 (comment)

Asking PyPI organization allowing 60MB limit exception for full and core dictionary.
This issue is heavily related to https://github.com/WorksApplications/SudachiDict

keep dictionary URL up-to-date

setup.py
requirements.txt
README.md

exact dictionary URL written like

"SudachiDict_core @ https://object-storage.tyo2.conoha.io/v1/nc_2520839e1f9641b08211a5c85243124a/sudachi/SudachiDict_core-20190531.tar.gz"

we need rewrite when dictionary updates.

For easy operation, we need introduce approach like bellows

prepare proxy server to route exact URL
parse URL index file on web like nltk (https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml)

prolonged sound mark ... plugin

after #7

link command has inconsistent effect to build and tokenize command

link command changes dictionary type for tokenize command but not for build and ubuild commands. It should have consistency.

dictionarylib not found

Hi, I tried and got following error.
Is there anything needed to install?
Thank you.

$ sudachipy
Traceback (most recent call last):
  File "/home/xxxx/.pyenv/versions/anaconda3-5.1.0/bin/sudachipy", line 7, in <module>
    from sudachipy.command_line import main
  File "/home/xxxx/.pyenv/versions/anaconda3-5.1.0/lib/python3.6/site-packages/sudachipy/__init__.py", line 2, in <module>
    from . import tokenizer
  File "/home/xxxx/.pyenv/versions/anaconda3-5.1.0/lib/python3.6/site-packages/sudachipy/tokenizer.py", line 3, in <module>
    from . import lattice
  File "/home/xxxx/.pyenv/versions/anaconda3-5.1.0/lib/python3.6/site-packages/sudachipy/lattice.py", line 2, in <module>
    from . import dictionarylib
ImportError: cannot import name 'dictionarylib'

Test for plugins

depends on #7 #32

UnicodeDecodeError

SudachiPy Command Line Ver on Cygwin Terminal.

Type: 貴社の記者が汽車で帰社する [Enter]

貴社の記者が汽車で帰社する
貴社    名詞,普通名詞,一般,*,*,*        貴社
の      助詞,格助詞,*,*,*,*     の
記者    名詞,普通名詞,一般,*,*,*        記者
が      助詞,格助詞,*,*,*,*     が
汽車    名詞,普通名詞,一般,*,*,*        汽車
で      助詞,格助詞,*,*,*,*     で
帰社    名詞,普通名詞,サ変可能,*,*,*    帰社
する    動詞,非自立可能,*,*,サ行変格,終止形-一般        為る
EOS

Type: 貴社の記者が汽車で帰社する [Back Space] [Enter]

貴社の記者が汽車で帰社す
  File "~/bin/sudachipy", line 11, in <module>
    sys.exit(main())
  File "~/lib/python3.7/site-packages/sudachipy/command_line.py", line 235, in main
    args.handler(args, args.print_usage)
  File "~/lib/python3.7/site-packages/sudachipy/command_line.py", line 173, in _command_tokenize
    run(tokenizer_obj, mode, input_, print_all, stdout_logger, enable_dump)
  File "/lib/python3.7/site-packages/sudachipy/command_line.py", line 61, in run
    for line in input_:
  File "~/lib/python3.7/fileinput.py", line 252, in __next__
    line = self._readline()
  File "~/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 36-37: invalid continuation byte

Sudachipy: 0.3.13 / Dict: 20190718_core

Add more docstring

Failed to install the dictionary on Windows

Hi, I'm trying to install the built dictionary for Python following instructions in doc, but the installation failed. The same happens for the core dictionary, so is there anything that I've missed?

OS: Windows 10 64-bit
Python: 3.7.4 64-bit

pip install SudachiDict_full-20190718.tar.gz
Processing sudachidict_full-20190718.tar.gz
ERROR: Exception:
Traceback (most recent call last):
  File "d:\python\lib\site-packages\pip\_internal\cli\base_command.py", line 188, in main
    status = self.run(options, args)
  File "d:\python\lib\site-packages\pip\_internal\commands\install.py", line 345, in run
    resolver.resolve(requirement_set)
  File "d:\python\lib\site-packages\pip\_internal\legacy_resolve.py", line 196, in resolve
    self._resolve_one(requirement_set, req)
  File "d:\python\lib\site-packages\pip\_internal\legacy_resolve.py", line 359, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "d:\python\lib\site-packages\pip\_internal\legacy_resolve.py", line 307, in _get_abstract_dist_for
    self.require_hashes
  File "d:\python\lib\site-packages\pip\_internal\operations\prepare.py", line 199, in prepare_linked_requirement
    progress_bar=self.progress_bar
  File "d:\python\lib\site-packages\pip\_internal\download.py", line 1051, in unpack_url
    unpack_file_url(link, location, download_dir, hashes=hashes)
  File "d:\python\lib\site-packages\pip\_internal\download.py", line 985, in unpack_file_url
    unpack_file(from_path, location, content_type, link)
  File "d:\python\lib\site-packages\pip\_internal\utils\misc.py", line 741, in unpack_file
    untar_file(filename, location)
  File "d:\python\lib\site-packages\pip\_internal\utils\misc.py", line 672, in untar_file
    member.name for member in tar.getmembers()
  File "d:\python\lib\tarfile.py", line 1763, in getmembers
    self._load()        # all members, we first have to
  File "d:\python\lib\tarfile.py", line 2350, in _load
    tarinfo = self.next()
  File "d:\python\lib\tarfile.py", line 2281, in next
    self.fileobj.seek(self.offset - 1)
  File "d:\python\lib\gzip.py", line 368, in seek
    return self._buffer.seek(offset, whence)
  File "d:\python\lib\_compression.py", line 143, in seek
    data = self.read(min(io.DEFAULT_BUFFER_SIZE, offset))
  File "d:\python\lib\gzip.py", line 471, in read
    uncompress = self._decompressor.decompress(buf, size)
zlib.error: Error -3 while decompressing data: invalid code lengths set

optimize JoinNumericPlugin

related #74

Join Numeric Plugin is one of the most time consuming but simple (input & output are clear) plugin.

OK: cythonize
OK: change implementation

faster parsing

guessed bottle necks are

fetch word info from dictionary
fetch connection weight from matrix

potential bug at connection matrix

WorksApplications/Sudachi#108

What's the tagset used by SudachiPy?

Hi, thanks for the great SudachiPy and I'm using it in my own project. I'm wondering that is there any reference for the tagset used by SudachiPy, since I want to convert the POS tags to universal POS tags?

Is it possible to import and use sudachipy's full dictionary directly?

Since I need to package and distribute my program which includes SudachiPy to my users, the symlink would not be very reliable if I want to use the full version of SudachiPy's dictionary. So is there any way to import and use sudachipy's dictionary directly instead of run the linking command before that (the linking command would fail if the user do not have admin privileges), just like spaCy:

import spacy
nlp = spacy.load("en_core_web_sm")

Integration with spaCy

Dictionary file management structure

Currently, the dictionary file is not included in the repository. We would like to make a flow to get these resources via the code, like NLTK (e.g., import nltk; nltk.download()) or spaCy (e.g., $python -m spacy download en).

test for dartsclone

Problem with installation

Hi, I actually have some problems with installation.
I cannot install it with pip because I'm getting the following error:

with open("README.md", encoding="utf-8") as f: TypeError: 'encoding' is an invalid keyword argument for this function

When I try to use pip3 it works but then I can't import it to python (no module error)

Proposal to simplify mode setting

In example code I noticed the mode is passed to tokenize on every call, like this:

# From the README
mode = tokenizer.Tokenizer.SplitMode.C
[m.surface() for m in tokenizer_obj.tokenize("医薬品安全管理責任者", mode)]

That strikes me as weird, since I assume usually you'll pick a single tokenization mode for any instance of a tokenizer and use it the whole time. Would it be possible to add a default mode when the tokenizer is constructed, something like this?

mode = tokenizer.Tokenizer.SplitMode.C
tokenizer_obj = dictionary.Dictionary().create(mode=mode) 
tokenizer_obj.tokenize("医薬品安全管理責任者") # uses mode C

If this is acceptable I can submit a PR.

parsing user-dictionary name

I wrote
"userDict" : "year.dic"
in sudachi.json and execute command line sudachipy, caused an error

FileNotFoundError: [Errno 2] No such file or directory: '~~/sudachipy/resources/y'

I rename 'year.dic' to 'y' ,wrote "userDict" : "y" in sudachi.json and execute, it can run normaly.

OOV flag is not properly set

Everything becomes OOV;

$ echo 徳島Sudachi | sudachipy -a
徳島    名詞,固有名詞,地名,一般,*,*     徳島    徳島    トクシマ        0       (OOV)
Sudachi 名詞,普通名詞,一般,*,*,*        sudachi sudachi         -1      (OOV)
EOS

>>> from sudachipy import tokenizer, dictionary
>>> tokenizer_obj = dictionary.Dictionary().create()
>>> [(m.surface(), m.dictionary_id(), m.is_oov()) for m in tokenizer_obj.tokenize("徳島Sudachi")]
[('徳島', 0, <bound method LatticeNode.is_oov of <sudachipy.latticenode.LatticeNode object at 0x7f18280f4c50>>), ('Sudachi', -1, <bound method LatticeNode.is_oov of <sudachipy.latticenode.LatticeNode object at 0x7f182807e590>>)]
>>>

Cross platform

CI

README example tokenization seems off

This is an example in the README:

mode = tokenizer.Tokenizer.SplitMode.C
[m.surface() for m in tokenizer_obj.tokenize("医薬品安全管理責任者", mode)]
# => ['医薬品安全管理責任者']

However, if I run this example my output doesn't match the given output. I get this output:

['医薬品', '安全', '管理責任者']

I'm not sure if this is due to a change in the dictionary, a bug, or something else.

This is the output of pip freeze:

sortedcontainers==2.1.0
SudachiDict-core==20190718
SudachiPy==0.3.6

Invalid space with voiced/semi-voiced sound mark

WorksApplications/Sudachi#48

愛゛の゛ム゛チ゛
愛	名詞,普通名詞,一般,*,*,*	愛
	空白,*,*,*,*,*	 
゛の	名詞,普通名詞,一般,*,*,*	゙の
	空白,*,*,*,*,*	 
゛	感動詞,一般,*,*,*,*	゙
ム	助動詞,*,*,*,文語助動詞-ム,終止形-一般	む
	空白,*,*,*,*,*	 
゛	名詞,普通名詞,一般,*,*,*	゙
チ゛	名詞,普通名詞,一般,*,*,*	ヂ

SudachiPy doesn't work with Windows with "OSError: symbolic link privilege not held"

SudachiPy doesn't work with Windows since Windows requires administrator privilege for creating symlink. It'd be nice if we could avoid using symlink for dictionary setting.

λ  python
Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from sudachipy import tokenizer
>>> from sudachipy import dictionary
>>> tokenizer_obj = dictionary.Dictionary().create()
Traceback (most recent call last):
  File "C:\Users\chezo\source\sudachi-test\.venv\lib\site-packages\sudachipy\config.py", line 55, in create_default_link_for_sudachidict_core
    dict_path = Path(import_module('sudachidict').__file__).parent
  File "C:\Python36\Lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'sudachidict'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\chezo\source\sudachi-test\.venv\lib\site-packages\sudachipy\dictionary.py", line 37, in __init__
    self._read_system_dictionary(config.settings.system_dict_path())
  File "C:\Users\chezo\source\sudachi-test\.venv\lib\site-packages\sudachipy\config.py", line 106, in system_dict_path
    dict_path = create_default_link_for_sudachidict_core(output=f)
  File "C:\Users\chezo\source\sudachi-test\.venv\lib\site-packages\sudachipy\config.py", line 71, in create_default_link_for_sudachidict_core
    dict_path = set_default_dict_package('sudachidict_core', output=output)
  File "C:\Users\chezo\source\sudachi-test\.venv\lib\site-packages\sudachipy\config.py", line 47, in set_default_dict_package
    dst_path.symlink_to(src_path)
  File "C:\Python36\Lib\pathlib.py", line 1325, in symlink_to
    self._accessor.symlink(target, self, target_is_directory)
  File "C:\Python36\Lib\pathlib.py", line 393, in wrapped
    return strfunc(str(pathobjA), str(pathobjB), *args)
OSError: symbolic link privilege not held

edit connection cost plugin

after #7

v0.3.2 Install fails on Windows

Unfortunately, encoding problem was happened on Windows while installing v0.3.2.

        long_description=open('README.md').read(),

This is the log from spaCy's CI test environment.

##[section]Starting: Install dependencies
==============================================================================
Task         : Command line
Description  : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
Version      : 2.151.1
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
==============================================================================
Generating script.
========================== Starting Command Output ===========================
##[command]"C:\windows\system32\cmd.exe" /D /E:ON /V:OFF /S /C "CALL "d:\a\_temp\3c6304a3-b77d-4d3f-9537-ed92905ea38d.cmd""
Collecting pip==18.1
  Downloading https://files.pythonhosted.org/packages/c2/d7/90f34cb0d83a6c5631cf71dfe64cc1054598c843a92b400e55675cc2ac37/pip-18.1-py2.py3-none-any.whl (1.3MB)
Installing collected packages: pip
  Found existing installation: pip 19.1.1
    Uninstalling pip-19.1.1:
      Successfully uninstalled pip-19.1.1
Successfully installed pip-18.1
Ignoring pathlib: markers 'python_version < "3.4"' don't match your environment
Collecting https://github.com/megagonlabs/ginza/releases/download/v2.0.0/SudachiPy-0.3.2-python27.tar.gz (from -r requirements.txt (line 24))
  Downloading https://github.com/megagonlabs/ginza/releases/download/v2.0.0/SudachiPy-0.3.2-python27.tar.gz (44kB)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\VSSADM~1\AppData\Local\Temp\pip-req-build-ebs0tm0p\setup.py", line 7, in <module>
        long_description=open('README.md').read(),
      File "c:\hostedtoolcache\windows\python\3.5.4\x64\lib\encodings\cp1252.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4011: character maps to <undefined>
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\VSSADM~1\AppData\Local\Temp\pip-req-build-ebs0tm0p\
You are using pip version 18.1, however version 19.1.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
##[error]Cmd.exe exited with code '1'.
##[section]Finishing: Install dependencies

handling alphabet

SudachiPy's alphabet handling is defferent from Sudachi(Java).

example: ideco
Sudachi:
ideco
ideco 名詞,普通名詞,一般,,,* iDeCo
EOS

SudachiPy:
ideco
id 名詞,普通名詞,一般,,,* ID
eco 名詞,普通名詞,サ変可能,,,* エコ
EOS

Probably, it is caused by dictionary lookup position.

"BufferError" happens when calling create method of dictionary object

Dear maintainers.

On creating the dictionary object, the following error has happened.
This error got in version 0.4.0 but it doesn't happen in 0.3.8.
Would you mind to check is this error?

Code

import SudachiPy

dictionary_obj = dictionary.Dictionary().create()

Error

BufferError: memoryview: underlying buffer is not writable

/aaa/python/lib/python3.5/site-packages/sudachipy/dictionary.py in __init__(self, config_path, resource_dir)
     35         self.dictionaries = []
     36         self.header = None
---> 37         self._read_system_dictionary(config.settings.system_dict_path())
     38 
     39         # self.edit_connection_plugin = [InhibitConnectionPlugin()]

/aaa/python/lib/python3.5/site-packages/sudachipy/dictionary.py in _read_system_dictionary(self, filename)
     64         if filename is None:
     65             raise AttributeError("system dictionary is not specified")
---> 66         dict_ = BinaryDictionary.from_system_dictionary(filename)
     67         self.dictionaries.append(dict_)
     68         self.grammar = dict_.grammar

/aaa/python/lib/python3.5/site-packages/sudachipy/dictionarylib/binarydictionary.py in from_system_dictionary(cls, filename)
     48     @classmethod
     49     def from_system_dictionary(cls, filename):
---> 50         args = cls._read_dictionary(filename)
     51         version = args[2].version
     52         if version != SYSTEM_DICT_VERSION:

/aaa/python/lib/python3.5/site-packages/sudachipy/dictionarylib/binarydictionary.py in _read_dictionary(filename, access)
     43             offset += grammar.get_storage_size()
     44 
---> 45         lexicon = DoubleArrayLexicon(bytes_, offset)
     46         return bytes_, grammar, header, lexicon
     47 

/aaa/python/lib/python3.5/site-packages/sudachipy/dictionarylib/doublearraylexicon.py in __init__(self, bytes_, offset)
     40 
     41         array = memoryview(bytes_)[offset:offset + size * 4]
---> 42         self.trie.set_array(array, size)
     43         offset += self.trie.total_size()
     44 

/aaa/python/lib/python3.5/site-packages/dartsclone/_dartsclone.cpython-35m-x86_64-linux-gnu.so in dartsclone._dartsclone.DoubleArray.set_array (dartsclone/_dartsclone.cpp:1898)()

/aaa/python/lib/python3.5/site-packages/dartsclone/_dartsclone.cpython-35m-x86_64-linux-gnu.so in View.MemoryView.memoryview_cwrapper (dartsclone/_dartsclone.cpp:10995)()

/aaa/python/lib/python3.5/site-packages/dartsclone/_dartsclone.cpython-35m-x86_64-linux-gnu.so in View.MemoryView.memoryview.__cinit__ (dartsclone/_dartsclone.cpp:7270)()

Environment

Python: 3.5
SudachiPy: 0.4.0
Dict_Core: https://object-storage.tyo2.conoha.io/v1/nc_2520839e1f9641b08211a5c85243124a/sudachi/SudachiDict_core-20190718.tar.gz

Slack invitation link is expired.

WorksApplications/Sudachi#112

A text hooks AttributeError: EOS is not connected to BOS

Hi, following text from wikipedia causes AttributeError.
It seems to be 100% reproducible. I used core dictionary.
(Excuse me to write with my wrapper code, I will take some time to remove unneeded lines if you want.)
Thanks.

lines = 'あと、国道7号相染こ線橋北交差点・終点に至る。\n\n起点より秋田厚生医療センター付近までは林間や水田地帯を通り、その後終点までは市街地を通る。\n\n\n\n\n\n\n\n\n\n\n\nローレンス・ハリス\n\nローレンス・ハリス（Lawrence Larry Harris 1954年8月4日- ）は、アメリカ合衆国の元NFL選手でバリトン歌手。\n\nオクラホマ州立大学でディフェンシブラインマンとしてプレーしていた彼は1976年のNFLドラフト7巡目でヒューストン・オイラーズに指名されて入団した。キャンプ中にバム・フィリップスヘッドコーチによってNFLリーディングラッシャーのアール・キャンベルをサポートするオフェンシブラインマンにコンバートされた。1977年にマイアミ・オレンジボウルでのマイアミ・ドルフィンズ戦で初出場したが、7年半のキャリアの大部分をインジャリー・リザーブで過ごした。その後バッファロー・ビルズ、USFLのボストン・ブレイカーズ、CFLのトロント・アルゴノーツでプレーした。1989年にプロフットボール選手としてのキャリアを終えた後オペラ歌手となった。\n\n\n\n\n支那の夜 (曲)\n\n「支那の夜」（しなのよる）は、日本映画『支那の夜』の主題歌。作詞:西條八十、作曲:竹岡信幸、編曲:奥山貞吉、初生歌手は:渡辺はま子。\n\n松平晃の歌った「上海航路」とのカップリングで1938年（昭和13年）12月に発売された。当初は全く売れなかったが、半年ぐらい経った頃から売れ出し、1年後には戦線の将兵の間でも大流行した。1940年（昭和15年）には李香蘭・長谷川一夫の主演で映画化（『支那の夜』）された。\n\n山本五十六はこの歌が好きで、聯合艦隊旗艦の長門では、軍楽隊によって演奏されたこともある。\n\n1940年（昭和15年）前後、ベトナム、タイ王国、インドネシア、**など、当時日本の占領地だったアジアでは、この歌が広く流行しており、そのため各国で映画が上映された。\n\nビルマ首相バー・モウは、この歌を「日本人の誰よりも上手く」唄ったという。\n\n中華民国上海市では、1942年（昭和17年）秋以降流行し、現地の人気歌手・姚莉によって、北京語のカバー曲（タイトル「春的夢」、作詞: ）が作られた。\n\nその影響で、1943年（昭和18年'

import json
from sudachipy import tokenizer
from sudachipy import dictionary
from sudachipy import config

class TokenizeBySudachi:
    def __init__(self, stop_words, normalize=True, mode=tokenizer.Tokenizer.SplitMode.B):
        with open(config.SETTINGFILE, "r", encoding="utf-8") as f:
            settings = json.load(f)
        self.tokenizer = dictionary.Dictionary(settings).create()
        self.stop_words = stop_words
        self.normalize = normalize
        self.mode = mode
    def _format(self, word):
        if word.isdigit():
            return '0'
        elif word in self.stop_words:
            return ''
        else:
            return word
    def tokenize(self, text):
        self.raw_tokens = self.tokenizer.tokenize(self.mode, text.strip())
        _tokens = [self._format(w.normalized_form() if self.normalize else w.surface())
                   for w in self.raw_tokens]
        self.tokens = [w for w in _tokens if w is not '']
        return self.tokens

def get_sudachi_tokenizer(stop_words=['\u3000'], normalize=True):
    return TokenizeBySudachi(stop_words=stop_words, normalize=normalize)

tokenizer = get_sudachi_tokenizer()
tokens = tokenizer.tokenize(lines)

build user dictionary failed with a long csv

Hi guys

I meet a issue when building a user dictionary with a little bit longer csv (224 lines), this will cause a exception as

reading the source file...224 words
writing the POS table...2 bytes
writing the connection matrix...4 bytes
building the trie.......Traceback (most recent call last):
  File "custom_dict.py", line 247, in <module>
    builder.build([my_dict_dir + '/my_dict.csv'], None, wf)
  File "/Users/allovince/.pyenv/versions/3.7.3/lib/python3.7/site-packages/sudachipy/dictionarylib/userdictionarybuilder.py", line 40, in build
    self.write_lexicon(out_stream)
  File "/Users/allovince/.pyenv/versions/3.7.3/lib/python3.7/site-packages/sudachipy/dictionarylib/dictionarybuilder.py", line 230, in write_lexicon
    trie.build(keys, vals, progress_func)
  File "/Users/allovince/.pyenv/versions/3.7.3/lib/python3.7/site-packages/sudachipy/dartsclone/doublearray.py", line 54, in build
    builder.build(key_set)
  File "/Users/allovince/.pyenv/versions/3.7.3/lib/python3.7/site-packages/sudachipy/dartsclone/doublearraybuilder.py", line 45, in build
    self.build_from_dawg_header(dawg_builder)
  File "/Users/allovince/.pyenv/versions/3.7.3/lib/python3.7/site-packages/sudachipy/dartsclone/doublearraybuilder.py", line 95, in build_from_dawg_header
    self.build_from_dawg_insert(dawg, dawg.root(), 0)
  File "/Users/allovince/.pyenv/versions/3.7.3/lib/python3.7/site-packages/sudachipy/dartsclone/doublearraybuilder.py", line 124, in build_from_dawg_insert
    self.build_from_dawg_insert(dawg, dawg_child_id, dic_child_id)
  File "/Users/allovince/.pyenv/versions/3.7.3/lib/python3.7/site-packages/sudachipy/dartsclone/doublearraybuilder.py", line 124, in build_from_dawg_insert
    self.build_from_dawg_insert(dawg, dawg_child_id, dic_child_id)
  File "/Users/allovince/.pyenv/versions/3.7.3/lib/python3.7/site-packages/sudachipy/dartsclone/doublearraybuilder.py", line 124, in build_from_dawg_insert
    self.build_from_dawg_insert(dawg, dawg_child_id, dic_child_id)
  [Previous line repeated 2 more times]
  File "/Users/allovince/.pyenv/versions/3.7.3/lib/python3.7/site-packages/sudachipy/dartsclone/doublearraybuilder.py", line 116, in build_from_dawg_insert
    offset = self.arrange_from_dawg(dawg, dawg_id, dic_id)
  File "/Users/allovince/.pyenv/versions/3.7.3/lib/python3.7/site-packages/sudachipy/dartsclone/doublearraybuilder.py", line 137, in arrange_from_dawg
    offset = self.find_valid_offset(dic_id)
  File "/Users/allovince/.pyenv/versions/3.7.3/lib/python3.7/site-packages/sudachipy/dartsclone/doublearraybuilder.py", line 260, in find_valid_offset
    raise RuntimeError(unfixed_id, memo)
RuntimeError: (311, [257, 258, 259, 260, 261, 262, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 285, 286, 287, 295, 298, 300, 301, 305, 308, 310, 311, 318, 319, 349, 355, 356, 357, 362, 364, 365, 366, 367, 370, 371, 372, 373, 377, 379, 382, 383, 428, 430, 431, 433, 436, 437, 440, 442, 445, 446, 447, 470, 471, 472, 474, 475, 476, 478, 479, 482, 484, 486, 487, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511])

Also here is the test code, you could save as file direct run it

from sudachipy import dictionarylib
from sudachipy.dictionarylib.userdictionarybuilder import UserDictionaryBuilder
import os
import time
from importlib import import_module
from pathlib import Path

dict_content = '''幸福人,5146,5146,8000,幸福人,名詞,固有名詞,一般,*,*,*,こうふくじん,幸福人,*,*,*,*,*
鶉坂,5146,5146,8000,鶉坂,名詞,固有名詞,一般,*,*,*,うずらざか,鶉坂,*,*,*,*,*
一車,5146,5146,8000,一車,名詞,固有名詞,一般,*,*,*,ひとくるま,一車,*,*,*,*,*
初客,5146,5146,8000,初客,名詞,固有名詞,一般,*,*,*,はつきゃく,初客,*,*,*,*,*
御番衆,5146,5146,8000,御番衆,名詞,固有名詞,一般,*,*,*,ごばんしゅう,御番衆,*,*,*,*,*
山桔梗,5146,5146,8000,山桔梗,名詞,固有名詞,一般,*,*,*,やまききょう,山桔梗,*,*,*,*,*
崖庭,5146,5146,8000,崖庭,名詞,固有名詞,一般,*,*,*,がけにわ,崖庭,*,*,*,*,*
愛縄堂,5146,5146,8000,愛縄堂,名詞,固有名詞,一般,*,*,*,あいじょうどう,愛縄堂,*,*,*,*,*
木額,5146,5146,8000,木額,名詞,固有名詞,一般,*,*,*,もくがく,木額,*,*,*,*,*
鮓箱,5146,5146,8000,鮓箱,名詞,固有名詞,一般,*,*,*,すしばこ,鮓箱,*,*,*,*,*
塙隼人,5146,5146,8000,塙隼人,名詞,固有名詞,一般,*,*,*,はなわはやと,塙隼人,*,*,*,*,*
情涙,5146,5146,8000,情涙,名詞,固有名詞,一般,*,*,*,じょうるい,情涙,*,*,*,*,*
大与力,5146,5146,8000,大与力,名詞,固有名詞,一般,*,*,*,だいよりき,大与力,*,*,*,*,*
役退,5146,5146,8000,役退,名詞,固有名詞,一般,*,*,*,やくび,役退,*,*,*,*,*
塙家,5146,5146,8000,塙家,名詞,固有名詞,一般,*,*,*,はなわけ,塙家,*,*,*,*,*
皎々,5146,5146,8000,皎々,名詞,固有名詞,一般,*,*,*,こうこう,皎々,*,*,*,*,*
三本錐,5146,5146,8000,三本錐,名詞,固有名詞,一般,*,*,*,さんぼんぎり,三本錐,*,*,*,*,*
御成道,5146,5146,8000,御成道,名詞,固有名詞,一般,*,*,*,おなりみち,御成道,*,*,*,*,*
寂地,5146,5146,8000,寂地,名詞,固有名詞,一般,*,*,*,じゃくち,寂地,*,*,*,*,*
眉深,5146,5146,8000,眉深,名詞,固有名詞,一般,*,*,*,まぶか,眉深,*,*,*,*,*
御霊廟,5146,5146,8000,御霊廟,名詞,固有名詞,一般,*,*,*,みたまや,御霊廟,*,*,*,*,*
死笑靨,5146,5146,8000,死笑靨,名詞,固有名詞,一般,*,*,*,しにえくぼ,死笑靨,*,*,*,*,*
笑靨,5146,5146,8000,笑靨,名詞,固有名詞,一般,*,*,*,えくぼ,笑靨,*,*,*,*,*
鮫柄,5146,5146,8000,鮫柄,名詞,固有名詞,一般,*,*,*,さめづか,鮫柄,*,*,*,*,*
麗玉,5146,5146,8000,麗玉,名詞,固有名詞,一般,*,*,*,れいぎょく,麗玉,*,*,*,*,*
専役,5146,5146,8000,専役,名詞,固有名詞,一般,*,*,*,せんやく,専役,*,*,*,*,*
幾歳,5146,5146,8000,幾歳,名詞,固有名詞,一般,*,*,*,いくつ,幾歳,*,*,*,*,*
石責,5146,5146,8000,石責,名詞,固有名詞,一般,*,*,*,いしぜ,石責,*,*,*,*,*
斑痕,5146,5146,8000,斑痕,名詞,固有名詞,一般,*,*,*,はんこん,斑痕,*,*,*,*,*
蟇然,5146,5146,8000,蟇然,名詞,固有名詞,一般,*,*,*,がまぜん,蟇然,*,*,*,*,*
偽聾,5146,5146,8000,偽聾,名詞,固有名詞,一般,*,*,*,にせつんぼ,偽聾,*,*,*,*,*
塙江漢,5146,5146,8000,塙江漢,名詞,固有名詞,一般,*,*,*,はなわこうかん,塙江漢,*,*,*,*,*
公暇,5146,5146,8000,公暇,名詞,固有名詞,一般,*,*,*,こうか,公暇,*,*,*,*,*
何人,5146,5146,8000,何人,名詞,固有名詞,一般,*,*,*,なんぴと,何人,*,*,*,*,*
谷文晁,5146,5146,8000,谷文晁,名詞,固有名詞,一般,*,*,*,たにぶんちょう,谷文晁,*,*,*,*,*
酒菰,5146,5146,8000,酒菰,名詞,固有名詞,一般,*,*,*,さかごも,酒菰,*,*,*,*,*
手抗,5146,5146,8000,手抗,名詞,固有名詞,一般,*,*,*,てむか,手抗,*,*,*,*,*
自鳴鐘,5146,5146,8000,自鳴鐘,名詞,固有名詞,一般,*,*,*,とけい,自鳴鐘,*,*,*,*,*
牢鞘,5146,5146,8000,牢鞘,名詞,固有名詞,一般,*,*,*,ろうざや,牢鞘,*,*,*,*,*
夜雲雀,5146,5146,8000,夜雲雀,名詞,固有名詞,一般,*,*,*,よひばり,夜雲雀,*,*,*,*,*
町駕,5146,5146,8000,町駕,名詞,固有名詞,一般,*,*,*,まちかご,町駕,*,*,*,*,*
大目的,5146,5146,8000,大目的,名詞,固有名詞,一般,*,*,*,おおあて,大目的,*,*,*,*,*
鈍々,5146,5146,8000,鈍々,名詞,固有名詞,一般,*,*,*,どんどん,鈍々,*,*,*,*,*
眼鑑,5146,5146,8000,眼鑑,名詞,固有名詞,一般,*,*,*,めがね,眼鑑,*,*,*,*,*
京普請,5146,5146,8000,京普請,名詞,固有名詞,一般,*,*,*,きょうぶしん,京普請,*,*,*,*,*
細軸,5146,5146,8000,細軸,名詞,固有名詞,一般,*,*,*,ほそもの,細軸,*,*,*,*,*
書額,5146,5146,8000,書額,名詞,固有名詞,一般,*,*,*,しょがく,書額,*,*,*,*,*
客卓,5146,5146,8000,客卓,名詞,固有名詞,一般,*,*,*,きゃくたく,客卓,*,*,*,*,*
冷蔑,5146,5146,8000,冷蔑,名詞,固有名詞,一般,*,*,*,れいべつ,冷蔑,*,*,*,*,*
眉眼,5146,5146,8000,眉眼,名詞,固有名詞,一般,*,*,*,びがん,眉眼,*,*,*,*,*
図取,5146,5146,8000,図取,名詞,固有名詞,一般,*,*,*,ずどり,図取,*,*,*,*,*
一眄,5146,5146,8000,一眄,名詞,固有名詞,一般,*,*,*,いちべん,一眄,*,*,*,*,*
女主人,5146,5146,8000,女主人,名詞,固有名詞,一般,*,*,*,おんなあるじ,女主人,*,*,*,*,*
町人体,5146,5146,8000,町人体,名詞,固有名詞,一般,*,*,*,ちょうにんてい,町人体,*,*,*,*,*
此家,5146,5146,8000,此家,名詞,固有名詞,一般,*,*,*,ここ,此家,*,*,*,*,*
切銘,5146,5146,8000,切銘,名詞,固有名詞,一般,*,*,*,きりめい,切銘,*,*,*,*,*
其許,5146,5146,8000,其許,名詞,固有名詞,一般,*,*,*,そこもと,其許,*,*,*,*,*
年扶持,5146,5146,8000,年扶持,名詞,固有名詞,一般,*,*,*,ねんぶち,年扶持,*,*,*,*,*
万代,5146,5146,8000,万代,名詞,固有名詞,一般,*,*,*,ばんだい,万代,*,*,*,*,*
草埃,5146,5146,8000,草埃,名詞,固有名詞,一般,*,*,*,くさぼこり,草埃,*,*,*,*,*
唖男,5146,5146,8000,唖男,名詞,固有名詞,一般,*,*,*,おしおとこ,唖男,*,*,*,*,*
庭垣,5146,5146,8000,庭垣,名詞,固有名詞,一般,*,*,*,にわがき,庭垣,*,*,*,*,*
惚々,5146,5146,8000,惚々,名詞,固有名詞,一般,*,*,*,ほれぼれ,惚々,*,*,*,*,*
塙郁次郎,5146,5146,8000,塙郁次郎,名詞,固有名詞,一般,*,*,*,はなわいくじろう,塙郁次郎,*,*,*,*,*
火除地,5146,5146,8000,火除地,名詞,固有名詞,一般,*,*,*,ひよけち,火除地,*,*,*,*,*
鳩渓,5146,5146,8000,鳩渓,名詞,固有名詞,一般,*,*,*,きゅうけい,鳩渓,*,*,*,*,*
蘭人,5146,5146,8000,蘭人,名詞,固有名詞,一般,*,*,*,らんじん,蘭人,*,*,*,*,*
逆磔,5146,5146,8000,逆磔,名詞,固有名詞,一般,*,*,*,さかさはりつけ,逆磔,*,*,*,*,*
手欄,5146,5146,8000,手欄,名詞,固有名詞,一般,*,*,*,てすり,手欄,*,*,*,*,*
掌脂,5146,5146,8000,掌脂,名詞,固有名詞,一般,*,*,*,てあぶら,掌脂,*,*,*,*,*
藍弁慶,5146,5146,8000,藍弁慶,名詞,固有名詞,一般,*,*,*,あいべんけい,藍弁慶,*,*,*,*,*
手裡,5146,5146,8000,手裡,名詞,固有名詞,一般,*,*,*,しゅり,手裡,*,*,*,*,*
召捕,5146,5146,8000,召捕,名詞,固有名詞,一般,*,*,*,やっ,召捕,*,*,*,*,*
取做,5146,5146,8000,取做,名詞,固有名詞,一般,*,*,*,とりな,取做,*,*,*,*,*
細縞,5146,5146,8000,細縞,名詞,固有名詞,一般,*,*,*,ほそじま,細縞,*,*,*,*,*
其方,5146,5146,8000,其方,名詞,固有名詞,一般,*,*,*,そのほう,其方,*,*,*,*,*
組為替,5146,5146,8000,組為替,名詞,固有名詞,一般,*,*,*,くみがわせ,組為替,*,*,*,*,*
暗灰色,5146,5146,8000,暗灰色,名詞,固有名詞,一般,*,*,*,あんかいしょく,暗灰色,*,*,*,*,*
真夜半,5146,5146,8000,真夜半,名詞,固有名詞,一般,*,*,*,まよなか,真夜半,*,*,*,*,*
法縄,5146,5146,8000,法縄,名詞,固有名詞,一般,*,*,*,ほうじょう,法縄,*,*,*,*,*
征悪,5146,5146,8000,征悪,名詞,固有名詞,一般,*,*,*,せいあく,征悪,*,*,*,*,*
帯際,5146,5146,8000,帯際,名詞,固有名詞,一般,*,*,*,おびぎわ,帯際,*,*,*,*,*
仄白,5146,5146,8000,仄白,名詞,固有名詞,一般,*,*,*,ほのじろ,仄白,*,*,*,*,*
好捕手,5146,5146,8000,好捕手,名詞,固有名詞,一般,*,*,*,こうとりて,好捕手,*,*,*,*,*
富武家,5146,5146,8000,富武家,名詞,固有名詞,一般,*,*,*,とみたけけ,富武家,*,*,*,*,*
仄暗,5146,5146,8000,仄暗,名詞,固有名詞,一般,*,*,*,ほのぐら,仄暗,*,*,*,*,*
小襖,5146,5146,8000,小襖,名詞,固有名詞,一般,*,*,*,こぶすま,小襖,*,*,*,*,*
塙氏,5146,5146,8000,塙氏,名詞,固有名詞,一般,*,*,*,はなわうじ,塙氏,*,*,*,*,*
人無,5146,5146,8000,人無,名詞,固有名詞,一般,*,*,*,ひとな,人無,*,*,*,*,*
底泥土,5146,5146,8000,底泥土,名詞,固有名詞,一般,*,*,*,そこどろ,底泥土,*,*,*,*,*
一襲,5146,5146,8000,一襲,名詞,固有名詞,一般,*,*,*,ひとかさ,一襲,*,*,*,*,*
宿端,5146,5146,8000,宿端,名詞,固有名詞,一般,*,*,*,しゅくはず,宿端,*,*,*,*,*
石餅屋,5146,5146,8000,石餅屋,名詞,固有名詞,一般,*,*,*,いしもちや,石餅屋,*,*,*,*,*
量見,5146,5146,8000,量見,名詞,固有名詞,一般,*,*,*,りょうけん,量見,*,*,*,*,*
桝酒,5146,5146,8000,桝酒,名詞,固有名詞,一般,*,*,*,ますざけ,桝酒,*,*,*,*,*
駕舁,5146,5146,8000,駕舁,名詞,固有名詞,一般,*,*,*,かごかき,駕舁,*,*,*,*,*
宿触,5146,5146,8000,宿触,名詞,固有名詞,一般,*,*,*,しゅくぶ,宿触,*,*,*,*,*
早飛脚,5146,5146,8000,早飛脚,名詞,固有名詞,一般,*,*,*,はや,早飛脚,*,*,*,*,*
電瞬,5146,5146,8000,電瞬,名詞,固有名詞,一般,*,*,*,でんしゅん,電瞬,*,*,*,*,*
古鈴,5146,5146,8000,古鈴,名詞,固有名詞,一般,*,*,*,これい,古鈴,*,*,*,*,*
七歳,5146,5146,8000,七歳,名詞,固有名詞,一般,*,*,*,ななつ,七歳,*,*,*,*,*
一羽雁,5146,5146,8000,一羽雁,名詞,固有名詞,一般,*,*,*,ひとはがり,一羽雁,*,*,*,*,*
名捕手,5146,5146,8000,名捕手,名詞,固有名詞,一般,*,*,*,めいとりて,名捕手,*,*,*,*,*
含月荘,5146,5146,8000,含月荘,名詞,固有名詞,一般,*,*,*,がんげつそう,含月荘,*,*,*,*,*
腕頸,5146,5146,8000,腕頸,名詞,固有名詞,一般,*,*,*,うでくび,腕頸,*,*,*,*,*
一挺,5146,5146,8000,一挺,名詞,固有名詞,一般,*,*,*,いっちょう,一挺,*,*,*,*,*
紺合羽,5146,5146,8000,紺合羽,名詞,固有名詞,一般,*,*,*,こんがっぱ,紺合羽,*,*,*,*,*
新駕,5146,5146,8000,新駕,名詞,固有名詞,一般,*,*,*,あらかご,新駕,*,*,*,*,*
小筥,5146,5146,8000,小筥,名詞,固有名詞,一般,*,*,*,こばこ,小筥,*,*,*,*,*
燧打石,5146,5146,8000,燧打石,名詞,固有名詞,一般,*,*,*,ひうちいし,燧打石,*,*,*,*,*
御番士方,5146,5146,8000,御番士方,名詞,固有名詞,一般,*,*,*,ごばんしがた,御番士方,*,*,*,*,*
香筥,5146,5146,8000,香筥,名詞,固有名詞,一般,*,*,*,こうばこ,香筥,*,*,*,*,*
思判,5146,5146,8000,思判,名詞,固有名詞,一般,*,*,*,しはん,思判,*,*,*,*,*
居辣,5146,5146,8000,居辣,名詞,固有名詞,一般,*,*,*,いすく,居辣,*,*,*,*,*
山詰,5146,5146,8000,山詰,名詞,固有名詞,一般,*,*,*,やまづ,山詰,*,*,*,*,*
短槍,5146,5146,8000,短槍,名詞,固有名詞,一般,*,*,*,たんそう,短槍,*,*,*,*,*
七刻,5146,5146,8000,七刻,名詞,固有名詞,一般,*,*,*,ななつ,七刻,*,*,*,*,*
前黄門公,5146,5146,8000,前黄門公,名詞,固有名詞,一般,*,*,*,さきのこうもんこう,前黄門公,*,*,*,*,*
革袴,5146,5146,8000,革袴,名詞,固有名詞,一般,*,*,*,かわばかま,革袴,*,*,*,*,*
其女,5146,5146,8000,其女,名詞,固有名詞,一般,*,*,*,そなた,其女,*,*,*,*,*
孤寂,5146,5146,8000,孤寂,名詞,固有名詞,一般,*,*,*,こじゃく,孤寂,*,*,*,*,*
蘭之助,5146,5146,8000,蘭之助,名詞,固有名詞,一般,*,*,*,らんのすけ,蘭之助,*,*,*,*,*
金行燈,5146,5146,8000,金行燈,名詞,固有名詞,一般,*,*,*,かなあんどう,金行燈,*,*,*,*,*
稀々,5146,5146,8000,稀々,名詞,固有名詞,一般,*,*,*,たまたま,稀々,*,*,*,*,*
含月,5146,5146,8000,含月,名詞,固有名詞,一般,*,*,*,がんげつ,含月,*,*,*,*,*
甲走,5146,5146,8000,甲走,名詞,固有名詞,一般,*,*,*,かんばし,甲走,*,*,*,*,*
中山越,5146,5146,8000,中山越,名詞,固有名詞,一般,*,*,*,なかやまごえ,中山越,*,*,*,*,*
人焼竈,5146,5146,8000,人焼竈,名詞,固有名詞,一般,*,*,*,ひとやきがま,人焼竈,*,*,*,*,*
一撃,5146,5146,8000,一撃,名詞,固有名詞,一般,*,*,*,ひとう,一撃,*,*,*,*,*
革襷,5146,5146,8000,革襷,名詞,固有名詞,一般,*,*,*,かわだすき,革襷,*,*,*,*,*
大破綻,5146,5146,8000,大破綻,名詞,固有名詞,一般,*,*,*,おおごと,大破綻,*,*,*,*,*
細智,5146,5146,8000,細智,名詞,固有名詞,一般,*,*,*,さいち,細智,*,*,*,*,*
楢薪,5146,5146,8000,楢薪,名詞,固有名詞,一般,*,*,*,ならまき,楢薪,*,*,*,*,*
竈場,5146,5146,8000,竈場,名詞,固有名詞,一般,*,*,*,かまば,竈場,*,*,*,*,*
一箇,5146,5146,8000,一箇,名詞,固有名詞,一般,*,*,*,ひとつ,一箇,*,*,*,*,*
土蓋,5146,5146,8000,土蓋,名詞,固有名詞,一般,*,*,*,どぶた,土蓋,*,*,*,*,*
六刻,5146,5146,8000,六刻,名詞,固有名詞,一般,*,*,*,むつ,六刻,*,*,*,*,*
竈開,5146,5146,8000,竈開,名詞,固有名詞,一般,*,*,*,かまあ,竈開,*,*,*,*,*
燐木,5146,5146,8000,燐木,名詞,固有名詞,一般,*,*,*,つけぎ,燐木,*,*,*,*,*
竈肌,5146,5146,8000,竈肌,名詞,固有名詞,一般,*,*,*,かまはだ,竈肌,*,*,*,*,*
合客,5146,5146,8000,合客,名詞,固有名詞,一般,*,*,*,あいきゃく,合客,*,*,*,*,*
炉部屋,5146,5146,8000,炉部屋,名詞,固有名詞,一般,*,*,*,ろべや,炉部屋,*,*,*,*,*
閾際,5146,5146,8000,閾際,名詞,固有名詞,一般,*,*,*,しきいぎわ,閾際,*,*,*,*,*
丑満,5146,5146,8000,丑満,名詞,固有名詞,一般,*,*,*,うしみつ,丑満,*,*,*,*,*
手功焦,5146,5146,8000,手功焦,名詞,固有名詞,一般,*,*,*,てがらあせ,手功焦,*,*,*,*,*
父愛,5146,5146,8000,父愛,名詞,固有名詞,一般,*,*,*,ふあい,父愛,*,*,*,*,*
不音,5146,5146,8000,不音,名詞,固有名詞,一般,*,*,*,ぶいん,不音,*,*,*,*,*
一間,5146,5146,8000,一間,名詞,固有名詞,一般,*,*,*,ひとま,一間,*,*,*,*,*
表書院,5146,5146,8000,表書院,名詞,固有名詞,一般,*,*,*,おもてしょいん,表書院,*,*,*,*,*
相成,5146,5146,8000,相成,名詞,固有名詞,一般,*,*,*,あいな,相成,*,*,*,*,*
父情,5146,5146,8000,父情,名詞,固有名詞,一般,*,*,*,ふじょう,父情,*,*,*,*,*
獄土,5146,5146,8000,獄土,名詞,固有名詞,一般,*,*,*,ごくど,獄土,*,*,*,*,*
一縛,5146,5146,8000,一縛,名詞,固有名詞,一般,*,*,*,ひとから,一縛,*,*,*,*,*
藪牢,5146,5146,8000,藪牢,名詞,固有名詞,一般,*,*,*,やぶろう,藪牢,*,*,*,*,*
懸合,5146,5146,8000,懸合,名詞,固有名詞,一般,*,*,*,かけあ,懸合,*,*,*,*,*
我説,5146,5146,8000,我説,名詞,固有名詞,一般,*,*,*,がせつ,我説,*,*,*,*,*
前差,5146,5146,8000,前差,名詞,固有名詞,一般,*,*,*,まえざし,前差,*,*,*,*,*
諸倒,5146,5146,8000,諸倒,名詞,固有名詞,一般,*,*,*,もろだお,諸倒,*,*,*,*,*
巧々,5146,5146,8000,巧々,名詞,固有名詞,一般,*,*,*,うまうま,巧々,*,*,*,*,*
見限,5146,5146,8000,見限,名詞,固有名詞,一般,*,*,*,みき,見限,*,*,*,*,*
舟辰,5146,5146,8000,舟辰,名詞,固有名詞,一般,*,*,*,ふなたつ,舟辰,*,*,*,*,*
夜魔,5146,5146,8000,夜魔,名詞,固有名詞,一般,*,*,*,よま,夜魔,*,*,*,*,*
塗鞘,5146,5146,8000,塗鞘,名詞,固有名詞,一般,*,*,*,ぬりざや,塗鞘,*,*,*,*,*
夜靄,5146,5146,8000,夜靄,名詞,固有名詞,一般,*,*,*,よもや,夜靄,*,*,*,*,*
櫓韻,5146,5146,8000,櫓韻,名詞,固有名詞,一般,*,*,*,ろいん,櫓韻,*,*,*,*,*
宗服,5146,5146,8000,宗服,名詞,固有名詞,一般,*,*,*,しゅうふく,宗服,*,*,*,*,*
名与力,5146,5146,8000,名与力,名詞,固有名詞,一般,*,*,*,めいよりき,名与力,*,*,*,*,*
中川尻,5146,5146,8000,中川尻,名詞,固有名詞,一般,*,*,*,なかがわじり,中川尻,*,*,*,*,*
槙町,5146,5146,8000,槙町,名詞,固有名詞,一般,*,*,*,まきちょう,槙町,*,*,*,*,*
訴文,5146,5146,8000,訴文,名詞,固有名詞,一般,*,*,*,そぶん,訴文,*,*,*,*,*
大捕手,5146,5146,8000,大捕手,名詞,固有名詞,一般,*,*,*,おおとりて,大捕手,*,*,*,*,*
苦茗,5146,5146,8000,苦茗,名詞,固有名詞,一般,*,*,*,くめい,苦茗,*,*,*,*,*
老腹,5146,5146,8000,老腹,名詞,固有名詞,一般,*,*,*,おいばら,老腹,*,*,*,*,*
一分,5146,5146,8000,一分,名詞,固有名詞,一般,*,*,*,いちぶん,一分,*,*,*,*,*
何艘,5146,5146,8000,何艘,名詞,固有名詞,一般,*,*,*,なんばい,何艘,*,*,*,*,*
血脂,5146,5146,8000,血脂,名詞,固有名詞,一般,*,*,*,ちあぶら,血脂,*,*,*,*,*
四尋,5146,5146,8000,四尋,名詞,固有名詞,一般,*,*,*,よひろ,四尋,*,*,*,*,*
迅舟,5146,5146,8000,迅舟,名詞,固有名詞,一般,*,*,*,はやぶね,迅舟,*,*,*,*,*
鱚舟,5146,5146,8000,鱚舟,名詞,固有名詞,一般,*,*,*,きすぶね,鱚舟,*,*,*,*,*
烏爪,5146,5146,8000,烏爪,名詞,固有名詞,一般,*,*,*,からすづめ,烏爪,*,*,*,*,*
女橋,5146,5146,8000,女橋,名詞,固有名詞,一般,*,*,*,おんなばし,女橋,*,*,*,*,*
白紫陽花,5146,5146,8000,白紫陽花,名詞,固有名詞,一般,*,*,*,しろあじさい,白紫陽花,*,*,*,*,*
舅父様,5146,5146,8000,舅父様,名詞,固有名詞,一般,*,*,*,とうさま,舅父様,*,*,*,*,*
舅父,5146,5146,8000,舅父,名詞,固有名詞,一般,*,*,*,とう,舅父,*,*,*,*,*
手裡剣,5146,5146,8000,手裡剣,名詞,固有名詞,一般,*,*,*,しゅりけん,手裡剣,*,*,*,*,*
扱帯,5146,5146,8000,扱帯,名詞,固有名詞,一般,*,*,*,しごき,扱帯,*,*,*,*,*
犯蹟,5146,5146,8000,犯蹟,名詞,固有名詞,一般,*,*,*,はんせき,犯蹟,*,*,*,*,*
前黄門,5146,5146,8000,前黄門,名詞,固有名詞,一般,*,*,*,さきのこうもん,前黄門,*,*,*,*,*
御内,5146,5146,8000,御内,名詞,固有名詞,一般,*,*,*,おんうち,御内,*,*,*,*,*
此寺,5146,5146,8000,此寺,名詞,固有名詞,一般,*,*,*,ここ,此寺,*,*,*,*,*
一期,5146,5146,8000,一期,名詞,固有名詞,一般,*,*,*,いちご,一期,*,*,*,*,*
観破,5146,5146,8000,観破,名詞,固有名詞,一般,*,*,*,みやぶ,観破,*,*,*,*,*
直手,5146,5146,8000,直手,名詞,固有名詞,一般,*,*,*,じきしゅ,直手,*,*,*,*,*
無役者,5146,5146,8000,無役者,名詞,固有名詞,一般,*,*,*,むやくもの,無役者,*,*,*,*,*
喝声,5146,5146,8000,喝声,名詞,固有名詞,一般,*,*,*,かっせい,喝声,*,*,*,*,*
山扮装,5146,5146,8000,山扮装,名詞,固有名詞,一般,*,*,*,やまいでたち,山扮装,*,*,*,*,*
霧谷,5146,5146,8000,霧谷,名詞,固有名詞,一般,*,*,*,きりだに,霧谷,*,*,*,*,*
岩牢,5146,5146,8000,岩牢,名詞,固有名詞,一般,*,*,*,いわろう,岩牢,*,*,*,*,*
渓川,5146,5146,8000,渓川,名詞,固有名詞,一般,*,*,*,たにがわ,渓川,*,*,*,*,*
蒸殺,5146,5146,8000,蒸殺,名詞,固有名詞,一般,*,*,*,むしごろ,蒸殺,*,*,*,*,*
薪小屋,5146,5146,8000,薪小屋,名詞,固有名詞,一般,*,*,*,たきぎごや,薪小屋,*,*,*,*,*
身振,5146,5146,8000,身振,名詞,固有名詞,一般,*,*,*,みぶり,身振,*,*,*,*,*
片唾,5146,5146,8000,片唾,名詞,固有名詞,一般,*,*,*,かたず,片唾,*,*,*,*,*
重粥,5146,5146,8000,重粥,名詞,固有名詞,一般,*,*,*,おもゆ,重粥,*,*,*,*,*
暮六刻,5146,5146,8000,暮六刻,名詞,固有名詞,一般,*,*,*,くれむつ,暮六刻,*,*,*,*,*
七刻仕舞,5146,5146,8000,七刻仕舞,名詞,固有名詞,一般,*,*,*,ななつじま,七刻仕舞,*,*,*,*,*
問詰,5146,5146,8000,問詰,名詞,固有名詞,一般,*,*,*,なじ,問詰,*,*,*,*,*
頬綿,5146,5146,8000,頬綿,名詞,固有名詞,一般,*,*,*,ほおわた,頬綿,*,*,*,*,*
黒龍紋,5146,5146,8000,黒龍紋,名詞,固有名詞,一般,*,*,*,くろりゅうもん,黒龍紋,*,*,*,*,*
身扮,5146,5146,8000,身扮,名詞,固有名詞,一般,*,*,*,みなり,身扮,*,*,*,*,*
恐察,5146,5146,8000,恐察,名詞,固有名詞,一般,*,*,*,きょうさつ,恐察,*,*,*,*,*
周防守,5146,5146,8000,周防守,名詞,固有名詞,一般,*,*,*,すおうのかみ,周防守,*,*,*,*,*
正腹,5146,5146,8000,正腹,名詞,固有名詞,一般,*,*,*,しょうふく,正腹,*,*,*,*,*
鋲乗物,5146,5146,8000,鋲乗物,名詞,固有名詞,一般,*,*,*,びょうのりもの,鋲乗物,*,*,*,*,*
二更,5146,5146,8000,二更,名詞,固有名詞,一般,*,*,*,にこう,二更,*,*,*,*,*
大吟味,5146,5146,8000,大吟味,名詞,固有名詞,一般,*,*,*,だいぎんみ,大吟味,*,*,*,*,*
凛絶,5146,5146,8000,凛絶,名詞,固有名詞,一般,*,*,*,りんぜつ,凛絶,*,*,*,*,*
目企,5146,5146,8000,目企,名詞,固有名詞,一般,*,*,*,もくろ,目企,*,*,*,*,*
世囈言,5146,5146,8000,世囈言,名詞,固有名詞,一般,*,*,*,よまいごと,世囈言,*,*,*,*,*
偽落胤,5146,5146,8000,偽落胤,名詞,固有名詞,一般,*,*,*,にせらくいん,偽落胤,*,*,*,*,*
干乾,5146,5146,8000,干乾,名詞,固有名詞,一般,*,*,*,ひから,干乾,*,*,*,*,*
愛涙燦爛,5146,5146,8000,愛涙燦爛,名詞,固有名詞,一般,*,*,*,あいるいさんらん,愛涙燦爛,*,*,*,*,*
一献,5146,5146,8000,一献,名詞,固有名詞,一般,*,*,*,いっこん,一献,*,*,*,*,*
一荷,5146,5146,8000,一荷,名詞,固有名詞,一般,*,*,*,いっか,一荷,*,*,*,*,*
'''

my_dict_dir = os.path.abspath(os.path.dirname(__file__))
system_dict_path = Path(import_module('sudachidict').__file__).parent
system_dict_path = os.path.join(system_dict_path, 'resources', 'system.dic')
system_dict = dictionarylib.BinaryDictionary.from_system_dictionary(system_dict_path)
with open(my_dict_dir + '/my_dict.csv', 'w', encoding='utf-8') as wf:
    wf.write(dict_content)

header = dictionarylib.dictionaryheader.DictionaryHeader(
    dictionarylib.USER_DICT_VERSION_2, int(time.time()), 'my_dict')
with open(my_dict_dir + '/my_dict.dic', 'wb') as wf:
    wf.write(header.to_bytes())
    builder = UserDictionaryBuilder(system_dict.grammar,
                                    system_dict.lexicon, )
    builder.build([my_dict_dir + '/my_dict.csv'], None, wf)