depccg

Codebase for A* CCG Parsing with a Supertag and Dependency Factored Model

Requirements

Python (Either 2 or 3)
Chainer (newer versions)
Cython
A C++ compiler supporting C++11 standard
OpenMP (optional)
CMake

Build

if you have not installed Chainer or Cython, do pip install chainer cython. Then,

mkdir build
cd build
cmake ..
# In pyenv environment, you may need to pass the path to libpython.so explicitly.
# cmake -DPYTHON_LIBRARY=$HOME/.pyenv/versions/3.6.1/lib/libpython3.so ..
make

Pretrained models

Pretrained models are available:

English (189M)
Japanese (56M)

Running parser

Having successfully built the sources, you'll see depccg.so in build/src directory. In python,

from depccg import PyAStarParser
model = "/path/to/model/directory"
parser = PyAStarParser(model)
res = parser.parse("this is a test sentence .")
# print res.deriv
#  this      is         a     test   sentence  . 
#   NP   ((S\NP)/NP)  (NP/N)  (N/N)     N      . 
#                            ----------------->
#                                  N ->
#                    ------------------------->
#                              NP ->
#       -------------------------------------->
#                     (S\NP) ->
# --------------------------------------------<
#                     S ->
# -----------------------------------------------<rp>
#                      S ->

# parser.parse_doc performs A* search in threads (using OpenMP), which is highly efficient. 
res = praser.parse_doc(sents) # sents: list of (python2: unicode, 3: str)
for tree in res:
    print tree.deriv

For Japanese CCG parsing, use depccg.PyJaAStarParser, which has the exactly same interface.
Note that the Japanese parser accepts pre-tokenized sentences as input.

src/run.py implements example running code. Please refer to it for the detailed usage of the parser.

Training model

TODO

$ python -m py.lstm_parser_bi create
usage: CCG parser's LSTM supertag tagger create [-h]
                                                [--cat-freq-cut CAT_FREQ_CUT]
                                                [--word-freq-cut WORD_FREQ_CUT]
                                                [--afix-freq-cut AFIX_FREQ_CUT]
                                                [--subset {train,test,dev,all}]
                                                [--mode {train,test}]
                                                path out

$ python -m py.lstm_parser_bi train
usage: CCG parser's LSTM supertag tagger train [-h] [--gpu GPU]
                                               [--tritrain TRITRAIN]
                                               [--tri-weight TRI_WEIGHT]
                                               [--batchsize BATCHSIZE]
                                               [--epoch EPOCH]
                                               [--word-emb-size WORD_EMB_SIZE]
                                               [--afix-emb-size AFIX_EMB_SIZE]
                                               [--nlayers NLAYERS]
                                               [--hidden-dim HIDDEN_DIM]
                                               [--dep-dim DEP_DIM]
                                               [--dropout-ratio DROPOUT_RATIO]
                                               [--initmodel INITMODEL]
                                               [--pretrained PRETRAINED]
                                               model train val

We make tri-training dataset publicly available: English Tri-training Dataset (309M)

Evaluation

You can evaluate the performance of a supertagger with src/py/eval_tagger.py:

$ python eval_tagger.py 
usage: evaluate lstm tagger [-h] [--save SAVE] model defs_dir test_data

For the evaluation in CCG-based dependencies, please use evaluation scripts in EasyCCG and C&C.

Citation

If you make use of this software, please cite the following:

@inproceedings{yoshikawa:2017acl,
  author={Yoshikawa, Masashi and Noji, Hiroshi and Matsumoto, Yuji},
  title={A* CCG Parsing with a Supertag and Dependency Factored Model},
  booktitle={Proc. ACL},
  year=2017,
}

Licence

MIT Licence

Contact

For questions and usage issues, please contact [email protected] .

Acknowledgement

In creating the parser, I owe very much to:

EasyCCG: from which I learned everything
NLTK: for nice pretty printing for parse derivation

texttheater / depccg Goto Github PK

depccg's Introduction

depccg

Requirements

Build

Pretrained models

Running parser

Training model

Evaluation

Citation

Licence

Contact

Acknowledgement

depccg's People

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent