Giter Club home page Giter Club logo

depccg's Introduction

depccg

Codebase for A* CCG Parsing with a Supertag and Dependency Factored Model

Requirements

Build

if you have not installed Chainer or Cython, do pip install chainer cython. Then,

mkdir build
cd build
cmake ..
# In pyenv environment, you may need to pass the path to libpython.so explicitly.
# cmake -DPYTHON_LIBRARY=$HOME/.pyenv/versions/3.6.1/lib/libpython3.so ..
make

Pretrained models

Pretrained models are available:

Running parser

Having successfully built the sources, you'll see depccg.so in build/src directory. In python,

from depccg import PyAStarParser
model = "/path/to/model/directory"
parser = PyAStarParser(model)
res = parser.parse("this is a test sentence .")
# print res.deriv
#  this      is         a     test   sentence  . 
#   NP   ((S\NP)/NP)  (NP/N)  (N/N)     N      . 
#                            ----------------->
#                                  N ->
#                    ------------------------->
#                              NP ->
#       -------------------------------------->
#                     (S\NP) ->
# --------------------------------------------<
#                     S ->
# -----------------------------------------------<rp>
#                      S ->

# parser.parse_doc performs A* search in threads (using OpenMP), which is highly efficient. 
res = praser.parse_doc(sents) # sents: list of (python2: unicode, 3: str)
for tree in res:
    print tree.deriv

For Japanese CCG parsing, use depccg.PyJaAStarParser, which has the exactly same interface.
Note that the Japanese parser accepts pre-tokenized sentences as input.

src/run.py implements example running code. Please refer to it for the detailed usage of the parser.

Training model

TODO

$ python -m py.lstm_parser_bi create
usage: CCG parser's LSTM supertag tagger create [-h]
                                                [--cat-freq-cut CAT_FREQ_CUT]
                                                [--word-freq-cut WORD_FREQ_CUT]
                                                [--afix-freq-cut AFIX_FREQ_CUT]
                                                [--subset {train,test,dev,all}]
                                                [--mode {train,test}]
                                                path out
$ python -m py.lstm_parser_bi train
usage: CCG parser's LSTM supertag tagger train [-h] [--gpu GPU]
                                               [--tritrain TRITRAIN]
                                               [--tri-weight TRI_WEIGHT]
                                               [--batchsize BATCHSIZE]
                                               [--epoch EPOCH]
                                               [--word-emb-size WORD_EMB_SIZE]
                                               [--afix-emb-size AFIX_EMB_SIZE]
                                               [--nlayers NLAYERS]
                                               [--hidden-dim HIDDEN_DIM]
                                               [--dep-dim DEP_DIM]
                                               [--dropout-ratio DROPOUT_RATIO]
                                               [--initmodel INITMODEL]
                                               [--pretrained PRETRAINED]
                                               model train val

We make tri-training dataset publicly available: English Tri-training Dataset (309M)

Evaluation

You can evaluate the performance of a supertagger with src/py/eval_tagger.py:

$ python eval_tagger.py 
usage: evaluate lstm tagger [-h] [--save SAVE] model defs_dir test_data

For the evaluation in CCG-based dependencies, please use evaluation scripts in EasyCCG and C&C.

Citation

If you make use of this software, please cite the following:

@inproceedings{yoshikawa:2017acl,
  author={Yoshikawa, Masashi and Noji, Hiroshi and Matsumoto, Yuji},
  title={A* CCG Parsing with a Supertag and Dependency Factored Model},
  booktitle={Proc. ACL},
  year=2017,
}

Licence

MIT Licence

Contact

For questions and usage issues, please contact [email protected] .

Acknowledgement

In creating the parser, I owe very much to:

  • EasyCCG: from which I learned everything
  • NLTK: for nice pretty printing for parse derivation

depccg's People

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.