Giter Club home page Giter Club logo

baselines-emnlp2016's Introduction

Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction

This repository contains baseline models, training scripts, and instructions on how to reproduce our results for our state-of-art grammar correction system from M. Junczys-Dowmunt, R. Grundkiewicz: Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction, EMNLP 2016.

Citation

@InProceedings{junczysdowmunt-grundkiewicz:2016:EMNLP2016,
  author    = {Junczys-Dowmunt, Marcin  and  Grundkiewicz, Roman},
  title     = {Phrase-based Machine Translation is State-of-the-Art for
               Automatic Grammatical Error Correction},
  booktitle = {Proceedings of the 2016 Conference on Empirical Methods in
               Natural Language Processing},
  month     = {November},
  year      = {2016},
  address   = {Austin, Texas},
  publisher = {Association for Computational Linguistics},
  pages     = {1546--1556},
  url       = {https://aclweb.org/anthology/D16-1161}
}

Updates

Last update: 3/8/2018

Updated training scripts

The train-2018 directory contains updated training scripts and instructions that we used to create SMT systems in our paper: R. Grundkiewicz, M. Junczys-Dowmunt Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation, NAACL 2018 [bibtex] Main modifications include switching to NLTK tokenization, using BPE subword segmentation, and adding GLEU tuning.

Text data used for training CCLM

We publish the text data we used for training a web-scale language model (CCLM): http://data.statmt.org/romang/gec-emnlp16/sim/

The data is tokenized with NLTK tokenizer and truecased with the Moses truecaser. All parts are separate files and can be separately extracted using the xz tool. Parts 00-04 consist of 1000M lines, part 05 consists of 291,262,763 lines.

Results on JFLEG data sets

Outputs generated by our models for the JFLEG data sets are available in the folder jfleg. These are produced by our systems tuned on M^2. See the README in that folder for more information.

New: We also report results for the systems tuned on GLEU using JFLEG dev.

Update on phrase tables

The phrase table that we have made publicly available for download were filtered for CoNLL test sets. The evaluation of our systems with that PT on other data sets makes no sense. Now, we provide the original unfiltered phrase table in binarized format (due to its size). The outputs for CoNLL test sets produced with a binarized PT should remain unchanged.

All .ini files and instructions how to use them have been updated.

Update on CCLM+sparse models

We have updated the model which use CCLM and sparse features. That model was used to generate results reported in the paper as Best sparse + CCLM. Moses .ini files are available in the folder models.

We also provide the script models/run_gecsmt.py to run our models (see notes below).

Update for 10gec dataset

The results reported in the camera-ready version of the paper on the dataset from Bryant and Ng (2015) (Tab. 4, three last columns) are understated due to the invalid preparation of the M2 file. The correct scores are as follows:

System Prec. Recall M^2
Baseline 69.22 37.00 58.95
+CCLM 76.66 36.39 62.77
Best dense 71.11 37.44 60.27
+CCLM 79.76 39.52 66.27
Best sparse 76.48 35.99 62.43
+CCLM 80.57 39.74 66.83

We would like to thank Shamil Chollampatt for reporting this issue!

Outputs

Outputs generated by our models for the CoNLL-2014 test set are available in the folder outputs. These correspond to Table 4 of our paper. See the README in that folder for more information.

Baseline models

You can download and run our baseline models (1,3G).

models/
├── data
│   ├── lm.cor.kenlm
│   ├── osm.kenlm
│   ├── phrase-table.0-0.gz
│   └── phrase-table.0-0.unfiltered.minphr
├── moses.dense-cclm.mert.avg.ini
├── moses.dense.mert.avg.ini
├── moses.sparse-cclm.mert.avg.ini
├── moses.sparse.mert.avg.ini
└── sparse
    ├── moses.cc.sparse
    └── moses.wiki.sparse

The four configuration *.ini files corresponds to the last four systems described in Table 4.

To use the models you need to install Moses decoder (branch master). It has to be compiled with support for 9-gram kenLM language models, and binarized tables by providing path to CMPH library (see details here), e.g.:

/usr/bin/bjam -j16 --max-kenlm-order=9 --with-cmph=/path/to/cmph

The language model data are available in separate packages:

The packages contain:

wikilm/
├── wiki.blm
├── wiki.classes.gz
└── wiki.wclm.kenlm
cclm/
├── cc.classes.gz
├── cc.kenlm
└── cc.wclm.kenlm

Adjust absolute paths in moses.*.ini files. You can do this by replacing /path/to/ with the path to the directory where you downloaded models and language models. Finally, run moses, e.g.:

/path/to/mosesdecoder/bin/moses -f moses.dense.mert.avg.ini < input.txt

The input file should contain one sentence per line and each sentence has to follow the Moses tokenization and truecasing as it is presented in train/run_cross.perl.

Alternatively you can use the script models/run_gecsmt.py, which performs pre- and postprocessing, e.g.:

python ./run/run_gecsmt.py -f moses.ini -w workdir -i input.txt -o output.txt

It can be used to evaluate M2 input:

python ./run/run_gecsmt.py -f moses.ini -w workdir -i test2014.m2 --m2

You will need to provide paths to Moses, Lazy and this repository. Use --help option for more details.

Running our models might give slightly different results (up to +/- 0.0020 F-score) than the results presented in the paper due to the different versions of the official CoNLL-2014 test set (we used the version provided during the CoNLL shared task), M2Scorer, NLTK tokenizer, Moses, and the LM used for truecasing.

Training models

Training is described in the README in the folder train.

Acknowledgments

This project was partially funded by the Polish National Science Centre (Grant No. 2014/15/N/ST6/02330).

baselines-emnlp2016's People

Contributors

emjotde avatar shamilcm avatar snukky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

baselines-emnlp2016's Issues

Training with sparse features

For training sparse features (as with train/config.sparse.yml):

Line 293 of train/run_cross.perl:

The following file does not exist: $DIR/cross.00/work.err-cor/binmodel.err-cor/moses.mert.ini.sparse

In train/train_smt.perl:

For the esm flag:

$MOSESDIR/bin/ESMSequences (on line 288) does not appear to be available in a standard
Moses build. Do you plan to make it available?

Retraining the smt-2016 model

Hi,
Thank you for answering my questions before. I am training the smt-2016 model recently. Everything just fine while using moses to train the model but I encounter an error while tuning. The error message is:
Name:moses VmPeak:30234320 kB VmRSS:702832 kB RSSMax:29396400 kB user:474.828 sys:7.216 CPU:482.044 real:55.818
The decoder returns the scores in this order: OpSequenceModel0 LM0 LM1 LM2 EditOps0 EditOps0 EditOps0 WordPenalty0 PhrasePenalty0 TranslationModel0 TranslationModel0 TranslationModel0 TranslationModel0
Executing: gzip -f run1.best100.out
Scoring the nbestlist.
exec: /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/tuning.0.1/extractor.sh
Executing: /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/tuning.0.1/extractor.sh > extract.out 2> extract.err
Executing: \cp -f init.opt run1.init.opt
Executing: echo 'not used' > weights.txt
exec: /data/home/ghoznfan/mosesdecoder-master/bin/kbmira --sctype M2SCORER --scconfig beta:0.5,max_unchanged_words:2,case:false --model-bg -D 0.001 --dense-init run1.dense --ffile run1.features.dat --scfile run1.scores.dat -o mert.out
Executing: /data/home/ghoznfan/mosesdecoder-master/bin/kbmira --sctype M2SCORER --scconfig beta:0.5,max_unchanged_words:2,case:false --model-bg -D 0.001 --dense-init run1.dense --ffile run1.features.dat --scfile run1.scores.dat -o mert.out > run1.mira.out 2> mert.log
sh: line 1: 34173 abandoned /data/home/ghoznfan/mosesdecoder-master/bin/kbmira --sctype M2SCORER --scconfig beta:0.5,max_unchanged_words:2,case:false --model-bg -D 0.001 --dense-init run1.dense --ffile run1.features.dat --scfile run1.scores.dat -o mert.out > run1.mira.out 2> mert.log
Exit code: 134
ERROR: Failed to run '/data/home/ghoznfan/mosesdecoder-master/bin/kbmira --sctype M2SCORER --scconfig beta:0.5,max_unchanged_words:2,case:false --model-bg -D 0.001 --dense-init run1.dense --ffile run1.features.dat --scfile run1.scores.dat -o mert.out'. at /data/home/ghoznfan/mosesdecoder-master/scripts/training/mert-moses.pl line 1775.
06/12/2019 19:54:13 Command: perl /data/home/ghoznfan/mosesdecoder-master/scripts/training/mert-moses.pl /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/test.lc.0.mer.err.fact /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/test.lc.0.mer.m2 /data/home/ghoznfan/mosesdecoder-master/bin/moses /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/binmodel.err-cor/moses.ini --working-dir=/data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/tuning.0.1 --mertdir=/data/home/ghoznfan/mosesdecoder-master/bin --mertargs "--sctype M2SCORER" --no-filter-phrase-table --nbest=100 --threads 16 --decoder-flags "-threads 16 -fd '|'" --maximum-iterations 15 --batch-mira --return-best-dev --batch-mira-args "--sctype M2SCORER --scconfig beta:0.5,max_unchanged_words:2,case:false --model-bg -D 0.001"
finished with non-zero status 512
Died at train/run_cross.perl line 695.
Died at train/run_cross.perl line 11.
[ghoznfan@train-shuaidong-20190308-1708-gpu-pod-0 baselines-emnlp2016-master]$ paste: /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.*/work.err-cor/binmodel.err-cor/moses.mert.?.1.ini: no such file
06/12/2019 19:54:15 Running command: perl /data/home/ghoznfan/baselines-emnlp2016-master/train/scripts/reuse-weights.perl /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/binmodel.err-cor/moses.mert.1.ini < /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/release/work.err-cor/binmodel.err-cor/moses.ini > /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/release/work.err-cor/binmodel.err-cor/moses.mert.1.ini
ERROR: could not open weight file: /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/binmodel.err-cor/moses.mert.1.ini at /data/home/ghoznfan/baselines-emnlp2016-master/train/scripts/reuse-weights.perl line 9.
06/12/2019 19:54:15 Command: perl /data/home/ghoznfan/baselines-emnlp2016-master/train/scripts/reuse-weights.perl /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/binmodel.err-cor/moses.mert.1.ini < /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/release/work.err-cor/binmodel.err-cor/moses.ini > /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/release/work.err-cor/binmodel.err-cor/moses.mert.1.ini
finished with non-zero status 512
Died at train/run_cross.perl line 695.
Where is the problem here?

Mismatch in weights and features?

I'm getting the following error running the suggested moses command:

FATAL ERROR: Mismatch in number of features and number of weights for Feature Function OpSequenceModel0 (features: 5 vs. weights: 1)

The full log is below. Is there anything I need to do other than fix the paths in the ini files?

Defined parameters (per moses.ini or switch):
	config: moses.sparse.mert.avg.ini`
	distortion-limit: 1
	feature: CorrectionPattern factor=0 context=1 context-factor=1 CorrectionPattern factor=1 OpSequenceModel path=/home/baselines-emnlp2016/models/data/osm.kenlm input-factor=0 output-factor=0 support-features=no EditOps scores=dis Generation name=Generation0 num-features=0 input-factor=0 output-factor=1 path=/home/baselines-emnlp2016/wikilm/wiki.classes.gz UnknownWordPenalty WordPenalty PhrasePenalty PhraseDictionaryMemory name=TranslationModel0 num-features=4 path=/home/baselines-emnlp2016/models/sparse/phrase-table.0-0.gz input-factor=0 output-factor=0 KENLM lazyken=0 name=LM0 factor=0 path=/home/baselines-emnlp2016/models/data/lm.cor.kenlm order=5 KENLM lazyken=0 name=LM1 factor=0 path=/home/baselines-emnlp2016/wikilm/wiki.blm order=5 KENLM lazyken=0 name=LM2 factor=1 path=/home/baselines-emnlp2016/wikilm/wiki.wclm.kenlm order=9
	input-factors: 0 1
	mapping: 0 T 0 0 G 0
	search-algorithm: 1
	weight: OpSequenceModel0= 0.056400116 EditOps0= 0.089810909 0.055824475 0.251698374 UnknownWordPenalty0= 0.000000000 WordPenalty0= 0.033986809 PhrasePenalty0= 0.213073353 TranslationModel0= 0.053263575 0.079491017 0.050398784 -0.002760660 LM0= 0.030412285 LM1= 0.059879919 LM2= 0.022263916
	weight-file: /home/baselines-emnlp2016/models/sparse/moses.wiki.sparse
line=CorrectionPattern factor=0 context=1 context-factor=1
Initializing correction pattern feature..
FeatureFunction: CorrectionPattern0 start: 0 end: 18446744073709551615
line=CorrectionPattern factor=1
Initializing correction pattern feature..
FeatureFunction: CorrectionPattern1 start: 0 end: 18446744073709551615
line=OpSequenceModel path=/home/baselines-emnlp2016/models/data/osm.kenlm input-factor=0 output-factor=0 support-features=no
FeatureFunction: OpSequenceModel0 start: 0 end: 4
Exception: moses/FF/Factory.cpp:191 in static void Moses::FeatureFactory::DefaultSetup(F*) [with F = Moses::OpSequenceModel] threw util::Exception because `weights.size() != feature->GetNumScoreComponents()'.
FATAL ERROR: Mismatch in number of features and number of weights for Feature Function OpSequenceModel0 (features: 5 vs. weights: 1)

Suspicious casing while reproducing the conll14 results

Hi,

I want to reproduce the same (or at least very similar) m2 scores on the official conll14 test set. Following the README file, I successfully set up the environment and could get some results by the following command:

python2 models/run_gecsmt.py \
    -f models/moses.dense-cclm.mert.avg.ini \
    -w reproduce/ \
    -i conll14st-test/noalt/official-2014.combined.m2 \
    --m2 \
    -o reproduce/conll.out \
    --moses $PWD/build/mosesdecoder \
    --lazy $PWD/build/lazy \
    --scripts $PWD/train/scripts

The output file was supposed to be almost (if not exactly) the same with your submission, and so should the m2 scores be. However, I only got the following m2 scores:

Precision : 0.5977
Recall : 0.2794
F_0.5 : 0.4868

while the reported F0.5 is 0.4893, which is what I was expecting.

I vimdiffed my output against yours, and found that my output contained a few casing mistakes while yours doesn't. For example, in the middle part of sentence 333, my output was:

... doctors to disclose information To Patients Relatives.It challenges The Confidentiality and privacy principles.Currently , under the Health Insurance Portability and ...

The bolded tokens look suspicious. Here their first letters are all capitalized, but the original input is not. Your output looks fine, too.

I digged a little into the script: models/run_gecsmt.py, and realized maybe there is something wrong during the recasing phase? More specifically, at line 78:

run_cmd("cat {pfx}.out.tok" \
" | {scripts}/impose_case.perl {pfx}.in {pfx}.out.tok.aln" \
" | {moses}/scripts/tokenizer/deescape-special-chars.perl" \
" | {scripts}/impose_tok.perl {pfx}.in > {pfx}.out" \
.format(pfx=prefix, scripts=args.scripts, moses=args.moses))

It looks like we are recasing the output (tokenized) using the raw input (untokenized) and the alignment file. I suspect this is incorrect because the alignment file is based on the tokenized files, and we should do something like this:

{scripts}/impose_case.perl {pfx}.in.tok {pfx}.out.tok.aln

I did try doing so. While I successfully got the correct cases for the example above, now all sentence beginning letters are in lowercase too.

This got me totally confused. How can I get the expected results and scores? What seems to be the problem? Could you shed some light?

For your reference, I also attached my output and logs here.

run.log
conll.out.txt

FDException while reading wikilm/wiki.blm

Hi:
Thank you for open-source your fantastic work. I encounter an error while running the script run_gecsmt.py. The error message is as follow:

util/file.cc:138 in std::size_t util::PartialRead(int, void *, std::size_t) threw FDException because `ret < 0'.
Invalid argument in fd 3 while reading 21992807322 bytes File: /Users/admin/fhs/smt-baseline/baselines-emnlp2016-master/wikilm/wiki.blm
Done

I try to run the script tokenizer.perl Individually for tokenizing the data, and it work to a certain degree. But the M2 score I got is far from the result in the paper:
Precision : 0.5617
Recall : 0.2371
F_0.5 : 0.4409
At the same time, I run the evaluation script with the sparse output in the folder 'output' and get:
Precision : 0.5854
Recall : 0.2493
F_0.5 : 0.4610
There is a huge difference between my result and your result. where is my problem here?

error while runing run_gecsmt.py

Run: cp models/data_gec/wi-locness-dev-origin.err /Users/admin/fhs/smt-baseline/baselines-emnlp2016-master/workdir/wi-locness-dev-origin.in
Run: /Users/admin/fhs/smt-baseline/baselines-emnlp2016-master/train/scripts/m2_tok/detokenize.py < /Users/admin/fhs/smt-baseline/baselines-emnlp2016-master/workdir/wi-locness-dev-origin.in | /Users/admin/fhs/smt-baseline/moses/mosesdecoder-master/scripts/tokenizer/tokenizer.perl -threads 16 | /Users/admin/fhs/smt-baseline/baselines-emnlp2016-master/train/scripts/case_graph.perl --lm /Users/admin/fhs/smt-baseline/baselines-emnlp2016-master/wikilm/wiki.blm --decode /Users/admin/fhs/smt-baseline/lazy/lazy-master/bin/decode > /Users/admin/fhs/smt-baseline/baselines-emnlp2016-master/workdir/wi-locness-dev-origin.in.tok
Tokenizer Version 1.1
Language: en
Number of threads: 16
Using 16 threads
Creating Graphs
Loading /Users/admin/fhs/smt-baseline/baselines-emnlp2016-master/wikilm/wiki.blm
Recasing
util/file.cc:136 in std::size_t util::PartialRead(int, void *, std::size_t) threw FDException because `ret < 0'.
Invalid argument in fd 3 while reading 21992807322 bytes File: /Users/admin/fhs/smt-baseline/baselines-emnlp2016-master/wikilm/wiki.blm
Done

Here is the errors while I running the script run_csmt.py.It seems that the system can't tokenize the input.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.