Giter Club home page Giter Club logo

simplification's Introduction

Contact: Wei Xu (Ohio State University)

Code, data and trained models from the following papers:

 @article{Xu-EtAl:2016:TACL,
 author = {Wei Xu and Courtney Napoles and Ellie Pavlick and Quanze Chen and Chris Callison-Burch},
 title = {Optimizing Statistical Machine Translation for Text Simplification},
 journal = {Transactions of the Association for Computational Linguistics},
 volume = {4},
 year = {2016},
 url = {https://cocoxu.github.io/publications/tacl2016-smt-simplification.pdf},
 pages = {401--415}
 }

and

 @article{Xu-EtAl:2015:TACL,
 author = {Wei Xu and Chris Callison-Burch and Courtney Napoles},
 title = {Problems in Current Text Simplification Research: New Data Can Help},
 journal = {Transactions of the Association for Computational Linguistics},
 volume = {3},
 year = {2015},
 url = {http://www.cis.upenn.edu/~ccb/publications/publications/new-data-for-text-simplification.pdf},
 pages = {283--297}
 }

Data

./tacl2016-smt-simplification.pdf the paper

./data/turkcorpus/ tuning and test data

*.norm       tokenized sentences from English Wikipedia

*.simp       tokenized, corresponding sentences from Simple English Wikipedia

*.turk.0~7   8 reference simplifications by different Amazon Mechanical Turkers 

./data/systemoutputs/ 4 different system outputs compared in the paper

./data/ppdb/ppdb-1.0-xl-all-simp.gz (a 3.8G file) paraphrase rules (PPDB 1.0) with added simplification-specific features

./data/ppdb/ppdb-1.0-xxxl-lexical-self-simp.gz (a 27M file) self-paraphrase lexical rules that map words to themselves, and help to copy input words into outputs

Code

./SARI.py a stand-alone Python implementation of the SARI metric for text simplification evaluation

There is also a Java implementation of SARI that is integrated as part of the Joshua's codebase.

Crowdsouring User Interface Design (Human Evaluation and Data Collection)

./HIT_MTurk_crowdsourcing/ HTML interfaces designed for human evaluation of simplification systems, as well as parallel corpus collection (originally used on Amazon Mechnical Turk HITs)

The Text Simplificaiton System

The text simplification system was implemented into the MT toolkit Joshua Decoder.

./ppdb-simplification-release-joshua5.0.zip (a 281M file) The experiments in our TACL 2016 paper used the Joshua 5.0. Example scripts for training the simplification are under the directory ./bin/. The joshua_TACL2016.config is also provided -- that is corresponding to the best system in our paper. You may find the Joshua pipeline tutorial useful. Note that STAR is corpus-level version of SARI, SARI is sentence-level; the current STAR.java used the hardcoded the number of reference sentences to 8, and used F_score of the deletion rather than only the precision (you may want to change before using it).

Preprocessing Scripts

./scripts_preprocessing/ The tokenizer and sentence spliter used for preprocessing.

simplification's People

Contributors

borgr avatar cocoxu avatar mounicam avatar socialmedia-class avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simplification's Issues

Details on pre-processing

Hello,

Could you provide details on the pre-processing applied to the dataset? For example, which tokenizer was used (with which options)? Thank you.

ZeroDivisionError in SARI.py

In SARI.py, line 107, I think it should be "if len(addgramcounterall) > 0" instead of "if len(addgramcounter) > 0"

Recall score for the kept tokens in SARI

Thank you for making this code available! I was trying to understand how the different components of the SARI score are computed, and I wonder if I've misunderstood something or if there's an inconsistency between the code and the paper. Consider the following example.

Input: "a b"
Output: "b"
Ref-1: "a b"
Ref-2: "a"

Now if I manually compute the recall of kept tokens using Eq. 5 from the paper, I get

    r_{keep}(1) = [min(0, 1) + min(1, 1/2)] / [1 + 1/2] = 1/3,

where the first terms of the numerator and denominator correspond to "a" and the second terms to "b". However, the GitHub implementation gives me

    r_{keep}(1) = 1/2.

The reason is that in the code the terms of the numerator are divided individually by the corresponding denominator terms on line 58, instead of dividing the sum of the numerator terms by the sum of the denominator terms as done in Eq. 5 in the paper.

Replacing line 58 by:

    keeptmpscore2 += keepgramcountergood_rep[keepgram]

and line 65 by:

    keepscore_recall = keeptmpscore2 / sum(keepgramcounterall_rep.values())

seems to fix this and yield p_{keep}(1) = 1/3 as I would expect it to yield.

Have I missed something? Thanks in advance!

SARI does not work for short sequence?

Hi,

Thanks for the great work!
I have a problem with SARI score. Looks like it does not support sequence with one word for example? See the example below:

test_src = "global"
test_pred = "scenarioPattern"
test_ref = "scenarioPattern"
print(SARIsent(test_src, test_pred, [test_ref]))

the score is 0.17 rather than 1.

Is a bug or as expected?

Thank you.

Questions About STAR

Hi, I have some questions about the STAR.java, I found this script use the f_score of the deletion rather than the precision, which is different from the SARI definition.
`

    double recall_del_n = 0.0 ;
    if (delRefTotalNgram > 0) {
            recall_del_n = delCandCorrectNgram / (double) delRefTotalNgram;
    }

    double f1_del_n = meanHarmonic(prec_del_n, recall_del_n);

    sc += weights[n] * f1_del_n;`

Could you tell me the reason?

Also, I tested STAR on the wiki-small corpus (1 reference ) and I found that using the random sentence as the simplified output can get very high STAR (about 33.7). Is this a normal phenomenon?

STAR - corpus-level SARI

In the readme it is mentioned

Note that STAR is corpus-level version of SARI, SARI is sentence-level.

But, I couldn't find information on STAR anywhere. So, how is it calculated or is it simply the mean of all SARI sentence scores?

Difference between paper equations and code

In Equation 7 of the paper, my understanding is that you need to compute the precision/recall of each ngram order, and then this is averaged over the maximum order of ngrams (which is 4). Only after that, you calculate the F1 score of each operation, and then compute SARI/STAR by averaging them:

add_precision = (add_precision_1 + add_precision_2 + add_precision_3 + add_precision_4) / 4
add_recall = (add_recall_1 + add_recall_2 + add_recall_3 + add_recall_4) / 4
add_f1 = 2 * add_precision * add_recall / (add_precision + add_recall)

keep_precision = (keep_precision_1 + keep_precision_2 + keep_precision_3 + keep_precision_4) / 4
keep_recall = (keep_recall_1 + keep_recall_2 + keep_recall_3 + keep_recall_4) / 4
keep_f1 = 2 * keep_precision * keep_recall / (keep_precision + keep_recall)

del_precision = (del_precision_1 + del_precision_2 + del_precision_3 + del_precision_4) / 4

sari = (add_f1 + keep_f1 + dep_precision) / 3

However, the code follows a different procedure. There, a F1 score (for each operation) is computed for each ngram order. These are accumulated (averaged by the maximum ngram order) and divided by 3 (the number of operations) in the end.

add_f1_1 = 2 * add_precision_1 * add_recall_1 / (add_precision_1 + add_recall_1)
add_f1_2 = 2 * add_precision_2 * add_recall_2 / (add_precision_2 + add_recall_2)
add_f1_3 = 2 * add_precision_3 * add_recall_3 / (add_precision_3 + add_recall_3)
add_f1_4 = 2 * add_precision_4 * add_recall_4 / (add_precision_4 + add_recall_4)

add_1 = (add_f1_1 + add_f1_2 + add_f1_3 + add_f1_4) / 4

keep_f1_1 = 2 * keep_precision_1 * keep_recall_1 / (keep_precision_1 + keep_recall_1)
keep_f1_2 = 2 * keep_precision_2 * keep_recall_2 / (keep_precision_2 + keep_recall_2)
keep_f1_3 = 2 * keep_precision_3 * keep_recall_3 / (keep_precision_3 + keep_recall_3)
keep_f1_4 = 2 * keep_precision_4 * keep_recall_4 / (keep_precision_4 + keep_recall_4)

keep_1 = (keep_f1_1 + keep_f1_2 + keep_f1_3 + keep_f1_4) / 4

del_precision = (del_precision_1 + del_precision_2 + del_precision_3 + del_precision_4) / 4

sari = (add_f1 + keep_f1 + dep_precision) / 3

These are not mathematically equivalent, so the scores produced by both ways of calculating the metric are different. Which is the correct process then? The one in the paper or the one in the code?

Thanks for your help and clarification.

Access to Human Ratings

Hello,

Would it be possible to have access to the data collected for the human evaluation? This means, the ratings submitted by the Turkers for Grammaticality, Meaning Preservation and Simplicity Gain for all the systems that were evaluated?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.