Giter Club home page Giter Club logo

smooth_bleu's Introduction

BLEU

Out-of-the-box Python script for sentence level and corpus level BLEU calculation We recommend users to use nltk-based BLEU calculation script by installing nltk first.

Run python bleu.py -h or python nltk_bleu.py -h to see the help information

Usage

python bleu.py -h to see the help information

  1. input FILES

    • python bleu.py -t translation.file -t reference.file
    • Set the parameter --sentence-level and other related parameters, such as --smooth-epsilon, --smooth to print sentence-level BLEU score.
  2. input Strings

    python bleu.py -t "test bleu calculation" -r "test blue calculation"
    
    BLEU = 90.36,  66.7/0.0/0.0/0.0 (BP=1.0,  ratio=1.0,  hyp_len=3,  ref_len=3)

The bleu.py is updated from this gist.

nltk-based

Instead of using bleu.py, one can use nltk_bleu.py based on nltk.

for example: python nltk_bleu.py -r reference.file -t translation.file

Similarly, one can use --sentence-level or -sl to print sentence-level BLEU score.

TODO

  1. bleu.py correctness check
  2. better wrapper for nltk_bleu.py
  3. NLTK installation included?

smooth_bleu's People

Contributors

cshanbo avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

andyshenas

smooth_bleu's Issues

tensor2tensor

Hi Justin - I saw that you are an original author of the WMT17 translation code for tensor2tensor. May you send me your email so I can ask you some questions about your thoughts and comparisons of Transformer model on English-Chinese translation? I can also be reached on twitter @twairball

Maybe the BLEU is wrong

In the BLEU.py, function modified_precision(references, hypothesis, n):

def modified_precision(references, hypothesis, n):
# Extracts all ngrams in hypothesis.
counts = Counter(ngrams(hypothesis, n))
if not counts:
return Fraction(0)
# Extract a union of references' counts.
max_counts = reduce(or_, [Counter(ngrams(ref, n)) for ref in references])
# Assigns the intersection between hypothesis and references' counts.
clipped_counts = {ngram: min(count, max_counts[ngram]) for ngram, count in counts.items()}
return Fraction(sum(clipped_counts.values()), sum(counts.values()))

function Fraction() returns reduction of a fraction. Such as value = Fraction(3, 6), so value.numerator = 1,
value.denominator= 2. However, when we statistics the number of n-gram, we should add 3 not 1. Looking forward to your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.