Giter Club home page Giter Club logo

mtorht's Introduction

Classifying Parallel Sentences as Machine or Human Translation

Corrosponding blog post can be found here#

The classifier is implemented in the script classifier.py that can be found in the directory code/. The script accepts data partitioned into train and test directories containing the following file names:

  1. source_ht : A text file containing the source sentences that were translated by a human
  2. trans_ht : A text file containing the target sentences translated by a human
  3. source_mt : A text file containing the source sentences that were translated by a machine
  4. trans_mt : A text file containing the target sentences translated by a machine

Each sentence in a given line number in the source file corresponds to the sentence in the same line number in the trans_ht and trans_mt files.

Specifying train and test data:

By default, the script will use the data provided in the directory data_for_code/. To specify which aligned sentence pairs to use as training data use the "-tr" flag followed by the directory where the training data is stored. To specify aligned sentence pairs to use as test data, use the "-te" flag followed by the directory where the test data is stored. With out any specified parameters, the classifer trains on the aligned sentence pairs in data_for_code/train and tests on the aligned sentence pairs in data_for_code/dev.

Specifying the type of classifier:

By default, the classifier uses an Support Vector Machine. To change which type of classifier used, uncomment any line between line numbers 173 - 178 in the classifier.py. As of now, this is not a command line argument.

For any questions or comments, please email me at [email protected]

mtorht's People

Contributors

azpoliak avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.