Giter Club home page Giter Club logo

missing-word's Introduction

Locating and Filling Missing Words in Sentences

Introduction

This is a project, which considers locating and filling missing words in sentences based on language modeling. By investigating the statistical connection among words surrounding the candidate location and the candidate word, we manage to achieve a missing word location accuracy and a missing word filling accuracy as high as 52% and 32%, respectively.

Table of Contents

  • What it can do
  • What it includes
  • Contributors
  • Additional information

What it can do

Train the language model by a large training set including millions of complete sentences, then apply the model to predict missing word location and filling the missing word for each incomplete sentence. Note that there should only one missing word for each sentence, and it can neither be the first nor the last one.

What it includes

  • data\vocabulary-14126.txt: vocabulary listing 14216 high-frequency words and their corresponding frequencies
  • data\train_v2.txt and test_v2.txt: too large to include, please download them from https://www.kaggle.com/c/billion-word-imputation/data, or you can use your own data and regenerate the high-frequency vocabulary.
  • LocationAndFillingCrossValidation.py: model training and cross validation (based on only train_v2.txt)
  • LocationAndFillingTestData.py: model training and produce results for testing data (train_v2.txt and test_v2.txt)
  • Presentation Slides.pdf & Project Report.pdf: supporting documents

Contributors

  • Tianlong Song
  • Zhe Wang

Additional information

Please refer to the presentation slides and project report in this repository. A brief theoretical explanation is also available on Tianlong's blog.

missing-word's People

Contributors

stlong0521 avatar

Stargazers

Vinicius avatar MS avatar Ronak Dedhiya avatar Ignavier Ng avatar  avatar Pantelis Koukousoulas avatar

Watchers

 avatar Thoufiq Ansari KS avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.