Giter Club home page Giter Club logo

learning_to_speak's Introduction

Natural Language Processing

This repository is an attempt to implement all algorithms in the Speech and Language Processing, Second Edition book.

Dependencies

The

installations.txt

files contains all the libraries needed for this.

Other than this, you will need to download WordNet and word_tokenize modules using

import nltk
nltk.download()

in a python shell in your terminal.

How to Run ?

The simplest method right now to access each algorithm is to go to each of the folders, if there is a

run.py

file in the folder, you will be able to run it with simply

python run.py <arguments>

you can access what arguments each of those run files takes using the

python run.py -h

option for argparse help.

Done:

Simplified Lesk Algorithm

This algorithm finds the sense of the word by matching context with wordnet definitions and examples to identify a best sense.

Quora Question Pairs

Attempted to solve the Quora Question Pairs competition as part of the class project, so implemented three different basic methods to identify.

  1. Adapted Lesk Algorithm: The Lesk algorithm was a good way to identify which words were being used in what sense and the output of the Lesk algorithm gave a semantic understanding of the sentence. When comparing two sentences, this understanding was used and a similarity score was generated based on the similarity of words in the two sentences.

  2. Cosine Similarity using TFIDF: TFIDF is one of the basic methods to convert sentences to vectors. This is mostly used for numerical representation of words and so seemed a great method to vectorize and identify the similarity between the two sentences.

  3. LSTM using Doc2Vec: Best implementation of the three. Actually this did not require a lot of implementation, just assembling of some methods and training the LSTM for hours over Doc2Vec input and class output.

Context Free Grammar Parser

CFG Parser loads the grammar and tags the sentences using the CFG. The program runs in O(n^2) and is a dynamic program as explained in the Speech and Language Processing book.

Probability Context Free Grammar Parser

This is an extension of the CFG parser where it selects only the most probable tagging of the text.

TODO

  • Automata - NDRecognise

  • Automata - DRecognise

  • Conversational Bot (LSTM)

  • Brill Tagger

learning_to_speak's People

Contributors

antidigest avatar evamy avatar

Stargazers

 avatar

Watchers

 avatar

learning_to_speak's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.