Natural Language Processing

This repository is an attempt to implement all algorithms in the Speech and Language Processing, Second Edition book.

Dependencies

The

installations.txt

files contains all the libraries needed for this.

Other than this, you will need to download WordNet and word_tokenize modules using

import nltk
nltk.download()

in a python shell in your terminal.

How to Run ?

The simplest method right now to access each algorithm is to go to each of the folders, if there is a

run.py

file in the folder, you will be able to run it with simply

python run.py <arguments>

you can access what arguments each of those run files takes using the

python run.py -h

option for argparse help.

Done:

Simplified Lesk Algorithm

This algorithm finds the sense of the word by matching context with wordnet definitions and examples to identify a best sense.

Quora Question Pairs

Attempted to solve the Quora Question Pairs competition as part of the class project, so implemented three different basic methods to identify.

Adapted Lesk Algorithm: The Lesk algorithm was a good way to identify which words were being used in what sense and the output of the Lesk algorithm gave a semantic understanding of the sentence. When comparing two sentences, this understanding was used and a similarity score was generated based on the similarity of words in the two sentences.
Cosine Similarity using TFIDF: TFIDF is one of the basic methods to convert sentences to vectors. This is mostly used for numerical representation of words and so seemed a great method to vectorize and identify the similarity between the two sentences.
LSTM using Doc2Vec: Best implementation of the three. Actually this did not require a lot of implementation, just assembling of some methods and training the LSTM for hours over Doc2Vec input and class output.

Context Free Grammar Parser

CFG Parser loads the grammar and tags the sentences using the CFG. The program runs in O(n^2) and is a dynamic program as explained in the Speech and Language Processing book.

Probability Context Free Grammar Parser

This is an extension of the CFG parser where it selects only the most probable tagging of the text.

TODO

Automata - NDRecognise
Automata - DRecognise
Conversational Bot (LSTM)
Brill Tagger

antidigest / learning_to_speak Goto Github PK

learning_to_speak's Introduction

Natural Language Processing

Dependencies

How to Run ?

Done:

Simplified Lesk Algorithm

Quora Question Pairs

Context Free Grammar Parser

Probability Context Free Grammar Parser

TODO

learning_to_speak's People

Contributors

Stargazers

Watchers

learning_to_speak's Issues

Confusion regarding your repository

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent