Giter Club home page Giter Club logo

semantic-textual-similarity's Introduction

Semantic Textual Similarity using Natural Language Processing(NLP)

Abstract

Semantic textual similarity computes the equivalence of two sentences on the basis of its conceptual similarity. It is widely used in natural languages processing tasks such as essay scoring, machine translation, text classification, information extraction, and question answering. This project focuses on one of the applications of semantic textual similarity known as automatic short answer grading (ASAG). It assigns a grade to a response provided by a student by comparing with one or more model answers. In particular, we selected one of the state-of-the-art short answer grading approaches that use Stanford CoreNLP library, and we used the same approach with the help of two open source libraries; Natural Language ToolKit (NLTK) and Spacy. For evaluation, Texas dataset and an in-house benchmarking ASAG dataset based on Mathematics for Robotics and Control (MRC) course were considered. Performances among all three libraries were evaluated using Pearson correlation coefficient, root mean square error (RMSE), and the runtime. Results based on Texas dataset showed that Stanford CoreNLP library has better Pearson correlation coefficient(0.66) and lowest RMSE(0.85) than NLTK and Spacy libraries. While using MRC dataset, all 3 libraries showed the comparative results on evaluated metrics.

Contents of Repository

This repository contains:

    Exercises related to textual similarity using NLTK and SPACY libraries that can help for short answer grading
    
    Comparison of spell corrector approaches using:
                    - Spell corrector using Ngrams,Jaccard coefficient and Minimum edit distance
                    - Spell corrector using Minimum Edit Distance(MED)

    Create jupyter notebooks for each student from Mohler data set for short questions and answers

    Create instructor version of assignments using nbgrader

    Create student version of assignments using nbgrader
    
    Wiki contains theoretically concepts: https://github.com/rameshjesswani/Semantic-Textual-Similarity/wiki
    
    Word Aligner using NLTK and Spacy libraries
    
    ASAG based Sultan et al. (2016) approach using NLTK And Spacy libraries

Guidelines for Monolingual Word Aligner

It can used as individual module. For more usage, check here: Word Aligner using NLTK and Spacy

Install nltk library(procedure given below)
Setup Stanford Parser, NER, PosTagger(link to setup in nltk given below)

Guidelines for ASAG

Details about Asag can be found here: ASAG

Installation

NLTK requires Python versions 2.7, 3.4, or 3.5

Install NLTK library

sudo pip install -U nltk

Install packages of NLTK

import nltk
nltk.download()

SPACY is compatible with 64-bit CPYTHON 2.6+/3.3+ and runs on Unix/Linux, macOS/OS X and WINDOWS

Install SPACY(code works with version 2.0.12) library

pip install -U spacy

After spacy installation you need to download a Language model

python -m spacy download en

Nbgrader Installation

pip install nbgrader

if you are using Anaconda:

conda install jupyter
conda install -c conda-forge nbgrader

To install nbgrader extensions:

jupyter nbextension install --user-prefix --py nbgrader --overwrite
jupyter nbextension enable --user-prefix --py nbgrader
jupyter serverextension enable --user-prefix --py nbgrader

For more docs about nbgrader:

http://nbgrader.readthedocs.io/en/stable/user_guide/installation.html

To use Stanford Parser, NER, PosTagger in NLTK check files:

https://github.com/rameshjesswani/Semantic-Textual-Similarity/blob/master/monolingualWordAligner/stanfordParser_setup.txt

https://github.com/rameshjesswani/Semantic-Textual-Similarity/blob/master/monolingualWordAligner/stanfordNERTagger_setup.txt

https://github.com/rameshjesswani/Semantic-Textual-Similarity/blob/master/monolingualWordAligner/stanfordPOSTagger_setup.txt

MindMap

Mind map

General NLP Pipeline

General Nlp pipeline

Bibtex

@unpublished{[RnD]Kumar,
	Authors = {Ramesh Kumar},
	Month = {January},
	Note = {WS17
	H-BRS - Evaluation of Semantic Textual Similarity Approaches for Automatic Short Answer Grading
Ploeger, Nair supervising},
	Title = {Evaluation of Semantic Textual Similarity Approaches for Automatic Short Answer Grading},
	Year = {2017/18}}

semantic-textual-similarity's People

Contributors

rameshjes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

semantic-textual-similarity's Issues

Demo for nbgrader convert dataset to notebooks

  • Creating the separate notebook for each student with question and answers(in raw folder there is answer and question file) (either automate it using nbformat or manually)
  • Adding the create assignment cell toolbar in all and marking it as Manually graded.
  • Release the assignment.

Read the Mohler dataset

  • Download the dataset from http://web.eecs.umich.edu/~mihalcea/downloads.html
    A larger collection of short student answers and grades for a course in Computer Science. The data set consists of 10 assignments (with 4-7 questions each) and 2 exams (with 10 questions each), with 30 student answers each. [download] (July 1, 2011)

  • Read the questions, models answers and answers provided by students

  • Answers provided by students should be easy to visualize(may be with id of each student)

Create data set of misspelled and correct words

  • Create data set(Manually or find online) that includes misspelled word and correct word.
  • Apply spell corrector using minimum edit distance and N-gram on each misspelled word from data set.
  • Compare results of both the methods.

Implement Monolingual Word Aligner

  • Create Monolingual word aligner using NLTK
  • Create Monolingual word aligner in Spacy, If some functions not present in Spacy then combine NLTK functions

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.