Light

namanjain2050 / sarcasm-detection Goto Github PK

View Code? Open in Web Editor NEW

1.0 3.0 1.0 2.06 MB

Saracasm Detection in Reddit comments

Jupyter Notebook 81.53% Python 18.47%

sarcasm-detection nlp sarc text-classification cnn deep-learning

sarcasm-detection's Introduction

Sarcasm Detection in Reddit Comments

This repository showcases my work on sarcasm detection task.
Problem description: Given raw comments from Reddit, we have to classify them as sarcastic or not.
Dataset source: https://www.kaggle.com/danofer/sarcasm
Paper referred: https://arxiv.org/abs/1610.08815

Description of each notebook

prepare-data-csv.ipynb: Using raw data source to create usable data CSVs
Data cleaning and EDA.ipyb: Cleaning text and some Exploratory data analysis on our data
Modelling.ipynb: CNN model for text classification task

Dataset details

We have a prepared a perfectly balanced dataset for our task.

	Sarcastic (1)	Not sarcastic (0)
Train	400000	400000
CV	50000	50000
Test	50000	50000
Total	500000	500000

Modelling

We've used 1D CNN models to extract features from raw texts and make classifications. We've used combinations of three different kinds of features:

Content features from raw texts
Sentiment features using Transfer Learning. Model trained on twitter dataset. More information can be found here: https://github.com/NamanJain2050/semeval-2014-task-9/
Emotion features using Transfer Learning. Two models trained on two different datasets. More information can be found here: https://github.com/NamanJain2050/emotion-detection

Model 1: Using only content features

Predictions made using only content features extracted from 1D CNN. Model architecture is as follows:

Results of this model are as follows:

We've achieved an F1-score of 0.7234 and we were able to classify 73.58% of sarcastic comments correcly.

Model 2: Using content features + sentiment features

Predictions made using content features extracted from 1D CNN and sentiment features from pre-trained model. Model architecture is as follows:

Results of this model are as follows:

We've achieved an F1-score of 0.7179 and we were able to classify 71.78% of sarcastic comments correcly.

Model 3: Using content features + emotion features

Predictions made using content features extracted from 1D CNN and emotion features from pre-trained model. Model architecture is as follows:

Results of this model are as follows:

We've achieved an F1-score of 0.7242 and we were able to classify 71.75% of sarcastic comments correcly.

Model 4: Using content features + emotion features

Predictions made using content features extracted from 1D CNN and emotion features from pre-trained model. This time we'll use a different model trained for emotion features. Model architecture is as follows:

Results of this model are as follows:

We've achieved an F1-score of 0.7235 and we were able to classify 72.07% of sarcastic comments correcly.

Model 5: Using content features + sentiment features + emotion features

Predictions made using content features extracted from 1D CNN, sentiment features and emotion features extracted from pre-trained models. Model architecture is as follows:

Results of this model are as follows:

We've achieved an F1-score of 0.7222 and we were able to classify 70.71% of sarcastic comments correcly.

Model 6: Using content features + sentiment features + emotion features

Predictions made using content features extracted from 1D CNN, sentiment features and emotion features extracted from pre-trained models. Model architecture is as follows:

Results of this model are as follows:

We've achieved an F1-score of 0.7215 and we were able to classify 72.1% of sarcastic comments correcly.

Summary of results

Conclusions

We've seen that adding emotion and sentiment features from pretrained models have degraded our results.
Possible reason(s):

Models were trained on much smaller datasets as compared to our SARC dataset

sarcasm-detection's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.