Giter Club home page Giter Club logo

sarcasm-detection's Introduction

Sarcasm Detection in Reddit Comments

This repository showcases my work on sarcasm detection task.
Problem description: Given raw comments from Reddit, we have to classify them as sarcastic or not.
Dataset source: https://www.kaggle.com/danofer/sarcasm
Paper referred: https://arxiv.org/abs/1610.08815

Description of each notebook

  1. prepare-data-csv.ipynb: Using raw data source to create usable data CSVs
  2. Data cleaning and EDA.ipyb: Cleaning text and some Exploratory data analysis on our data
  3. Modelling.ipynb: CNN model for text classification task

Dataset details

We have a prepared a perfectly balanced dataset for our task.

Sarcastic (1) Not sarcastic (0)
Train 400000 400000
CV 50000 50000
Test 50000 50000
Total 500000 500000

Modelling

We've used 1D CNN models to extract features from raw texts and make classifications. We've used combinations of three different kinds of features:

  1. Content features from raw texts
  2. Sentiment features using Transfer Learning. Model trained on twitter dataset. More information can be found here: https://github.com/NamanJain2050/semeval-2014-task-9/
  3. Emotion features using Transfer Learning. Two models trained on two different datasets. More information can be found here: https://github.com/NamanJain2050/emotion-detection

Model 1: Using only content features

Predictions made using only content features extracted from 1D CNN. Model architecture is as follows:

model_01

Results of this model are as follows:

model_01

We've achieved an F1-score of 0.7234 and we were able to classify 73.58% of sarcastic comments correcly.

Model 2: Using content features + sentiment features

Predictions made using content features extracted from 1D CNN and sentiment features from pre-trained model. Model architecture is as follows:

model_02

Results of this model are as follows:

model_02

We've achieved an F1-score of 0.7179 and we were able to classify 71.78% of sarcastic comments correcly.

Model 3: Using content features + emotion features

Predictions made using content features extracted from 1D CNN and emotion features from pre-trained model. Model architecture is as follows:

model_03

Results of this model are as follows:

model_03

We've achieved an F1-score of 0.7242 and we were able to classify 71.75% of sarcastic comments correcly.

Model 4: Using content features + emotion features

Predictions made using content features extracted from 1D CNN and emotion features from pre-trained model. This time we'll use a different model trained for emotion features. Model architecture is as follows:

model_04

Results of this model are as follows:

model_04

We've achieved an F1-score of 0.7235 and we were able to classify 72.07% of sarcastic comments correcly.

Model 5: Using content features + sentiment features + emotion features

Predictions made using content features extracted from 1D CNN, sentiment features and emotion features extracted from pre-trained models. Model architecture is as follows:

model_05

Results of this model are as follows:

model_05

We've achieved an F1-score of 0.7222 and we were able to classify 70.71% of sarcastic comments correcly.

Model 6: Using content features + sentiment features + emotion features

Predictions made using content features extracted from 1D CNN, sentiment features and emotion features extracted from pre-trained models. Model architecture is as follows:

model_06

Results of this model are as follows:

model_06

We've achieved an F1-score of 0.7215 and we were able to classify 72.1% of sarcastic comments correcly.

Summary of results

summary

Conclusions

We've seen that adding emotion and sentiment features from pretrained models have degraded our results.
Possible reason(s):

  1. Models were trained on much smaller datasets as compared to our SARC dataset

sarcasm-detection's People

Contributors

namanjain2050 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

shubhi82

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.