Giter Club home page Giter Club logo

emonoba's Introduction

EmoNoBa: A Dataset for Analyzing Fine-Grained Emotions on Noisy Bangla Texts

This is the implementation of our paper "EmoNoBa: A Dataset for Analyzing Fine-Grained Emotions on Noisy Bangla Texts". This work has been accepted at the AACL-IJCNLP 2022. You can find the paper here.

Abstract

For low-resourced Bangla language, works on detecting emotions on textual data suffer from size and cross-domain adaptability. In our paper, we propose a manually annotated dataset of 22,698 Bangla public comments from social media sites covering 12 different domains such as Personal, Politics, and Health, labeled for 6 fine-grained emotion categories of the Junto Emotion Wheel. We invest efforts in the data preparation to 1) preserve the linguistic richness and 2) challenge any classification model. Our experiments to develop a benchmark classification system show that random baselines perform better than neural networks and pre-trained language models as hand-crafted features provide superior performance.

Authors

  • Khondoker Ittehadul Islam 1
  • Tanvir Hossain Yuvraz 1
  • Md Saiful Islam 1,2
  • Enamul Hassan 1

1 Shahjalal University of Science and Technology, Bangladesh

2 University of Alberta, Canada

EmoNoBa Dataset is available here

List of files

  • Train.csv
  • Val.csv
  • Test.csv

Files Format

Column Title Description
Data Social media comment
Love 0, 1. '1' for Love, '0' for Not Love
Joy 0, 1. '1' for Joy, '0' for Not Joy
Surprise 0, 1. '1' for Surprise, '0' for Not Surprise
Anger 0, 1. '1' for Anger, '0' for Not Anger
Sadness 0, 1. '1' for Sadness, '0' for Not Sadness
Fear 0, 1. '1' for Fear, '0' for Not Fear
Topic Topic of the comment
Domain Source of the comment from {Youtube, Facebook and Twitter}

INSTALLATION

Requires the following packages:

  • Python 3.10.7 or higher

It is recommended to use virtual environment packages such as virtualenv. Follow the steps below to setup the project:

  • Clone this repository via git clone https://github.com/KhondokerIslam/EmoNoBa.git
  • Use this command to install required packages pip install -r requirements.txt
  • Run the setup.sh file to download additional data and setup pre-processing

Usage

  1. Download the EmoNoBa dataset from here
  2. Unzip the folder
  3. Ensure the folder name is "EmoNoBa Dataset"
  4. Go to data_processing folder and run python preprocess.py to obtain the preprocessed data.

Feature-Based Experiments

  • Go to Models folder
  • Use python feature_based.py
  • Type in the model name when you will be asked to specify the model name in the console
  • Model Names (Please follow the paper to read the details about experiments):
    • W1
    • W2
    • W3
    • W4
    • W1+W2
    • W1+W2+W3
    • W1+W2+W3+W4
    • C2
    • C3
    • C4
    • C5
    • C1+C2+C3
    • C1+C2+C3+C4
    • C1+C2+C3+C4+C5
    • W1+C1+C2+C3+C4+C5
    • W1+W2+W3+C1+C2+C3
    • W1+W2+W3+W4+C1+C2+C3

Neural Network Experiments

Random Initialize
  • Go to Models folder
  • Use "python neural_network_(random).py" to run an experiment.
FastText
  • Go to Models folder
  • Use "python neural_network_(embedding).py" to run an experiment.

Bangla-BERT

  • Go to Models folder
  • Use "python bangla-bert.py" to run an experiment.

Bibtex

@inproceedings{islam2022emonoba,
  title={EmoNoBa: A Dataset for Analyzing Fine-Grained Emotions on Noisy Bangla Texts},
  author={Islam, Khondoker Ittehadul and Yuvraz, Tanvir and Islam, Md Saiful and Hassan, Enamul},
  booktitle={Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing},
  pages={128--134},
  year={2022}
}

emonoba's People

Contributors

khondokerislam avatar

Stargazers

Md. Majharul Kamal avatar Shawon Ashraf avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.