Giter Club home page Giter Club logo

arsarcasm-v2's Introduction

ArSarcasm-v2 Dataset

ArSarcasm-v2 is an extension of the original ArSarcasm dataset published along with the paper From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset. ArSarcasm-v2 conisists of ArSarcasm along with portions of DAICT corpus and some new tweets. Each tweet was annotated for sarcasm, sentiment and dialect. The final dataset consists of 15,548 tweets divided into 12,548 training tweets and 3,000 testing tweets. ArSarcasm-v2 was used and released as a part of the shared task on sarcasm detection and sentiment analysis in Arabic. You can find more details in the Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic

Dataset details:

ArSarcasm-v2 is provided in a CSV format, we provide the same split that was used for the shared task. The training set contains 12,548 tweets, while the test set contains 3,000 tweets.

The dataset contains the following fields:

  • tweet: the original tweet text.
  • sarcasm: boolean that indicates whether a tweet is sarcastic or not.
  • sentiment: the sentiment of the tweet (positive, negative, neutral).
  • dialect: the dialect used in the tweet, we used the 5 main regions in the Arab world, follows the labels and their meanings:
    • msa: modern standard Arabic.
    • egypt: the dialect of Egypt and Sudan.
    • levant: the Levantine dialect including Palestine, Jordan, Syria and Lebanon.
    • gulf: the Gulf countries including Saudi Arabia, UAE, Qatar, Bahrain, Yemen, Oman, Iraq and Kuwait.
    • magreb: the North African Arab countries including Algeria, Libya, Tunisia and Morocco.

Citation

Please use the following citation if you use ArSarcasm-v2:

@inproceedings{abufarha-etal-2021-arsarcasm-v2,
title = "Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic",
    author = "Abu Farha, Ibrahim  and
    Zaghouani, Wajdi  and
    Magdy, Walid",
    booktitle = "Proceedings of the Sixth Arabic Natural Language Processing Workshop",
    month = april,
    year = "2021",
    }

Other resources

If you are interested in other Arabic NLP resources check:

arsarcasm-v2's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.