Giter Club home page Giter Club logo

propaganda's Introduction

Propaganda in Hindi

Repository of the data and models generated by Mr. Shyam Ratan as part of his MPhil dissertation titled 'Automatic Detection Of Propaganda In Hindi On Social Media', in collaboration with the UnReaL-TecE LLP. All the information regarding the dataset, models, and results are given below:

Dataset

v0.1

This data is used for automatic detection of propaganda in Hindi and two supporting case studies in MPhil Dissertation. This version has two phases and both of these phases has two divisons: Annotated data and Raw data. Phase - 1 has the data which is used for the pilot of this work for automatic detection and result as well. The Phase - 2 data is used in two imporatnt case studies of this research work. Though, in the final stage of this research whole data of phase - 1 and 2 is used to train and test language models for automatic detection of propaganda in Hindi.

Data Structure

Navigation - Dataset -> v0.1 -> Phase - 1 -> {1. Annotated and 2. Raw} - 500 articles/documents; Phase - 2 -> {1. Annotated and 2. Raw} - 399 articles/docuemnts.
Here in this version data is distributed in two phases which is mentioned earlier. Phase - 1 has annoated data of 8 Hindi newspapers viz. Aap Ki Kranti, Amar Ujala, Dainik Bhaskar, Dainik Jagran, Hindustan, Media Vigil, Saamana, tfipost and 2 peiodicals viz. Kamal Sandesh and Panchjanya, for balancy each source has 50 annotated news articles/documents. Each direcotry has same numbers of ann and txt file, here ann files has propaganda labeled spans and sentences while txt files has data. This phase also has same amount of raw data news articles/documents in Raw direcotry. Where as Phase - 2 has annotated data of 18 newspapers viz. Aap Ki Kranti, Agnibaan, Amar Ujala, Dainik Bihar, Dainik Bhaskar, Dainik Jagran, Haribhoomi, Hindustan, Jansandesh Times, Janwarta, Media Vigil, Naye Samikaran, Newslaundry, Panchjanya, Saamana, Swarajya, Swatantra Bharat, tfipost, Virarjun and 2 periodicals viz. Kamal Sandesh and Panchjanya. Here each source has 20 annoated news articles/documents except Panchajanya has 19 articles.

v0.2

This data is annotated but not used in this Mphil work because of maintaing the balancy of data used in automatic detection and case studies.

v0.3

This is still in raw form and developed from social media, which available for intrested people who can use this data for furture study in this direction.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.