Giter Club home page Giter Club logo

hate_target's Introduction

Targeted Identity Group Prediction in Hate Speech Corpora

by Pratik Sachdeva, Renata Barreto, Claudia von Vacano, and Chris J. Kennedy

Overview

This repository provides the code to reproduce the analyses and figures developed in the paper Targeted Identity Group Prediction in Hate Speech Corpora by Sachdeva et al., published in 2022 at the 6th Workshop on Online Abuse and Harms (WOAH).

Abstract

The past decade has seen an abundance of work seeking to detect, characterize, and measure online hate speech. A related, but less studied problem, is the detection of identity groups targeted by that hate speech. Predictive accuracy on this task can supplement additional analyses beyond hate speech detection, motivating its study. Using the Measuring Hate Speech corpus, which provided annotations for targeted identity groups, we created neural network models to perform multi-label binary prediction of identity groups targeted by a comment. Specifically, we studied 8 broad identity groups and 12 identity sub-groups within race and gender identity. We found that these networks exhibited good predictive performance, achieving ROC AUCs of greater than 0.9 and PR AUCs of greater than 0.7 on several identity groups. We validated their performance on HateCheck and Gab Hate Corpora, finding that predictive performance generalized in most settings. We additionally examined the performance of the model on comments targeting multiple identity groups. Our results demonstrate the feasibility of simultaneously identifying targeted groups in social media comments.

Repository Structure

The code used for this paper are divided into two repositories. To train the target identity models, you will need the Tensorflow layers defined in the hate_measure repository. To run the scripts used to set up and train those models, run secondary analyses, and generate the figures, you will need the code in this repository.

The repository is divided into the following folders:

  • figures: Jupyter notebooks used to generate the figures in the paper.
  • hate_target: Contains the codebase used in scripts, analyses, and figure generation for this paper.
  • notebooks: Contains supplementary Jupyter notebooks used in secondary analyses.
  • scripts: Contains Python scripts used to train variants of the model, whose predictions were analyzed in the paper.

Set Up and Install

To run the code, first download this repository to your local machine. The easiest way to do this is to clone to code via SSH:

git clone [email protected]:dlab-projects/hate_target.git

Navigate to the cloned folder on your local machine. Then, create a new Anaconda environment with the environment.yml file:

conda env create -f environment.yml

Finally, install an editable version of this package using pip. Be sure to run the following command in the hate_target folder, where setup.py is visible:

pip install -e .

You should now have access to hate_target's functions as importable modules anywhere in your virtual environment.

Generate Figures

Run the Code

hate_target's People

Contributors

pssachdeva avatar

Stargazers

 avatar Guanqun Yang avatar Marta Marchiori Manerba avatar

Watchers

Claudia von Vacano avatar  avatar

hate_target's Issues

Incomplete Model on HuggingFace Hub

Hi,

I found that there is ucberkeley-dlab/hate-measure-roberta-base and ucberkeley-dlab/hate-measure-roberta-large on the HuggingFace Hub; they are the more performant models reported in your paper.

However, it seems that the uploaded models are incomplete; an error was prompted when I tried to load them into a classification pipeline.

OSError: ucberkeley-dlab/hate-measure-roberta-large does not appear to have a file named config.json. Checkout 'https://huggingface.co/ucberkeley-dlab/hate-measure-roberta-large/main' for available files.
from transformers import (
    pipeline,
)

clf = pipeline(
    "text-classification",
    "ucberkeley-dlab/hate-measure-roberta-large"
)

On a second inspection, I found that the available files are all around several MB. But a complete model should be several hundred MB or even a few GB.

I am wondering if you have a plan to release the complete model on HuggingFace Hub?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.