Giter Club home page Giter Club logo

ml_final_project's Introduction

Machine Learning Fall 2022 Final Project

2 labels version Open In Colab

3 labels version Open In Colab

Project Description

We are trying to solve the problem of automating hate speech and offensive language detection.

Hate speeches are common on social media, and it would be easier for such speeches to be regulated if some program can automatically detect them. The problem is similar to the language recognition in hw3 lab in that we take a natural language input as a sequence, and train a model to predict some labels associated with such input sequence. The unique part of this task is that hate speech/offensive language is sometimes hard to detect because it really depends on the context the language is used. By automating hate speech and offensive language detection, we could contribute to making a more healthy internet environment.

Install Dependencies

On MacOS/Linux

pip install -r requirements.txt

Quickstart

ML_project_3labels.ipynb is a jupyter notebook file contains the 3 labels classifier model we wrote.

ML_project_2labels.ipynb is a jupyter notebook file in which we combined the "Offensive" and "Hate" languages in the dataset together to make binary classification

Methods Documentation

create_train_and_test_set_balanced(X, y, train_ratio=0.8)

Parameters

  • X: array of sentence embeddings
  • y: labels
  • train_ratio: proportion of size of training set to

Returns

  • X_train: Training data
  • X_rem: Testing data
  • y_train: Training labels
  • y_rem: Testing labels

model.fit(train_loader, epochs=300, lr=1e-5, interval=100)

Parameters

  • train_loader: Dataloader for the training dataset
  • epochs: number of epochs in training
  • lr: learning rate of optimizer
  • interval: frequency to output loss information

model.validate(valid_loader)

Parameters

  • valid_loader: Dataloader for the validation dataset

Returns

  • The average validation loss

model.accuracy(test_loader)

Parameters

  • test_loader: Dataloader for the testing dataset

Returns

  • Accuracy score of the model on the testing dataset

model.predict(sentence)

Parameters

  • sentence: Input sentence to predict its category

Returns

  • Hate, Offensive or neither

model.metrics(test_loader)

Note: this method is only in 2 labels version

Parameters

  • test_loader: Dataloader for the testing dataset

Returns

  • Evaluation metrics including accuracy, precision, recall, f1 score and a ROC graph

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.