Giter Club home page Giter Club logo

toxic-comments-analysis's Introduction

Toxic-comments-analysis


This repository contains multiple ways of analyzing toxic comments. The goal is to compare different methods and see which one is the most efficient.

A quick report written in French can be found in the main branch.

Repository organization

All the toxic comments models are available on different branches of the repository. You can find the following branches:

  • main : contains the main code of the project
  • tf-idf : contains the code for the tf-idf model
  • RNN : contains the code for the basic RNN model
  • GRU : contains the code for the GRU model
  • LSTM : contains the code for the LSTM model

/!\ There is 2 LSTM branches : LSTMathis is the one that can be found in the report while LouiSTM is a notebook where we tried to go further in the implementation.

Each model has its metrics displayed on the code.

Prerequisites

  • Python 3.11.5
  • Tensorflow 2.16.0

Installation

We are using glove embeddings, you can download them from the following link: https://nlp.stanford.edu/projects/glove/ put the file glove.6B.100d.txt and others in the datasets folder.

Data

We are using the Jigsaw dataset, you can download it from the following link: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data

Project structure

The project is structured as follows:

  • datasets : contains the data used for the project
  • helpers : contains functions used in the project
  • GloVe : contains the glove embeddings
  • models : contains the trained models
  • notebook.ipynb : The notebook used for training the models
  • pipeline.py : The pipeline to implement the model in production

toxic-comments-analysis's People

Contributors

spihcness avatar louislecouturier avatar

Watchers

 avatar

Forkers

theblackhat17

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.