In this project, we proposed a multi-label classification model using LSTM and Bi-LSTM to classify the various toxic comments into six classes namely toxic, severe toxic, obscene, threat, insult and identity hate.
The dataset for this project is taken from Kaggle and is provided by the Conversation AI team (a research initiative co-founded by Jigsaw and Google). Word Embedding is performed to get insights from the previous research works. In this project, Glove.6B.300D is used which contain 6 billion tokens and each token is represented by 300D vector representation.
Download the Dataset from here.
Download the word embeddings from here
The following figure shows the distribution of the comments into six labels according to the length.
The following figure shows the proposed model of this project work.
If you found this project useful, then please consider giving it a ⭐ on Github and sharing it with your friends via social media.
Machine Learning Enthusiast #MachineLearning #DeepLearning #ToxicCommentsClassification #LSTM #Bi-LSTM #Python
Machine Learning Enthusiast #MachineLearning #DeepLearning #ToxicCommentsClassification #LSTM #Bi-LSTM #Python