Giter Club home page Giter Club logo

topic-classification's Introduction

๐Ÿท๏ธ Topic Classification of UN Speeches

๐Ÿ“ Description

This project implements a semi-supervised approach to classify UN speeches.

We have implemented this approach in 2 ways:

1. ๐ŸŒ Graph Neural Network

The method is best illustrated with the following diagram:

Approach 1

  • Generate word embeddings using BERT Sentence Transformer
  • Generate a graph using cosine similarity for edges and sentence as the node
  • Generate embeddings using Node2Vec
  • Train a Neural Network to classify into topics using graph embeddings.

2. ๐Ÿง  Neural Networks

The flowchart illustrating this approach:

Approach 2

  • Generate word embeddings using BERT Sentence Transformer
  • Train a Neural Network (N1) on these embeddings
  • Pseudo-label data using N1
  • Stack labelled and pseudo-labelled data
  • Train a more complex Neural Network (N2)

Read the Detailed Report for further information.

๐Ÿ“ฆ Dataset

The dataset for this project contains approximately 2 million sentences from UN General Debate speeches held from 1970 to 2016.

A sample of the dataset is saved as csv files in this repo. The original is publicly available on the Harvard Dataverse and on my GDrive.

โš™๏ธ Training Setup

  1. Download the dataset from the above GDrive link and unzip it into data folder
  2. Execute the preprocess.py file
  3. Execute either:
    1. approach1.py to train the model using the first approach

      OR

    2. approach2.py to train the model using the second approach

๐Ÿš€ Inference Demo

  1. Download the contents from the above GDrive link
  2. Put the csv files in the data folder
  3. Put everything else in the weights folder
  4. For inference, execute inference2.py to use the saved weights from the second approach. The program will ask for an input sentence and will output the predicted class.

โš ๏ธ Requirements

  • pandas==2.2.1
  • numpy==1.26.4
  • nltk==3.8.1
  • maptlotlib==3.8.3
  • sentence_transformers==2.5.1
  • tensorflow==2.16.1
  • gensim==4.3.2
  • node2vec==0.4.6

๐Ÿ‘ค Contributors

  • Yash Jain
  • Abhinav Shukla

topic-classification's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

achilles107

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.