Giter Club home page Giter Club logo

automated-research-paper-classification's Introduction

Automated Research Paper Classification

Overview

This project aims to automate the classification of research papers into predefined categories using natural language processing (NLP) techniques. It leverages state-of-the-art models like SciBERT, augmented with BiLSTM layers, and custom data preprocessing and augmentation strategies to enhance model performance. This solution can significantly aid academic database management, systematic review processes, and the categorization of scholarly articles for researchers and publishers.

Files Description

Notebooks

  • Augmentation.ipynb

    • Description: Demonstrates data augmentation techniques specifically designed for textual data, enhancing the diversity of our training dataset to improve model robustness against varied linguistic expressions found in research papers.
  • Data_preprocessing.ipynb

    • Description: Outlines the preprocessing steps applied to the dataset of research papers, including text cleaning, tokenization, and normalization, preparing data for efficient model training.
  • bilstm_scibert.ipynb

    • Description: Combines the capabilities of SciBERT, an NLP model pre-trained on scientific literature, with BiLSTM layers to capture both the contextual and sequential nuances in research papers, aiming at improving classification accuracy.
  • linear_layer_after_scibert.ipynb

    • Description: Explores the impact of adding linear layers following SciBERT embeddings, fine-tuning the model to tailor it to the specific classification needs of our dataset.
  • main_model_training.ipynb

    • Description: The core notebook that orchestrates the model training process, from loading preprocessed data to training and validating the model, culminating in the evaluation of its performance on a test set.
  • weight_tensor_custom_loss.ipynb

    • Description: Implements a custom loss function that leverages a weight tensor to address class imbalance in the dataset, ensuring a fair and effective learning process.

Other Files

  • README.md
    • Provides an overview of the project, including detailed descriptions of all included files and instructions for their use.
pip install -r requirements.txt

automated-research-paper-classification's People

Contributors

soumyatus avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.