This project aims to help label research papers to fit arXiv's extensive labelling system. We used SciBERT to create a robust classification pipeline that uses the abstracts of research papers to sort them into 50+ labels. This leverages the now popular transformers mechanism to help the model understand the abstracts, and then use this information to accurately classify the papers.
Before running it on your machine, you will need to install the necessary libraries using:
pip install streamlit pandas scikit-learn transformers imbalanced-learn nltk torch
-
Install the requirements
$ pip install -r requirements.txt
-
Run the app
$ streamlit run streamlit_app.py