Giter Club home page Giter Club logo

sdg-classification-bert's Introduction

sdg-classification-bert (Streamlit/Gradio App)

This repository powers a Streamlit app for classifying text with respect the United Nations Sustainable Development Goals (SDG). The classification model is a fine-tuned BERT and named sdgBERT. The labelled data used in fine-tuning sdgBERT model was obtained fron the OSDG Community Dataset publicly available at https://zenodo.org/record/5550238#.Y93vry9ByF4. The OSDG dataset include text from diverse fields; hence, the fine tuned BERT model and the streamlit app are generic and can be used to predict the SDG of most texts.

The streamlit app supports SDG 1 to SDG 16 shown in the image below image Source:https://www.un.org/development/desa/disabilities/about-us/sustainable-development-goals-sdgs-and-disability.html

Streamlit app link and key functions

The app can be accessed from two sources including:

The app has the following key functions:

  • Single text prediction: copy/paste or type in a text box
  • Multiple text prediction: upload a csv file (Note: The column contaning the texts to be predicted must be title "text_inputs". The app will generate an output csv file that you can download. This downloadable file will include all the original columns in the uploaded cvs, a column for predicted SDGs, and a columns prediction probability scores. If any of the text in text_inputs is longer that the maximum model sequence length of approximately 300 - 400 words (i.e. 512 word pieces), it will be automatically trancated. For now, if you want to analyse large documents using this model or streamlit app, I will recommend breaking the document into 300 to 400 word chunks, have each chunk in a cell in the "text_inputs" column of your cvs file. Hence, you can analyse large document page by page, where the text on each page will be in a csv cell.

In future updates of the app, support for directly analysing pdf documents may be added for ease of analysing large documents.

Use fine tuned BERT Transformer model directly

If you would like to directly use the fine tuned BERT model, you can easily achieve that unsing the code below:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("sadickam/sdg-classification-bert")

model = AutoModelForSequenceClassification.from_pretrained("sadickam/sdg-classification-bert")

Or just clone the model repo from Hugging Face using the code below:

git lfs install
git clone https://huggingface.co/sadickam/sdg-classification-bert

# if you want to clone without large files โ€“ just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1

OSDG online tool

The OSDG has an online tool for SDG clsssification of text. I will encourage you to check it out at https://www.osdg.ai/ or visit their github page at https://github.com/osdg-ai/osdg-data to learm more about their tool.

To do

  • Add model evaluation metrics
  • Citation information

sdg-classification-bert's People

Contributors

sadickam avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.