Giter Club home page Giter Club logo

disaster-response-classification's Introduction

Disaster Response Pipeline Project

Table of Contents

  1. Motivation
  2. Some Cool Screenshots of the App
  3. Installation
  4. Data
  5. Usage
  6. Licensing, Authors, Acknowledgements

1. Motivation

The repository contains Machine Learning solutions for classifying disaster response messages. It uses text data provided by Appen. Thanks to their contribution, we had public data on which to train and test our NLP models. The project contains three main components:

  • ETL pipeline: it extracts data from multiple Appen data sources, transforms the data, and merges it into a suitable format.
  • Modeling pipeline: it creates features based on the ETL data and trains and evaluates different models.
  • Web app: it allows users to interact with the model and see the results.

2. Some Cool Screenshots of the App

Data Analysis

Model In Action

3. Installation

The code was run with:

  • Ubuntu 20.4
  • Python 3.7

Create & activate a conda environment with:

conda create --name disaster-response-classification python=3.7
conda activate disaster-response-classification

Install all the requirements:

pip install --upgrade pip
pip install -r requirements.txt

4. Data

We used the diaster response messages dataset provided by Appen. It contains two data sources: messages and categories. Because the data sources are pretty small, to simplify the process, we left the raw files in the repository at data/.

All the supported categories are the following:

  • related
  • request
  • offer
  • aid_related
  • medical_help
  • medical_products
  • search_and_rescue
  • security
  • military
  • water
  • food
  • shelter
  • clothing
  • money
  • missing_people
  • refugees
  • death
  • other_aid
  • infrastructure_related
  • transport
  • buildings
  • electricity
  • tools
  • hospitals
  • shops
  • aid_centers
  • other_infrastructure
  • weather_related
  • floods
  • storm
  • fire
  • earthquake
  • cold
  • other_weather
  • direct_report

5. Usage

File Description

 disaster_response_pipeline
          |-- app
                |-- templates
                        |-- go.html
                        |-- master.html
                |-- plots.py
                |-- run.py
          |-- data
                |-- disaster_message.csv
                |-- disaster_categories.csv
                |-- data.db (outcome of the ETL pipeline)
                |-- process_data.py
          |-- models
                |-- classifier.pkl (outcome of the modeling pipeline)
                |-- train_classifier.py
          |-- notebooks
                |-- ETL Pipeline Preparation.ipynb
                |-- ML Pipeline Preparation.ipynb
          |-- images
                |-- data_analysis.png
                |-- model_in_action.png
          |-- README

Instructions

Step 1 (Optional)

Step 1

From the root directory run the ETL pipeline:

python data/process_data.py -messages-filepath data/disaster_messages.csv -categories-filepath data/disaster_categories.csv -database-filepath data/data.db

Step 2

From the root directory train the model:

python models/train_classifier.py -database-filepath data/data.db -model-filepath models/classifier.pkl -config-filepath models/config.yaml

Hyperparameter tuning with GridSearchCV:

python models/train_classifier.py -database-filepath data/data.db -model-filepath models/classifier.pkl -config-filepath models/config_gridsearch.yaml -gridsearch True

Step 3

From the root directory run the web app:

python -m app.run

To access the web app, open a browser and navigate to http://localhost:3001/

Apache Airflow (Optional)

Install

The ETL pipeline has Airflow support. This part is optional, because you can run the ETL script directly with Python.
For development purposes, you can install it locally with ( if you just want to run it go directly to the next step):

# Airflow needs a home. `~/airflow` is the default, but you can put it
# somewhere else if you prefer (optional)
export AIRFLOW_HOME=~/airflow

# Install Airflow using the constraints file
AIRFLOW_VERSION=2.3.3
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
# For example: 3.7
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.3.3/constraints-3.7.txt
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

# Visit localhost:8080 in the browser and use the admin account details

NOTE: The steps are taken from the Airflow official documentation.

Run

Start Airflow locally with docker-compose:

cd airflow-docker
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
docker-compose up airflow-init
docker-compose up

NOTE: You need docker and docker-compose installed on your machine.

6. Licensing, Authors, Acknowledgements

The code is licensed under the MIT license. I encourage anybody to use and share the code as long as you give credit to the original author. I want to thank Appen for their contribution to making the data available. Without their assistance, I would not have been able to train the model.

If anybody has machine learning questions, suggestions, or wants to collaborate with me, feel free to contact me at [email protected] or to connect with me on LinkedIn.

disaster-response-classification's People

Contributors

iusztinpaul avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.