Giter Club home page Giter Club logo

disaster-response-pipeline's Introduction

Bootstrap logo

Disaster Response Pipeline

This repo contains the implementations of the Disaster Response Pipeline project, which is part of the Udacity Data Scientist Program.

Table of contents

Motivation

This project is part of the Udacity Data Scientist Program (Data Engineering). I followed the instructions to build ETL, NLP and machine learning pipelines during the course, and used the code and skills learned to complete this project.

Installation

Please note that to reproduce the results, you need to install the exact versions of the libraries using the pip:

pip install -r requirements.txt

I recommend to install into a virtual environment like Anaconda.

Structure

The folder structure of this repo is as follows:

|-app
  |-templates				# html templates 
    |-run.py				# script to run web demo
|-data
  |-disaster_categories.csv	# original categories dataset
  |-disaster_messages.csv	# original messages dataset
  |-process_data.py			# ETL pipeline script
  |-ETL Pipeline Preparation.ipynb	# the ETL pipeline preparation notebook
|-models
  |-train_classifier.py		# NLP&ML pipeline script
  |-ML Pipeline Preparation.ipynb	# the NLP&ML pipeline preparation notebook

Usage

  1. Run the following commands in the project's root directory to set up your database and model.
  • To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
  • To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
  1. Go to app directory: cd app

  2. Run your web app: python run.py

  3. Click the PREVIEW button to open the homepage

  4. Enter a message and click the green "Classify Message" button to see the results

Limitations

This implementation aims to record my learning path and made as a homework and contains several limitations:

  • The dataset is heavily unbalanced. Some categories have very small number of positive samples (even 0 positive sample). For categories with 0 positive samples, the classification results are meaningless. For categories with small amount of positive samples, other technics like over-sampling need to be introduced during the training process.
  • The model parameters could be further tuned. Since Udacity offers limited on-line computing resources, the grid search had to be done in a small space. More optimized parameters could be retrived with added computing power.

disaster-response-pipeline's People

Contributors

captainst avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.