The Project is part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight. It aims to build a machine learning pipeline to categorize emergency messages based on the needs communicated by the sender.
['request', 'offer', 'aid_related', 'medical_help', 'medical_products',
'search_and_rescue', 'security', 'military', 'child_alone', 'water',
'food', 'shelter', 'clothing', 'money', 'missing_people', 'refugees',
'death', 'other_aid', 'infrastructure_related', 'transport',
'buildings', 'electricity', 'tools', 'hospitals', 'shops',
'aid_centers', 'other_infrastructure', 'weather_related', 'floods',
'storm', 'fire', 'earthquake', 'cold', 'other_weather',
'direct_report']
- app
| - template
| |- master.html # main page of the web application
| |- go.html # classification result page of the web application
|- run.py # script for running the web application using Flask
- data
|- disaster_categories.csv # messages categories dataset usesd for training the model
|- disaster_messages.csv # messages dataset used for training the model
|- process_data.py # Script data processing
|- DisasterResponse.db # database to save clean data to
- models
|- train_classifier.py # Model training script
|- classifier.pkl # Model file
- notebooks
|- ETL Pipeline Preparation.ipynb
|- ML Pipeline Preparation.ipynb
- Create a
Python 3.6
conda virtual environmentconda create --name py36 python=3.6
- Activate the new environment
conda activate py36
- Install required packages by running the following command in the app's directory
pip install -r requirements.txt
- In order to be able to download the model file you need to have git-lfs installed and used in the local repository copy. For that, install Git LFS following the instructions in https://git-lfs.github.com/. Now install git-lfs in the local repository with
git lfs install
Run the processing script in the data folder
e.g. process_data.py disaster_messages.csv disaster_categories.csv DisasterResponse.db
Run the training script in the models
folder
e.g. train_classifier.py ../data/DisasterResponse.db classifier.pkl
You can check the scripts PEP8 style
by running the following command in the project's top folder pylint --rcfile .pylintrc file_path
.
e.g.
pylint --rcfile .pylintrc models/train_classifier.py
F1 Score of the disaster messages categories
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Run the following command in the app's directory to run your web app.
python run.py
-
Go to http://0.0.0.0:3001/ You will get the home page with visualizations of the used training data set.
- The message catgeories are highlighted in green
Sample Output