Disease Prediction from Symptoms

This project explores the use of machine learning algorithms to predict diseases from symptoms.

Algorithms Explored

The following algorithms have been explored in code:

Naive Bayes
Decision Tree
Random Forest
Gradient Boosting

Dataset

Source-1

The dataset for this problem used with the main.py and increased_accuracy.py script is downloaded from here:

https://www.kaggle.com/kaushil268/disease-prediction-using-machine-learning

This dataset has 133 total columns, 132 of them being symptoms experienced by patiend and last column in prognosis for the same.

Source-2

The dataset for this problem used with the Jupyter notebook is downloaded from here:

https://impact.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/index.html

This dataset has 3 columns:

Disease  | Count of Disease Occurrence | Symptom

You can either copy paste the whole table from here to an excel sheet or scrape it out using Beautifulsoup.

Directory Structure

|_ dataset/
         |_ training_data.csv
         |_ test_data.csv

|_ saved_model/
         |_ [ pre-trained models ]

|_ main.py [ code for laoding kaggle dataset, training & saving the model]

|_ increased_accuracy.py [ enhancement for Decision Tree and Random Forest models ]

|_ notebook/
         |_ dataset/
                  |_ raw_data.xlsx [Columbia dataset for notebook]
         |_ Disease-Prediction-from-Symptoms-checkpoint.ipynb [ IPython Notebook for loading Columbia dataset, training model and Inference ]

Usage

Please make sure to install all dependencies before running the frontend, using the following:

pip install -r requirements.txt

Training Models with Enhanced Accuracy

To enhance the accuracy and cross-validation scores of the Decision Tree and Random Forest models while reducing overfitting, utilize the increased_accuracy.py script.

Run the script as follows:

python increased_accuracy.py

Interactive Frontend

For running an interactive frontend or sharing it with others, please run frontend.py.

python frontend.py

Standalone Demo

For running the inference on test set or on custom inputs, you can also use the infr.py file as follows:

python infer.py

NOTE: This project is for demo purposes only. For any symptoms/disease, please refer to a Doctor.

silver-shadow / disease-prediction-from-symptoms Goto Github PK