Giter Club home page Giter Club logo

detecting-pneumonia's Introduction

Detecting Pneumonia

Chest x-ray images

By Nadine Amersi-Belton

Problem statement

This project was completed in September 2020, during the worldwide COVID-19 pandemic. To help diagnose patients chest X-rays are examined. Machine learning presents an opportunity to expediate diagnosis and prevent human errors.

Due to insufficient data pertaining to COVID-19 patients, we have chosen to apply machine learning tools, in particular deep neural networks, to detect pneumonia in chest x-ray images.

Components

  • Jupyter Notebook

The Jupyter Notebook is our key deliverable and contains details of our approach and methodology, data preprocessing and exploration, deep learning models and results.

  • Presentation

The presentation gives a high-level overview of our approach, findings and recommendations for non-technical stakeholders. It is aimed to be between 5 and 10 minutes long.

  • Data

The dataset was obtained from Kaggle. Due to the large file size, the data was not saved in this repository.

  • Blog Post

A blog post on Medium was created as part of this project.

Data exploration and preprocessing

The data consists of just over 5,000 images of chest x-rays of two classes, patients diagnosed with pneumonia and healthy/normal patients. We split the data into training, validation and test sets.

We noted that data is imbalanced, with 3 times more images of patients diagnosed with pneumonia.

Distribution in training set

Results and recommendations

We applied the following models:

  • basic neural network model
  • convolution neural network
  • convolution neural network with dropout layers
  • more complex CNN model
  • complex CNN model with L2 regularization
  • VGG19

The CNN model with dropout layers performed best on the validation set and was chosen as our final model. Its performance on the test set was as follows:

  • accuracy of 0.78
  • recall of 0.99
  • F1 score of 0.85

The confusion matrix for the test set is as follows:

Confusion matrix

With only 3 false negatives, our model has high recall and thus maximises patient safety.

Whilst we would prefer an overall higher accuracy, our focus is on recall as this metric is particularly important for patient safety and to minimize the legal risk.

We would recommend the following actions:

  • gather additional data, this classification was undertaken on a small sample size of around 5k images and served as a proof of concept
  • address class imbalance to seek to improve performance using say oversampling techniques
  • use this tool to support medical professionals whilst it is further improved on

Contact

detecting-pneumonia's People

Contributors

nadinezab avatar

Watchers

James Cloos avatar  avatar

Forkers

vikasb512

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.