Giter Club home page Giter Club logo

malaria-detection's Introduction

malaria-detection

Neural network for malaria detection on segmented cell images. The current state of the project is summarized here and I will update this as further work is done.

Prerequisites

  • Python 3
  • PyTorch
  • Matplotlib
  • Pandas (for results analysis)
  • Numpy (for results analysis)

Dataset

The dataset used for this project is provided by the National Institutes of Health. Available here.

The dataset consists of 27,558 segmented cells from the thin blood smear slide images of 151 patients used for malaria screening. Cells are "stained" with an agent to distinguish the parasite. The dataset contains an equal number of parasitized and uninfected specimens.

Healthy cells usually display a uniform appearance whereas parasitized cells will display distinctive marks (image courtesy of pyimagesearch):

malaria-images

Dataset was divided into three splits : training (60%), validation(20%) and test (20%).

Images are resized to 96x96 before training, normalized by channel and augmented with rotation, translation, sheering, scaling and flipping.

Models

All models are defined in models.py. The PaperCNN model is detailed below:

PaperCNN

Two other models were tested so far. The 'CustomVGG' models follows a similar architecture, but stacks additional convolutional layers before pooling and relies on 3x3 filters [Input > 2 x Conv(3x3) > Max-Pool + BN > 3 x Conv(3x3) > Max-Pool + BN > 3 x Conv(3x3) > Max-Pool + BN > Avg-Pool > FC(1x256) > Output].

Finally, the 'CustomInception' model applies a single 3x3 convolution and a stack of two 3x3 convolutions, both on the same input. The output of both branches are concatenated on the channel dimension and pooled before moving forward. The process is repeated three times before average pooling and sending to the fully-connected layer detailed above. The idea was to see if having two different receptive fields working on the same input in parallel would lead to any improvements.

ReLU was used for all activations.

Results and Hyper-parameters

The best results achieved so far with the three models are the following :

Accuracy PaperCNN CustomVGG CustomInception
Train 98.35 % 99.63 % 98.76 %
Validation 97.70 % 98.08 % 97.55 %

And here's a summary of the best experiment (98.08 % on validation set):

Model Optimizer Epochs completed Patience Learning Rate Dropout (convolutions) Dropout (fully connected)
CustomVGG Adam 536 200 0.001 0.05 0.5

I will provide results on the test set after implementing the changes detailed below.

Further Steps

Results do not quite match those achieved with similar models by Rajaraman and al. (99.09 % with their custom CNN). Reading the paper gave a few ideas as to how to improve this project:

  • Cell images were randomly assigned to training, validation and test sets. This should arguably have been done with patients instead (there are multiple cell specimens per patient) to avoid any 'data leakage' and contamination of the validation and test sets.
  • Models could probably benefit from pretraining on a larger dataset to strengthen the extraction of generic features (e.g. ImageNet).
  • Hyper-parameters were tuned using basic grid-search on handpicked discrete values. The process should be randomized and applied on ranges instead.
  • Models should be evaluated using k-fold cross-validation and other performance metrics.

malaria-detection's People

Contributors

sachamorin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.