Giter Club home page Giter Club logo

wsics's Introduction

Introduction

This repository offers an implementation of the WSICS algorithm, as described in "Stain Specific Standardization of Whole-Slide Histopathological Images" (10.1109/TMI.2015.2476509). WSICS can create templates from H&E stained images and use these to normalize other H&E stained images accordingly.

Binaries

A precompiled binary is available for 64x Windows. Additionally a Docker Container can be build through the Dockerfile within the repository. These are both standalone, and don't require any additional work to function.

Compilation

This project aims to provide source code that is compatible with both Windows and Linux. However, Linux systems may vary in terms of available packages, which is why we don't officially support or provide binaries for any specific distribution.

In order to compile the WSICS binary, the following prerequisites must be met:

Due to the utilization of the ASAP image reading capabilities, all of its dependencies are required as well.

Usage

WSICS can be called through a CLI and accepts whole slide images in a tiled image format, or as a flat patch. The image reading is provided by the ASAP project, and thus provides any format that it does as well. WSICS attempts to locate tiles or static images that don't just contain background, if no tiles or static images are discovered that can be utilized for processing, adjusting the --background_threshold parameter can control the strictness of this process.

The normalization process requires that a template image is converted to a CSV file with relevant paramaters. Using another WSI directly isn't supported yet. Please see the --output_template parameter for the creation of a template file.

Docker

The containerized version of WSICS relies on volumes to access the required files and images. In order to access images and export results, a volume must be mounted through the -v option for docker. An example can be seen below:

docker run -v [local directory]:/data/ diagnijmegen/wsics --input /data/[image] --template_output /data/[template name]

It is possible for the algorithm to utilize a large amount of memory, which can result in a segfault error for Docker. To resolve this, the required training size can be diminished, or the pool of memory increased for Docker.

Input and Output

In order to execute the algorithm, the input parameter can be set with a file or directory path. If a directory path is offered, the algorithm will attempt to normalize all readable files within that directory. The output commands are interpreted based on the input, if a file is inserted, the output paths will be considered file paths as well, and vice versa for directories.

-i, --input [file or directory path]

If a directory is used as an input origin, the filenames themselves will be used to output the resulting normalized whole-slide images. To customize this output, a prefix or postfix can be set through the corresponding parameters.

--prefix [prefix string]
--postfix [postfix string]

To output normalized images, the image_output parameter must be set. This also outputs the LUT table as a tif file. If only the LUT table is required, the lut_output parameter can be set.

--image_output [file or directory path]
--lut_output [file or directory path]

A whole-slide image can be used as a reference for the normalization, which allows the algorithm to transform the source image to closer resemble the reference used. To do this, a set of template parameters are created and exported into a CSV file. This template can be generated by setting the template_output parameter, and used by setting the template_input parameter.

--template_input [file path to a template CSV file]
--template_output [file path to an output location]

If one or multiple whole-slide images contain ink or other heavily represented artifacts, then the ink or k parameter can be set. This will reduce the chance of a patch being selected to collect training pixels for the algorithm, and thus potentially insuring a better result.

-k, --ink

Several steps of the normalization process are based on randomized processes. In order to still offer a deterministic execution, the BOOST Mersenne Twister implementation has been utilized as the random generator. Its seed can be set through the seed variable.

-s, --seed [positive integer]

Training

The creation of the Look Up Table utilizes a Naïve Bayes classifier to determine the probabilities of a pixel belonging to a certain class. In order to train this classifier, pixels corresponding to the background, Eosine and Hematoxyline colored tissue is selected and added to a training set. The max_training and min_training parameters define the total size of the training set created and the minimum amount of selected pixels required to continue an execution.

--max_training [size as integer]
--min_training [size as integer]

The training pixels are selected from tiles that contain little to no background, this is done by calculating the amount of pixels that are near white or black. If this is higher than the percentage indicated by the background_threshold parameter, then the tile isn’t utilized for the selection of training pixels.

--background_threshold [positive float]

Additionally, the amount of detected ellipses are also considered when selecting tiles to extract pixels from. Normally this is calculated based on the tile size. However, it can also be set through the min_ellipses

The selection of Hematoxylin colored pixels is done by detecting ellipses within the tissue and then calculating the mean red density value of the HSD color space. The hema_percentile parameter then defines which ellipse mean is selected to serve as threshold for the selection of Hematoxylin pixels.

--hema_percentile [float value between 0 and 1]

The Eosin threshold is defined by collecting all pixels that aren’t considered as background or Hematoxylin stained, and then selecting the one that is nearest within the list, based on the percentage defined by the eosin_percentile parameter. The pixels passing the threshold are then utilized for the creation and collection of Eosin stained pixels.

--eosin_percentile [float value between 0 and 1]

Normally, tiles that contain few detected ellipses are skipped and ignored for the creation of the training set. However, it's possible to set a static lower limit for the amount of detection, such as when there's little actual tissue on the image. This can be done with the min_ellipses parameter.

--emin_ellipses [positive integer]

wsics's People

Contributors

karelger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

wsics's Issues

Allow direct usage of previously created LUT

Right now, only template parameters can be used to generate LUT's and normalized images. An option should be added that can directly transform a WSI with an existing LUT. Potentially define if a normalized WSI is output, a template csv or a LUT. (Or all three!)

  • Direct LUT application

    • Test whether or not feature extraction is required for proper template application
    • If no further feature extraction is required beyond the LUT creation, change algorithm to simply apply the LUT to an input WSI
  • Write resulting LUT's into a more easily accessible format

Output filepaths not consistent

The program has trouble recognizing whether an output should be a prefix or a full directory.

TODO:

  • Implement output based on whether or not an input path points towards a directory or a file.
  • Implement prefix and postfix parameters that can prepend and append when a directory serves as input.

Reduce Docker size

The size of the current Docker image is rather large, and should preferably be reduced.

Windows and Docker both output an invalid image

Currently both the Windows binary and the Dockerized instance output an unreadable image. This is likely due to incorrect DLL's or linking.

  • Determine cause
  • Implement fix
  • Release updated version

Whitespace Artifacts

Testing showcases artifacts being generated within whitespace regions.

  • Debug why this happens
  • Document parameterization cause, or fix bug causing this issue

Add python bindings

It would be really useful to have python bindings for the library so we could directly use it from python. Version 3.7 as our upcoming dockers are based on it, but 3.6 would be also useful.

runing code in docker can not output any file

when I run this script "docker run -it --name devtest --mount type=bind,source=D:/path/,target=/app diagnijmegen/wsics -i /app/530896.svs --template_output /app/template_output ", I can not get any template parameters output, can you provide a standard script for outputing template parameters(CSV file)

Test Deterministic Processing

The deterministic processing hasn't been tested yet, run tests and fix any potential issues.

  • Test deterministic processing on local machine
    • Fix potential issues
  • Test deterministic processing within docker container

Segfault errors for docker containers

Hi,
I'm getting the docker container randomly stopping with a segfault before the normalisation of the image is completed. I've tried reducing max training and increasing memory and the problem still occurs. Is there any guidance on what these sizes need to be for the docker container to be stable to run?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.