Giter Club home page Giter Club logo

mil-pathology's Introduction

Refining Coarse Annotations on Single Whole-Slide Image

Detailed and exhausitive annotations on whole-slide images (WSI) are extremely labor-intensive and time-consuming. In this repository, we provide implementation of two methods -- (1) Deep k-NN (DkNN); (2) Label Cleaning Multiple Instance Learning (LC-MIL) -- for refining these coarse annotations, and producing a more accurate version of annotations. The figure below shows an example of the coarse annotations and the refined annotations produced by one of our method (LC-MIL). image.png Noticeably, although both methods are machine learning based, the refinement can be conducted on each single slide, and NO externel data is needed.

Dataset

Usage

DkNN

Training

python train_model.py 

positional arguments:
  slide_root            The location of the whole-slide image.
  slide_ID              Name of the whole-slide image.
  slide_format          Dataformat the whole-slide image. Permitted format can
                        be `.svs`, `,ndpi`, and `.tif`.
  ca_path               The path to the coarse annotations. File format should
                        be `.sav`
  model_save_root       Where model will be saved
  
optional arguments:
  -h, --help            show this help message and exit
  --remove_blank REMOVE_BLANK
                        How to remove blank regions (i.e. identify tissue
                        regions) of WSI. We provide three functions: 0:
                        convert to HSV and then OTSU threshold on H and S
                        channel; 1: apply [235, 210, 235] on RGB channel,
                        respecitively; 2: convert to gray image, then OTSU
                        threshold. Default is 0. For new dataset, the user is
                        encouraged to write customed function
  --focal_loss FOCAL_LOSS
                        Whether or not to use focal loss (True: using focal
                        loss; Flase: using cross entropy), default is false
  --patch_shape PATCH_SHAPE
                        Patch shape(size), default is 256
  --unit UNIT           Samllest unit when cropping patches, default is 256
  --gpu GPU             gpu
  --lr LR               Initial Learning rate, default is 0.00005
  --step_size STEP_SIZE
                        Step size when decay learning rate, default is 1
  --reg REG             Reg,default is 10e-5

Applying/Inference

python apply_model.py 

positional arguments:
  slide_root            The location of the whole-slide image.
  slide_ID              Name of the whole-slide image.
  slide_format          Dataformat the whole-slide image. Permitted format can
                        be `.svs`, `,ndpi`, and `.tif`.
  ca_path               The path to the coarse annotations. File format should
                        be `.sav`
  model_dir             Where to load the model (to conduct feature
                        extraction)
  feature_save_root     Where the mapped features will be saved
  knn_save_root         Where the KNN results (distance and index) will be
                        saved
  heatmap_save_root     Where the predicted heatmap will be saved

optional arguments:
  -h, --help            show this help message and exit
  --remove_blank REMOVE_BLANK
                        How to remove blank regions (i.e. identify tissue
                        regions) of WSI. We provide three functions: 0:
                        convert to HSV and then OTSU threshold on H and S
                        channel; 1: apply [235, 210, 235] on RGB channel,
                        respecitively; 2: convert to gray image, then OTSU
                        threshold. Default is 0. For new dataset, the user is
                        encouraged to write customed function
  --focal_loss FOCAL_LOSS
                        Whether or not to use focal loss (True: using focal
                        loss; Flase: using cross entropy), default is False
  --patch_shape PATCH_SHAPE
                        Patch shape(size), default is 256
  --unit UNIT           Samllest unit when cropping patches, default is 256
  --gpu GPU             gpu

Template command

cd DkNN
python train_model.py ../Data test_016 .tif ../coarse_annotations.sav . 
python apply_model.py ../Data test_016 .tif ../coarse_annotations.sav model_test_016.pth . . . 

We can not actually upload our test WSI, test_016.tif to this repository due to the space limit of Github, but you can find it in the google drive

LC-MIL

Training

python train_model.py 

positional arguments:
  slide_root            The location of the whole-slide image.
  slide_ID              Name of the whole-slide image.
  slide_format          Dataformat the whole-slide image. Permitted format can
                        be `.svs`, `,ndpi`, and `.tif`.
  ca_path               The path to the coarse annotations. File format should
                        be `.sav`
  model_save_root       Where model will be saved

optional arguments:
  -h, --help            show this help message and exit
  --remove_blank REMOVE_BLANK
                        How to remove blank regions (i.e. identify tissue
                        regions) of WSI. We provide three functions: 0:
                        convert to HSV and then OTSU threshold on H and S
                        channel; 1: apply [235, 210, 235] on RGB channel,
                        respecitively; 2: convert to gray image, then OTSU
                        threshold. Default is 0. For new dataset, the user is
                        encouraged to write customed function
  --length_bag_mean LENGTH_BAG_MEAN
                        Average length of bag (Binomial distribution),default
                        = 10
  --num_bags NUM_BAGS   Number of bags to train,default = 1000
  --focal_loss FOCAL_LOSS
                        Whether or not to use focal loss (True: using focal
                        loss; Flase: using cross entropy), default is FL
  --patch_shape PATCH_SHAPE
                        Patch shape(size), default is 256
  --unit UNIT           Samllest unit when cropping patches, default is 256
  --gpu GPU             gpu
  --lr LR               Initial Learning rate, default is 0.00005
  --step_size STEP_SIZE
                        Step size when decay learning rate, default is 1
  --reg REG             Reg,default is 10e-5

Applying/Inference

python apply_model.py 

positional arguments:
  slide_root            The location of the whole-slide image.
  slide_ID              Name of the whole-slide image.
  slide_format          Dataformat the whole-slide image. Permitted format can
                        be `.svs`, `,ndpi`, and `.tif`.
  model_dir             The path to the MIL model
  heatmap_save_root     Where the predicted heatmap will be saved

optional arguments:
  -h, --help            show this help message and exit
  --remove_blank REMOVE_BLANK
                        How to remove blank regions (i.e. identify tissue
                        regions) of WSI. We provide three functions: 0:
                        convert to HSV and then OTSU threshold on H and S
                        channel; 1: apply [235, 210, 235] on RGB channel,
                        respecitively; 2: convert to gray image, then OTSU
                        threshold. Default is 0. For new dataset, the user is
                        encouraged to write customed function
  --length_bag_mean LENGTH_BAG_MEAN
                        Average length of bag (Binomial distribution),default
                        = 10
  --num_bags NUM_BAGS   Number of bags to train,default = 1000
  --focal_loss FOCAL_LOSS
                        Whether or not to use focal loss (True: using focal
                        loss; Flase: using cross entropy), default is FL
  --patch_shape PATCH_SHAPE
                        Patch shape(size), default is 256
  --unit UNIT           Samllest unit when cropping patches, default is 256
  --gpu GPU             gpu

Template command

cd LC_MIL
python train_model.py ../Data test_016 .tif ../coarse_annotations.sav . 
python apply_model.py ../Data test_016 .tif model_test_016.pth . . . 

We can not actually upload our test WSI, test_016.tif to this repository due to the space limit of Github, but you can find it in the google drive

Post-processing

Post-processing procedure for both methods (DkNN and LC-MIL), and the illustration can be found in Post-process.ipynb.

Publication

@misc{wang2021label,
      title={Label Cleaning Multiple Instance Learning: Refining Coarse Annotations on Single Whole-Slide Images}, 
      author={Zhenzhen Wang and Aleksander S. Popel and Jeremias Sulam},
      year={2021},
      eprint={2109.10778},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

mil-pathology's People

Contributors

jasminezhen218 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.