Training a Lung Cancer Histological Type Classifier Using a Synthetic Dataset Generated by Progressive Growing GAN (PGGAN)
This repository contains the code for a Bachelor Thesis project that aims to train a classifier to identify the histological type of lung cancer nodules using a synthetic dataset generated by Progressive Growing Generative Adversarial Networks (PGGAN).
Early detection of the lung cancer subtype is crucial for the prognosis and treatment of the disease, which can significantly increase the survival rate of patients. Currently, subtype detection is done through a biopsy, a time-consuming and highly invasive procedure with potential clinical implications.
To circumvent these problems, this work proposes using computed tomography images combined with deep learning (DL) algorithms, specifically convolutional neural networks, to classify the histological type of nodules. However, DL algorithms require a large number of annotated images by a specialist, which can be costly in the case of medical images.
To solve this problem, the use of PGGAN to generate additional synthetic images is proposed to train the classifier and improve its performance.
Three different datasets were used in this study: HCFMRP, LungPET-CT-Dx, and NSCLC Radiomics. Together the datasets contain 3263 images from 205 exams, whose diagnosis was adenocarcinoma and 4274 images from 220 exams, whose diagnosis was squamous carcinoma.
The results show that the use of PGGAN-generated images improves the performance of the classifier, reaching an accuracy of 0.940 compared to 0.906 for the model trained only with real images, representing a 3.8% improvement.
Adenocarcinoma and Squamous cell carcinoma nodules (left to right) generate by the PGGAN.
This repository contains the code for training and evaluating both the classifier and the PGGAN. The code is implemented in Python the popular deep learning library PyTorch.