mahmoodlab / patch-gcn Goto Github PK

View Code? Open in Web Editor NEW

114.0 5.0 29.0 1.18 GB

Context-Aware Survival Prediction using Patch-based Graph Convolutional Networks - MICCAI 2021

Home Page: http://mahmoodlab.org

License: GNU General Public License v3.0

Python 87.41% Jupyter Notebook 12.59%

wsi wsi-images gcn survival-prediction pathology histopathology mahmoodlab

patch-gcn's People

Contributors

Stargazers

Watchers

patch-gcn's Issues

Some questions about Censorship

Does Censorship=1 mean that the survival time of the patient is fully observed?
And in TCGA data, how to judge the censorship of each patient？

Code seems not runnable as-is

Hi!

Very impressed with the paper. My lab would love to try PatchGCN on our own dataset, but we ran into a problem:

It seems that the below variable reg_fn is never defined

Patch-GCN/utils/core_utils.py

Line 182 in f6d771b

 train_loop_survival_cluster(epoch, model, train_loader, optimizer, args.n_classes, writer, loss_fn, reg_fn, args.lambda_reg, args.gc, VAE) 

This was also mentioned in issue #4 (comment)

Is it possible that some code is missing?

I want to run MI-FCN. However, I don't know how to get the 'fast_cluster_ids.pkl' file?

Originally posted by @genhao3 in #17 (comment)

Thanks for sharing this code, it looks very interesting. Is possible to train PatchGCN_Surv in a classification manner? For example, can we use the hazards variable to model the risk probabilities of a WSI and use these as input for cross entropy loss to train the model?

Thanks,
Jeroen

What data does the dataset used contain?

What data do I need to download in addition to the SVS formatted images? Survival predictions should also include survival time and whether death has occurred. Where can these data be downloaded?

Can the dataset be re-uploaded?

Hello, I am trying to reproduce your code, but it seems that the dataset is unavailable. Could you please re-upload the dataset?

When running the main file, it shows a missing .pt file.

The error message you provided indicates that the code is unable to find the file 'tcga_blca_20x_features\graph_files\tcga_blca_20x_features\graph_euclidean_files\TCGA-XF-AAMW-01Z-00-DX1.875DED6D-805B-418C-8C9E-AE45FDECC504.pt'.

P-Value computation

Thanks for your wonderful work.
I've met some trouble when I read the article. In Fig.4 description, there is "

Out-of-sample risk predictions from each validation fold were pooled and then plotted against their survival time.

".
I can't really understand it. Could you please upload the code of how to compute it or explain it in detail ?

Failed to reproduce c-index of 0.58 for LUAD

Hello, when I've successfully reproduced the c-indices in BRCA, UCEC and BLCA. However, when it came to LUAD, I got 0.529072 ± 0.057346, which is much lower than 0.585 ± 0.012 which you mentioned in your article. I repeated multiple times using the same paramters in your code and the results are similar to 0.53. Could you please help me out?

MI-FC reproducibility

Hi, I am trying to reproduce your results but am having trouble with MI-FC as i do not have the required fast_cluster_ids.pkl file. I saw quite a bit of discussion in various issues, the closest answer to what I was looking for was this:

Hi @genhao3 - the fast_cluster_ids.pkl is a dictionary that I load for each cancer type, which maps case_id to [M x 1]-dim array of cluster assignments 1⋯C, where M is the number of patches and the indices correspond to the cluster assignment of a given patch embedding. You can use packages like faiss to generate these cluster assignments for running MI-FCN / DeepAttnMISL comparisons.

but it is a bit unspecific and the library mentioned is not straightforward for me. Could you provide either code used to create the file or the .pkl file itself? Any help is much appreciated :)
I know it is a baseline and not your proposed model, but would be interested to run this also.

Best
Valentin

Questions Regarding Censored Data Handling and Case Selection

Hello @Richarizardd ,

Thanks for your great work. I have a question regarding Case Selection: The /data/data/tcga_gbmlgg_all_clean.csv.zip file contains 1042 cases. However, cBioPortal reports a higher number of combined LGG and GBM cases. Could you clarify the criteria used to select these 1042 cases? This information was not found in the repository. I apologize if I missed this information.

Thank you for your time and any clarification you can provide!

deepattenMISL performance

I have noticed that deepattenMISL performs even worse than AttenMIL in terms of performance. I would like to ask for some technical details: when using deepattenMISL and attention MIL, did you use all the patches from the WSI, or did you randomly select some of the patches from the WSI?

Failed to generate TCGA-HT-7483-01Z-00-DX1.7241DF0C-1881-4366-8DD9-11BF8BDD6FBF.pt

Hello,
thanks for your great work. I'm trying to reproduce your results in your article. I succeeded in BRCA. However, when I tried to reproduce in GBMLGG, it failed to find TCGA-HT-7483-01Z-00-DX1.7241DF0C-1881-4366-8DD9-11BF8BDD6FBF.pt. It is due to failure in the previous step that extract features by resnets. Besides, it seems that the svs is not qualified enough. I wonder how should I generate a pt for it. Thank you.

CoxSurvLoss has a lot of bugs

As far as we know, CoxSurvLoss requires a batch size greater than or equal to 2.
However, the batch size is 1(Default: 1, due to varying bag sizes). Therefore, the CoxSurvLoss has a lot of bugs.

c-index value on validation set is very high

In the BRCA dataset, the c-index of the training set is around 0.60, while the c-index of the validation set reaches 0.80+.That's impossible!Could your team please answer my question?Let me know your team are responsible for the repo,thanks!
Moreover,how should I set the n_classes when I just want to make survival prediction in TCGA-BRCA dataset?

Is the 5 c-index of 5-fold cross-validation averaged to get the final c-index?

c-index value on validation set is very high

In the BRCA dataset, the c-index of the training set is around 0.67, while the c-index of the validation set reaches 0.82+.
Is this normal?

main.py

Will the code of main.py be announced? And the compressed files in datasets_csv cannot be decompressed. Looking forward to your reply！

"……csv.zip" files not find

First of all, thank you for your team's contribution to the whole project!
I want to ask whether the csv.zip file in the datasets_csv file is the source data file or the program generated file, because I always get an error when running main.py, indicating that the csv.zip file cannot be found. I'm using my own svs data, but don't understand what csv.zip file really is? I'm using my own svs data, but don't understand what csv.zip file really is?

Graph Construction

Hi @Richarizardd,

Thanks for sharing the code. I have a question regarding graph construction in latent space:
I was wondering why do you pass coords to the query function for latent graph generation as:

model.fit(features)
a = np.repeat(range(num_patches), radius-1)    
b = np.fromiter(chain(*[model.query(coords[v_idx], topn=radius)[1:] for v_idx in range(num_patches)]),dtype=int)
edge_latent = torch.Tensor(np.stack([a,b])).type(torch.LongTensor)

I wonder if it is the way that DGC model requires for the graph to be constructed?

Thanks

About TCGA data

With regard to the TCGA data, I would like to know whether the .svs file used in the experiment is a Tissue Slide or a Diagnostic Slide? Because I found that there are two kinds of slide images of TCGA, namely Tissue Slide and Diagnostic Slide.
Also, I can't find GBMLGG, GBMLGG equals GBM?

unzip error

Hello!I want to repeat the experiment,but after downloading the repo I tried to unzip it with unzip command in Linux ,some errors occurred.

BRCA

Thanks for your amazing work. And I have a question: in /datasets/dataset_survival.py, we can see that in line 49-50:

if "IDC" in slide_data['oncotree_code']: # must be BRCA (and if so, use only IDCs)
    slide_data = slide_data[slide_data['oncotree_code'] == 'IDC']

That means, for BRCA data, you did not use all the data(less than 1022), but in your paper, we can see you describe BRCA data as Breast Invasive Carcinoma (BRCA) (n = 1022), is it a mistake? And I want to know why do you only use the BRCA data that slide_data[slide_data['oncotree_code'] == 'IDC']. Looking forward to your reply！

What's the difference between the two folders dataset_csv and datasets_csv?

"Unable to open object (object 'features' doesn't exist)"

Hi Mahmood Lab team,
Thanks for creating this interesting repo.
Well, I've generated h5 files via Basic, Fully Automated Run command in /CLAM repo. Now, when I'm trying to generate the graph using a notebook in /Patch-GCN (https://github.com/mahmoodlab/Patch-GCN/blob/master/WSI-Graph%20Construction.ipynb), I got the following log.
In this case, could you please suggest how to solve the issue.
Thanks

KeyError Traceback (most recent call last)
Input In [47], in <cell line: 1>()
----> 1 createDir_h5toPyG(h5_path, save_path)

Input In [44], in createDir_h5toPyG(h5_path, save_path)
6 try:
7 wsi_h5 = h5py.File(os.path.join(h5_path, h5_fname), "r")
----> 8 G = pt2graph(wsi_h5)
9 torch.save(G, os.path.join(save_path, h5_fname[:-3]+'.pt'))
10 wsi_h5.close()

Input In [43], in pt2graph(wsi_h5, radius)
74 from torch_geometric.data import Data as geomData
75 from itertools import chain
---> 76 coords, features = np.array(wsi_h5['coords']), np.array(wsi_h5['features'])
77 assert coords.shape[0] == features.shape[0]
78 num_patches = coords.shape[0]

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File ~/.local/lib/python3.8/site-packages/h5py/_hl/group.py:264, in Group.getitem(self, name)
262 raise ValueError("Invalid HDF5 object reference")
263 else:
--> 264 oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
266 otype = h5i.get_type(oid)
267 if otype == h5i.GROUP:

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5o.pyx:190, in h5py.h5o.open()

KeyError: "Unable to open object (object 'features' doesn't exist)"

MI-FCN corresponds to DeepAttnMISL in the paper?

RuntimeError: CUDA out of memory when not skipping large bags

I encountered a memory issue when I did not skip bags of large size. The error message I received was:
RuntimeError: CUDA out of memory. Tried to allocate 2.92 GiB (GPU 0; 23.65 GiB total capacity; 19.63 GiB already allocated; 188.56 MiB free; 21.93 GiB reserved in total by PyTorch)
My setup includes a 4090 graphics card, and I have a total of six 4090 GPUs and 4 3090 GPUS in my system.
Are there any solutions or suggestions to address this memory issue?

main.py

When will you publish the code of mian.py？
thanks.

survival information

Hello, may I ask how to get the survival information of TCGA？

AttributeError: 'Tensor' object has no attribute 'keys'

In the following code,I met the error:AttributeError: 'Tensor' object has no attribute 'keys'.I find the data_list[0] is a list not a dict,so the code reports the error.Could you please tell me how to fix the bug?

def from_data_list(cls, data_list, follow_batch=[], exclude_keys=[], update_cat_dims={}):
            r"""Constructs a batch object from a python list holding
            :class:`torch_geometric.data.Data` objects.
            The assignment vector :obj:`batch` is created on the fly.
            Additionally, creates assignment batch vectors for each key in
            :obj:`follow_batch`.
            Will exclude any keys given in :obj:`exclude_keys`."""
            import pdb
            pdb.set_trace()
            #AttributeError: 'Tensor' object has no attribute 'keys'
            #cls:datasets.BatchWSI.BatchWSI
            keys = list(set(data_list[0].keys) - set(exclude_keys))
            assert 'batch' not in keys and 'ptr' not in keys

Type(wsi_bag) is Tensor,and Type(path_featuresl) is list,so the type(data_list[0]) is list.How could I understand the 'keys = list(set(data_list[0].keys) - set(exclude_keys))'?
The content of path_features is:

I find that wsi_bag = torch.load(wsi_path) in dataset_survival.py return a Tensor variable instead of a dict,so the path_features only contains a Tensor instead of a dict.
Could you please tell me how to get the right data format?

Files missing

Hello, thanks for your wonderful work and source code. I notice that there may be some files missing,

from utils.file_utils import save_pkl, load_pkl

Could you upload them ?

Preprocessing steps

Hi :),
Nice paper.
I want to reproduce the results but I notice that you describe the processing of the WSIs but, can you share the script for the preprocessing step?

Patch-GCN main.py do not work

Firstly, I should thank Mahmood Lab @mahmoodlab and their scientists @Richarizardd for such a outstanding work!!
I just downloaded the whole slide image (WSI) of TCGA-BLCA from GDC, and preprocessed the svs files using your CLAM pipeline as the instructions in the github main page of CLAM:

(1) created patches using the Ostu method, which generated masks, stitches, and patches (code: CUDA_VISIBLE_DEVICES=0,1 python create_patches_fp.py --source /mnt1/TCGA/data --save_dir /mnt1/TCGA/ostu_result --patch_size 256 --preset tcga_ostu.csv --seg --patch --stitch);
(2) extracted features based on the h5 files generated above, which created pt files for subsequent analysis. (code: CUDA_VISIBLE_DEVICES=0,1 python extract_features_fp.py --data_h5_dir /mnt1/TCGA/ostu_result --data_slide_dir /mnt1/TCGA/data --csv_path /mnt1/TCGA/ostu_result/process_list_autogen.csv --feat_dir /mnt1/TCGA/ostu_result --batch_size 512 --slide_ext .svs)

(3) I created graphs based on the instructions you indicated at https://github.com/mahmoodlab/Patch-GCN/blob/master/WSI-Graph%20Construction.ipynb, which created graph based pt files.

Then, I tried to analyze TCGA-BLCA WSI using Patch-GCN. To to this, I set the directory ostu_result as the DATA_ROOT_DIR, within which, I put two sub-directories: (1) splits/5foldcv/tcga_blca/split_0.csv...split_4.csv; (2) tcga_blca/.pt (pt files of the graphs generate above).
the directories and data are organized as follow:
ostu_result/
|--tcga_blca
||--slide_1.pt
||--slide_2.pt
||...
|--splits
||--5foldcv
|||--tcga_blca
||||--splits_0.csv
||||--splits_1.csv
||||--splits_2.csv
||||--splits_3.csv
||||--splits_4.csv

Then I run patch-GCN using the following code: CUDA_VISIBLE_DEVICES=0,1 python main.py --data_root_dir "/mnt1/TCGA/ostu_result" --which_splits 5foldcv --split_dir tcga_blca --mode graph --model_type patchgcn .
However, the error comes with the trace backs :Traceback (most recent call last):
Traceback (most recent call last):
File "main.py", line 221, in
results = main(args)
File "main.py", line 54, in main
train_dataset, val_dataset = dataset.return_splits(from_id=False,
File "/mnt1/Patch-GCN/datasets/dataset_survival.py", line 193, in return_splits
train_split = self.get_split_from_df(all_splits=all_splits, split_key='train')
File "/mnt1/Patch-GCN/datasets/dataset_survival.py", line 180, in get_split_from_df
split = Generic_Split(df_slice, metadata=self.metadata, mode=self.mode, data_dir=self.data_dir, label_col=self.label_col, patient_dict=self.patient_dict, num_classes=self.num_classes)
File "/mnt1/Patch-GCN/datasets/dataset_survival.py", line 287, in init
with open(os.path.join(data_dir, 'fast_cluster_ids.pkl'), 'rb') as handle:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt1/TCGA/ostu_result/tcga_blca_20x_features/fast_cluster_ids.pkl'

I am just a beginner of python, and this error really confused me. I just preprocessed the WSI data and organized the directories and pt files as the instructions, however, do not created and even did not know how to create the tcga_blca_20x_features/fast_cluster_ids.pkl file or directory. I guess there must be something wrong with the data directory or dataset preparation process for the main.py program, I really do not know how to address it.

Would you please help me to address this error? Are there any detailed pipelines/examples/tutorials to guide me preparing the data and the associated directories for the mian.py program of Patch-GCN?

About learning rate

Where do you change the learning rate? I don't see one scheduler to decrease the learning rate.

mahmoodlab / patch-gcn Goto Github PK

patch-gcn's People

Contributors

Stargazers

Watchers

Forkers

patch-gcn's Issues

Recommend Projects

Recommend Topics

Recommend Org