havakv / pycox Goto Github PK

View Code? Open in Web Editor NEW

779.0 779.0 180.0 2.5 MB

Survival analysis with PyTorch

License: BSD 2-Clause "Simplified" License

Python 100.00%

deep-learning machine-learning neural-networks python pytorch survival-analysis

pycox's Issues

Predict log partial hazard

Hi,

May I ask if there is a way to predict log partial hazard from DeepSurv or other models?

Thanks,

Is 'entity embedding' implemented here?

Hi again,

Thanks to your kind support, I could successfully load data and run some models.

I have one more question about data processing.

In the paper, categorical covariates are embedded by using entity embeddings.

For the neural networks, we standardize the numerical covariates and encode the categorical covariates by entity embeddings (Guo and Berkhahn, 2016) half the size of the number of categories.

Could you please let me know how can I use 'entity embeddings' in this package (or torchtuples)?
(if there is an example code or documentation, it would be very helpful for me)
I found entity embedding code at here and here, but don't have any idea how to combine the entity embedding with MLP (or CoxPH model) for the survival analysis dataframe.

Thanks.

Combining LogisticHazard models

Hi,
I built three LogisticHazard models and was trying to make an ensemble of them along the lines of:
https://discuss.pytorch.org/t/combine-two-model-on-pytorch/47858/5

However this failed probably since the ensemble did not include the parts specific to LogisticHazard. Is there a way to do that, or can you give me some guidance how to ensemble LogisticHazard models ?

Thank you!

How do I extend the prediction window for Cox-Time model?

I would like to predict the survival probabilities further into the future than the model predicts.
I tried to set max_duration to 1200 when predicting the probabilities as such:
surv = model.predict_surv_df(x_test, max_duration=1200)

but printing the last 8 rows of surv shows that 1) it does not predict further than 1012 periods and 2) the periods are not increasing by the same amount in each row (993>998>999>1005)

print(surv.iloc[920:,:1])
989 0.469803
991 0.469803
992 0.469803
993 0.469803
998 0.469803
999 0.469803
1005 0.469803
1012 0.469803

How do I extend the prediction window and obtain a survival probability for every single period?

pad_col in DeepHit pmf

Hi Havard,

I do have a question regarding the pmf computation in the DeepHit Models: As I noticed, you are zero-padding the logits at the end before taking the softmax and I was wondering what the reason behind that was? Why is it not enough to take the softmax on the unpadded logits?

Many thanks for your help & time!

how to calculate estimated risk

Hi, I was asked to calculate the following metric:
https://scikit-survival.readthedocs.io/en/latest/generated/sksurv.metrics.concordance_index_censored.html
which requires 3 input arrays:

event_indicator - Boolean array denotes whether an event occurred
event_time – Array containing the time of an event or time of censoring
estimate – Estimated risk of experiencing an event

how would i calculate estimated risk from a, e.g. DeepHitSingle model? or alternatively from any Cox models

edit: for CoxPH with 1 out feature, model.predict() on test data results in an identical value between ev.concordance_td('antolini') and sksurv.metrics.concordance_index_censored()

Is there really an added value to couple the pycox to torchtuples?

While torchtuples simplifies the interfaces of code usage quite a lot, it appears that the interface of functions highly depends on the implementation, as well as the semantics of torchtuples, while in many cases, especially for research purposes, it might be in favor to expose the detailed model implementation and the training/testing procedure, and the encapsulation of both inside torchtuples may significantly reduce the overall readability and flexibility, e.g. quite huge amount of time may be required to write code for bridging.

But if I understand the codes correctly, it is also true that the base.Model can take pre-trained nn.Modules and perform predictions directly, which means any external models can be brought into the pipeline of pycox, which is quite neat. It might be possible to decouple the pycox from torchtuples to some extent, i.e. under the assumption that the model is always nn.Module, and the input/target as tensors, while another wrapper can be put into effect to bridge pycox and torchtuples, which means the flexibility retain to use pipeline with or without torchtuples.

Still, really appreciate the effort. I gotta say it is amazing that eventually there is an awesome pytorch version of DeepSurv here:)

Evaluation metric

Hi,
How do we find out the variance or confidence intervals of the evaluation metric obtained?
eg. C^td(Confidence interval), IBS(CI) etc.

grid_search issue

hi,
when i try to use the gird search method, i met the problem that the training process couldn't repeat/initialize. when i try another set of hyper-parameters, the training loss is still the last one.

for example,
when i run

for lr in [0.01,0.1,0.001]:
    net = tt.practical.MLPVanilla(in_features, num_nodes, out_features, batch_norm,
                              dropout, output_bias=output_bias)
    model = CoxPH(net, tt.optim.Adam)
    model.optimizer.set_lr(0.01)
    log = model.fit(x_train, y_train, batch_size, epochs, callbacks, verbose=True,
                val_data=val, val_batch_size=batch_size)

the result is

0:	[0s / 0s],		train_loss: 4.8275,	val_loss: 3.8472
1:	[0s / 0s],		train_loss: 4.7032,	val_loss: 3.8083
2:	[0s / 0s],		train_loss: 4.6465,	val_loss: 3.8090
3:	[0s / 0s],		train_loss: 4.6266,	val_loss: 3.8291
4:	[0s / 0s],		train_loss: 4.6113,	val_loss: 3.8204
5:	[0s / 0s],		train_loss: 4.6186,	val_loss: 3.8120
6:	[0s / 0s],		train_loss: 4.5811,	val_loss: 3.8143
7:	[0s / 0s],		train_loss: 4.6007,	val_loss: 3.8200
8:	[0s / 0s],		train_loss: 4.5901,	val_loss: 3.8217
9:	[0s / 0s],		train_loss: 4.5858,	val_loss: 3.8167
10:	[0s / 0s],		train_loss: 4.5680,	val_loss: 3.8179
11:	[0s / 0s],		train_loss: 4.5737,	val_loss: 3.8225

0:	[0s / 0s],		train_loss: 4.8113,	val_loss: 3.8552
0:	[0s / 0s],		train_loss: 4.7864,	val_loss: 3.8699

the net can't be initialized for the next training and i tried del model or del log and initialize the net , they also didn't work.

predict survival function

Hi
Thanks for developing this package. It is really useful.
In the "DeepHitSingle" class, What's the differences between "predict", "predict_surv", and "predict_surv_df"?

No loss return during model training

Hi, I encounter a problem when using LogisticHazard and DeepHitSingle model to fit a customer churn model.

The dataset after performing label transformation is as below. There are around 1400 features and the duration ranges from 0 to 60. The dataset is imbalanced. The ratio of censoring data (event = 0) to remaining data is 5 in the training set and 20 in the eval set. I discretize it by setting num_duration equal 3.

But when I try to fit the model, it just shows the training time per epoch, but no training and eval loss is shown. May I know how to solve the problem? Many thanks.
log = model.fit(x_train, y_train, batch_size, epochs, callbacks, val_data=val_sample)

ValueError: Network output `phi` is too small for `idx_durations`

Ubuntu 16.04.6 LTS
Python 3.7.7
torch 1.2.0

Hi, as I followed the example_4 in ”Get Started“, something wrong with this:

callbacks = [tt.cb.EarlyStopping(patience=5)]
epochs = 50
verbose = True
log = model.fit_dataloader(dl_train, epochs, callbacks, verbose, val_dataloader=dl_test)

here is my network with input-shape(32,1,246,246) :

class Net(nn.Module):
    def __init__(self,n_intervals):
        super().__init__()
        self.backbone = models.resnet50(pretrained=False)
        self.backbone.conv1 = nn.Conv2d(1,64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
        self.fc1 = nn.Linear(1000, 128)
        self.fc2 = nn.Linear(128,n_intervals)
        self.bn1 = nn.BatchNorm1d(1000)
        self.bn2 = nn.BatchNorm1d(128)
        
    def forward(self, x):
        x = self.backbone(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.fc1(x)
        x = self.bn2(x)
        x = F.relu(x)
        x = self.fc2(x)
        return x

The error information as follow:

ValueError                                Traceback (most recent call last)
<ipython-input-18-7ae9e531dd45> in <module>
      2 epochs = 50
      3 verbose = True
----> 4 log = model.fit_dataloader(dl_train, epochs, callbacks, verbose, val_dataloader=dl_test)

~/anaconda3/envs/pytorch37/lib/python3.7/site-packages/torchtuples/base.py in fit_dataloader(self, dataloader, epochs, callbacks, verbose, metrics, val_dataloader)
    227                 if stop: break
    228                 self.optimizer.zero_grad()
--> 229                 self.batch_metrics = self.compute_metrics(data, self.metrics)
    230                 self.batch_loss = self.batch_metrics['loss']
    231                 self.batch_loss.backward()

~/anaconda3/envs/pytorch37/lib/python3.7/site-packages/torchtuples/base.py in compute_metrics(self, data, metrics)
    178         out = self.net(*input)
    179         out = tuplefy(out)
--> 180         return {name: metric(*out, *target) for name, metric in metrics.items()}
    181 
    182     def _setup_metrics(self, metrics=None):

~/anaconda3/envs/pytorch37/lib/python3.7/site-packages/torchtuples/base.py in <dictcomp>(.0)
    178         out = self.net(*input)
    179         out = tuplefy(out)
--> 180         return {name: metric(*out, *target) for name, metric in metrics.items()}
    181 
    182     def _setup_metrics(self, metrics=None):

~/anaconda3/envs/pytorch37/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/anaconda3/envs/pytorch37/lib/python3.7/site-packages/pycox/models/loss.py in forward(self, phi, idx_durations, events)
    466     """
    467     def forward(self, phi: Tensor, idx_durations: Tensor, events: Tensor) -> Tensor:
--> 468         return nll_logistic_hazard(phi, idx_durations, events, self.reduction)
    469 
    470 

~/anaconda3/envs/pytorch37/lib/python3.7/site-packages/pycox/models/loss.py in nll_logistic_hazard(phi, idx_durations, events, reduction)
     41         raise ValueError(f"Network output `phi` is too small for `idx_durations`."+
     42                          f" Need at least `phi.shape[1] = {idx_durations.max().item()+1}`,"+
---> 43                          f" but got `phi.shape[1] = {phi.shape[1]}`")
     44     if events.dtype is torch.bool:
     45         events = events.float()

ValueError: Network output `phi` is too small for `idx_durations`. Need at least `phi.shape[1] = 2717.0`, but got `phi.shape[1] = 4

PicklingError while trying to save the model

Hi all,

I used CoxPH and at the end when trying to save the trained model with save_net like that model.save_net('data_root/model'), I get this error:

PicklingError: Can't pickle <class 'torch._C._VariableFunctions'>: it's not the same object as torch._C._VariableFunctions

Any help with that, please?

Discretization error

Hi, the following code

num_durations = 10
labtrans = MTLR.label_transform(num_durations)
get_target = lambda df: (df['time'].values, df['event'].values)
y_train = labtrans.fit_transform(*get_target(data_train))
y_val = labtrans.transform(*get_target(data_val))
y_test = labtrans.transform(*get_target(data_test))

gives me the error:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

It is curious that the same code works smoothly before but it no longer works now that I have changed the dataframe features (the columns time and event have remained the same, without null values, and therefore the problem should not arise).
A suggestion as to what the error might be in such cases would be ideal because I cannot understand what causes it.

Thanks in advance and congratulations on your excellent work.

EDIT: the column event had a negative value, which gave rise to the error.

DL-based survival analysis for high dimensional data

Hi, havakv,
I am a radiologist, interesting in your work in DL based survival analysis. Recently, I used the pycox for processing medical survival data. I have some questions to you:

When i used N-MTLR for processing my data, i found it doses not work well. My data has 2800+ input variables, while with 243 samples. So, i first used a AE to reduce variables to 70, thus use N-MTLR for training, in this case, it can perform well. So i ask you when the dimension of data is significantly higher than no. of samples, does N-MTLR still work well? Do you have some experience? or can N-MTLR network can be instead by a AE network, for both dimension reduction and survival analysis?
In N-MTLR model, due to its non-PH model, how to know the individual weight of each input variable for developed function? Can N-MTLR outputs a samilar hazard ratio to know which one is important predictor of survival?
I am a doctor, so i hope you can help me, how to save the predicted probabilities ( point-sample to-point probability) to a csv. file?

hope receive you response

if possible, you can response to: [email protected]

Is there a way to save a trained model?

Hi,

I was able to train a DeepSurv model following the tutorial, but I was wondering if I can save the trained model. I tried to pickle it, but it generates the following error:

AttributeError: Can't pickle local object '_ActionOnBestMetric.on_fit_start.<locals>.<lambda>'

I also attempted to use state_dict, but CoxPH doesn't have such an attribute.

In the end, I would like to create a checkpoint so that I can save and load models. Could you please let me know if there's a method? Thank you.

Predicting ranks

First, thank you for this very useful package and your open-source work. Second, is there a way to predict ranks directly? I can use model.predict_surv_df(dataframe) to predict a dataframe of risk of event for each chosen time point, but can this be converted to a rank for observations? Say rank 1 for the observation with the highest and so on, similar to sksurv package? One example could be sksurv's Gradient boosting predict function, see here.

cox_ph_loss_sorted

Hi,

many thanks for implementing this easy to use and flexible package. I have a short question regarding the implementation of the partial likelihood function of deepsurv (cox_ph_loss_sorted). In the paper is a cumulative sum over the risk sets, in your implementation however this is approximated by taking the sum over all ranked samples.

Could you comment on why the approximation in cox_ph_loss_sorted is legit? What is the reasoning behind it?

This would help a lot! Thanks!

Make example notebook on how to use torch optimizers/schedulers, early stopping callbacks etc

Make an example notebook on how to use callbacks for added monitoring/functionality.

Examples could include

Learning rate schedulers such as in #49
Monitor metrics such as Concordance #49
Early stopping base on Concordance #49

About dataset

Hi,
When I carried out cox_models_1_introduction in examples file. I find that there are no metric and support datasets in datasets module. I was wondering how I could deal with such problem?
Looking forward your reply!

MTLR - transform predicted survival probability to hazard rate

Hi, I wonder if I could ask about the transformation from predicted survival probability to hazard rates using MTLR model.

In discrete-time model such as MTLR, could we simply apply this formula h(t) = ( S(t-1) - S(t) ) / S(t-1) to transform the predicted survival probability to hazard rate in specific time interval?

But in the below implementation in class LogisticHazard(models.base.SurvBase) in pycox.models, the transformation from hazard to survival function is a bit different from the formula above. May I know which one should I adopt in discrete-time models when the predicted survival probability is already discretized by time interval?

def predict_surv(self, input, batch_size=8224, numpy=None, eval_=True, to_cpu=False,
                     num_workers=0, epsilon=1e-7):
        hazard = self.predict_hazard(input, batch_size, False, eval_, to_cpu, num_workers)
        surv = (1 - hazard).add(epsilon).log().cumsum(1).exp()
        return tt.utils.array_or_tensor(surv, numpy, input)

Many thanks!

Integration of Deep Survival Machines

Hi, I am the first author on the paper, 'Deep Survival Machines: Fully Parametric Survival Regression and Representation Learning for Censored Data with Competing Risks
and think it might be a good addition to pycox. My current code for DSM is in torch and here.. https://github.com/chiragnagpal/DeepSurvivalMachines

I'd look forward to beginning discussions regarding integration in case that's of interest...

Cheers !

Survival time predictions (with CoxPH) using medical images

Hello,

I'd like to build a model which takes images and predicts overall survival time as continuous. For that reason, I followed the model shown in this jupyter notebook 04_mnist_dataloaders_cnn.ipynb by using CoxPH instead of LogisticHazards. However, I got 2 different errors. I am using databatchloader by the way.

When I tried with CoxPH and fit the model with:

callbacks = [tt.cb.EarlyStopping()] epochs = 100 verbose = True log = model.fit_dataloader(dl_train, epochs, callbacks, verbose, val_dataloader=dl_val)

Running this code : net is same with the sample notebook stated above.
model = CoxPH(net, tt.optim.Adam(0.01)) surv = model.predict_surv_df(dl_test_x) gave me this error:

`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 surv = model.predict_surv_df(dl_test_x)

/usr/local/anaconda/lib/python3.6/site-packages/pycox/models/cox.py in predict_surv_df(self, input, max_duration, batch_size, verbose, baseline_hazards_, eval_, num_workers)
153 """
154 return np.exp(-self.predict_cumulative_hazards(input, max_duration, batch_size, verbose, baseline_hazards_,
--> 155 eval_, num_workers))
156
157 def predict_surv(self, input, max_duration=None, batch_size=8224, numpy=None, verbose=False,

/usr/local/anaconda/lib/python3.6/site-packages/pycox/models/cox.py in predict_cumulative_hazards(self, input, max_duration, batch_size, verbose, baseline_hazards_, eval_, num_workers)
123 if baseline_hazards_ is None:
124 if not hasattr(self, 'baseline_hazards_'):
--> 125 raise ValueError('Need to compute baseline_hazards_. E.g run model.compute_baseline_hazards()')
126 baseline_hazards_ = self.baseline_hazards_
127 assert baseline_hazards_.index.is_monotonic_increasing,\

ValueError: Need to compute baseline_hazards_. E.g run model.compute_baseline_hazards() `

Hence, once I tried to run model.compute_baseline_hazards() it gave me this error:

`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 _ = model.compute_baseline_hazards()

/usr/local/anaconda/lib/python3.6/site-packages/pycox/models/cox.py in compute_baseline_hazards(self, input, target, max_duration, sample, batch_size, set_hazards, eval_, num_workers)
82 if (input is None) and (target is None):
83 if not hasattr(self, 'training_data'):
---> 84 raise ValueError("Need to give a 'input' and 'target' to this function.")
85 input, target = self.training_data
86 df = self.target_to_df(target)#.sort_values(self.duration_col)

ValueError: Need to give a 'input' and 'target' to this function.`

Since the training data is shape of (torch.Size([16, 1, 128, 128]), (torch.Size([16]), torch.Size([16]))), I didn't understand how to give the input and target to the model.compute_baseline_hazards(). I basically fed the image as input and tuples of time and event values as target then it throw another error saying that to much to unpack.

Could you please help me to understand how can I solve this issue?

I really appreciate any help.
Regards,
Asli Y.

Example of how to use time-dependent covariates

Hi, thanks for the repo!

It looks from the article that CoxTime can do time-dependent covariates. Is this functionality implemented in pycox? If so, do you have any examples on how to format the inputs to the model for this usecase?

Thanks again,
Guillem

Assertion Error for Eval_Surv

Thanks for this awesome package. The version I used is 0.2.0. Here is the code:

_ = model.compute_baseline_hazards()
surv = model.predict_surv_df((x1, x2))
print(np.isnan(np.sum(surv.to_numpy()))) #Output False
print(pd.Series(surv.index.values).is_monotonic) #Output True
ev = EvalSurv(surv, durations, events, censor_surv='km')
return ev.concordance_td()

However, the following error showed up:

File "test.py", line 106, in CV
Cindex = Coxnnet_evaluate(model, x1_test, x2_test, y1_test, y2_test)
File "test.py", line 85, in Coxnnet_evaluate
ev = EvalSurv(surv, durations, events, censor_surv='km')
File "/home/group8/anaconda3/envs/BHI/lib/python3.8/site-packages/pycox/evaluation/eval_surv.py", line 33, in init
self.censor_surv = censor_surv
File "/home/group8/anaconda3/envs/BHI/lib/python3.8/site-packages/pycox/evaluation/eval_surv.py", line 51, in censor_surv
self.add_km_censor()
File "/home/group8/anaconda3/envs/BHI/lib/python3.8/site-packages/pycox/evaluation/eval_surv.py", line 107, in add_km_censor
return self.add_censor_est(surv, steps)
File "/home/group8/anaconda3/envs/BHI/lib/python3.8/site-packages/pycox/evaluation/eval_surv.py", line 95, in add_censor_est
censor_surv = self._constructor(censor_surv, self.durations, 1-self.events, None,
File "/home/group8/anaconda3/envs/BHI/lib/python3.8/site-packages/pycox/evaluation/eval_surv.py", line 36, in init
assert pd.Series(self.index_surv).is_monotonic
AssertionError

Would you mind suggesting some possible reasons? Thank you.

Error occurs with 'download_kkbox()' method from 'kkbox' dataset

Hi,

I've just tried to download kkbox dataset with download_kkbox() method,
but FileNotFoundError occurs during during Extracting 'train'... step.

How can I handle this problem? Thank you

out_features size for MLPVanilla

Does MLPVanilla's out_features for DeepSurv have to stay as size 1? Or is it possible to increase it?

Error occurs with downloading kkbox dataset

Hi, havakv

I have the same issue with #42. (Error occurs with 'download.kkbox()').

My OS is windows and installed with pip.
I checked the path you suggested and 3 files (for training) were there.
But, I got the same problem (FileNotFoundError) during 'extracting train...'.

Do you have any idea about this?

Thank you

Hi,
What OS are you on (Window, Mac, Linux)?
Have you installed the package using pip or by pulling this repo?
Have you set the PYCOX_DATA_DIR environment variable?

Can you check the directory <pycox_path>/datasets/data/kkbox and list the files there?
The <pycox_path> can be found by running

import pycox
pycox.__file__  # '/Users/teboozas/anaconda3/envs/some_env/lib/python3.8/site-packages/pycox/__init__.py'

and remove the __init__.py part. So I want to you list the content of a folder such as
/Users/teboozas/anaconda3/envs/some_env/lib/python3.8/site-packages/pycox/datasets/data/kkbox (if you're on a mac).
You should be seeing some *.7z files.

Originally posted by @havakv in #42 (comment)

For reproducibility

Hi,

I think the seed numbers(numpy and torch) are fixed to make it reproducible in DeepSurv.

But If I try again after a few days, it doesn't work. Is this problem from 'He initialization'?

How can I make it reproducible?

Customized evaluation metric or loss function for imbalanced dataset

Hi, I have encountered another problem, which is specific to the dataset I am using. Since the dataset is very imbalanced, I undersample the negative class (censoring data with event = 0) in the training set but not the validation set. But when I use the negative log-likelihood loss, eg. NLLMTLRLoss, the validation loss is far less than the training loss. Is there another way to add customized evaluation metrics in model.fit for model training aside from the negative log-likelihood? Or I just write another _Loss class considering the class weight?

Many thanks!

Survival time predictions (with CoxCC) using medical images

Hello,

This issue is similar to my other post but this time I wanted to build a model which takes images and predicts overall survival time as continuous by using CoxCC (instead of LogisticHazards) and followed the model shown in this jupyter notebook 04_mnist_dataloaders_cnn.ipynb. However, I got the error below. I am using databatchloader by the way.

When I tried with CoxCC and fit the model with:

callbacks = [tt.cb.EarlyStopping()] epochs = 100 verbose = True model = CoxCC(net, tt.optim.Adam(0.01)) log = model.fit_dataloader(dl_train, epochs, callbacks, verbose, val_dataloader=dl_val) gave me this error:

ValueError Traceback (most recent call last)
in ()
----> 1 log = model.fit_dataloader(dl_train, epochs, callbacks, verbose, val_dataloader=dl_val)

/usr/local/anaconda/lib/python3.6/site-packages/torchtuples/base.py in fit_dataloader(self, dataloader, epochs, callbacks, verbose, metrics, val_dataloader)
227 if stop: break
228 self.optimizer.zero_grad()
--> 229 self.batch_metrics = self.compute_metrics(data, self.metrics)
230 self.batch_loss = self.batch_metrics['loss']
231 self.batch_loss.backward()

/usr/local/anaconda/lib/python3.6/site-packages/pycox/models/cox_cc.py in compute_metrics(self, input, metrics)
50 raise RuntimeError("All elements in input does not have the same length.")
51 case, control = input # both are TupleTree
---> 52 input_all = tt.TupleTree((case,) + control).cat()
53 g_all = self.net(*input_all)
54 g_all = tt.tuplefy(g_all).split(batch_size).flatten()

/usr/local/anaconda/lib/python3.6/site-packages/torchtuples/tupletree.py in cat(self, dim)
420 @Docstring(cat)
421 def cat(self, dim=0):
--> 422 return cat(self, dim)
423
424 def reduce_nrec(self, func, initial=None):

/usr/local/anaconda/lib/python3.6/site-packages/torchtuples/tupletree.py in cat(seq, dim)
202 """
203 if not seq.shapes().apply(lambda x: x[1:]).all_equal():
--> 204 raise ValueError("Shapes of merged arrays need to be the same")
205
206 type_ = seq.type()

ValueError: Shapes of merged arrays need to be the same

Could you please help me to understand how can I solve this issue?
I really appreciate any help.

Best regards,

Asli Y.

Getting risk scores

Thanks for making this library available. I'm currently trying to produce time-dependent AUCs for the available models. Is there a way to get the models to produce risk scores given features?

How can I get the KKBox split as described in the paper?

I really appreciate your commitment to this field.

I got kkbox dataset by using kkbox.read_df().

The kkbox dataset consists of 2,814,735 instances, so how can I get the same split as described in the paper?

In the paper, there are 1,786,333 train samples, 661,748 test samples, and 198,665 valid samples.
(2,646,746 instances in total)

Thanks.

Predict survival probabilities at given times

Hi, I'm trying to predict survival probabilities of a population at 6, 12 and 24 months. lifelines.CoxPHFitter has that functionality. I am able to specify a times parameter for predict survival function. I wonder if it is also possible with DeepSurv model. If not, how can I achieve similar results?

available models

Hi!
Is there any chance that also the deephit model for competing risks will be implemented in pycox? That would be great!

metabric dataset doesn't load

when I run your introduction example the metabric dataset doesn't load. Is it a problem with the package?

Question about baseline hazard in cox.py

Hi,

Thank you for sharing the great implementation.

I have a short question about the baseline hazard estimated in cox.py.
Is _compute_baseline_hazards function in _CoxPHBase is the Breslow estimator?

Thank you!

Dynamic-Deephit

Hi,

Just wanted to ask if there was any plans on including the extension of DeepHit to panel data?
The Dynamic-DeepHit method. Would be awesome for a couple of projects I am trying to test out.

kinds regards,
Milan

Thoughts about integration with pytorch-lightning?

Great work here havakv, looks really modular and well thought out.

I'm interested in using some of the time-to-event tools here with high dimensional imaging inputs. I've been building up a medical imaging codebase using pytorch-lightning for a little bit mostly because of how modular & convenient it makes iterating over multiple experiments on a cluster environment.

Do you have any ideas of how best to re-organize some of pycox.models (PCHazard for example) to torch-lightning before I start on this? What I'm really interested in is being able to use the pytorch lightning trainer.

Unstable c-index results

Hi. Many thanks for this powerful implementation of survival analysis.
I am new to survival analysis, and I am trying to use the Cox-Time model for my dataset. But I observed the c-index result on the test set is really unstable with different random seeds. (varying from less than 0.6 to more than 0.7)
Did you find the same problem in your experiments? Or do you have any suggestions about it?

AssertionError during c-index computation

Hi,
I ran into the following assertion error when computing the c-index for the discrete MTLR method.

assert durations.shape[0] == surv.shape[1] == surv_idx.shape[0] == events.shape[0]

I suppose the error is due to the fact that the maximum of test durations is 1628***, while the function gets in input a number between 0 and 490. This range (0, 490) is the result of applying the following:

num_durations = 50
scheme = 'quantiles'
labtrans = MTLR.label_transform(num_durations, scheme)

and

surv = model.interpolate(10).predict_surv_df(x_test)

As the parameter of the function interpolate increases, the number of grid points also increases and viceversa. It is a multiplication between num_durations and the parameter of the function interpolate.
In the example I followed step by step, it was also pointed out that in the plot "the time scale is correct because we have set model.duration_index to be the grid points".

Thanks in advance,
Luca

*** EDIT: I hadn't read the error carefully and now I understood what I was wrong (I partially saved the df deriving from surv and then applied the c-index on the entire test set). Unfortunately, I still haven't figured out how to fix the time scale problem in discrete models.

how to get time-to-event prediction

Hello! I'm a bit confused with the examples but how do I get the time-to-event prediction for a test feature?

how to handle patient dropout?

Hi, first of thank you so much for this great python package. I started with survival analysis only very recently, so your work and notebooks are absolutely godsend. My question might be silly though

I am wondering how to approach the dropout patients, ie patients that quit a study while still alive. I see that in the METABRIC dataset there does not seem to be a category like this, as there are only 0 and 1 in the event column.

So, I was thinking that i could approach dropout patients as a competing risk problem? Does that make sense?

time-dependent co-variates

Hi,

I wounder how to use cox-time with time dependent co-variates ?

How to provide the rank_mat for training deepHit model using the fit_dataloader()?

Hi,
I am trying to use the deephit model from pycox with img as input and I get following error regarding rank_mat, with in the fit_dataloader()

File "train.py", line 210, in main
log = model.fit_dataloader(train_loader, epochs, callbacks, verbose, val_dataloader=val_loader)
File "/opt/conda/envs/env/lib/python3.6/site-packages/torchtuples/base.py", line 229, in fit_dataloader
self.batch_metrics = self.compute_metrics(data, self.metrics)
File "/opt/conda/envs/env/lib/python3.6/site-packages/torchtuples/base.py", line 180, in compute_metrics
return {name: metric(*out, *target) for name, metric in metrics.items()}
File "/opt/conda/envs/env/lib/python3.6/site-packages/torchtuples/base.py", line 180, in
return {name: metric(*out, *target) for name, metric in metrics.items()}
File "/opt/conda/envs/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'rank_mat'

Hyperparameter tuning in DeepSurv method

Hi there,

I am looking for the codes related to hyperparameter tuning for the DeepSurv method in the following example, but I am not able to find it.

https://github.com/havakv/pycox/blob/master/examples/cox-ph.ipynb

I was wondering if you could help me with that.

Thank you,
Afshin

Question about discretion time grid in 01_introduction.ipynb

Hello everyone,
I am new to pycox and currently learning the script 01_introduction.ipynb
I don't understand why the 1st observation's labtrans.cuts value is 78.9 while the 2nd observation's value is 118.4. (output below) Since the duration time for observation 0 is 99.3 and duration time for observation 1 is 95.7. I think they should be in the same discretion time grid (from 78.9 to 118.4)?
Thanks for help answering this.

IN: labtrans.cuts[y_train[0]]
OUT: array([ 78.933334, 118.4 , 236.8 , ..., 39.466667, 197.33334 ,
118.4 ], dtype=float32)

IN: labtrans.cuts
OUT: array([ 0. , 39.466667, 78.933334, 118.4 , 157.86667 ,
197.33334 , 236.8 , 276.26666 , 315.73334 , 355.2 ],
dtype=float32)

Using LSTM instead of MLPVanilla

In order to use LSTM instead of MLPVanilla with the CoxTime and CoxPH models, I have the following model class. It works mechanically, but I want to make sure that the implementation is theoretically correct. I'm trying to make each patient the input sequence for the LSTM model and the hidden and cell states can be transferred within that sequence, not on the whole batch of patients as a sequence. Would you be able to share some insights?

from torch import nn

class LSTMCox(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, n_layers, output_size):
      super(LSTMCox, self).__init__()
      self.n_layers = n_layers
      self.hidden_dim = hidden_dim
      self.embedding_dim = embedding_dim
      
      self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers)
      self.fc = nn.Linear(hidden_dim, output_size)
      self.activation = nn.ReLU()

    def forward(self, input):
      input = input.view(len(input), 1, self.embedding_dim)

      lstm_out, _ = self.lstm(input)
      lstm_out = lstm_out.contiguous().view(len(input), -1)

      out = self.fc(lstm_out)
      out = self.activation(out)

      return out

net = LSTMCox(in_features, 512, 1, 1)
model = CoxPH(net, tt.optim.Adam)
model.optimizer.set_lr(0.01)
log = model.fit(x_train, y_train, batch_size, epochs, callbacks, val_data=val, val_batch_size=batch_size)

Having troubles with downloading FLCHAIN dataset

Hi!

Thank you for making this library, it's epic.

Having an issue when trying to download the FLCHAIN dataset, it gives me 404 error.

Thanks for the help!

Best regards,
Robert

Embedding layer for categorical covariates

Hello all together,

I have a small question about this nice package for deep survival analysis. Would it be possible to feed two sets of input features into the different deep learning models? One set for the numeric features and a second one for a set of categorical encoded covariates and then use an Embedding layer to learn some task-specific embeddings for the variable.

Maybe someone has an idea of how I can prepare my dataset (it is a pandas DataFrame) and feed it into such kind of a model. Unfortunately, I don't have so much experience using PyTorch. Until now I used Tensorflow and its Dataset API to prepare my datasets for multi-input models.

Thank you very much.

havakv / pycox Goto Github PK

pycox's Issues

Recommend Projects

Recommend Topics

Recommend Org