Giter Club home page Giter Club logo

Comments (11)

havakv avatar havakv commented on July 21, 2024 3

The short answer is that the Cox-Time implementation does not explicitly support time-dependent covariates.
I do, however, believe this can be achieved by partly conditional modeling. The benefit of this approach is that it only requires preprocessing and no changes to the Cox-Time code.

In short, the idea of partly conditional modeling is that every time the covariates of an individual change, you create a new individual and consider the residual time. So for an individual with event time t and a new set of covariates x(s) at time s, you would consider this a new individual with event time t - s.
This means that if the covariates of this individual change k times, you will consider a set of k independent individuals. Your new data set will contain many "copies" of our individuals and you can fit the Cox-Time model to this larger data set. Survival predictions should works as before.

I would recommend including s as a covariate.
Also, you are not restricted to only use the covariates at time s, and can instead create a set of covariates that are representative for the history up till time s.
The WTT-RNN is built on this principle, where the covariate history up till time s is processed by an RNN.

I hope this can help you get started.

from pycox.

hgjlee avatar hgjlee commented on July 21, 2024 1

Thanks for your explanation in the previous comments. I noticed that time-varying covariates are conventionally formatted in a counting process form with the interval start and end columns in replacement of the event duration column (e.g. lifelines' cox time-varying ph model).

I'm doing an experiment to compare the prediction accuracies from the standard cox time-varying PH model with data in a counting process form and CoxTime and DeepSurv with data formatted according to your suggestion. Would this be a fair comparison? Or are there any implications that I need to consider?

from pycox.

MohdSafwanAhmad avatar MohdSafwanAhmad commented on July 21, 2024

If the output also has multiple datapoints for each subject than what would be the best one to consider while making the prediction?

from pycox.

havakv avatar havakv commented on July 21, 2024

When you say the output has multiple datapoints, what are you referring to? Are the multiple datapoints describing the survival function, such as for LogisticHazard and DeepHit. Or are the multiple datapoints containing some other information? If you are referring to the output of e.g., LogisticHazard, the same approach as described above should be fine. An example of this is the WTT-RNN which has two outputs (alpha and beta) describing the survival function.

from pycox.

MohdSafwanAhmad avatar MohdSafwanAhmad commented on July 21, 2024

Sorry I meant the test data. Just like the training data, if there are multiple data points for each subject for different times (time-dependent covariates), we will get the survival probability for each data point separately. How do we read the survival probabilities in that case since one subject will have several S(t|x) vs time plots? The NASA turbofan dataset developed for RUL prediction can be taken as an example (assuming some binary values for the event column).

from pycox.

havakv avatar havakv commented on July 21, 2024

Ah, I think I understand. So, again specifying that I have not tried the approach, prediction on the test data should be straight forward. Evaluation of the predictions, on the other hand, might be harder.

As all time dependency is captured by the covariates up to a given time s, i.e, x(s), your predictions are conditioned on this time and you can treat this as the starting point (0) of your survival predictions. In other words, your survival predictions are S(t | x(s), t > s), so S(s | x(s), t > s) = 1. I realise that the notation her might be confusing though. It is probably better explained by the WTT-RNN blog or the WTT-RNN masters theis.

For evaluation of the predictions the problem is that the multiple survival predictions S(t | x(s), t > s) for different s are highly correlated, so considering them independent might be problematic. I don't know what the best approach for this evaluation would be.

from pycox.

havakv avatar havakv commented on July 21, 2024

@hgjlee I think it is very interesting that you are conduction these kinds of experiments, and I hope you will share your results with us in the future!

At the top of my head, the only problematic part of comparing time-varying Cox PH with CoxTime and DeepSurv is already listed at the bottom of your lifelines link short-note-on-prediciton.
I.e., as time-varying Cox PH is not really intended for prediction, you would need to simulate the time-dependent covariate process to be able to do predictions without cheating (using covariates from the future).

from pycox.

hgjlee avatar hgjlee commented on July 21, 2024

@havakv Thank you for your reply. I'd love to share my results when they're ready. In that case, it'd make more sense to apply partly conditioning to all the models and then compare the results.

from pycox.

Niccolo-Ajroldi avatar Niccolo-Ajroldi commented on July 21, 2024

I'm doing an experiment to compare the prediction accuracies from the standard cox time-varying PH model with data in a counting process form and CoxTime and DeepSurv with data formatted according to your suggestion. Would this be a fair comparison? Or are there any implications that I need to consider?

Hi Jacob, can I ask you how you finally managed the issue? Were you able to obtain meaningful results?
Thanks!

from pycox.

ymao418 avatar ymao418 commented on July 21, 2024

@hgjlee any updates on the results comparison? I am working on a problem with a covariate: transaction in the last 30 days. Obviously, this covariate will change over time and wonder if not construct it as a time-varying covariate will affect predictions much or not.

from pycox.

hxian avatar hxian commented on July 21, 2024

@havakv

I would recommend including s as a covariate.

when you said using s as a covariate, did you mean using it as a numeric covariate, thus needing to standardize it, or just putting it together with the other binary variables that don't need to transformed?

from pycox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.