Comments (7)
Thanks for getting back to me! However I'm still not sure I understand how the model wouldn't underestimate the time to event. (With predictions I refer to the mean or median of the distribution).
As an example say we have a customer with 30 days since last event so censored with time to event at least 30. Maybe the model has learned something about the pattern underlying long periods of inactivity, but still without the knowledge of the minimum bound the model might predict 15 days or whatever number less than 30 in this example.
Even for relatively short periods of inactivity we could end up with predictions lower than the observed censoring. Hopefully you can shed some light on this if I have misinterpreted something.
from wtte-rnn.
Hey, I tried maybe a bad example of WTTE-RNN with the jet engine data before reading through your examples, where we have readings on every timestep. This didn't yield very good results on the data I trained on (scale and predictions were ok but overall accuracy was quite bad), but I realized that I have missed something crucial.
I see that I should train on data between events also, and therefore censoring time would be part of an observation's history. So regarding my previous question you are correct in that the minimum bound will be respected.
If I may ask a couple more questions about the problem setup that have puzzled me a bit since reading your examples.
- Masking
First is the masking that is needed for shorter sequences, I get that I can mask them with impossible/improbable values in X but what about y? Why have you used the expected TTE and 0.95 in your data pipeline examples, aren't these anyway discarded when you pass the 0.0 sample_weights
for these points, why not use some random crap value here also?
- Censoring backwards?
Why do we count the censoring backwards? I.e. if we have a censored observation with time since event 30, we start counting at 30 and go down to 1 when we hit the "present".
- Training time, size, loss etc.
What are some typical metrics of this problem? I am seeing that loss plateaus quite fast, assuming I can get it to train without nan losses etc. can I expect to hit a loss close to 0 or should I be happy when I have reached plateau?
Have you tried estimating with deeper or wider networks?
How much data approximately would you say this needs for convergence? From my view it looks to be converging quite rapidly even with small amounts of data. I know it's an impossible question but if you have any ballpark numbers it would be beneficial to know.
Big thanks for being so active here and answering questions. I'm hoping this could replace / complement our current churn models.
from wtte-rnn.
Thank you - all good answers that came to good use preprocessing my data and training this bad-boy. Got mostly nan losses when I tried a wider network, but deep and narrow network seems to work quite well and also reduces overfit as you said.
I am in the process of evaluating this model, will return if I have something interesting to share.
from wtte-rnn.
Sorry for the slow answer, and thanks for the great question. I replied in the blog too http://disq.us/p/1mghgb2.
Geist is, censoring indicator should be considered part of the training data only. We use every timestep for every customer to train the algorithm.
The present-day data will naturally be mostly censored, but these censored points says "The TTE was greater than 0" which might lead to overestimation but hard to see how it can lead to underestimation.
from wtte-rnn.
You have a very valid question, censoring is a hard concept and the reason why it should work has puzzled me and why I've put in so much effort to convince myself.
Why we can figure out correct distribution even if there's (one type of) censoring:
- Empirical argument: It actually seems to work (Check the tests, notebook, and the visualization of same experiment )
- Mathematical argument: check the proof on page 18
- Intuitive argument: "Push beyond point of censoring if censored, concentrate at tte if uncensored"
But there's actually a shorter answer to your question:
As an example say we have a customer with 30 days since last event so censored with time to event at least 30. Maybe the model has learned something about the pattern underlying long periods of inactivity, but still without the knowledge of the minimum bound the model might predict 15 days or whatever number less than 30 in this example.
The minimum bound is "at least 30" and it'll try to push density above there, hence the median/mean should be above 30 if it's inferrable!
from wtte-rnn.
- why not use some random crap value here also?
- I am! Just crappy but numerically reasonable. I can spot 0.95 when I'm testing and expected TTE is unlikely to cause numerical problems. Masked points may (depending on backend) be used in forward pass.
- Why do we count the censoring backwards?
- It's apparent from the definition of censoring sorry
- What are some typical metrics of this problem?
- Good question that I don't have numbers for. In short it depends on the noisiness of the data and I haven't given it much thought. %Censoring has huge impact on magnitude of the loss. In the noisy github-commit log example the training loss goes to around 1.2665 and in a null-weibull model it reaches around 1.7 with correct parameters.
If the model is initialized properly outputting alpha around mean tte and beta around 1 you could see every step downward as improvement over the null-exponential model. I'm always happy when I reach a plateu and early stopping there is often numerically sound.
Have you tried estimating with deeper or wider networks?
- My experience is that stacking recurrent layers gives a smoother prediction between timesteps and reduces overfit, but learns and reacts slowly to incoming data. Wider network learns way faster but sometimes this causes overfit and numerical instability
I have no hint regarding data sorry. I've had good results with very small datasets, but then again my expectations about how tight the predictions would be weren't that high.
from wtte-rnn.
Great to hear! Looking forward to hear your results. Also, check out #33, almost always there's truth being revealed (i.e data problem) when loss turns NaN!
from wtte-rnn.
Related Issues (20)
- Event with duration
- Is it applicable for my dataset HOT 1
- Loss Function - Not the PCF? HOT 2
- Keras and Theano why? HOT 1
- Log-likelihood for discrete Weibull distribution HOT 3
- c-index
- wtte.pipelines.data_pipeline returns wrong seq_ids
- possible memory issue with large data
- Weird Beta outputs
- Stability of loss function for left censored data HOT 1
- References of success of the WTTE-RNN structure?
- multi variate time series : we have categorical and continues data
- Why do you use a log in the discrete weibull loss function?
- How to use the model to predict
- Porting WTTE-RNN to PyTorch HOT 2
- Numerical instability parameterization tricks
- How to label for "time to the next event" ?
- will it work for multivariate time series prediction both regression and classification
- preparation data for churn prediction HOT 1
- how would one support 3 labels: win, loss, censored?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wtte-rnn.