Giter Club home page Giter Club logo

wtte-rnn's Introduction

WTTE-RNN

Weibull Time To Event Reccurent Neural Network

A less hacky machine-learning framework for churn- and time to event prediction. Forecasting problems as diverse as server monitoring to earthquake- and churn-prediction can be posed as the problem of predicting the time to an event. WTTE-RNN is an algorithm and a philosophy about how this should be done.

Basics

You have data consisting of many time-series of events and want to use historic data to predict the time to the next event (TTE). If you haven't observed the last event yet we've only observed a minimum bound of the TTE to train on. This results in what's called censored data (in red):

Censored data

Instead of predicting the TTE itself the trick is to let your machine learning model output the parameters of a distribution. This could be anything but we like the Weibull distribution because it's awesome. The machine learning algorithm could be anything gradient-based but we like RNNs because they are awesome too.

example WTTE-RNN architecture

The next step is to train the algo of choice with a special log-loss that can work with censored data. The intuition behind it is that we want to assign high probability at the next event or low probability where there wasn't any events (censored data):

WTTE-RNN prediction over a timeline

What we get is a pretty neat prediction about the distribution of the TTE in each step (here for a single event):

WTTE-RNN prediction

A neat sideresult is that the predicted params is a 2-d embedding that can be used to visualize and group predictions about how soon (alpha) and how sure (beta). Here by stacking timelines of predicted alpha (left) and beta (right):

WTTE-RNN alphabeta.png

Warnings

There's alot of mathematical theory basically justifying us to use this nice loss function in certain situations:

loss-equation

So for censored data it only rewards pushing the distribution up, beyond the point of censoring. To get this to work you need the censoring mechanism to be independent from your feature data. If your features contains information about the point of censoring your algorithm will learn to cheat by predicting far away based on probability of censoring instead of tte. A type of overfitting/artifact learning. Global features can have this effect if not properly treated.

ROADMAP

The project is on the TODO-state. The goal is to create a forkable and easily deployable model framework. WTTE-RNN is the algorithm, churn_watch is the deployment - an opinionated idea about how churn-monitoring and reporting can be made beautiful and easy. Pull-requests, recommendations, comments and contributions very welcome.

Implementations of the objective functions

The core technology is the objective functions. These can be used with any machine-learning algorithm. To spread the word we should implement and commit them to various ML-projects.

  • Tensorflow (Done but not implemented as raw op yet)
  • MXnet
  • Theano
  • Keras
  • TORCH
  • H2o
  • scikitFlow
  • MLlib

Auxiliary

To use the model one needs basic tte-transforms of raw data. To consume the models we need weibull related functions for the final output.

  • Ready to run helper functions implemented in SQL, R, Python
    • get_time_to_event (calculates tte and censored tte)
    • get_is_censored
    • weibull hazard, chf, cdf, pdf, quantile, expected value etc (Python done).

Monitoring

The WTTE-RNN is as much an ML-algorithm as a visual language to talk about this shape of data and our predictions.

  • Plots (partly done)
  • Shiny webapp or/and similar (partly done elsewhere)
  • Integration. Slack/E-mail bots & summaries
  • API

Models

To get this going we need at least one off-the-shelf deep-learning implementation that scales. Currently there's one that doesn't.

  • Best-practices WTTE-RNN implementation

Deployment

  • Notebooks
  • Containers
  • IBM project?
  • Cortana project?

Licensing

  • MIT-license.

Citation

@MastersThesis{martinsson:Thesis:2016,
    author     =     {Egil Martinsson},
    title     =     {WTTE-RNN : Weibull Time To Event Recurrent Neural Network},
    school     =     {Chalmers University Of Technology},
    year     =     {2016},
    }

Reach out to egil.martinsson[at]gmail.com if you have any questions!

wtte-rnn's People

Contributors

seniormeow avatar ragulpr avatar theclaymethod avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.