Giter Club home page Giter Club logo

survival-analysis's Introduction

# Predictive Maintenance -- Survival Analysis
tags: tech blog, tutorials, predictive maintenance, python, lifelines

Predictive maintenance is to predict which machinery at which condition needs preventative maintenance so as to eliminate outages and the costs associated with it. Instead of predicting each individual part's failure, Survival analysis is a statistics approach to estimate failure rate. It is also called reliability analysis and event history analysis. Simply put, it intends to answer -- "How long until an event occurs?"

For example,

  • How long patients survive?
  • How long mechanical parts last?

In this tutorial, we exemplify the second question with 28,529 censored parts which had no failure history, 1334 failed parts and their time to event (fail or exit the censorship). Input table is prepared as below with event and time_to_event columns.

input_table

df.event.value_counts()

# censor    28529
# fail       1334
# Name: event, dtype: int64

Kaplan-Meier Estimation

At each t + 1, compute how many at risk, which is the last entrance substracted by parts failed at t and also those did not fail but monitored until t (i.e. # censored)

from lifelines import KaplanMeierFitter

durations = df['time_to_event'].apply(lambda x: 0.5 if x==0 else x) # workarounds for DOA
event_observed = df['event'].apply(lambda x: 1 if x=='fail' else 0)

km = KaplanMeierFitter()

km.fit(durations, event_observed, label='KM')
  • Event Table:

km.event_table

event_table

Survival Function

  • What fraction survive past t?
  • e.g., 5-yr survival rates, median survival time
  • Also called Reliability Function

survival function

Estimated by event table

Survival function can be estimated by the following formula where $d_i$ stands for number of defects/observed and $n_i$ is number of parts entering period $i$ / at risk. km survival function

km.survival_function_

km.survival_function

Hazard Function

  • Of the people who survive until t, what fraction die at t?
  • Conditional density probability

$$ \lambda(t) = {f(y)\over S(t)} $$

  • The hazard function might be of more intrinsic interest than the p.d.f. to a patient who had survived a certain time period and wanted to know something about their prognosis.

Popular Distributions

Weibull

from lifelines import * 

wbf = WeibullFitter().fit(durations[1:], event_observed[1:], label='WeibullFitter')

wbf.summary

wbf.plot_hazard() weibull_hazard

plt.figure(figsize=(10, 24))
plt.subplot(4, 1, 1)
wbf.plot_survival_function()
plt.title('Survival Curve')
plt.xlabel('Time to Event (days)')
plt.ylabel('Survival Probability')

plt.subplot(4, 1, 2)
wbf.plot_cumulative_density()
plt.title('Cumulative Failure Density')
plt.xlabel('Time to Event (days)')
plt.ylabel('Cum. Density Prob. (failure)')


plt.subplot(4, 1, 3)
wbf.plot_hazard()
plt.title('Hazard Function')
plt.xlabel('Time to Event (days)')
plt.ylabel('Hazard Function (pdf)')

plt.subplot(4, 1, 4)
wbf.plot_cumulative_hazard()
plt.title('Cumulative Hazard Function')
plt.xlabel('Time to Event (days)')
plt.ylabel('Cumulative Hazard Function')

plt.show()

QQPlot

In fact, weibull distribution fits our data poortly, and we can observe that by looking at qqplot as below. None falls on the diagonal line. Instead, Log Normal seems to be a better fit.

from lifelines.plotting import qq_plot

durations = df['time_to_event'].apply(lambda x: 0.5 if x==0 else x)
event_observed = df['event'].apply(lambda x: 1 if x=='fail' else 0)

fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes = axes.reshape(4, )

for i, model in enumerate([WeibullFitter(), LogNormalFitter(), LogLogisticFitter(), ExponentialFitter()]):
    model.fit(durations, event_observed)
    qq_plot(model, ax=axes[i])

qq_plot

LogNormal

lnf = LogNormalFitter().fit(durations, event_observed)
lnf.summary

plt.figure(figsize=(10, 24))
plt.subplot(4, 1, 1)
lnf.plot_survival_function()
plt.title('Survival Curve')
plt.xlabel('Time to Event (days)')
plt.ylabel('Survival Probability')

plt.subplot(4, 1, 2)
lnf.plot_cumulative_density()
plt.title('Cumulative Failure Density')
plt.xlabel('Time to Event (days)')
plt.ylabel('Cum. Density Prob. (failure)')


plt.subplot(4, 1, 3)
lnf.plot_hazard()
plt.title('Hazard Function')
plt.xlabel('Time to Event (days)')
plt.ylabel('Hazard Function (pdf)')

plt.subplot(4, 1, 4)
lnf.plot_cumulative_hazard()
plt.title('Cumulative Hazard Function')
plt.xlabel('Time to Event (days)')
plt.ylabel('Cumulative Hazard Function')

plt.show()

Poisson

Having explored hazard function, which shows unpredictable but extremely low failure rate, it is reasonable to assume the part's failure rate will converge to a constant (random failure in bathtub curve). With that constant failure rate - probability of failure in day, I can convert that to Mean Time Between Failure (MTBF) and Annualized Failure Rate (AFR).

# ultimate failure rate
fr = lnf.hazard_.iloc[-1, 0]
MTBF_days = 1/fr
MTBF_mons = round(MTBF_days / 30, 1)
MTBF_yrs = round(MTBF_days / 365, 1)

AFR = fr*365

num_active_parts = 200000
num_active_parts * AFR  # estimated number of failures per year out of 200K active parts

To fit in a possion distribution, I only need one independent variable. That is average failure rate, and I use AFR multiplied by number of active part base.

from scipy.stats import poisson

mu = num_active_parts * AFR
x = np.arange(0, 1000, 1)
plt.plot(x, poisson.pmf(x, mu))
plt.title('Number of failures occur in a year in Prob.')

plt.plot(x, poisson.cdf(x, mu))
plt.title('Cumu. Demand Distribution (Probability of number of failures less than x)')
plt.xlabel('x (demand)')
plt.show()

i = np.arange(0.01, 1, 0.01)
plt.plot(i, poisson.isf(1-i, mu)) # cdf modified from isf, for some reason, there is no inverse CDF, but inverse survival function
plt.title('Inverse CDF')
plt.show()

With the cummulative density distribution, I can use its inverse function to map cummulative failure rate back to number of parts that failures migth happen - which is equivalent to the number of spares we would like prepare when a failure happens.

(Bonus) Spares Demand Forecast using Newsvendor Model

Newsvendor Model

The optimal spare quantiy given underage cost $C_u$ and overage cost $C_o$ is to minimize expected overall cost: To minize: $$ Expected\ Overage\ Cost + Expected\ Underage\ Cost\ \ = E(Demand<Q)\times C_o + E(Demand>Q) \times C_u \ $$

Equals to derivative = 0, equals to $$ P(Demand<Q*) = {C_u\over C_u+C_o} \ ==Q* = F^{-1}({C_u \over C_u+C_o}) $$

Results

SLAs = [0.50, 0.75, 0.95, 0.96, 0.97, 0.98, 0.99, 0.9999999, 1]
print('{:<30}{:<40}'.format('Reliability', '# Spares'))
for sla in SLAs:
    print('{:<30}{:<40}'.format(sla, poisson.isf(1-sla, mu)))

This simplified model (constant failure rate = 0.0003 per day / AFR = 203 per year) fitted in poisson distribution recommends 237 spares in a year to achieve 99% reliability or when ${C_u\over C_u+C_o} = 0.99$.

Reference

survival-analysis's People

Contributors

wanda15tw avatar hackmd-deploy avatar

Stargazers

Josephbkt avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.