Giter Club home page Giter Club logo

predicting_failure_hard_disks's Introduction

Predicting Hard disk failure at the Delta center in the Netherlands

From centuries, Dutch people have pumped the water of the lakes and the sea in order to build big cities on the new dry land. That is why around of sixty percent of the surface area of the Netherlands is bellow the sea level, with a high risk of flooding. In order to prevent an overflow that could destroy the western part of the country, artificial beaches, sand dunes and dikes were built to absorb the forces of a rising sea. However, the Dutch hydraulic system was not built and maintained properly until the 50's. Proof of that were the effects of the most devastating flood in the Netherlands' history, where 1800 people and 200000 animals died as a result of the collapse of the dikes' structure.

The delta project started in 1953, twenty days after the flooding. The aim of the Delta project was to build a complex system of automatic dikes, barriers and dams that control the sea level and drain off the excess of water coming from the large rivers. Currently, the Netherlands has 700 km of dikes, which are divided in 53 dike areas. The dikes and damns are controlled with supercomputers, which monitor the status of these structures 24 hours per day. A damage in the supercomputer; for instance, a failure in some of its hard disks, would produce devastating effects that would result in another flood.

The aim of this project is to predict the number of hard disks that fail during the first week of 2016 at the Delta center in the Netherlands. For this task, I analyze the measurements of different hard disks’ features during the year of 2015.

This repository contains the following files:

  • Capstone_report.pdf: File that explains all the data analysis that was carried out to make the predictions.

  • Capstone project_proposal_CAMartinez.pdf: the introduction of the problem and the database used to make the analysis and predictions.

  • data_reading_and_wrangling.ipynb: script that makes the cleaning of the data.

  • exploratory_data_analysis.ipynb: script that makes a exploratory analysis of data.

  • statistical_analysis.ipynb: script that makes a statistical analysis of the data. In particular, it looks for the features where the distribution of failed disks is different from the distribution of working disks.

  • machine_learning.ipynb: script that uses different machine learning techniques to predict the failed hard drives at the delta center in the Netherlands.

  • final_presentation: slide deck of the project.

predicting_failure_hard_disks's People

Stargazers

 avatar  avatar  avatar kaikai-sk avatar Shikher Srivastava avatar xiashuijun avatar Nischal avatar

Watchers

James Cloos avatar Carmen Adriana Martínez Barbosa avatar

predicting_failure_hard_disks's Issues

what's the overlap probability mean?

def overlap_superiority(self, dist1, dist2):
    control_sample= dist1
    treatment_sample= dist2
    thresh = (control_sample.mean() + treatment_sample.mean()) / 2
    control_above= sum(control_sample>thresh)/len(control_sample)
    treatment_below= sum(treatment_sample <thresh)/ len(treatment_sample)
    overlap= control_above+ treatment_below
    superiority= np.size([x for x,y in zip(treatment_sample, control_sample) if x>y ])/len(treatment_sample)
    return overlap, superiority

what's the mean of this code??

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.