capstone_project-ibm_hr_analytics's Introduction

Capstone_Project-IBM_HR_Analytics

Introduction

Attrition, in Human Resource terminology, refers to the phenomenon of the employees leaving the company. Attrition in a company is usually measured with a metric called attrition rate, which simply measures the no of employees moving out of the company (voluntary resigning or laid off by the company).

In this project, we want to predict the attrition of the company’s valuable employees, uncover the factors that lead to employee attrition.

Project Proposal

The problem and approach to solve the problem in project proposal. You can click here to reach it.

Data

I use IBM HR Analytics Employee Attrition & Performance data from Kaggle, which is created by IBM data scientists. Dataset is in the open source website and can be reached from this link. It has 1470 rows x 35 columns and contains numeric and categorical data types in columns. I loaded the dataset from this link in csv format and read it in the Jupyter notebook after importing necessary libraries.

Data Wrangling

Data wrangling notebook and report can be reached form this link

Exploratory Data Analysis

Univarate and bivariate analysises are done by using bar charts and heatmap visualizations. These analysis can be found here

Hypothesis Test

Inferential statistics notebook can be reached here

Interim report

Interim report can be found here

Modeling

This is a supervised binary classification problem. We used Python’s Scikit Learn libraries to solve the problem. In this context, we implemented Logistic Regression, Decision Tree, Random Forest, K-NN, SVM, Kernel SVM, Naive Bayes, Gradient Boosting, and AdaBoosting algorithms. We applied hiperparameter tuning, PCA, and SMOTE techniques to the algorithms.

Modeling notebooks can be reached from following link

Conclusion

Final report can be found here

Project Presentation can be found here

Recommend Projects

shiningdata / capstone_project-ibm_hr_analytics Goto Github PK