Attrition, in Human Resource terminology, refers to the phenomenon of the employees leaving the company. Attrition in a company is usually measured with a metric called attrition rate, which simply measures the no of employees moving out of the company (voluntary resigning or laid off by the company).
In this project, we want to predict the attrition of the company’s valuable employees, uncover the factors that lead to employee attrition.
The problem and approach to solve the problem in project proposal. You can click here to reach it.
I use IBM HR Analytics Employee Attrition & Performance data from Kaggle, which is created by IBM data scientists. Dataset is in the open source website and can be reached from this link. It has 1470 rows x 35 columns and contains numeric and categorical data types in columns. I loaded the dataset from this link in csv format and read it in the Jupyter notebook after importing necessary libraries.
Data wrangling notebook and report can be reached form this link
Univarate and bivariate analysises are done by using bar charts and heatmap visualizations. These analysis can be found here
Inferential statistics notebook can be reached here
Interim report can be found here
This is a supervised binary classification problem. We used Python’s Scikit Learn libraries to solve the problem. In this context, we implemented Logistic Regression, Decision Tree, Random Forest, K-NN, SVM, Kernel SVM, Naive Bayes, Gradient Boosting, and AdaBoosting algorithms. We applied hiperparameter tuning, PCA, and SMOTE techniques to the algorithms.
Modeling notebooks can be reached from following link
Final report can be found here
Project Presentation can be found here