This project dives deep into the realm of data analysis using SQL and Power BI to uncover important human resource insights that can greatly benefit the company. Featuring eye-catching dashboards offer crucial HR metrics like employee turnover, diversity, recruitment efficacy and performance evaluations. These help HR professionals make informed decisions and strategic workforce planning.
The source data contained Human Resource 22000 records from 2000 to 2020. This is included in the repository.
This was done on SQL server 2022 involving
- Data loading & inspection
- Handling missing values
- Data cleaning and analysis
Power BI Desktop In a corporate setting, results can be shared online on www.powerbi.com
- What's the age distribution in the company?
- What's the gender breakdown in the company?
- How does gender vary across departments and job titles?
- What's the race distribution in the company?
- What's the average length of employment in the company?
- Which department has the highest turnover rate?
- What is the tenure distribution for each department?
- How many employees work remotely for each department?
- What's the distribution of employees across different states?
- How are job titles distributed in the company?
- How have employee hire counts varied over time?
- There are more male employees than female or non-conforming employees.
- The genders are fairly evenly distributed across departments. There are slightly more male employees overall.
- Employees 21-30 years old are the fewest in the company. Most employees are 31-50 years old. Surprisingly, the age group 50+ have the most employees in the company.
- Caucasian employees are the majority in the company, followed by mixed race, black, Asian, Hispanic, and native Americans.
- The average length of employment is 7 years.
- Auditing has the highest turnover rate, followed by Legal, Research & Development and Training. Business Development & Marketing have the lowest turnover rates.
- Employees tend to stay with the company for 6-8 years. Tenure is quite evenly distributed across departments.
- About 25% of employees work remotely.
- Most employees are in Ohio (14,788) followed distantly by Pennsylvania (930) and Illinois (730), Indiana (572), Michigan (569), Kentucky (375) and Wisconsin (321).
- There are 182 job titles in the company, with Research Assistant II taking most of the employees (634) and Assistant Professor, Marketing Manager, Office Assistant IV, Associate Professor and VP of Training and Development taking the just 1 employee each.
- Employee hire counts have increased over the years
CREATE DATABASE hr;
- Right-click on Human_Resources > Tasks > Import Data
- Use import wizard to import HR Data.csv to hr table. Watch this at https://www.youtube.com/watch?v=4yn7iUJnJtE
- Verify that the import worked:
use hr;
SELECT *
FROM hr_data;
The termdate was imported as nvarchar(50). This column contains termination dates, hence it needs to be converted to the date format.
Update termdate date/time to date
-
- convert dates to yyyy-MM-dd
-
- create new column new_termdate
-
- copy converted time values from termdate to new_termdate
-
convert dates to yyyy-MM-dd
UPDATE hr_data
SET termdate = FORMAT(CONVERT(DATETIME, LEFT(termdate, 19), 120), 'yyyy-MM-dd');
- create new column new_termdate
ALTER TABLE hr_data
ADD new_termdate DATE;
- copy converted time values from termdate to new_termdate
UPDATE hr_data
SET new_termdate = CASE
WHEN termdate IS NOT NULL AND ISDATE(termdate) = 1 THEN CAST(termdate AS DATETIME) ELSE NULL
END;
- check results
SELECT new_termdate
FROM hr_data;
ALTER TABLE hr_data
ADD age nvarchar(50)
UPDATE hr_data
SET age = DATEDIFF(YEAR, birthdate, GETDATE());
- age distribution
SELECT
MIN(age) AS youngest,
MAX(age) AS OLDEST
FROM hr_data;
- age group count
SELECT age_group,
count(*) AS count
FROM
(SELECT
CASE
WHEN age <= 21 AND age <= 30 THEN '21 to 30'
WHEN age <= 31 AND age <= 40 THEN '31 to 40'
WHEN age <= 41 AND age <= 50 THEN '41 to 50'
ELSE '50+'
END AS age_group
FROM hr_data
WHERE new_termdate IS NULL
) AS subquery
GROUP BY age_group
ORDER BY age_group;
- Age group by gender
SELECT age_group,
gender,
count(*) AS count
FROM
(SELECT
CASE
WHEN age <= 21 AND age <= 30 THEN '21 to 30'
WHEN age <= 31 AND age <= 40 THEN '31 to 40'
WHEN age <= 41 AND age <= 50 THEN '41 to 50'
ELSE '50+'
END AS age_group,
gender
FROM hr_data
WHERE new_termdate IS NULL
) AS subquery
GROUP BY age_group, gender
ORDER BY age_group, gender;
SELECT
gender,
COUNT(gender) AS count
FROM hr_data
WHERE new_termdate IS NULL
GROUP BY gender
ORDER BY gender ASC;
SELECT
department,
gender,
count(gender) AS count
FROM hr_data
WHERE new_termdate IS NULL
GROUP BY department, gender,
ORDER BY department, gender ASC;
- job titles
SELECT
department, jobtitle,
gender,
count(gender) AS count
FROM hr_data
WHERE new_termdate IS NULL
GROUP BY department, jobtitle, gender
ORDER BY department, jobtitle, gender ASC;
SELECT
race,
count(*) AS count
FROM
hr_data
WHERE new_termdate IS NULL
GROUP BY race
ORDER BY count DESC;
SELECT
AVG(DATEDIFF(year, hire_date, new_termdate)) AS tenure
FROM hr_data
WHERE new_termdate IS NOT NULL AND new_termdate <= GETDATE();
- get total count
- get terminated count
- terminated count/total count
SELECT
department,
total_count,
terminated_count,
(round((CAST(terminated_count AS FLOAT)/total_count), 2)) * 100 AS turnover_rate
FROM
(SELECT
department,
count(*) AS total_count,
SUM(CASE
WHEN new_termdate IS NOT NULL AND new_termdate <= GETDATE() THEN 1 ELSE 0
END
) AS terminated_count
FROM hr_data
GROUP BY department
) AS subquery
ORDER BY turnover_rate DESC;
SELECT
department,
AVG(DATEDIFF(year, hire_date, new_termdate)) AS tenure
FROM
hr_data
WHERE
new_termdate IS NOT NULL
AND new_termdate <= GETDATE()
GROUP BY
department;
SELECT
location,
count(*) as count
FROM hr_data
WHERE new_termdate IS NULL
GROUP BY location;
SELECT
location_state,
count(*) AS count
FROM hr_data
WHERE new_termdate IS NULL
GROUP BY location_state
ORDER BY count DESC;
SELECT
jobtitle,
count(*) AS count
FROM hr_data
WHERE new_termdate IS NULL
GROUP BY jobtitle
ORDER BY count DESC;
- calculate hires
- calculate terminations
- (hires-terminations)/hires percent hire change
SELECT
hire_year,
hires,
terminations,
hires - terminations AS net_change,
(round(CAST(hires-terminations AS FLOAT)/hires, 2)) * 100 AS percent_hire_change
FROM
(SELECT
YEAR(hire_date) AS hire_year,
count(*) AS hires,
SUM(CASE
WHEN new_termdate is not null and new_termdate <= GETDATE() THEN 1 ELSE 0
END
) AS terminations
FROM hr_data
GROUP BY YEAR(hire_date)
) AS subquery
ORDER BY percent_hire_change ASC;