job_recomendation_kg's Introduction

Job Recomendation System Using Knowledge Graph

Dealing with the enormous amount of recruiting information on the Internet, a job seeker always spends hours to find useful ones. To reduce this laborious work, we design and implement a recommendation system for online job-hunting. Instead of using CF algorithms we contrast on a Knowledge RS approach to figure out more interrelations between candidates and job description

Glimps of the Knowledge Graph

Architectural Overview

Open in the whimsical for better viewing experience

Dataset

Resume Dataset

The resume dataset is provided by “stack overflow” on the “Kaggle” website in 2018. Stack Overflow did a survey in which they asked the developer community about everything from their favorite technologies to their job preferences.

There are 98,855 responses in this public data release.
Dataset

Job Description Dataset

The job Description dataset was created by PromptCloud's in-house web-crawling service. This is a pre-crawled dataset, taken as a subset of a bigger dataset (more than 4.6 million job listings) that was created by extracting data from Dice, a prominent US-based technology job board in 2017.

There are 22,000 job profiles in this public data release.
Dataset

Testing and verifying results

Adamic Adar

Adamic Adar is a measure used to compute the closeness of nodes based on their shared neighbors.
The Adamic Adar algorithm was introduced in 2003 by Lada Adamic and Eytan Adar to predict links in a social network. It is computed using the following formula:where N(u) is the set of nodes adjacent to u.

A value of 0 indicates that two nodes are not close, while higher values indicate nodes are closer.
The library contains a function to calculate closeness between two nodes.

Future Scope

Extending KG to more dimensions like location, salery
Using unstructured dataset
Native language support

FAQ's

1. What is the problem statement?

we are going to leverage a knowledge graph-based recommendation system that helps candidates to find jobs according to their skillsets.

2. What all are the general tasks?

We analysed various aspects which help to recommend job and job descriptions based on location, age group, etc. Future - Build homogenous graph's as in resume-skills, resume-location, resume-dev_type(backend/frontend), after that take the most popular nodes and build a heterogeneous knowledge graph

3. Why knowledge Graph?

A knowledge graph is self-descriptive, as it provides a single place to find the data and understand what it is all about. Knowledge graphs are being used for a wide range of applications from space, journalism, biomedicine to entertainment, network security, and pharmaceuticals.

4. Why Neo4j?

Neo4j delivers the lightning-fast read and write performance you need, while still protecting your data integrity.Neo4j graph algorithms are scalable and production-ready. Neo4j algorithms are written in Java and performance tested. NetworkX is a single node implementation of a graph written in Python. The response time is much faster in Neo4j.

job_recomendation_kg's People

Contributors

Stargazers

job_recomendation_kg's Issues

final minor II eval notes

https://github.com/arszen123/offer-notification-application
https://betterprogramming.pub/building-an-offer-notification-service-on-aws-99faad5d2806

https://github.com/allen-tran/drop-it]

https://github.com/Vennify-Inc/DoogleGrive

age gender skills salery

Age
0-10 m,f,o (popular skills) salery
11-20
21-30

What we have done till mid evaluation?

Resume preprocessing
- binning of salaries, merging columns of framework and lang, database
- ommiting na values
Built KG on neo4j
- having nodes and relationship in b/w id,domain,age & gender

What we did now?
Divided into 3 parts
- Analysis
- Skill destribution across age and gender
- Domain destribution across age and gender
- Slighlty more women in 25-35 age group
- slighlty salery increase due to increase in experience
- Association Rule Mining using FP grwoth on skill and domain
- Popular skill, domain
- Rare skill and domain
- Heat map between skill and domain
- Building Knowledge Graph
- for resume
- for Job Description
- Recomendation System
- Manipulation Scripts
- Add, delete, relation and nodes
- Basic Script
- Finding co resumes
- get job id which ask for sql skills
- list all skills a resume has
- Graph info
- jobid and resume id node counts
- Recomendation
- using skills in between
- using skills in between & priortizing
- using empty relationship & priortizing
- using skill and domain & priorizing them
- Adamic adar verification
- Analytics
- Resume having max skills count
- Link prediction
- predict link b/w JD and resume

Future Scope?
- extending KG to more dimensions like location, salery
- Using unstructured dataset
- Native language support

Actually we also thinking of writing research paper for IC3, so in that part also this
analysis would be helpful

Resumes -> 19K -> Kaggle stackoverflow servey
JD -> 22K dataworld 2017 -> dice job board of US 2017

Notes

1.github repo
2.Project Synopsis
3. Graphical Analysis/mining
4. Presentation (10 min slides)

Mid - eval

Data Cleaning
1.1 Remove replace null values in following columns
Age,salery,Marrid/Unmarried, Gender,skills
=> Build a credibility score out of 10

++ how gender, skills, dependency,dev_type, salery related to each other -> graphical represntaion
2.1 Highest paid skills & highest salery group by skill
2.2 Highest paid dev_type
2.3 Age with salery
2.4 Find salery wise job_satisfaction group by skills

For Final Eval

Build homogenous graph's as in resume-skills, resume-location, resume-dev_type(backend/frontend),
after that take most popular nodes and build a heterogenous knowledge graph

Recommend Projects

mystic-trooper / job_recomendation_kg Goto Github PK