Giter Club home page Giter Club logo

employability's Introduction

employability

A study of postsecondary graduate employability using topic modeling.

Project Overview

The research goal of this project is to determine the nature of the overlap of those core, academic skills being taught at the postsecondary level and those being expected at the entry level in the workforce. In a nutshell, the research goal is to study how well universities are preparing students for the workforce: to what degree are they promoting graduate employability?

This repository defines an open source software tool to perform that analysis. The core concept leveraged is topic modeling, a method from machine learning and natural language processing. Topic modeling is used to infer concepts from large datasets of job postings and course descriptions.

Installation

Dependencies
Import data
docker-compose up -d elasticsearch
./elasticsearch/bin/import-data small  # {small|medium|large}
Start the server
docker-compose up -d web

You should be able to acccess the vis server at localhost:9000.

Data Ingestion

If you would like to run the whole ingestion and analysis process, there are a few more steps.

data.world API token

Register for an account at Data World and export your API token.

export DATA_WORLD_API_TOKEN=
Start up background services
docker-compose up -d elasticsearch kibana postgres
Run data pipelines
sbt ingest/run
sbt preprocess/run

Executing LDA

With default parameters
sbt analysis/run

Configuring LDA

You can modify the behavior of LDA through environment variables. Some pre-defined configurations are made available for you.

# Source one of these before running analysis/run.
source ./analysis/config/small
source ./analysis/config/medium
source ./analysis/config/large

Exporting data

./elasticsearch/bin/export-data NEW_SNAPSHOT_ID

This will do several things:

  • create a local snapshot repository in your Elasticsearch cluster
    • this lives on your local filesystem: ./data/elasticsearch-snapshots/local/
  • create a new snapshot NEW_SNAPSHOT_ID in the local repository

Project Modules

core

Core components, models, and glue code.

"net.rouly" % "employability-core" % "x.x.x"

elasticsearch

Elasticsearch read/write services. Interaction is defined using Reactive Streams.

"net.rouly" % "employability-elasticsearch" % "x.x.x"

postgres

Postgres read/write services. Interaction is defined using Reactive Streams.

"net.rouly" % "employability-postgres" % "x.x.x"

ingest

Entry point application to ingest raw data into Elasticsearch.

Raw data is accepted from the following data providers:

  • data.world: add data set definitions under resources/datasets/data.world/
"net.rouly" % "employability-ingest" % "x.x.x"

preprocess

Entry point application to pre-process and clean ingested data. Cleaned and prepared data is exported to Postgres.

"net.rouly" % "employability-ingest" % "x.x.x"

analysis

Entry point application to read processed data from Postgres and execute the primary topic modeling steps. Topics are output to Elasticsearch.

"net.rouly" % "employability-analysis" % "x.x.x"

web

User facing web application to explore the generated topics and render various statistics about them.

"net.rouly" % "employability-web" % "x.x.x"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.