Giter Club home page Giter Club logo

oyebamiji-micheal / youth-income-prediction-challenge-api Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 938 KB

A machine learning web app and API for predicting youth employment based on data from labour market surveys in South Africa

Home Page: https://oyebamiji-micheal-youth-income-prediction-challenge.streamlit.app/

License: MIT License

Jupyter Notebook 98.33% Python 1.65% Procfile 0.02%
fastapi hyperparameter-optimization random-forest streamlit zindi-competition

youth-income-prediction-challenge-api's Introduction

Youth Income Prediction Challenge API

Language Framework Framework hosted build reposize Framework

A machine learning web app and API for predicting youth income based on data from labour market surveys in South Africa

You can view the live demo of the web app here

You can interact with the API here

Table of Contents

Overview and Objective

Up to this moment, I have always deployed my models using Streamlit for easier interaction, testing and sharing. Moving forward, this project and subsequent ones will aim to extend beyond traditional machine learning model development in Jupyter notebooks and web apps by incorporating the development of APIs using FastAPI. Additionally, this project particularly will seek to explore various hyperparameter tuning techniques to optimize the performance of machine learning model.

Data

The dataset used in this repository is obtained from a competition on Zindi. The data comes from four rounds of a survey of youth in the South African labour market, conducted at 6-month intervals. The survey contains numerical, categorical and free-form text responses. Each person in the dataset was surveyed one year prior (the ‘baseline’ data) to the follow-up survey. In a nutshell, the objective of the challenge is to build a machine learning model that predicts whether a person is employed at the follow-up survey based on their labour market status and other characteristics during the baseline.

Insights from EDA

The importance of EDA before model building cannot be overemphasized. EDA provides a clearer picture and understanding of the distribution of the data. This include class-imbalance, outliers, correlation and so on. Below are some of the insights gained from a light EDA:

  • Below is the proportion of people who have a positive outcome and otherwise.

  • The ages of candidates with a positive outcome and those with a negative outcome seem to follow a similar distribution.

  • People from "Urban" areas are most likely to get a positive outcome.

Model and Evaluation Metric

For the sake of simplicity, only one type of classification model (LightGBM Classifier) was used in the notebook. Also, the hyperparameter tunning techniques used are GridSearchCV and RandomSearchCV. In subsequent models, I hope to explore the Bayesian Optimization with Gaussian Process. The performance of the base model however and the tunned ones can be found in the notebook.

Simple API Doc

Note: All string inputs are case and whitespace sensitive

input DataType Description Expected Value
survey_date string The date the survey was conducted The format should be dd-mm-year
survey_round int Survey round Ranges from 1 to 4
status string Prior Employment Status Input should be any of the following:
"Studying", "Unemployed", "Wage Employed", "Self Employed", "Employment Programme", "Wage and Self Employed", "Other"
tenure int Prior Employment Tenure (Days) Feasible values in range 1 to 220000
geography string Geography "Suburb", "Rural", "Urban"
province string Province Input should be any of the following:
"Mpumalanga", "North West", "Free State", "Eastern Cape", "Limpopo", "KwaZulu-Natal", "Gauteng", "Western Cape", "Northern Cape"
matric int Matriculation Enter 1 if matriculated and 0 otherwise
degree int Degree Enter 1 if you have a degree and 0 otherwise
diploma int Diploma Enter 1 if you have a diploma and 0 otherwise
school_quantile string School Quantile Values range from 0 to 5
additional_lang string Additional Langauage Input should be any of the following:
"50 - 59 %", "40 - 49 %", "60 - 69 %", "70 - 79 %", "30 - 39 %", "80 - 100 %"
gender int Gender 0 corresponds to male while 1 corresponds to Female
sa_citizen int South Africa Citizen Input should be either 0 or 1
birth_year int Birth Year Feasible values in the range 1950 to 2010
birth_month int Birth Month Input range from 1 to 12

youth-income-prediction-challenge-api's People

Contributors

oyebamiji-micheal avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.