A simple classification of individual's annual income into two classes (At least $50,000 and less than $50,000) using Pyspark.
US Income an individual’s annual income influenced from various factors. Intuitively, it is influenced by the individual’s education level, age, gender, occupation, and etc. The dataset contains 16 columns Target filed: Income The income is divided into two classes: 50K Number of attributes: 14 Dataset: adult.csv
Create Prediction of Income >=50K or <50K using Logistic Regression Classification and Decision Tree Classifier model. Display classification Metrics, Accuracy and Confusion Matrix.
I was able to classify the income accuracy of 85% using decision tree classifier. Confusion Matrix was done to see the proportion of the correctly predicted income vs the actual income of individuals.