Assessing the credit score of individuals using PySpark. Used the dataset from kaggle competition named 'GiveMeSomeCredit'. Created ML Pipelines to first adn then used three approaches in order to check if a person will default in the next 2 years or not:
- Random Forest Classifier
- Gradient Boosting Classifier
- Logistic Regression
Based on the results, random forest classifier best predicts the data. The definition of the attributes is given in the data dictionary