Capstone project of udacity data scientist nanodegree
This project look to get insights of sparkify data to predict the churn of users in this plataform. You can read more of this project on my Medium Blog
- seaborn==0.8.1
- scipy==1.2.1
- scikit-learn==0.24.1
- pandas==1.1.5
- numpy==1.19.5
- matplotlib==2.1.0
- httpagentparser==1.9.1
- spark==2.4.3
This project have two notebooks. The notebook called Sparkify Data Analysis.ipynb
there is some data analysis and Modeling - Sparkify.ipynb
there is the modeling part.
mini_sparkify_event_data.json
is a sample with the data from Sparkify.
workspace_utils.py
is a code to help the kernel at Udacity Workspace to keep it active.
::
sparkify
├── LICENCE.txt
├── Sparkify Data Analysis.ipynb
├── requirements.txt
├── Modeling - Sparkify.ipynb
├── workspace_utils.py
The Logistic Regression Model and GBTs overfitting the model with F1-Score very high, 0.99 and 0.97 respectively, so Random Forest got the best performance without overfitting 0.86.
To the Udacity instructors that help me to understand a lot of concepts of the way and to offer an amazing project idea to work.