The goal of this project is to choose a dataset from the available choices and then go through the data analysis process. This project utilized the following Python libraries: Numpy, Pandas, Matplotlib. Finally, everything came together for conclusions. The specific dataset I use in this project is "No-show appointments". I analyze the data to see if any trends or correlations exist for why people in this missed (or didn't) their appointments.
I have looked into the dataset and managed a few problems like unifying names, removing wrong data, adding new features based on existing data. I have also investigated most of independent variables in the dataset and made a few observations comparing them to each other as well as to the dependent one (no_show). As this was only an exploratory analysis, many potential correlations may remain uncovered. The data should be investigated further with more advanced statistical analysis to potentially reveal new insights and correlations.
For details see analysis documentation Jupyter Notebook or HTML.