The project aims to teach you the basics of machine learning and how to use it to create a machine learning model that can predict precipitation. This project is aimed at students who want to begin their career and are looking for a starting point. We will start right from the basics and move towards advanced concepts.
To create a model that can predict whether precipitation occurs or not using various machine learning algorithms.
This week will be lighter for those already familiar with Python. For those who are new, we have got you covered.
- Python : Watch till 3:36:00, try to write the code and try running them, this goes for all further tutorials
- Numpy
- Pandas
- Matplotlib
Assignment: Test your Python, Numpy, Pandas and Matplotlib concepts
- W1_Assignment1
- W1_Assignment1 dataset
- W1_Assignment submission
- Upload the Assignment and Dataset on Google Collab and solve. For downloading individual files form GitHub use "GitZip for GitHub" Chrome extension
- Supervised Machine Learning: Regression and Classification
- Advanced Learning Algorithms
- Apply for financial aid for these courses, and while applying, audit the course for free. Watch the videos at 2x speed; no need to complete assignments. By auditing the course, you can immediately access the videos for free without waiting for financial aid approval.
This week, we will review the key concepts covered in week 2 and delve into coding for the machine learning algorithms introduced during that time and begin with our final project.
Let's begin with our final project. We will proceed step by step, and resources for implementation will be provided.
- Dataset: Download the dataset to begin with your Final Project, For downloading individual files form GitHub use "GitZip for GitHub" Chrome extension
- Precipitation(PRCP) column in the data frame will be our target feature in this model. Replace all values greater than 0 as 1 (representing precipitation will occur), and values that are equal to 0 representing precipitation will not occur
- Dropping null values : Drop any column that has an excessive number of null values. For the remaining columns with a lower number of null values, replace those null values with the mode of that column.
- EDA : Perform EDA to visualize data and identify outliers
- Data Preprocessing : Remove outliers and find correlation matrix
- Use SMOTE to handel class imbalance: Most of the ML algorithms used for classification were designed with the assumption of an equal no. of examples in each case. Therefore, we need to balance it. The imbalance has to be removed or reduced.
- Check for null values once again and proceed
- Feature selection: Feature selection will be made using the chi-square test, refer SelectKBest and chi2
- Normalise the dataset
- Training model using different techniques
- Split data into test and train datasets.
- Use logistic regression classifier, decision tree classifier, neural networks training dataset.
- Calculate accuracy, precision, recall, F-1 score, and ROC_AUC on the test dataset and visualize it.
- Plot confusion matrix using sklearn.
- Kindly refer to the documentation provided on Google to perform the above steps
- Model Comparison: Compare models based on accuracy and ROC_AUC score and visualize it using seaborn