- Data Collection using Web Scraping and third party API.
- Preprocessing Data ( Feature engineering )
- Feature Selection
- Modeling
- Hyperparameter optimization
- Deployment using Heroku
In this project, I have employed various regression analysis methods such as linear regression, ridge & lasso regression, decision tree regression, random forest regression, XGBoost regression, and KNN regression by analyzing the outcome to obtain the best method to predict the air quality index.
This project requires Python and the following Python libraries installed:
You will also need to have software installed to run and execute a Jupyter Notebook.
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included.
In case the Jupyter Notebook doesn't load on Github, notebooks can be viewed using the below links for the respective regressors.
- Decision Tree Regressor:https://nbviewer.org/github/AshutoshDevpura/Air-Quality-Index-Prediction/blob/main/DecisionTreeRegressor.ipynb
- KNN Regressor:https://nbviewer.org/github/AshutoshDevpura/Air-Quality-Index-Prediction/blob/main/KNearestNeighborRegressor.ipynb
- Ridge and Lasso Regression:https://nbviewer.org/github/AshutoshDevpura/Air-Quality-Index-Prediction/blob/main/LassoRegression.ipynb
- Linear Regression:https://nbviewer.org/github/AshutoshDevpura/Air-Quality-Index-Prediction/blob/main/LinearRegression.ipynb
- Random Forest Regressor:https://nbviewer.org/github/AshutoshDevpura/Air-Quality-Index-Prediction/blob/main/RandomForestRegressor.ipynb
- Xg boost Regressor:https://nbviewer.org/github/AshutoshDevpura/Air-Quality-Index-Prediction/blob/main/XgboostRegressor.ipynb
On Heroku: