Giter Club home page Giter Club logo

regressionprojectiitk's Introduction

regressionProjectIITK

This repo is for a group project for the course MTH416A : Regression Analysis during the academic session 2021-2022 (even semester) at IIT Kanpur.

Project Title:

Ozone concentration and meteorology in the LA Basin, 1976 - A Regression Study [Report] [Presentation]

Project Guide

Prof. Sharmishtha Mitra, Department of Mathematics and Statistics, IIT Kanpur

Project Members :

Project Outline

Setup Topic
1. Introduction
2. Data Description
3. Exploratory Data Analysis
Parametric 4. Multicollinearity
Detection:
  • Eigen-decompostion Proportion
  • Variance Inflation Factor
Remedy:
  • Variable Drop (Model A)
  • Ridge Regression (Model B)
  • Principal Components Regression (Model C)
5. Variable Selection
Selection Methods:
  • Best Subset Selection
  • Mallow's Cp
  • Adjusted $R^2$
  • AIC vs p Plot
  • Scree Plot and Validation Plot
6. Heteroscedasticity of Errors
Detection:
  • Breusch-Pagan Test
Remedy:
  • Box-Cox Transformation
7. Normality of Errors
Detection:
  • Q-Q Plot
  • Shapiro-Wilks Test
8. Autocorrelation
Detection:
  • $\epsilon_t$ vs. $\epsilon_{t-1}$ Plot
  • Durbin-Watson Test
Remedy:
  • ARIMA Fitting
9. Prediction
Nonparametric 10. Alternating Conditional Expectation (ACE)
  • Optimal Transformations Plot
11. Final Model Fit and Predictions

Summary of Fitted Models:

Model Type Model Name $R^2$ RMSE
Parametric Model 0 0.6986 4.2745
Model A 0.7662 0.8272
Model B 0.7202 0.8830
Model C 0.7077 1.2565
Nonparametric ACE 0.8271 0.3132

Conclusions:

  • Among the parametric models, model A has the highest $R^2$ value as well as the lowest $RMSE$ value.
  • All models - A, B and C are better than the baseline model Model 0. This validates our corrections for multicollinearity, heteroscedasticity and autocorrelation and variable selection.
  • Simple non-parametric models are better if the problem of prediction is to be solved. But here, the ACE model transforms the data so that maximum $R^2$ can be achieved. And, as expected it has the highest $R^2$ value and the lowest $RMSE$ value amond all the models.
  • So among the models considered here, ACE model is the best, both for the problem of prediction and for the purpose of explaining ozone concentration by the meteorological variables based on the ozone dataset.

References:

  1. Leo Breiman & Jerome H. Friedman (1985): Estimating Optimal Transformations for Multiple Regression and Correlation, Journal of the American Statistical Association, 80:391, 580-598
  2. Jolliffe, Ian T. (1982). "A note on the Use of Principal Components in Regression". Journal of the Royal Statistical Society, Series C. 31 (3): 300–303. doi:10.2307/2348005. JSTOR 2348005.
  3. Sung H. Park (1981). "Collinearity and Optimal Restrictions on Regression Parameters for Estimating Responses". Technometrics. 23 (3): 289–295. doi:10.2307/1267793.
  4. Wilkinson, L., & Dallal, G.E. (1981). Tests of significance in forward selection regression with an F-to enter stopping rule. Technometrics, 23, 377–380
  5. Akaike, H. (1973), "Information theory and an extension of the maximum likelihood principle", in Petrov, B. N.; Csáki, F. (eds.), 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, USSR, September 2-8, 1971, Budapest: Akadémiai Kiadó, pp. 267–281. Republished in Kotz, S.; Johnson, N. L., eds. (1992), Breakthroughs in Statistics, I, Springer-Verlag, pp. 610–624.
  6. Akaike, H. (1974), "A new look at the statistical model identification", IEEE Transactions on Automatic Control, 19 (6): 716–723, doi:10.1109/TAC.1974.1100705, MR 0423716.
  7. Shapiro, S. S.; Wilk, M. B. (1965). "An analysis of variance test for normality (complete samples)". Biometrika. 52 (3–4): 591–611. doi:10.1093/biomet/52.3-4.591. JSTOR 2333709. MR 0205384. p. 593
  8. Breusch, T. S.; Pagan, A. R. (1979). "A Simple Test for Heteroskedasticity and Random Coefficient Variation". Econometrica. 47 (5): 1287–1294. doi:10.2307/1911963. JSTOR 1911963. MR 0545960.
  9. Box, George E. P.; Cox, D. R. (1964). "An analysis of transformations". Journal of the Royal Statistical Society, Series B. 26 (2): 211–252. JSTOR 2984418. MR 0192611.
  10. Durbin, J.; Watson, G. S. (1950). "Testing for Serial Correlation in Least Squares Regression, I". Biometrika. 37 (3–4): 409–428. doi:10.1093/biomet/37.3-4.409. JSTOR 2332391
  11. Durbin, J.; Watson, G. S. (1951). "Testing for Serial Correlation in Least Squares Regression, II". Biometrika. 38 (1–2): 159–179. doi:10.1093/biomet/38.1-2.159. JSTOR 2332325
  12. Faraway, J.J. (2004). Linear Models with R (1st ed.). Chapman and Hall/CRC. https://doi.org/10.4324/9780203507278
  13. Hoerl, A. E., Kennard, R. W. and Baldwin, K. F. (1975). Ridge regression: Some simulations. Communications in Statistics-Theory and Methods, 4(2), 105-123.

regressionprojectiitk's People

Contributors

arkab-ds avatar jnsaurab avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

jnsaurab

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.