Giter Club home page Giter Club logo

introduction_to_statistical_learning_summary_python's Introduction

Statistical Machine Learning in Python

A summary of the book "Introduction to Statistical Learning"

Whenever someone asks me โ€œHow to get started in data science?โ€, I usually recommend the book ๐Ÿ“• โ€” Introduction to Statistical Learning by Daniela Witten, Trevor Hastie, Gareth M. James, Robert Tibshirani, to learn the basics of statistics and machine learning models.

And understandably, completing a technical book while practicing it with relevant data and code is a challenge for lot of us.

So, I created a concise version of the book as a course on statistical machine learning in python. In this repo, each chapter of the book has been translated into a jupyter notebook with summary of the key concepts, data & python code to play with.

If you want to quickly understand the book, learn statistical machine learning or/and python for data science, then just clone the repo and get started! ๐Ÿ‘ฉโ€๐Ÿ’ป

Expect to learn following concepts & their implementation in python:

Notebook: Chapter 2: Statistical Learning explains-

  • What Is Statistical Learning?
  • Assessing Model Accuracy
  • Introduction to Programming language, Python

Notebook: Chapter 3: Linear Regression explains-

  • Linear Regression (LR)- simple, multiple
  • Qualitative Predictors in LR
  • Non-linear Transformations of the Predictors
  • Potential Problems with least square linear regression

Notebook: Chapter 4: Classification explains-

  • Classification Overview
  • Logistic Regression
  • Linear Discriminant Analysis (LDA)
  • Quadratic Discriminant Analysis (QDA)
  • K-nearest neighbour

Notebook: Chapter 5: Resampling Methods explains-

  • Cross-Validation
    • The Validation Set Approach
    • Leave-One-Out Cross-Validation
    • k-FoldCross-Validation
  • The Bootstrap

Notebook: Chapter 6: Linear Model Selection and Regularization explains-

  • Subset Selection Models
    • Best Subset Selection
    • Forward Stepwise Selection
    • Backward Stepwise Selection
  • Shrinkage Methods
    • Ridge Regression
    • The Lasso
  • Dimension Reduction Methods- PCR and PLS Regression
    • Principal Components Regression
    • Partial Least Squares

Notebook: Chapter 7: Moving Beyond Linearity explains-

  • Polynomial Regression
  • Step/Piecewise Functions
  • Basis Functions
  • Regression Splines
  • Smoothing Splines
  • Generalized Additive Models

Notebook: Chapter 8: Tree-Based Methods explains-

  • Decision Trees
    • Regression Trees
    • Classification Trees
  • Bagging, Random Forests, Boosting

Note: Chapter-9 and 10 will be added soon.


More about the book:

"This book is intended for anyone who is interested in using modern statistical methods for modeling and prediction from data. This group includes scientists, engineers, data analysts, or quants, but also less technical individuals with degrees in non-quantitative fields such as the social sciences or business. We expect that the reader will have had at least one elementary course in statistics."

I recommend โœ… this book because-

  1. This book (and derived notebooks in this repo) marries the statistical machine learning concepts with real-life data science problem statements. Each chapter/concept begins with a real scenerio, like - "You are a consultant who needs to advice the best medium of advertising & budgets to increase the sale of a product, using the advertising data" and explains techniques and methods step by step as we solve through it.

  2. It gives a modest introduction to statistics and mathematics behind the most used methods like:

  • Regressions
  • Classifications
  • Decision Trees
  • SVM
  • Clustering
  • Unsupervised Learning
  • Resampling
  • Cross-Validation Methods
  • Dimension reduction methods
  1. It also provides a ๐Ÿ’ก lab section at the end of each chapter. It offers R code snippets & various libraries that will come in handy to analyze data, build models, and test them. ๐ŸŒŸ This repo gives the same code in python, so you are covered either way! This will help you get started and equip you to test out the given methods & models on your own data.

Few important concepts it does not touch at all are-

  • Time series data models
  • Neural networks
  • Deep learning
  • Bayesian methods

This is the independent part of my blog series, Data science for analytical minds, serving as a resource for people, especially from non-technical backgrounds like economics, statistics, mathematics, physics etc, to learn different components of data science through real life problem statements.

Checkout its ๐Ÿ‘‰ introductory blog & data quality & cleaning blog. This is the 3rd part of the series focusing on statistics & machine learning basics.

This is meant to give you quick head start with most used statistical concepts with data and code to play with. For a deeper understanding of any concept, I recommend referring back to the book.

If you find any issues or have doubts, feel free to submit issues.

If you have any generic feedback, ideas to collaborate or anything interesting to say, you can reach me at shilpaarora992[at]gmail[dot]com.

introduction_to_statistical_learning_summary_python's People

Contributors

ankita1017 avatar shilpa9a avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.