STA 380: Predictive Modeling

Welcome to part 2 of STA 380, a course on predictive modeling in the MS program in Business Analytics at UT-Austin. All course materials can be found through this GitHub page. Please see the course syllabus for links and descriptions of the readings mentioned below.

Scribe notes and exercises

You can find the up-to-date collection of scribe notes here.

The first set of exercises is available here.

Topics

The readings listed below are not yet complete, but the topics list is accurate.

(1) The data scientist's toolbox

Good data-curation and data-analysis practices; R; Markdown and RMarkdown; the importance of replicable analyses; version control with Git and Github.

Readings:

(2) Exploratory analysis

Contingency tables; basic plots (scatterplot, boxplot, histogram); lattice plots; basic measures of association (relative risk, odds ratio, correlation, rank correlation)

Scripts and data:

gdpgrowth.R and gdpgrowth.csv
titanic.R and TitanicSurvival

Readings:

NIST Handbook, Chapter 1.
R walkthroughs on basic EDA: contingency tables, histograms, and scatterplots/lattice plots.
Bad graphics
Good graphics: scan through some of the New York Times' best data visualizations

(3) Resampling methods

The bootstrap and the permutation test; joint distributions; basic moment identities for linear combinations; using the bootstrap to approximate value at risk (VaR).

Scripts:

Readings:

ISL Section 5.2 for a basic overview.
These notes, pages 99-111. This is an introduction to the bootstrap from the (by now familiar) perspective of linear regression modeling, but it conveys the essential idea.
This R walkthrough on using the bootstrap to estimate the variability of a sample mean.
Another R walkthrough on the permutation test in a simple 2x2 table.
Any basic explanation of the concept of value at risk (VaR) for a financial portfolio, e.g. here, here, or here.

Optionally, Shalizi (Chapter 6) has a much lengthier treatment of the bootstrap, should you wish to consult it.

(4) Latent classes

Basics of clustering; K-means clustering; mixture models; hierarchical clustering.

Scripts and data:

Readings:

ISL Section 10.1 and 10.3
Elements Chapter 14.3 (more advanced)
K means examples: a few stylized examples to build your intuition for how k-means behaves.
Hierarchical clustering examples: ditto for hierarchical clustering.

(5) Latent features and structure

Principal component analysis (PCA); factor analysis; canonical correlation analysis; multi-dimensional scaling.

Scripts and data:

pca_2D.R
pca_intro.R
congress109.R, congress109.csv, and congress109members.csv
gasoline.R and gasoline.csv
FXmonthly.R, FXmonthly.csv, and currency_codes.txt
cca_intro.R, mmreg.csv, and mouse_nutrition.csv

Readings:

ISL Section 10.2 for the basics
Shalizi Chapters 18 and 19 (more advanced). In particular, Chapter 19 has a lot more advanced material on factor analysis, beyond what we covered in class.
Elements Chapter 14.5 (more advanced)

(6) Text data

Co-occurrence statistics; naive Bayes; TF-IDF; topic models; vector-space models of text (if time allows).

Scripts and data:

Readings:

Stanford NLP notes on vector-space models of text, TF-IDF weighting, and so forth.
(Using the tm package)[http://cran.r-project.org/web/packages/tm/vignettes/tm.pdf] for text mining in R.
Dave Blei's survey of topic models.
A pretty long blog post on naive-Bayes classification.

(7) Miscellaneous

Coverage of these topics will depend on the time available. Possibilities include: anomaly detection; label propagation; learning association rules; graph partitioning; partial least squares.

Scripts and data:

playlists.R and playlists.csv

Readings:

Pradeep Ravikumar's notes on association rule mining

siva2k16 / sta380 Goto Github PK

sta380's Introduction

STA 380: Predictive Modeling

Scribe notes and exercises

Topics

(1) The data scientist's toolbox

(2) Exploratory analysis

(3) Resampling methods

(4) Latent classes

(5) Latent features and structure

(6) Text data

(7) Miscellaneous

sta380's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent