Giter Club home page Giter Club logo

sta380's Introduction

STA 380: Predictive Modeling

Welcome to part 2 of STA 380, a course on predictive modeling in the MS program in Business Analytics at UT-Austin. All course materials can be found through this GitHub page. Please see the course syllabus for links and descriptions of the readings mentioned below.

Scribe notes and exercises

You can find the up-to-date collection of scribe notes here.

The first set of exercises is available here.

Topics

The readings listed below are not yet complete, but the topics list is accurate.

(1) The data scientist's toolbox

Good data-curation and data-analysis practices; R; Markdown and RMarkdown; the importance of replicable analyses; version control with Git and Github.

Readings:

(2) Exploratory analysis

Contingency tables; basic plots (scatterplot, boxplot, histogram); lattice plots; basic measures of association (relative risk, odds ratio, correlation, rank correlation)

Scripts and data:

Readings:

(3) Resampling methods

The bootstrap and the permutation test; joint distributions; basic moment identities for linear combinations; using the bootstrap to approximate value at risk (VaR).

Scripts:

Readings:

  • ISL Section 5.2 for a basic overview.
  • These notes, pages 99-111. This is an introduction to the bootstrap from the (by now familiar) perspective of linear regression modeling, but it conveys the essential idea.
  • This R walkthrough on using the bootstrap to estimate the variability of a sample mean.
  • Another R walkthrough on the permutation test in a simple 2x2 table.
  • Any basic explanation of the concept of value at risk (VaR) for a financial portfolio, e.g. here, here, or here.

Optionally, Shalizi (Chapter 6) has a much lengthier treatment of the bootstrap, should you wish to consult it.

(4) Latent classes

Basics of clustering; K-means clustering; mixture models; hierarchical clustering.

Scripts and data:

Readings:

(5) Latent features and structure

Principal component analysis (PCA); factor analysis; canonical correlation analysis; multi-dimensional scaling.

Scripts and data:

Readings:

  • ISL Section 10.2 for the basics
  • Shalizi Chapters 18 and 19 (more advanced). In particular, Chapter 19 has a lot more advanced material on factor analysis, beyond what we covered in class.
  • Elements Chapter 14.5 (more advanced)

(6) Text data

Co-occurrence statistics; naive Bayes; TF-IDF; topic models; vector-space models of text (if time allows).

Scripts and data:

Readings:

(7) Miscellaneous

Coverage of these topics will depend on the time available. Possibilities include: anomaly detection; label propagation; learning association rules; graph partitioning; partial least squares.

Scripts and data:

Readings:

sta380's People

Contributors

jgscott avatar anaghpal avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.