Giter Club home page Giter Club logo

apmae4990-'s Introduction

APMAE4990 - Introduction to Data Science in Industry

Instructor: Dorian Goldman

Term: Spring 2016

Location: R 7:00pm-9:30pm 214 Seeley W. Mudd Building

Objectives:

This course is designed for graduate and advanced undergraduate students who wish to learn the fundamentals of data science and machine learning in the context of real world applications. An em- phasis will be placed on problems that companies such as Amazon, Booking.com, Netflix and others use with a slight emphasis on problems arising at The New York Times, where I was a data scientist. Despite a focus on applications, the course will be mathematically rigorous, but the goal is to motivate each theorem and problem by a concrete problem arising in industry. The course will follow an online iPython notebook where students can try out various algorithms in real time as we go through the course.

There will be no midterms or exams, but rather assignments which will be handed in periodically throughout the term. The final project will be yours to choose, but will ideally be a productionalized tool developed via a web app that uses some of the methods (or others) taught in this class to solve a concrete problem.

###Prerequisites: Exposure to undergraduate-level probability, statistics, graph theory, algorithms, and linear algebra is strongly encouraged, but these topics will be covered as we encounter them.

Grading:

  • 30% Assignments
  • 70% Final Project

Tentative Course Outline:

Introduction

  • Problems that arise in industry involving data.
  • Introduction to regression, classification, clustering. Model training and evaluation.

Predictive learning (Supervised)

  • Predicting Virality of Content (Regression. Linear Regression, Random Forest )
  • User Churn, Acquisition and Conversion. (Classification. Exponential Family.)
  • Model selection and feature selection. Regularization. Real world performance evaluation.

Descriptive Learning (Unsupervised)

  • Clustering users (Clustering and Support Vector Machines)
  • Correlation of features. Principle Component Analysis.

Prescriptive Modeling and A/B tests

  • A/B experiments. Causal inference introduction.
  • Uplift Modeling. How do we target who should have received treatment?

Intro to Data Engineering

  • Map Reduce. SQL. Bash.

Recommendation Engines and Personalziation

  • Diffusion on Graph and NYT Article Recommendations.
  • Topic Modeling.
  • Introduction to Bayesian statistics. Bayesian vs. Frequentist approach.
  • Multi-armed Bandits. Thompson Sampling. LinUCB.
  • Cold Starts. Continous Cold starts. Warm Starts. uTime Series Analysis and

Paper Distribution

  • The paper distribution problem at The New York Times.
  • Reivew of Random Variables and Distributions.
  • Time Series Models. Auto Regressive. Poisson Regression. Negative Binomial Regression.
  • The Newsvendor Problem and profit optimization.

References

These are references to deepen your understanding of material presented in lecture. The list is by no means exhaustive.

Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, Springer 2013

Trevor Hastie, Robert Tibshirani, Jerome Friedman, Elements of Statistical Learning, Springer 2013

Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

Cameron Davidson-Pilon, Bayesian Methods for Hackers, https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.