This tutorial will provide an example-driven introduction on how to use probabilistic programming. Through the use of real-world examples, you will see how to perform parameter estimation, case/control analysis, hierarchical modelling, and regression analysis. Theory will be introduced where appropriate, but is not the focus of the tutorial. Multiple structured examples will motivate learning.
If you feel limited by your current toolkit or analytic workflow, in terms of being able to express and solve modeling problems and account for statistical uncertainty, then this tutorial is for you. We assume intermediate Python proficiency, including the use of context managers, Python objects and their methods, and the ability to read code documentation. We also assume some knowledge of introductory statistica (e.g. definitions of central tendency and variance measures). If you've ever performed a t-test and know what a Gaussian distribution looks like, you have all the background you need. By the end of the tutorial, you will be able to describe a problem as a probabilistic model, and and fit that model using PyMC3.
- Introduction (10 min)
- Bayes Theorem
- Why uncertainty?
- Features of a PP language (intro to PyMC3)
- Distribution library
- Syntax
- Sampling algorithms <-- fancy math for lazy programmers (and the algebra-blind)!
- Where Bayesian models are (and are not useful?
- Comparing two groups with binary outcomes (30 min)
- Worked example: Coin flip (10 min)
- Hands-on coding: Sepsis deaths (15 min)
- Discussion/questions (5 min)
- Break (5 min)
- Comparing two groups with continuous outcomes (40 min)
- Worked example: IQ drug (10 min)
- Hands-on coding: Radon contamination (20 min)
- Discussion/questions (5 min)
- NTS: someone might ask about comparison to t-test.
- Break (10 min)
- Regression Analysis (40 min)
- Problem class description (given Xs, predict Y, but now with uncertainty) (5 min)
- Worked example: Low birthweight infants (10 min)
- Hands-on coding: Sepsis deaths (20 min)
- Discussion/questions (5 min)
- Break (5 min)
- Hierarchical Modelling (40 min)
- Rationale (5 min)
- Worked example: School test performance (10 min)
- Hands-on coding: Radon contamination (20 min)
- Discussion/questions (5 min)
- End
Eric finished his doctoral degree in the Department of Biological Engineering at MIT, where he studied influenza evolution and ecology. He currently works at the Novartis Institutes for Biomedical Research, where he is an Investigator on the Scientific Data Analysis team. He has given talks & tutorials on practical aspects of Network Analysis and Bayesian Statistics at a variety of Python conferences, including PyCon, SciPy, and PyData (recordings available online). His website is www.ericmjl.com.
Chris is a professor of Statistics at Vanderbilt University, and is the creator and one of the lead maintainers of PyMC3. His research interests span computational and Bayesian statistics, epidemiology, and meta-analysis.
- Make sure you have Python installed. Anaconda distribution recommended.
- Run environment script:
$ bash conda_environment.sh
to get setup.
- Install all packages specified in
environment.yml
using your favourite package manager (e.g. pip, conda).