BIOSTAT 578A: Bioinformatics for Big Omics Data

Instructor: Raphael Gottardo, PhD, Fred Hutchinson Cancer Research Center

If you need to contact me, please email me at [email protected].

Time and location: T 10:30-11:50 HST T531 Th 10:30-11:50 HST T747

Prerequisite: BIOSTAT 511/12 or permission of the instructor. Please email me if you're unsure.

Grading scheme (Tentative): HW (40%), Midterm (30%), Final project (30%)

Important dates: Midterm (Feb 20), Final project presentations: last 2 weeks of class (March 4,6,11&13).

Scope: This practical "hands-on" course in Bioinformatics for high dimensional omics will emphasize on how to use statistical methods, as well as the R programming language and the Bioconductor project, as tools to manipulate, visualize and analyze real world omics datasets. The course will be organized around the following topics:

Introduction to computing for Bioinformatics using R: Introduction to R/RStudio, review of main data structures and tools for efficient and reproducible research, data manipulation and visualization
Managing "big omics data" using relational databases: Overview of main database management systems (MySQL, Postgres, SQLite), and review of the Structured Query Language and main operations
How to connect to a database from R, and alternative to databases in R (sqldf and data.table)
How to evaluate and adjust the data for presence of "batch effect"
Regression techniques for high throughput biomedical data: Multiple regression analysis and logistic regression, ANOVA and design of experiments
Statistical methods for high dimensional hypothesis testing: Permutation tests, empirical Bayes and multiple comparison adjustment
Modeling of gene expression data: Introduction to Bioconductor, and basic packages for gene expression analysis (GEOquery, Limma, DAVIDquery, etc)
Genome-wide association studies and eQTLs; review of main packages in R/Bioconductor (e.g. rqtl)
Overview of other high-throughput technologies (e.g. RNA-seq, ChIP-seq) and available tools in R/Bioconductor
Data integration: Using R to integrate multiple data types and perform "systems biology" type analysis
Drawbacks and limitations of high dimensional omics analysis (overfitting, inference)

Note that this is tentative ouline and minor modifications are likely to occur. Please watch this page regularly for updates.

Lecture notes:

01/07/14 Introduction to R
01/08/14 Same as above and Advanced graphics in R
01/14/14 Advanced data manipulation in R
01/16/14 Advanced data manipulation in R & Biology basics
01/21/14 Intro to Microarrays & Normalization & Probe summary
01/23/14 Probe summary & Intro to Bioconductor
01/28/14 Differential expression
02/03/14 Differential expression & Batch correction
02/06/14 Sequence analysis in R
02/11/14 RNA-Seq Data analysis
02/13/14 RNA-Seq Data analysis
02/18/14 Gene set enrichment analysis
02/20/14 In class midterm. Andrew McDavid will proctor the exam
02/25/14 In class discussion of the final projects
02/27/14 ChIP-seq and perhaps start talking about prediction
03/04/14 Finish talking about predictions and talk about projects
03/06/14, 03/10/14, 03/13/14 Project presentations

finid / biostat-578 Goto Github PK

biostat-578's Introduction

BIOSTAT 578A: Bioinformatics for Big Omics Data

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent