Microbial Ecology Group (MEG) - AMR++ bioinformatics workshop

Course syllabus

Start Date: October 10, 2022

This dropbox folder contains all of the videos from our zoom course sessions and recordings from a previous MEG bioinformatics workshop.

Course content

Course summary
Learning objectives
Bioinformatics overview
Statistics overview
Resources

Summary

These lessons are designed to introduce researchers to the R programming language for statistical analysis of metagenomic sequencing data. While we are primarily developing these training resources for the Microbial Ecology Group (MEG), we would love to get your input on improvements to any component so that we can one day provide this as a useful public resource. As the lessons are meant to be an informal collection of resources and tutorials, we have have liberally used parts and pieces of other online lessons and tailored it for our purposes. We attempt to give credit when possible by linking the original source and we are happy to hear recommendations for other resources to include.

We wholeheartedly encourage students to independently troubleshoot the majority of problems they might encounter by:

googling it (or using another search engine)
getting help from other students by using our slackgroup channel #2021-AMR++workshop
searching bioinformatic forums such as (stackoverflow.com, biostars.org, seqanswers.com, etc.)

Workshop details

Learning objectives:

Upon completion of these lessons, students will:

have their computer set up with the R and RStudio software
know how to read-in count matrices from bioinformatic analysis of sequence data
be able to explore and summarize bioinformatic results using
- diversity indices and box plots
- ordination with non-metric multidimensional scaling (NMDS)
- heatmaps
be familiar with common statistical techiniques such as:
- Wilcoxon test
- Generalized linear models
- Analysis of similarities (ANOSIM)
- Differential abundance testing using a zero-inflated Gaussian (ZIG) model

Instructors

Group email: [email protected]

Dr. Paul Morley -- [email protected]

Dr. Noelle Noyes -- [email protected]

Peter Ferm -- [email protected]

Dr. Lee Pinnell -- [email protected]

Dr. Enrique Doster -- [email protected]

Dr. Lisa Perez -- [email protected]

Bioinformatic overview

Metagenomic sequencing approach determines the type of analysis you can perform:

Shotgun metagenomic sequencing
- can analyze both the microbiome and resistome, in addition to other sequences such as plasmid-associated or virulence factors
Target-enriched resistome sequencing (MEGARes baits)
- can only analyze the resistome
16S rRNA amplicon sequencing
- can only analyze the microbiome

In this repository, we'll show you examples of running variants of the AMR++ pipeline to achieve your bioinformatic analysis goals. We'll be using code found in this repository of bioinformatic pipelines

AMR++ pipeline
- The main_AmrPlusPlus_v2_withKraken.nf script nalyzes shotgun metagenomic sequencing reads to characterize the microbiome using the taxanomic classier, kraken2, and alignment of reads to our MEGARes database to characterize the resistome.
- The main_AmrPlusPlus_v2.nf script is simply a subset of the entire pipeline and only performs the resistome analysis.
Qiime2 pipeline
- We use the Qiime2 pipeline to analyze 16S rRNA reads and export the results to a file format that we can use to analyze with R.

Statistics overview

Remember, the analysis will always have to be based on your study design and performed with the goal of testing your apriori hypotheses. The scripts in this repository are merely meant to provide an outline for you to begin your analysis and branch off as needed.

Using RStudio, download everything in this repository and change your working directory to the newly downloaded AMRplusplus_bioinformatic_workshop directory. Start by opening the script on the main page, Stats_overview_script.R, and follow along for a brief explanation of how each of the scripts below fits into your analysis.

If you don't have RStudio installed, click on the link below to explore our test dataset using Binder and RStudio:

Otherwise, follow the instructions on this tutorial for installing R and Rstudio on your personal computer.

The main steps of data exploration and statistical analysis we will cover are divided into four main steps with associated scripts for each general step:

Loading count matrix results from bioinformatic analyses into R
- Load qiime microbiome data
- Load MEGARes resistome data
Calculating summary statistics
- Calculating summary statistics
Normalizing counts and creating exploratory figures
Running some common statistical tests
- Basic stats with R

Resources:

MEG resources

R programming

RStudio cheatsheets
- This website has tons of helpful cheatsheets for various R packages and analyses methods. Also includes cheatsheets translated to other languages.
YaRrr! The Pirate’s Guide to R
- This is a free online book that goes over many useful topics in a quirky, but fun way! Follow along with our simplified R scripts in Lesson 1 and reference this book if you have any other questions.
R programming coursera course
- This free coursera course goes in-depth with all of the functionality of R. It combines videos with example R scripts for you to follow along with. We recommend this course after you have been playing around with R a bit and want to learn more about the details into how R works.
Introduction to R workshop
- We haven't personally tried this workshop, but they have a combination of videos, slides, and R code for various topics.
ggpubr
- Nice package for "publication-ready" figures.
Harvard's Data Science: R Basics

Data visualization

dataviz project
- This website is for a private company, but they have a great interface for exploring different figure types
Visual vocabulary
- Handy outline and explanation for the uses of different plots.
- You can also check out this interactive figure of the same material
FT Visual Journalism Team
- Awesome site with articles covering various topics and with the emphasis on creating awesome graphics to convey
Interactive Jupyter notebooks
- Also use this site for neat jupyter tips and tricks'
GGplot colors and themes
More ggplot colors and themes
Interactive heatmaps

Command-line

Explain shell
- cool website that explains bash commands piece by piece

Statistics resources

GUide to STatistical Analysis in Microbial Ecology (GUSTA ME)
Diversity indices
LHS 610: Exploratory Data Analysis for Health
- We haven't personally tried this course, but they provide great videos and code examples for learning how to explore data using R.
R-specific resources
- Amelia McNamara STATS 220 in R
Choose the right test
Batch effects
Misc
- #bioinformatics live twitter feed
- Collaborative spreadsheet of resources

Funding Information:

The development of this tutorial was supported in part by USDA NIFA Grant No. 2018-51300-28563, University of Minnesota College of Veterinary Medicine, The VERO Program at Texas A&M University and West Texas A&M University, and the State of Minnesota Agricultural Research, Education, Extension and Technology Transfer program.

annabelledamerum / amrplusplus_bioinformatic_workshop Goto Github PK

amrplusplus_bioinformatic_workshop's Introduction