Giter Club home page Giter Club logo

dge_workshop's Introduction

THIS REPO IS ARCHIVED, PLEASE GO TO https://hbctraining.github.io/main FOR CURRENT LESSONS.

Differential gene expression workshop

Audience Computational skills required Duration
Biologists Introduction to R 1.5-day workshop (~10 hours of trainer-led time)

Description

This repository has teaching materials for a 1.5-day, hands-on Introduction to differential gene expression (DGE) analysis workshop. The workshop will lead participants through performing a differential gene expression analysis workflow on RNA-seq count data using R/RStudio. Working knowledge of R is required or completion of the Introduction to R workshop.

Learning Objectives

  • QC on count data using Principal Component Analysis (PCA) and hierarchical clustering
  • Using DESeq2 to obtain a list of significantly different genes
  • Visualizing expression patterns of differentially expressed genes
  • Performing functional analysis on gene lists with R-based tools

These materials are developed for a trainer-led workshop, but also amenable to self-guided learning.

Lessons

Below are links to the lessons and suggested schedules:

Installation Requirements

  1. Download the most recent versions of R and RStudio for your laptop:
  1. Install the following packages using the instructions provided below.

NOTE:  When installing the following packages, if you are asked to select (a/s/n) or (y/n), please select “a” or "y" as applicable but know that it can take awhile.

(a) Install the below packages on your laptop from CRAN. You DO NOT have to go to the CRAN webpage; you can use the following function to install them one by one:

install.packages("insert_first_package_name_in_quotations")
install.packages("insert__second_package_name_in_quotations")
& so on ...

Packages to install from CRAN (note that these package names are case sensitive!):

  • BiocManager
  • RColorBrewer
  • pheatmap
  • ggrepel
  • devtools
  • tidyverse

(b) Install the below packages from Bioconductor, using BiocManager::install() function 7 times for the 7 packages:

BiocManager::install("insert_first_package_name_in_quotations")
BiocManager::install("insert_second_package_name_in_quotations") 

Packages to install from Bioconductor (note that these package names are case sensitive!):

  • DESeq2
  • clusterProfiler
  • DOSE
  • org.Hs.eg.db
  • pathview
  • DEGreport
  • EnsDb.Hsapiens.v86
  • AnnotationHub
  • ensembldb
  1. Finally, please check that all the packages were installed successfully by loading them one at a time using the library() function.
library(DESeq2)
library(ggplot2)
library(RColorBrewer)
library(pheatmap)
library(ggrepel)
library(clusterProfiler)
library(DEGreport)
library(org.Hs.eg.db)
library(DOSE)
library(pathview)
library(tidyverse)
library(EnsDb.Hsapiens.v86)
library(AnnotationHub)
library(ensembldb)
  1. Once all packages have been loaded, run sessionInfo().
sessionInfo()

These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

dge_workshop's People

Contributors

jihe-liu avatar marypiper avatar mistrm82 avatar rkhetani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dge_workshop's Issues

plots in deseq2

Hi,
First, totally new to rnaseq analysis... figured out the command line portion (star, stringtie) and also managed deseq2 in R.
So I figured out how to analyze the differential expression in R, got actual fold change values in the command line tables!! But... I can't figure out how to 'save' the resulting expression analysis as a csv file. I did this below but I have no idea where the file is; I checked all of the folders, don't see it.

write.table(res, file="ctrlVS10um.txt", append = FALSE, sep = "\t" )

The other issue is when I tried plotMA or plotcounts, it looks like its running, but I have no idea where to find the graphs. Is it supposed to pop up in my command line window? How would I find the graphs so I can save and export them onto my local computer?

I've been googling both, but all I find are instructions on how to generate graphs or save files, nothing on how to actually save them or where they go.

Thank you.

results() function needs alpha value

We should be specifying the alpha value in the results() function when extracting our results. By default it tests against a alpha = 0.1, and since we select genes with padj < 0.05, we should be testing for an alpha = 0.05.

Also, if we use a lfcThreshold, we should specify that in the results() function as well. I think this is what Mike had mentioned to us previously.

sleuth pca function plots pca on non-log transformed counts

code below to use log transformed values

# Extract data from object
norm_counts <- sleuth_to_matrix(de, "obs_norm", "est_counts")
log_norm_counts <- de$transform_fun_counts(norm_counts)

# Compute PCs
pc <- prcomp(t(log_norm_counts))
plot_pca <- data.frame(pc$x, summarydata)


# Plot with sample names used as data points
ggplot(plot_pca, aes(PC1, PC2)) + 
  theme_bw() +
  geom_point(aes(color=genotype)) +
  xlab('PC1') +
  ylab('PC2') +
  scale_x_continuous(expand = c(0.3,  0.3)) +
  #geom_text_repel(aes(x=PC1, y=PC2), label=name) +
  theme(plot.title = element_text(size = rel(1.5)),
        axis.title = element_text(size = rel(1.5)),
        axis.text = element_text(size = rel(1.25)))

Error/Issue with code for stripping version from ENSEMBL ids

Hi,

I believe there might be an error in the code for stripping version ids from ENSEMBL IDs because some version numbers are double digits. (lesson 9a)

If you apply current code to a file that contains any double digit ensembl version ids (including the demo salmon files):
Current code: ids.strip <- str_replace(ids, "([.][0-9])", "")
Then the second number in the ENSEMBL version ID is appended to the end to the ENSEMBL IDs which results in errors in downstream processes.
ENST00000339924.12
becomes
ENST000003399242
instead of
ENST00000339924

Probably a more efficient way to do this but I circumvented this by running two code steps to strip double digit and then single digit version ids:
ids.strip <- str_replace(ids, "([.][0-9][0-9])", "")
ids.strip <- str_replace(ids.strip, "([.][0-9])", "")

Best,
Sam

update normalization lesson

Bring in materials from Mary's BOSC lesson, specifically the table for normalization methods and associated text.

Overview of DGE Analysis Workflow

Hello,

I am new to R program, and i can not follow the step in "Overview of DGE Analysis Workflow" from: Salmon (quant.sf) to tximport. https://hbctraining.github.io/DGE_workshop_salmon/lessons/01_DGE_setup_and_overview.html

when i run the below codes:

List all directories containing data

samples <- list.files(path = "./data", full.names = T, pattern="salmon$")

Obtain a vector of all filenames including the path

files <- file.path(samples, "quant.sf")

Since all quant files have the same name it is useful to have names for each element

names(files) <- str_replace(samples, "./data/", "") %>%
str_replace(".salmon", "")

It showed" files character(0) & samples character (empty)", and i can not run tximport.

Look for the advice. Really appreciate.

Cannot use FPKM/RPKM/TPM for comparison between samples?

I have checked the DESeq2 normalizaion result, the sum are also different between samples like FPKM/RPKM/TPM. In your idea, i think use RPGC(1x average coverage) are better.

In most paper(about cancer) when show the gene change quote from TCGA adopt the FPKM instead of DESeq2 normalization result( TCGA produce the raw count)

How to use DEseq2 for Differential expression

Hi,
I have a .txt file generated from featureCounts file generated from featureCounts. I want to use DEseq2 for Differential expression analysis. Please suggest any script to run the program.
Here is my input file looks like:
counts.txt

Thank you,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.