Giter Club home page Giter Club logo

rna-seq-lesson's People

Contributors

abbycabs avatar alee avatar anacost avatar asarafoglou avatar brandoncurtis avatar erinbecker avatar evanwill avatar fmichonneau avatar franskloet avatar fred-white94 avatar gvwilson avatar jduckles avatar jpallen avatar jsta avatar katrinleinweber avatar mawds avatar maxim-belkin avatar mgalland avatar mr-c avatar neon-ninja avatar pbanaszkiewicz avatar pipitone avatar raynamharris avatar rgaiacs avatar sshinne avatar synesthesiam avatar tijsbliek avatar tracykteal avatar twitwi avatar wclose avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rna-seq-lesson's Issues

scripts/mypca.R

Dear ScienceParkStudyGroup-Team,

I am struggling with your Introduction to RNA-seq lessons.
I managed to start a RStudio instance in the browser and now tried to follow Episode 05. However I got stuck at the very beginning when I try to execute:
source("scripts/mypca.R")
This is followed by an error:
Error in file(filename, "r", encoding = encoding) :
cannot open the connection
In addition: Warning message:
In file(filename, "r", encoding = encoding) :
cannot open file 'scripts/mypca.R': No such file or directory

My this is a trivial problem, but I cannot figure out where it comes from.

Help would be appreciated
Thanks and best,
Matthias


Outdated GO terms for enrichment analysis

Apparently, the GO terms in the org.At.tair.db Bioconductor package (release 3.10) are outdated.

In this package, there are 27,416 Arabidopsis genes for a total of 4837 GO terms.

columns(org.At.tair.db)
 [1] "ARACYC"       "ARACYCENZYME" "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL" 
 [7] "GENENAME"     "GO"           "GOALL"        "ONTOLOGY"     "ONTOLOGYALL"  "PATH"        
[13] "PMID"         "REFSEQ"       "SYMBOL"       "TAIR"        

length(keys(org.At.tair.db, keytype = 'TAIR'))
[1] 27416

length(keys(org.At.tair.db, keytype = 'GO'))
[1] 4837

One solution would perhaps to use the official most up to date GO terms (available here) and perform the enrichment analysis manually.

Broken link in episode 3

Hello,
Thanks for providing this wonderful resource.

I wanted to point out that there is a broken link in episode 3. Specifically, the linked file does not exist, and I couldn't find it anywhere in the repo:

We encourage you to look at the full set of reads and note how the QC results differ when using the entire dataset

We encourage you to look at the [full set of reads](../fastqc/Mov10oe_1-fastqc_report.html) and note how the QC results differ when using the entire dataset

Can't find this file: ../fastqc/Mov10oe_1-fastqc_report.html

Cheers,
Jacob

Searching for regulatory elements

From a list of differential genes.

6.1 Extracting the coordinates of genes

Using biomartr

6.2 Adding or substracting X nts

For instance, 5000 nts
If gene is on DNA strand + then substract 5000 nts
If gene is on DNA strand - then add 5000 nts

Promoter retrieval using GenomicRanges
MEME for motif...

episode 02 improvement ideas

For the statistical refresher part:

What is a good p-value histogram? It should have a high peak on the left suggesting that you are comparing two conditions that have different distributions.
If distribution is uniform then no difference between your experimental conditions being tested.

Perhaps also split episode 02 into "statistical refresher" and a new episode termed 03 "statistics applied to RNA-seq"

For episode 02:

  1. population and sample notions
  2. simulate two populations from two countries with different heights.
  3. draw a sample + increase size of sample and make average + sd estimations.
  4. Case 1 = identical populations (= same country)
    • draw N samples of similar size. Say N = 5, N = 10 groups or N = 10,000 groups.
    • perform a t test for each of these three group sizes.
    • draw a p-value histogram for these 3 group sizes.
    • FDR procedure to control for type I error = false positives.
  5. Case 2 = different populations
    • draw N samples of similar size. Say N = 5, N = 10 groups or N = 10,000 groups.
    • perform a t test for each of these three group sizes.
    • draw a p-value histogram for these 3 group sizes.
    • FDR procedure to control for type I error = false positives.
  6. Type I error and type II error.
  7. FDR procedure
  8. Power

For episode 03 = application to RNA-seq

  • maximise biological replicate sample numbers to increase statistical power.
  • talk about sample sequencing depth = rarefaction curve.
  • p-values histogram profiles and what to do about it.

Useful links
https://www.bioconductor.org/packages/release/bioc/html/qvalue.html

Episode 7 Functional Enrichment Analysis GO/KEGG for non model organisms

I just wanted to point out that using clusterprofiler with OrgDb objects is not ideal for less well annotated species. This is the case where the OrgDb comes from AnnotationHub.

This includes rice for example. The issue is with OrgDb not having translations from EntrezIDs to GO terms ~75% of the input EntrezIDs do not map to GO terms through this method.
Since the OrgDb object does not have a ensembl keytype I was forced to translate using biomart from ensembl to entrez. This also loses some IDs.
A direct translation from ensembl to GO terms (using biomaRt) leads to only ~39 % non-mapping genes.
I am unaware of a method to update OrgDb objects with, for example, new keyTypes. But need to look into it as this clusterprofiler method for GSEA is unusable for lesser annotated species.

I have not tried creating an OrgDb from ncbi, but I would not recommend using AnnotationHub as was recommended by the authors of clusterProfiler

grading table

  • Figures: present / not:

    • PCA: + 1
    • Top ten table: +1
    • Volcano plot: +1
    • Heatmap: +1
  • Interpretation of the figures: 0.5 point for each figure commented.

    • PCA: + 0.5
    • Top ten table: + 0.5
    • Volcano plot: + 0.5
    • Heatmap: + 0.5
  • Volcano plot:

  • Title: + 0.2

  • Explicit arguments + 0.2

  • Set cutoff axes + 0.2

  • Set the limits of the min/max +0.2

  • Heatmap

    • Top ten/50/differential genes: + 0.2
    • Clustering of the genes: + 0.2

assignment ideas

Make a clustering of the samples: step by step.

Find individual genes affected by the treatment and make up a story from it:

  • retrieving info on TAIR.org
  • gene function
  • subcellular localization, etc.

Have each student to pick up a gene different from the other students in the list of differentially expressed genes.

Variance stabilisation

Need to add the vst(dds) step before the PCA. It is not correct to plot the PCA without this transformation.

biomart cache clear

Add a little piece to exemplify when the biomartr function runs. Make a screenshot of the query when it runs.

Then add the BiomaRt::biomartCacheClear() piece of code in the main body of the episode as this is a frequent bug.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.