scienceparkstudygroup / rna-seq-lesson Goto Github PK
View Code? Open in Web Editor NEWA Carpentries-style lesson on RNA-Sequencing
Home Page: https://scienceparkstudygroup.github.io/rna-seq-lesson/
License: Other
A Carpentries-style lesson on RNA-Sequencing
Home Page: https://scienceparkstudygroup.github.io/rna-seq-lesson/
License: Other
In order to have them available for the next course.
Dear ScienceParkStudyGroup-Team,
I am struggling with your Introduction to RNA-seq lessons.
I managed to start a RStudio instance in the browser and now tried to follow Episode 05. However I got stuck at the very beginning when I try to execute:
source("scripts/mypca.R")
This is followed by an error:
Error in file(filename, "r", encoding = encoding) :
cannot open the connection
In addition: Warning message:
In file(filename, "r", encoding = encoding) :
cannot open file 'scripts/mypca.R': No such file or directory
My this is a trivial problem, but I cannot figure out where it comes from.
Help would be appreciated
Thanks and best,
Matthias
Apparently, the GO terms in the org.At.tair.db
Bioconductor package (release 3.10) are outdated.
In this package, there are 27,416 Arabidopsis genes for a total of 4837 GO terms.
columns(org.At.tair.db)
[1] "ARACYC" "ARACYCENZYME" "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL"
[7] "GENENAME" "GO" "GOALL" "ONTOLOGY" "ONTOLOGYALL" "PATH"
[13] "PMID" "REFSEQ" "SYMBOL" "TAIR"
length(keys(org.At.tair.db, keytype = 'TAIR'))
[1] 27416
length(keys(org.At.tair.db, keytype = 'GO'))
[1] 4837
One solution would perhaps to use the official most up to date GO terms (available here) and perform the enrichment analysis manually.
To add to the next versions of the lesson:
https://www.slideshare.net/jakonix/4rna-seqpart4extracting-countsandanalysing
A great week of courses:
https://www.physalia-courses.org/courses-workshops/course19/curriculum-19/
Hello,
Thanks for providing this wonderful resource.
I wanted to point out that there is a broken link in episode 3. Specifically, the linked file does not exist, and I couldn't find it anywhere in the repo:
We encourage you to look at the full set of reads and note how the QC results differ when using the entire dataset
We encourage you to look at the [full set of reads](../fastqc/Mov10oe_1-fastqc_report.html) and note how the QC results differ when using the entire dataset
Can't find this file: ../fastqc/Mov10oe_1-fastqc_report.html
Cheers,
Jacob
From a list of differential genes.
Using biomartr
For instance, 5000 nts
If gene is on DNA strand + then substract 5000 nts
If gene is on DNA strand - then add 5000 nts
Promoter retrieval using GenomicRanges
MEME for motif...
For the statistical refresher part:
What is a good p-value histogram? It should have a high peak on the left suggesting that you are comparing two conditions that have different distributions.
If distribution is uniform then no difference between your experimental conditions being tested.
Perhaps also split episode 02 into "statistical refresher" and a new episode termed 03 "statistics applied to RNA-seq"
Useful links
https://www.bioconductor.org/packages/release/bioc/html/qvalue.html
I just wanted to point out that using clusterprofiler with OrgDb objects is not ideal for less well annotated species. This is the case where the OrgDb comes from AnnotationHub.
This includes rice for example. The issue is with OrgDb not having translations from EntrezIDs to GO terms ~75% of the input EntrezIDs do not map to GO terms through this method.
Since the OrgDb object does not have a ensembl keytype I was forced to translate using biomart from ensembl to entrez. This also loses some IDs.
A direct translation from ensembl to GO terms (using biomaRt) leads to only ~39 % non-mapping genes.
I am unaware of a method to update OrgDb objects with, for example, new keyTypes. But need to look into it as this clusterprofiler method for GSEA is unusable for lesser annotated species.
I have not tried creating an OrgDb from ncbi, but I would not recommend using AnnotationHub as was recommended by the authors of clusterProfiler
Figures: present / not:
Interpretation of the figures: 0.5 point for each figure commented.
Volcano plot:
Title: + 0.2
Explicit arguments + 0.2
Set cutoff axes + 0.2
Set the limits of the min/max +0.2
Heatmap
Make a clustering of the samples: step by step.
Find individual genes affected by the treatment and make up a story from it:
Have each student to pick up a gene different from the other students in the list of differentially expressed genes.
Need to add the vst(dds) step before the PCA. It is not correct to plot the PCA without this transformation.
Add a little piece to exemplify when the biomartr function runs. Make a screenshot of the query when it runs.
Then add the BiomaRt::biomartCacheClear()
piece of code in the main body of the episode as this is a frequent bug.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.