namlab / qbio Goto Github PK
View Code? Open in Web Editor NEWQBIO HHU course materials
QBIO HHU course materials
myTopHits2 <- topTable(ebFit2, adjust ="BH", coef=1, number=50, sort.by="logFC")
myTopHits2
ebFit2
gost.res2 <- gost(rownames(myTopHits2), organism = "tccriollo", correction_method = "fdr", significant = F)
gost.res2
That's the code I run and that's the corresponding error code:
No results to show
Please make sure that the organism is correct or set significant = FALSE
We couldn't solve this problem on our own by setting significant = FALSE and adding a user.threshold didn't help as well.
See https://github.com/IngoGiebel/qbio304-student-work/blob/main/scripts/dge-analysis-PRJCA004229.R on how the variables used here were created,
top_genes_gostres_onivara <- gprofiler2::gost(
top_genes_onivara_df$geneID[1:100],
organism = "onivara",
correction_method = "fdr")
gprofiler2::gostplot(
top_genes_gostres_onivara,
interactive = TRUE,
capped = FALSE)
gprofiler2::gostplot(
top_genes_gostres_onivara,
interactive = FALSE,
capped = FALSE) |>
gprofiler2::publish_gostplot(
highlight_terms = top_genes_gostres_onivara$result$term_id[1:10])
gprofiler2::publish_gosttable(
top_genes_gostres_onivara,
highlight_terms = top_genes_gostres_onivara$result$term_id[1:20],
show_columns = c("source", "term_name", "term_size", "intersection_size"))
top_genes_gostres_osativa <- gprofiler2::gost(
top_genes_osativa_df$geneID[1:100],
organism = "osativa",
correction_method = "fdr")
gprofiler2::gostplot(
top_genes_gostres_osativa,
interactive = TRUE,
capped = FALSE)
gprofiler2::gostplot(
top_genes_gostres_osativa,
interactive = FALSE,
capped = FALSE) |>
gprofiler2::publish_gostplot(
highlight_terms = top_genes_gostres_osativa$result$term_id[1:10])
gprofiler2::publish_gosttable(
top_genes_gostres_osativa,
highlight_terms = top_genes_gostres_osativa$result$term_id[1:20],
show_columns = c("source", "term_name", "term_size", "intersection_size"))
Checked GMT files: http://structuralbiology.cau.edu.cn/PlantGSEA/download.php
All these files do not fully adhere the GMT standard which states that the genes must be separated by tabs. In these file the genes are separated by ",". That issue can of course be tackled. When doing so, a knockout problem arises... The codes for the genes differ from the codes used in the reference genome file "https://ftp.ebi.ac.uk/ensemblgenomes/pub/release-56/plants/fasta/oryza_sativa/cdna/".
For example:
BioMart gene codes: Os12g0469300, Os07g0249200
MSU Rice Genome Annotation Project gene codes (used in the GMT files): LOC_Os01g07760, LOC_Os01g40630, LOC_Os03g59220
At http://plants.ensembl.org/Oryza_sativa/Location/Viewdb=core;g=Os03g0786000;r=3:32624612-32627796;t=Os03t0786000-01 I found the following information (and only there) when displaying the information for one of the genes:
Transcript LOC_Os01g02240.1.1
Gene LOC_Os01g02240
Protein product LOC_Os01g02240.1
Location Chromosome 1: 678,778-684,594
Gene type Msu gene
Strand Reverse
Base pairs 4,758
Amino acids 1,585
Analysis Genes (MSU)
Annotation method Gene annotation by MSU Rice Genome Annotation Project dated 2011-10-31. These genes are included alongside the IRGSP annotations, but are not included in Compara or BioMart. Read more...;
Genome Analysis
rGREAT: an R/bioconductor package for functional
enrichment on genomic regions
Unfortunately, I could not find any other suitable GMT files which use the BioMart gene codes (used with kallisto/reference genome file and the tximport).
Some of you get errors while importing some of the PlantGSEA gmt files
> broadSet.C2.ALL <- getGmt("Osa.DetailInfo.csv", geneIdType=SymbolIdentifier())
Error in validObject(.Object) :
invalid class “GeneSetCollection” object: each setName must be distinct
In addition: Warning message:
In getGmt("Osa.DetailInfo.csv", geneIdType = SymbolIdentifier()) :
5788 record(s) contain duplicate ids: 'DE_NOVO'_IMP_BIOSYNTHETIC_PROCESS, 'DE_NOVO'_PYRIMIDINE_NUCLEOBASE_BIOSYNTHETIC_PROCESS, ..., ZINC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY, ZINC_ION_TRANSPORT
The error is caused by duplicated names of some of the gene sets. I dont know why such duplicates occur in the file, they can be easily removed using R and the code below.
# Quick solution
# 1. Add ".csv "extension to the downloaded file, here for rice, the file name is "Osa.DetailInfo" downloaded from PlantGSEA
# 2. Read the file
tmp = read.csv("Osa.DetailInfo.csv", header = F, sep = "\t")
# 3. make tibble
tmp = as.tibble(tmp)
# 4. remove Duplicates
tmp = tmp[!duplicated(tmp$V1), ]
# 5. write new file
write.table(tmp, "OsaUnique.csv", sep="\t",col.names = F,row.names = F)
# 6. read the file as Gmt
broadSet.Osa.Unique = getGmt("OsaUnique.csv", geneIdType=SymbolIdentifier())
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.