Giter Club home page Giter Club logo

qbio's People

Contributors

fi4sko avatar

Watchers

 avatar  avatar  avatar

qbio's Issues

Error in Step 5

myTopHits2 <- topTable(ebFit2, adjust ="BH", coef=1, number=50, sort.by="logFC")
myTopHits2
ebFit2
gost.res2 <- gost(rownames(myTopHits2), organism = "tccriollo", correction_method = "fdr", significant = F)
gost.res2

That's the code I run and that's the corresponding error code:
No results to show
Please make sure that the organism is correct or set significant = FALSE

We couldn't solve this problem on our own by setting significant = FALSE and adding a user.threshold didn't help as well.

Gene sets enrichment analysis (GSEA) using g:Profiler still works fine for me!

See https://github.com/IngoGiebel/qbio304-student-work/blob/main/scripts/dge-analysis-PRJCA004229.R on how the variables used here were created,

------------------------------------------------------------------------------

Step 5: Gene sets enrichment analysis (GSEA) using g:Profiler

------------------------------------------------------------------------------

Oryza nivara

Functional enrichment analysis of the 100 top-ranked genes

top_genes_gostres_onivara <- gprofiler2::gost(
top_genes_onivara_df$geneID[1:100],
organism = "onivara",
correction_method = "fdr")

Produce an interactive manhattan plot of the enriched GO terms

gprofiler2::gostplot(
top_genes_gostres_onivara,
interactive = TRUE,
capped = FALSE)

Produce a static publication quality manhattan plot

with the first 10 top-ranked GO terms highlighted.

gprofiler2::gostplot(
top_genes_gostres_onivara,
interactive = FALSE,
capped = FALSE) |>
gprofiler2::publish_gostplot(
highlight_terms = top_genes_gostres_onivara$result$term_id[1:10])

Generate a table of the gost results of the first 20 top-ranked GO terms

gprofiler2::publish_gosttable(
top_genes_gostres_onivara,
highlight_terms = top_genes_gostres_onivara$result$term_id[1:20],
show_columns = c("source", "term_name", "term_size", "intersection_size"))

Oryza sativa

Functional enrichment analysis of the 100 top-ranked genes

top_genes_gostres_osativa <- gprofiler2::gost(
top_genes_osativa_df$geneID[1:100],
organism = "osativa",
correction_method = "fdr")

Produce an interactive manhattan plot of the enriched GO terms

gprofiler2::gostplot(
top_genes_gostres_osativa,
interactive = TRUE,
capped = FALSE)

Produce a static publication quality manhattan plot

with the first 10 top-ranked GO terms highlighted.

gprofiler2::gostplot(
top_genes_gostres_osativa,
interactive = FALSE,
capped = FALSE) |>
gprofiler2::publish_gostplot(
highlight_terms = top_genes_gostres_osativa$result$term_id[1:10])

Generate a table of the gost results of the first 20 top-ranked GO terms

gprofiler2::publish_gosttable(
top_genes_gostres_osativa,
highlight_terms = top_genes_gostres_osativa$result$term_id[1:20],
show_columns = c("source", "term_name", "term_size", "intersection_size"))

image

image

Oryza sativa: Found GMT files use different gene codes from that used in BioMart

Checked GMT files: http://structuralbiology.cau.edu.cn/PlantGSEA/download.php

- GO (Gene Ontology) gene sets : http://structuralbiology.cau.edu.cn/PlantGSEA/database/Osa_GO

- Gene Family based gene sets : http://structuralbiology.cau.edu.cn/PlantGSEA/database/Osa_GFam

- KEGG gene sets : http://structuralbiology.cau.edu.cn/PlantGSEA/database/Osa_KEGG

- PO gene sets : http://structuralbiology.cau.edu.cn/PlantGSEA/database/Osa_PO

All these files do not fully adhere the GMT standard which states that the genes must be separated by tabs. In these file the genes are separated by ",". That issue can of course be tackled. When doing so, a knockout problem arises... The codes for the genes differ from the codes used in the reference genome file "https://ftp.ebi.ac.uk/ensemblgenomes/pub/release-56/plants/fasta/oryza_sativa/cdna/".

For example:
BioMart gene codes: Os12g0469300, Os07g0249200
MSU Rice Genome Annotation Project gene codes (used in the GMT files): LOC_Os01g07760, LOC_Os01g40630, LOC_Os03g59220

At http://plants.ensembl.org/Oryza_sativa/Location/Viewdb=core;g=Os03g0786000;r=3:32624612-32627796;t=Os03t0786000-01 I found the following information (and only there) when displaying the information for one of the genes:

Transcript LOC_Os01g02240.1.1
Gene LOC_Os01g02240
Protein product LOC_Os01g02240.1
Location Chromosome 1: 678,778-684,594
Gene type Msu gene
Strand Reverse
Base pairs 4,758
Amino acids 1,585
Analysis Genes (MSU)
Annotation method Gene annotation by MSU Rice Genome Annotation Project dated 2011-10-31. These genes are included alongside the IRGSP annotations, but are not included in Compara or BioMart. Read more...;


Genome Analysis
rGREAT: an R/bioconductor package for functional
enrichment on genomic regions

image

Unfortunately, I could not find any other suitable GMT files which use the BioMart gene codes (used with kallisto/reference genome file and the tximport).

Script 5 - getGmt Error

Some of you get errors while importing some of the PlantGSEA gmt files

> broadSet.C2.ALL <- getGmt("Osa.DetailInfo.csv", geneIdType=SymbolIdentifier())
Error in validObject(.Object) : 
  invalid class “GeneSetCollection” object: each setName must be distinct
In addition: Warning message:
In getGmt("Osa.DetailInfo.csv", geneIdType = SymbolIdentifier()) :
  5788 record(s) contain duplicate ids: 'DE_NOVO'_IMP_BIOSYNTHETIC_PROCESS, 'DE_NOVO'_PYRIMIDINE_NUCLEOBASE_BIOSYNTHETIC_PROCESS, ..., ZINC_ION_TRANSMEMBRANE_TRANSPORTER_ACTIVITY, ZINC_ION_TRANSPORT

The error is caused by duplicated names of some of the gene sets. I dont know why such duplicates occur in the file, they can be easily removed using R and the code below.

# Quick solution 
# 1. Add ".csv "extension to the downloaded file, here for rice, the file name is "Osa.DetailInfo" downloaded from PlantGSEA
# 2. Read the file
tmp = read.csv("Osa.DetailInfo.csv", header = F, sep = "\t")
# 3. make tibble
tmp = as.tibble(tmp)
# 4. remove Duplicates
tmp = tmp[!duplicated(tmp$V1), ]
# 5. write new file
write.table(tmp, "OsaUnique.csv", sep="\t",col.names = F,row.names = F)
# 6. read the file as Gmt
broadSet.Osa.Unique = getGmt("OsaUnique.csv", geneIdType=SymbolIdentifier())

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.