llrs / biocor Goto Github PK

Package to calculate functional similarity between genes https://biocor.llrs.dev

Home Page: https://www.bioconductor.org/packages/BioCor/

License: Other

R 100.00%

biocor's Introduction

BioCor

This project wants to allow the user to calculate functional similarities (or biological correlation as it was named originally hence the name) and use them for network building or other purposes.

Installation

It is an R package you can install it from the Bioconductor project with:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
      install.packages("BiocManager")
  }
BiocManager::install("BioCor")

You can install this version of BioCor with:

if (!requireNamespace("devtools", quietly = TRUE)) {
      install.packages("devtools")
  }
devtools::install_github("llrs/BioCor")

How to use BioCor?

See the vignette in Bioconductor site and the advanced vignette.
Here is a minimum example:

# The data must be provided, see the vignette for more details.
# Get some pathways from the pathway data
(pathways <- sample(unlist(genesReact, use.names = FALSE), 5))
#> [1] "R-HSA-372790" "R-HSA-168188" "R-HSA-450294" "R-HSA-109582" "R-HSA-194840"
# Calculate the pathway similarity of them
mpathSim(pathways, genesReact, NULL)
#>              R-HSA-372790 R-HSA-168188 R-HSA-450294 R-HSA-109582 R-HSA-194840
#> R-HSA-372790   1.00000000   0.02341920   0.01924619   0.14301552   0.08478425
#> R-HSA-168188   0.02341920   1.00000000   0.79012346   0.02781641   0.00000000
#> R-HSA-450294   0.01924619   0.79012346   1.00000000   0.02335766   0.00000000
#> R-HSA-109582   0.14301552   0.02781641   0.02335766   1.00000000   0.03689065
#> R-HSA-194840   0.08478425   0.00000000   0.00000000   0.03689065   1.00000000

Who might use this package?

It is intended for bioinformaticians, both people interested in knowing the functionally similarity of some genes or clusters and people developing some other analysis at the top of it.

What is the goal of this project?

The goal of this project is to provide methods to calculate functional similarities based on pathways.

What can be BioCor used for?

Here is a non-comprehensive list:

Diseases or drug:
By observing which genes with the same pathways are more affected
Gene/protein functional analysis:
By testing how new pathways are similar to existing pathways
Protein-protein interaction:
By testing if they are involved in the same pathways
miRNA-mRNA interaction:
By comparing clusters they affect
sRNA regulation:
By observing the relationship between sRNA and genes
Evolution:
By comparing similarities of genes between species
Networks improvement:
By adding information about the known relationship between genes
Evaluate pathways databases:
By comparing scores of the same entities

See the advanced vignette

Contributing

Please read how to contribute for details on the code of conduct, and the process for submitting pull requests.

Acknowledgments

Anyone that has contributed to make this package be as is, specially my advisor.

biocor's People

Contributors

Stargazers

Watchers

Forkers

yongming-duan

biocor's Issues

How to quantify evidence of co-functionality?

From Bioinformatics:

"quantify how likely two genes are correlated in their enrichment, function etc. For example, using STRING we can see that PIK3CA and PTEN are more co-functioning than PIK3CA and SF3B1. "

The question is how to add this higher co-functioning evidence in BioCor? My answer is that this should be two separate metrics.

Improvements for version 1.2

List of improvements for the release 1.2

Vignettes

- Remove wall of text when loading Org.Hs.eg.db via suppressPackageStartupMessages
- Remove merging similarities explanation in the vignette
- Call GOSemSim in the vignette instead of comparing with static/hard coded values
- Make a note to section 9.8 about clashing namespace
- Remove cluster description in the GOSemSim comparison
- Section 9.3 move last sentence as a note
- Correct title of section 9.8 / Check grammar
- [] Add advanced vignette (currently hosted in here)
  8.1. - [x] Add the packages needed as suggested
  8.2. - [x] Remove section 1.2 but keep the subset of genes
  8.3. - [x] Explain better the implication of the tests.
  8.4. - [x] Compare the similarity within the DE genes and between DE subset and the others
  8.5. - [x] Maybe a plot of one similarity and the other

Package

- Reduce memory foot print
- Reduce time for building in Windows

Bug report on BBS

Describe the bug
The error when checking is the condition has length > 1.

To Reproduce
I tried with the local option without docker and couldn't reproduce the error. Despite using the check:

_R_CHECK_LENGTH_1_CONDITION_ =${_R_CHECK_LENGTH_1_CONDITION_-verbose}
_R_CHECK_LENGTH_1_LOGIC2_=${_R_CHECK_LENGTH_1_LOGIC2_-verbose}

Should use Bioconductor docker: bioconductor/bioconductor_docker:devel

Expected behavior
Not a faulty build

Additional context
Version 1.11.1 didn't solve the issues, so I might need to do something else. And should check on R-relesae

Improvements for version 1.8

Package

- #10 Improve testing using Appveyor for testing in windows
- #3 Improve test for coherence between using GeneSetCollections and not.
- #8 Classificate gene sets.

Taking advantage of GeneSetCollection

BioCor should handle more graciously the GeneSetCollection adding methods to calculate the functional similarities and compare them easier.

Warning while installing it on a CentOS machine

Warning: multiple methods tables found for ‘toTable’

Create sticker

https://github.com/Bioconductor/BiocStickers

To evaluate enrichment

Explore the idea that the less similar an enrichment is, the better the input is (either the gene sets) or the genes for the enrichment.

Calculate gene information

This issue is related to #4, the goal is use those variables for each gene.

This could also shed light on the issue of finding functional similarities between genes. Squashing the size of the pathways and comparing only the content might not be the best approach.

Use case: classificate gene sets

Add an example here or in the blog about how to use it to classify GeneSets which are similar
Suggestion:

Via a dendrogram find those that are related
Parsing of the names of the gene sets to find the right label

Reduce complexity

Reduce complexity of combineScoresPar and combineScores, to remove the error in #3 and point 2 of #11 :

library("cyclocomp")
cyclocomp_package("BioCor")
#>                name cyclocomp
#> 7     combineScores        28
#> 8  combineScoresPar        24
#> 11          diceSim         7
#> 30     weighted.sum         7
#> 6        combinadic         6
#> 1   addSimilarities         5
#> 27     similarities         5
#> 24       reciprocal         4
#> 29    weighted.prod         4
#> 2            AintoB         3
#> 9    combineSources         3
#> 10              D2J         3
#> 16              J2D         3
#> 3               BMA         2
#> 12 duplicateIndices         2
#> 15      inverseList         2
#> 23            rcmax         2
#> 25        removeDup         2
#> 26          seq2mat         2
#> 28         vdiceSim         2
#> 4    clusterGeneSim         1
#> 5        clusterSim         1
#> 13          geneSim         1
#> 14             Info         1
#> 17  mclusterGeneSim         1
#> 18      mclusterSim         1
#> 19         mgeneSim         1
#> 20         mpathSim         1
#> 21          pathSim         1
#> 22  pathSims_matrix         1

Plot a heatmap

People is seeing plots of similarity. It would be nice to have one in the "About BioCor" vignette

Probably it could replace a matrix on this section

Remove the printing on the loading of the package

Remove the printing on the loading of the package...
Basically remove the zzz.R file

Export inverseList and redirect users to it

inverseList is useful, export it, and probably hint at it when there is no pathway name found as per:

genesSim <- mpathSim(names(models), genes, method = NULL)
lengths(genes)
##      model0       model1       model2  model2_best       model3  model3_best model3_best2 model3_bestB 
##         3461         3783         3734         3743         3575         3580         3584         3578

Summary information for a GeneSetCollection

Related to #3 and #4, to compare or assess GeneSetCollections it would be good to have a summary of the GeneSetCollection.

Use other information of the gene sets

Use other metrics of the gene sets aside from the similarity.

See the page about other variables

Link between vignettes

It seems that the official way to do it is using Biocpkg(package, vignette=name_vignete.html, label=text_to_show)

For reference, that links to iSEE vignette.

Use cffr

Use cffr to make it easier cite the package.

Also it might be worth to comment it on the slacks

Error building the package

Error (on build of 2018-10-21 21:45:59 -0400 (Sun, 21 Oct 2018)) related to $ operator, but for building an image. It doesn't seem related to my package' code

MacOS

* creating vignettes ... ERROR
sh: line 1: 30774 Abort trap: 6           'convert' 'BioCor_1_basics_files/figure-html/hclust1-1.png' -trim 'BioCor_1_basics_files/figure-html/hclust1-1.png' > /dev/null
sh: line 1: 31077 Abort trap: 6           'convert' 'BioCor_1_basics_files/figure-html/hclust3-1.png' -trim 'BioCor_1_basics_files/figure-html/hclust3-1.png' > /dev/null
sh: line 1: 31136 Abort trap: 6           'convert' 'BioCor_1_basics_files/figure-html/hclust3b-1.png' -trim 'BioCor_1_basics_files/figure-html/hclust3b-1.png' > /dev/null
Quitting from lines 271-282 (BioCor_1_basics.Rmd) 
Error: processing vignette 'BioCor_1_basics.Rmd' failed with diagnostics:
$ operator is invalid for atomic vectors
Execution halted

Windows

* creating vignettes ... ERROR
Invalid Parameter - /figure-html
Warning in shell(paste(c(cmd, args), collapse = " ")) :
  'convert "BioCor_1_basics_files/figure-html/hclust1-1.png" -trim "BioCor_1_basics_files/figure-html/hclust1-1.png"' execution failed with error code 4
Invalid Parameter - /figure-html
Warning in shell(paste(c(cmd, args), collapse = " ")) :
  'convert "BioCor_1_basics_files/figure-html/hclust3-1.png" -trim "BioCor_1_basics_files/figure-html/hclust3-1.png"' execution failed with error code 4
Invalid Parameter - /figure-html
Warning in shell(paste(c(cmd, args), collapse = " ")) :
  'convert "BioCor_1_basics_files/figure-html/hclust3b-1.png" -trim "BioCor_1_basics_files/figure-html/hclust3b-1.png"' execution failed with error code 4
Quitting from lines 271-282 (BioCor_1_basics.Rmd) 
Error: processing vignette 'BioCor_1_basics.Rmd' failed with diagnostics:
$ operator is invalid for atomic vectors
Execution halted

Linux:

* creating vignettes ... ERROR
Quitting from lines 271-282 (BioCor_1_basics.Rmd) 
Error: processing vignette 'BioCor_1_basics.Rmd' failed with diagnostics:
$ operator is invalid for atomic vectors
Execution halted

Enable hidding code

Convert these lines

BioCor/vignettes/BioCor_2_advanced.Rmd

Lines 9 to 10 in b8411a9

 BiocStyle::html_document: 

 fig_caption: true

To this:

BioCor/vignettes/BioCor_1_basics.Rmd

Lines 12 to 15 in b8411a9

 BiocStyle::html_document: 

 fig_caption: true 

 code_folding: show 

 self_contained: yes

To be able to hide long pieces of code.

Probably it could be used in some code chunks like hidding per default the ending session info

Pay attention to BioCor.Rproj and other related files/changes which might not be fully sync

Allow to convert pathway information to GeneSetCollection

Related to #3, instead of using lists from metabolic pathways databases, use GeneSetCollections

library("reactome.db")
genesReact <- as.list(reactomeEXTID2PATHID)

It would be great to work with:

library("reactome.db")
genesReact <- as.GeneSetCollection(reactomeEXTID2PATHID)
genesReact
## GeneSetCollection
##   names: R-HSA-109582, R-HSA-114608, R-HSA-168249, R-HSA-168256, R-HSA-6798695, R-HSA-76002, ... (22001 total)
##   unique identifiers: 5167, 100288400, ..., 57191 (69713 total)
##   types in collection:
##     geneIdType: EntrezIdentifier (1 total)

Check that using list work

I got a strange error about a list not being character. I was using mclusterGeneSim perhaps it was using the function for GeneSetCollection.

The input was:

set.seed(456)
# info
library("reactome.db")
#> Loading required package: AnnotationDbi
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind,
#>     colMeans, colnames, colSums, dirname, do.call, duplicated,
#>     eval, evalq, Filter, Find, get, grep, grepl, intersect,
#>     is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
#>     paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
#>     Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
#>     table, tapply, union, unique, unsplit, which, which.max,
#>     which.min
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: IRanges
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:base':
#> 
#>     expand.grid
library("BioCor")
#> If you use BioCor in published research, please cite:
genes2Pathways <- as.list(reactomeEXTID2PATHID)
pathways <- unlist(genes2Pathways, use.names = FALSE)
genes <- rep(names(genes2Pathways), lengths(genes2Pathways))
paths2genes <- split(genes, pathways)
human <- grep("R-HSA-", names(paths2genes))
paths2genes <- paths2genes[human]
paths2genes <- lapply(paths2genes, unique)
paths2genes <- paths2genes[lengths(paths2genes) >= 2]
genes2paths <- GSEAdv:::inverseList(paths2genes)

# clusters
clusters <- list(a=sample(genes, 50), b = sample(genes, 25))
mclusterGeneSim(clusters, info = genes2paths, method = c("max", "BMA"))
#> Warning in mclusterGeneSim(clusters, info = genes2paths, method =
#> c("max", : Some genes are not in the list provided.
#> Error in if (is.na(rowIds) || is.na(colIds)) {: missing value where TRUE/FALSE needed
mclusterGeneSim(clusters, info = paths2genes, method = c("max", "BMA"))
#> Warning in mclusterGeneSim(clusters, info = paths2genes, method =
#> c("max", : Some genes are not in the list provided.
#> Error in mpathSim(pathwaysl, info, NULL): The input pathways should be characters

^{Created on 2018-11-15 by the reprex package (v0.2.1)}

Ensure that NEWS is in the right format

If you are using a NEWS file, make sure it can be parsed by utils::news()

Build failure on devel due to a GOSemSim

Build failure on devel due to GOSemSim:

genes <- c("23098", "4843", "5431", "4710", "4287", "5217", "7321", "1207", 
"9891", "27252", "56922", "1136", "51668", "5241", "54700", "43", 
"11020", "5372", "7528", "79913", "2717", "6650", "9738", "3718", 
"9827", "23586", "9148", "975", "84274", "80824", "8078", "10686", 
"6152", "374291", "60482", "6509", "2582", "10560", "9194", "5228", 
"25950", "10564", "26212", "8189", "94101", "8520", "968", "4301", 
"2643", "51763", "23164", "254428", "29079", "56886", "9380", 
"85465", "2247", "254013", "54509", "4123", "3801", "27043", 
"10907", "84958", "26230", "9589", "908", "27147", "6129", "6749", 
"2308", "7069", "3628", "5352", "1525", "58494", "9337", "7273", 
"10670", "138199", "6750", "26958", "136227", "29115", "51005", 
"7086", "285231", "4724", "9232", "1020", "2923", "124975", "55048", 
"55867", "3516", "9677", "3965", "6940", "27258", "3866", "54811", 
"5707", "201626", "7025", "10458", "127064", "126375", "9735", 
"3852", "388567", "55615", "401541", "388552", "728", "5660", 
"5336", "8337", "5004", "3833", "26063", "51750", "3690", "92335"
)
library("GOSemSim")
BP <- godata('org.Hs.eg.db', ont="BP", computeIC=TRUE)
gsGO <- GOSemSim::mgeneSim(genes, semData = BP, measure = "Resnik", verbose = FALSE)
## Error in infoContentMethod_cpp(ID1, ID2, .anc, IC, method, ont) : 
##   Expecting a string vector: [type=logical; required=STRSXP].

Improvements for version 1.3

List of improvements for the release 1.2:

Vignettes

- Explain how to use the functions on point 2 of the package development

Package

- Reduce memory foot print (from issue #1)
  Could be using a similar approach than on GS² of Troy Ruths.
- Add functions to select highly similar and dissimilar genes/pathways/gene sets.
- Add the possibility to use the incidence matrix of gene set collections to calculate the pathway similarities

	BiocStyle::html_document:
	fig_caption: true
	code_folding: show
	self_contained: yes

llrs / biocor Goto Github PK

biocor's Introduction

BioCor

Installation

How to use BioCor?

Who might use this package?

What is the goal of this project?

What can be BioCor used for?

Contributing

Acknowledgments

biocor's People

Contributors

Stargazers

Watchers

Forkers

biocor's Issues

Vignettes

Package

Package

Vignettes

Package

Recommend Projects

Recommend Topics

Recommend Org