Giter Club home page Giter Club logo

biocor's Introduction

BioCor

R build status codecov Build Status Bioc Project Status: Active - The project has reached a stable, usable state and is being actively developed. lifecycle CII Best Practices

This project wants to allow the user to calculate functional similarities (or biological correlation as it was named originally hence the name) and use them for network building or other purposes.

Installation

It is an R package you can install it from the Bioconductor project with:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
      install.packages("BiocManager")
  }
BiocManager::install("BioCor")

You can install this version of BioCor with:

if (!requireNamespace("devtools", quietly = TRUE)) {
      install.packages("devtools")
  }
devtools::install_github("llrs/BioCor")

How to use BioCor?

See the vignette in Bioconductor site and the advanced vignette.
Here is a minimum example:

# The data must be provided, see the vignette for more details.
# Get some pathways from the pathway data
(pathways <- sample(unlist(genesReact, use.names = FALSE), 5))
#> [1] "R-HSA-372790" "R-HSA-168188" "R-HSA-450294" "R-HSA-109582" "R-HSA-194840"
# Calculate the pathway similarity of them
mpathSim(pathways, genesReact, NULL)
#>              R-HSA-372790 R-HSA-168188 R-HSA-450294 R-HSA-109582 R-HSA-194840
#> R-HSA-372790   1.00000000   0.02341920   0.01924619   0.14301552   0.08478425
#> R-HSA-168188   0.02341920   1.00000000   0.79012346   0.02781641   0.00000000
#> R-HSA-450294   0.01924619   0.79012346   1.00000000   0.02335766   0.00000000
#> R-HSA-109582   0.14301552   0.02781641   0.02335766   1.00000000   0.03689065
#> R-HSA-194840   0.08478425   0.00000000   0.00000000   0.03689065   1.00000000

Who might use this package?

It is intended for bioinformaticians, both people interested in knowing the functionally similarity of some genes or clusters and people developing some other analysis at the top of it.

What is the goal of this project?

The goal of this project is to provide methods to calculate functional similarities based on pathways.

What can be BioCor used for?

Here is a non-comprehensive list:

  • Diseases or drug:
    By observing which genes with the same pathways are more affected
  • Gene/protein functional analysis:
    By testing how new pathways are similar to existing pathways
  • Protein-protein interaction:
    By testing if they are involved in the same pathways
  • miRNA-mRNA interaction:
    By comparing clusters they affect
  • sRNA regulation:
    By observing the relationship between sRNA and genes
  • Evolution:
    By comparing similarities of genes between species
  • Networks improvement:
    By adding information about the known relationship between genes
  • Evaluate pathways databases:
    By comparing scores of the same entities

See the advanced vignette

Contributing

Please read how to contribute for details on the code of conduct, and the process for submitting pull requests.

Acknowledgments

Anyone that has contributed to make this package be as is, specially my advisor.

biocor's People

Contributors

davidmasp avatar hpages avatar jwokaty avatar link-ny avatar llrs avatar lshep avatar nturaga avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

yongming-duan

biocor's Issues

How to quantify evidence of co-functionality?

From Bioinformatics:

"quantify how likely two genes are correlated in their enrichment, function etc. For example, using STRING we can see that PIK3CA and PTEN are more co-functioning than PIK3CA and SF3B1. "
8lpz1

The question is how to add this higher co-functioning evidence in BioCor? My answer is that this should be two separate metrics.

Improvements for version 1.2

List of improvements for the release 1.2

Vignettes

    • Remove wall of text when loading Org.Hs.eg.db via suppressPackageStartupMessages
    • Remove merging similarities explanation in the vignette
    • Call GOSemSim in the vignette instead of comparing with static/hard coded values
    • Make a note to section 9.8 about clashing namespace
    • Remove cluster description in the GOSemSim comparison
    • Section 9.3 move last sentence as a note
    • Correct title of section 9.8 / Check grammar
    • [] Add advanced vignette (currently hosted in here)
      8.1. - [x] Add the packages needed as suggested
      8.2. - [x] Remove section 1.2 but keep the subset of genes
      8.3. - [x] Explain better the implication of the tests.
      8.4. - [x] Compare the similarity within the DE genes and between DE subset and the others
      8.5. - [x] Maybe a plot of one similarity and the other

Package

    • Reduce memory foot print
    • Reduce time for building in Windows

Bug report on BBS

Describe the bug
The error when checking is the condition has length > 1.

To Reproduce
I tried with the local option without docker and couldn't reproduce the error. Despite using the check:

_R_CHECK_LENGTH_1_CONDITION_ =${_R_CHECK_LENGTH_1_CONDITION_-verbose}
_R_CHECK_LENGTH_1_LOGIC2_=${_R_CHECK_LENGTH_1_LOGIC2_-verbose}

Should use Bioconductor docker: bioconductor/bioconductor_docker:devel

Expected behavior
Not a faulty build

Additional context
Version 1.11.1 didn't solve the issues, so I might need to do something else. And should check on R-relesae

Improvements for version 1.8

Package

    • #10 Improve testing using Appveyor for testing in windows
    • #3 Improve test for coherence between using GeneSetCollections and not.
    • #8 Classificate gene sets.

To evaluate enrichment

Explore the idea that the less similar an enrichment is, the better the input is (either the gene sets) or the genes for the enrichment.

Calculate gene information

This issue is related to #4, the goal is use those variables for each gene.

This could also shed light on the issue of finding functional similarities between genes. Squashing the size of the pathways and comparing only the content might not be the best approach.

Use case: classificate gene sets

Add an example here or in the blog about how to use it to classify GeneSets which are similar
Suggestion:

  1. Via a dendrogram find those that are related
  2. Parsing of the names of the gene sets to find the right label

Reduce complexity

Reduce complexity of combineScoresPar and combineScores, to remove the error in #3 and point 2 of #11 :

library("cyclocomp")
cyclocomp_package("BioCor")
#>                name cyclocomp
#> 7     combineScores        28
#> 8  combineScoresPar        24
#> 11          diceSim         7
#> 30     weighted.sum         7
#> 6        combinadic         6
#> 1   addSimilarities         5
#> 27     similarities         5
#> 24       reciprocal         4
#> 29    weighted.prod         4
#> 2            AintoB         3
#> 9    combineSources         3
#> 10              D2J         3
#> 16              J2D         3
#> 3               BMA         2
#> 12 duplicateIndices         2
#> 15      inverseList         2
#> 23            rcmax         2
#> 25        removeDup         2
#> 26          seq2mat         2
#> 28         vdiceSim         2
#> 4    clusterGeneSim         1
#> 5        clusterSim         1
#> 13          geneSim         1
#> 14             Info         1
#> 17  mclusterGeneSim         1
#> 18      mclusterSim         1
#> 19         mgeneSim         1
#> 20         mpathSim         1
#> 21          pathSim         1
#> 22  pathSims_matrix         1

Export inverseList and redirect users to it

inverseList is useful, export it, and probably hint at it when there is no pathway name found as per:

genesSim <- mpathSim(names(models), genes, method = NULL)
lengths(genes)
##      model0       model1       model2  model2_best       model3  model3_best model3_best2 model3_bestB 
##         3461         3783         3734         3743         3575         3580         3584         3578 

Use cffr

Use cffr to make it easier cite the package.

Also it might be worth to comment it on the slacks

Error building the package

Error (on build of 2018-10-21 21:45:59 -0400 (Sun, 21 Oct 2018)) related to $ operator, but for building an image. It doesn't seem related to my package' code

MacOS
* creating vignettes ... ERROR
sh: line 1: 30774 Abort trap: 6           'convert' 'BioCor_1_basics_files/figure-html/hclust1-1.png' -trim 'BioCor_1_basics_files/figure-html/hclust1-1.png' > /dev/null
sh: line 1: 31077 Abort trap: 6           'convert' 'BioCor_1_basics_files/figure-html/hclust3-1.png' -trim 'BioCor_1_basics_files/figure-html/hclust3-1.png' > /dev/null
sh: line 1: 31136 Abort trap: 6           'convert' 'BioCor_1_basics_files/figure-html/hclust3b-1.png' -trim 'BioCor_1_basics_files/figure-html/hclust3b-1.png' > /dev/null
Quitting from lines 271-282 (BioCor_1_basics.Rmd) 
Error: processing vignette 'BioCor_1_basics.Rmd' failed with diagnostics:
$ operator is invalid for atomic vectors
Execution halted
Windows
* creating vignettes ... ERROR
Invalid Parameter - /figure-html
Warning in shell(paste(c(cmd, args), collapse = " ")) :
  'convert "BioCor_1_basics_files/figure-html/hclust1-1.png" -trim "BioCor_1_basics_files/figure-html/hclust1-1.png"' execution failed with error code 4
Invalid Parameter - /figure-html
Warning in shell(paste(c(cmd, args), collapse = " ")) :
  'convert "BioCor_1_basics_files/figure-html/hclust3-1.png" -trim "BioCor_1_basics_files/figure-html/hclust3-1.png"' execution failed with error code 4
Invalid Parameter - /figure-html
Warning in shell(paste(c(cmd, args), collapse = " ")) :
  'convert "BioCor_1_basics_files/figure-html/hclust3b-1.png" -trim "BioCor_1_basics_files/figure-html/hclust3b-1.png"' execution failed with error code 4
Quitting from lines 271-282 (BioCor_1_basics.Rmd) 
Error: processing vignette 'BioCor_1_basics.Rmd' failed with diagnostics:
$ operator is invalid for atomic vectors
Execution halted
Linux:
* creating vignettes ... ERROR
Quitting from lines 271-282 (BioCor_1_basics.Rmd) 
Error: processing vignette 'BioCor_1_basics.Rmd' failed with diagnostics:
$ operator is invalid for atomic vectors
Execution halted

GeneOverlap package

I don't know how I missed the GeneOverlap package, but it might be worth to explore if it is worth to depend on it or how much this package overlaps 🙄 with that one.

At least it should be mentioned in the vignette

Minor tweaks

Add code_download: true on vingettes (It is already provided by Bioconductor but it might be nice to have)

also check that BioCor_2_advanced.Rmd GOSemSim chunk doesn't have an eval=FALSE option.

Pay attention to BioCor.Rproj and other related files/changes which might not be fully sync

Allow to convert pathway information to GeneSetCollection

Related to #3, instead of using lists from metabolic pathways databases, use GeneSetCollections

library("reactome.db")
genesReact <- as.list(reactomeEXTID2PATHID)

It would be great to work with:

library("reactome.db")
genesReact <- as.GeneSetCollection(reactomeEXTID2PATHID)
genesReact
## GeneSetCollection
##   names: R-HSA-109582, R-HSA-114608, R-HSA-168249, R-HSA-168256, R-HSA-6798695, R-HSA-76002, ... (22001 total)
##   unique identifiers: 5167, 100288400, ..., 57191 (69713 total)
##   types in collection:
##     geneIdType: EntrezIdentifier (1 total)

Check that using list work

I got a strange error about a list not being character. I was using mclusterGeneSim perhaps it was using the function for GeneSetCollection.

The input was:

set.seed(456)
# info
library("reactome.db")
#> Loading required package: AnnotationDbi
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind,
#>     colMeans, colnames, colSums, dirname, do.call, duplicated,
#>     eval, evalq, Filter, Find, get, grep, grepl, intersect,
#>     is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
#>     paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
#>     Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
#>     table, tapply, union, unique, unsplit, which, which.max,
#>     which.min
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: IRanges
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:base':
#> 
#>     expand.grid
library("BioCor")
#> If you use BioCor in published research, please cite:
genes2Pathways <- as.list(reactomeEXTID2PATHID)
pathways <- unlist(genes2Pathways, use.names = FALSE)
genes <- rep(names(genes2Pathways), lengths(genes2Pathways))
paths2genes <- split(genes, pathways)
human <- grep("R-HSA-", names(paths2genes))
paths2genes <- paths2genes[human]
paths2genes <- lapply(paths2genes, unique)
paths2genes <- paths2genes[lengths(paths2genes) >= 2]
genes2paths <- GSEAdv:::inverseList(paths2genes)

# clusters
clusters <- list(a=sample(genes, 50), b = sample(genes, 25))
mclusterGeneSim(clusters, info = genes2paths, method = c("max", "BMA"))
#> Warning in mclusterGeneSim(clusters, info = genes2paths, method =
#> c("max", : Some genes are not in the list provided.
#> Error in if (is.na(rowIds) || is.na(colIds)) {: missing value where TRUE/FALSE needed
mclusterGeneSim(clusters, info = paths2genes, method = c("max", "BMA"))
#> Warning in mclusterGeneSim(clusters, info = paths2genes, method =
#> c("max", : Some genes are not in the list provided.
#> Error in mpathSim(pathwaysl, info, NULL): The input pathways should be characters

Created on 2018-11-15 by the reprex package (v0.2.1)

Build failure on devel due to a GOSemSim

Build failure on devel due to GOSemSim:

genes <- c("23098", "4843", "5431", "4710", "4287", "5217", "7321", "1207", 
"9891", "27252", "56922", "1136", "51668", "5241", "54700", "43", 
"11020", "5372", "7528", "79913", "2717", "6650", "9738", "3718", 
"9827", "23586", "9148", "975", "84274", "80824", "8078", "10686", 
"6152", "374291", "60482", "6509", "2582", "10560", "9194", "5228", 
"25950", "10564", "26212", "8189", "94101", "8520", "968", "4301", 
"2643", "51763", "23164", "254428", "29079", "56886", "9380", 
"85465", "2247", "254013", "54509", "4123", "3801", "27043", 
"10907", "84958", "26230", "9589", "908", "27147", "6129", "6749", 
"2308", "7069", "3628", "5352", "1525", "58494", "9337", "7273", 
"10670", "138199", "6750", "26958", "136227", "29115", "51005", 
"7086", "285231", "4724", "9232", "1020", "2923", "124975", "55048", 
"55867", "3516", "9677", "3965", "6940", "27258", "3866", "54811", 
"5707", "201626", "7025", "10458", "127064", "126375", "9735", 
"3852", "388567", "55615", "401541", "388552", "728", "5660", 
"5336", "8337", "5004", "3833", "26063", "51750", "3690", "92335"
)
library("GOSemSim")
BP <- godata('org.Hs.eg.db', ont="BP", computeIC=TRUE)
gsGO <- GOSemSim::mgeneSim(genes, semData = BP, measure = "Resnik", verbose = FALSE)
## Error in infoContentMethod_cpp(ID1, ID2, .anc, IC, method, ont) : 
##   Expecting a string vector: [type=logical; required=STRSXP].

Improvements for version 1.3

List of improvements for the release 1.2:

Vignettes

    • Explain how to use the functions on point 2 of the package development

Package

    • Reduce memory foot print (from issue #1)
      Could be using a similar approach than on GS² of Troy Ruths.
    • Add functions to select highly similar and dissimilar genes/pathways/gene sets.
    • Add the possibility to use the incidence matrix of gene set collections to calculate the pathway similarities

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.