nsheff / lola Goto Github PK
View Code? Open in Web Editor NEWLocus Overlap Analysis: Enrichment of Genomic Ranges
Home Page: http://code.databio.org/LOLA
Locus Overlap Analysis: Enrichment of Genomic Ranges
Home Page: http://code.databio.org/LOLA
The checkUniverseAppropriateness is a bit confusing, with no real recommendations. I need to clean it up a little bit to make it more universal and useful, and formalize some of the concepts of an appropriate universe.
In addition, I need to fix the current example, which might (gasp!) use an inappropriate universe.
Database downloads include cistrome files for hg19 and hg38 but not for mm10. The database downloads page on databio site describes a LOLACore for mm10 including Cistrome, but when I downloaded the region core database, cistrome is missing from mm10. Is this a future endeavor? Do you plan on including cistrome as part of LOLAcore for mm10 soon?
In the meantime I have contacted Cistrome directly to get access to the files for mm10 and plan to create a custom database to run with LOLA. Hope to hear about future plans with LOLACore for mm10!
Would it be possible to integrate the recently published JSAPR binding prediction for mm10 into Lola Jaspar?
See http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/mm10/ for the data ,http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/ for the method details, and also http://jaspar.genereg.net/genome-tracks/ for a general overview of the new track feature of JASPAR.
Since it's already a region dataset (bed-file) a conversion into the LOLA db format may be straightforward.
Hi,
Many thanks for this extremely useful package.
I have two questions about Fisher's test and output data frame.
Firstly, I would like to ask if is it possible to implement two sided alternative in runLOLA()
function (as shown below)?
if (direction == "depletion") {
fisherAlternative = "less"
}
else if (direction == "two.sided") {
fisherAlternative = "two.sided"
cat("Running fisher test with alternative=two.sided")
}
else {
fisherAlternative = "greater"
}
This topic was already mentioned here https://github.com/nsheff/LOLA/issues/20, however this feature is still not available.
Another question is about the presence of raw p-value within the output data frame, this kind of information may be useful for example to use custom correction for multiple comparisons problem.
Thanks in advance.
We should make sure the PEP loading function has the same caching capability the current loadRegionDB function has:
Hi Nathan,
I would like to highlight that in your paper named "Nonstationary Evolution and Compositional Heterogeneity in Beetle
Mitochondrial Phylogenomics" you had plotted p4 results saying it groups according to the AT%. When I tried to run p4 with the dataset which you had included in the supplementary, It didn't group and the result obtained was different when I plotted for the same. It would be really helpful if you can give me a clarification. Maybe I missed something unlike you while p4 run. Could you please give a look and respond.
I would like to test for enrichment of skeletal muscle TFBS in my differentially methylated regions (DMRs).
I have a Granges object of my DMRs.
I have a Granges object of all of my tested CpGs (is this correct for the userUniverse?)
And for the regionDB I have come up with two options:
Please advise if I am using this package correctly and if the results look normal.
Also I am concerned that using this method doesn't account for the multiple probe and gene bias issues in the EPIC array (see https://www.biorxiv.org/content/10.1101/2020.08.24.265702v1.abstract). But that is a whole other thing.
Lastly, what is the code to analyze for enrichment of all encode TFBS? I can't seem to figure out the input.
Thanks in advance!
Would it be possible to use LOLA to test for enrichment of significant individual CpGs identified in DNA methylation study? we can make a .bed file for each sig CpG, would that be ok to use as input for LOLA? Thanks!
I created a custom LOLA database using the instructions in the README.md file.
The database loads fine and is cached without problems.
However, when I run runLOLA()
I get this warning message:
2: In runLOLA(userSets, userUniverse, lolaCustom, cores = detectCores()) :
Negative c entry in table. Bug with userSetsLength; this should not happen.
and indeed quite a few of my c values are negative.
The data.table returned only has the following columns: userSet, dbSet, support, b,c and d.
How can I fix this?
The same userSets and userUniverse work perfectly fine with LOLACore and LOLAExt.
This is how I create my userUniverse:
sheffield_dnase <- unlist(lolaCore$regionGRL[which(lolaCore$regionAnno$collection == "sheffield_dnase")])
userUniverse <- sort(disjoin(c(sheffield_dnase, unlist(userSets))))
This is my userSetsLength:
first second third fourth fifth sixth seventh
518 1 483 3 280 4 245
eighth ninth tenth eleventh
51 1450 60 812
Thanks for your help!
Enrico
gettingStarted.Rmd
regionDB = loadRegionDB(dbPath)
Found collections: ucsc_example
In 'ucsc_example', found index file:/home/fhalbritter/R/x86_64-unknown-linux-gnu-library/3.1/LOLA/extdata/hg19/ucsc_example/index.txt
Found collections: ucsc_example
In 'ucsc_example', no collection file.
ucsc_example
::Loading cache:: /home/fhalbritter/R/x86_64-unknown-linux-gnu-library/3.1/LOLA/extdata/hg19/ucsc_example/ucsc_example.RData
Hi there,
I'm really appreciate your brilliant tool for enrichment analysis.
But when I try to create my own reference dataset, I got the warnings:
Warning messages:
1: In dir.create(cacheDir, recursive = TRUE) :
'D:\APP\R-4.0.3\library\LOLA\mydata\danrer11\my_dataset' already exists
2: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrUn_KN149687v1, chrUn_KN149729v1, chrUn_KN149738v1, chrUn_KN149739v1, chrUn_KN149743v1, chrUn_KN149745v1, chrUn_KN149779v1, chrUn_KN149794v1, chrUn_KN149813v1, chrUn_KN149822v1, chrUn_KN149857v1, chrUn_KN149915v1, chrUn_KN149926v1, chrUn_KN149950v1, chrUn_KN149955v1, chrUn_KN149967v1, chrUn_KN149997v1, chrUn_KN150000v1, chrUn_KN150002v1, chrUn_KN150037v1, chrUn_KN150039v1, chrUn_KN150070v1, chrUn_KN150086v1, chrUn_KN150100v1, chrUn_KN150110v1, chrUn_KN150112v1, chrUn_KN150152v1, chrUn_KN150165v1, chrUn_KN150185v1, chrUn_KN150218v1, chrUn_KN150254v1, chrUn_KN150255v2, chrUn_KN150286v1, chrUn_KN150289v1, chrUn_KN150330v1, chrUn_KN150352v2, chrUn_KN150354v1, chrUn_KN150362v1, chrUn_KN150363v1, chrUn_KN150380v1, chrUn_KN150383v1, chrUn_KN150421v1, chrUn_KN150430v1, chrUn_KN150481v1, chrUn_KN150485v1, chrUn_KN150504v1, chrUn_KN150523v1, chrUn_KN150544v1, chrUn_KN150550v1, chrUn_KN150562v1, chrUn_KN150575v1, c [... truncated]
3: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrUn_KN149687v1, chrUn_KN149738v1, chrUn_KN149745v1, chrUn_KN149747v1, chrUn_KN149811v1, chrUn_KN149841v1, chrUn_KN149857v1, chrUn_KN149861v1, chrUn_KN149874v1, chrUn_KN149915v1, chrUn_KN149916v1, chrUn_KN149926v1, chrUn_KN149950v1, chrUn_KN150008v1, chrUn_KN150037v1, chrUn_KN150039v1, chrUn_KN150065v1, chrUn_KN150086v1, chrUn_KN150100v1, chrUn_KN150136v1, chrUn_KN150141v1, chrUn_KN150152v1, chrUn_KN150213v1, chrUn_KN150218v1, chrUn_KN150228v1, chrUn_KN150324v1, chrUn_KN150352v2, chrUn_KN150354v1, chrUn_KN150380v1, chrUn_KN150411v1, chrUn_KN150421v1, chrUn_KN150430v1, chrUn_KN150439v1, chrUn_KN150481v1, chrUn_KN150485v1, chrUn_KN150504v1, chrUn_KN150544v1, chrUn_KN150623v1, chrUn_KZ115963v1, chrUn_KZ115976v1, chrUn_KZ115993v1, chrUn_KZ116046v1, chrUn_KZ116047v1, chrUn_KZ116053v1, chrUn_KN148869v2, chrUn_KN149722v1, chrUn_KN149730v1, chrUn_KN149755v1, chrUn_KN149762v1, chrUn_KN149765v1, chrUn_KN149768v1, c [... truncated]
4: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrUn_KN149687v1, chrUn_KN149729v1, chrUn_KN149741v1, chrUn_KN149745v1, chrUn_KN149747v1, chrUn_KN149760v1, chrUn_KN149803v1, chrUn_KN149810v1, chrUn_KN149811v1, chrUn_KN149841v1, chrUn_KN149855v1, chrUn_KN149857v1, chrUn_KN149861v1, chrUn_KN149874v1, chrUn_KN149915v1, chrUn_KN149916v1, chrUn_KN149926v1, chrUn_KN149950v1, chrUn_KN149997v1, chrUn_KN150008v1, chrUn_KN150037v1, chrUn_KN150039v1, chrUn_KN150065v1, chrUn_KN150079v1, chrUn_KN150086v1, chrUn_KN150112v1, chrUn_KN150185v1, chrUn_KN150213v1, chrUn_KN150228v1, chrUn_KN150342v1, chrUn_KN150351v1, chrUn_KN150352v2, chrUn_KN150354v1, chrUn_KN150366v1, chrUn_KN150375v1, chrUn_KN150399v1, chrUn_KN150411v1, chrUn_KN150481v1, chrUn_KN150485v1, chrUn_KN150504v1, chrUn_KN150544v1, chrUn_KN150575v1, chrUn_KN150602v1, chrUn_KN150614v1, chrUn_KN150623v1, chrUn_KN150647v1, chrUn_KZ115963v1, chrUn_KZ115982v1, chrUn_KZ116008v1, chrUn_KZ116024v1, chrUn_KZ116046v1, c [... truncated]
5: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrUn_KN147632v2, chrUn_KN147636v1, chrUn_KN147652v2, chrUn_KN149683v1, chrUn_KN149685v1, chrUn_KN149687v1, chrUn_KN149688v2, chrUn_KN149689v2, chrUn_KN149691v1, chrUn_KN149694v1, chrUn_KN149698v1, chrUn_KN149702v1, chrUn_KN149705v1, chrUn_KN149716v1, chrUn_KN149717v1, chrUn_KN149719v1, chrUn_KN149723v1, chrUn_KN149725v1, chrUn_KN149741v1, chrUn_KN149743v1, chrUn_KN149745v1, chrUn_KN149747v1, chrUn_KN149749v1, chrUn_KN149760v1, chrUn_KN149778v1, chrUn_KN149794v1, chrUn_KN149795v1, chrUn_KN149803v1, chrUn_KN149805v1, chrUn_KN149810v1, chrUn_KN149811v1, chrUn_KN149812v1, chrUn_KN149813v1, chrUn_KN149841v1, chrUn_KN149852v1, chrUn_KN149856v1, chrUn_KN149861v1, chrUn_KN149884v1, chrUn_KN149893v1, chrUn_KN149904v1, chrUn_KN149912v1, chrUn_KN149915v1, chrUn_KN149939v1, chrUn_KN149946v1, chrUn_KN149949v1, chrUn_KN149950v1, chrUn_KN149967v1, chrUn_KN149968v1, chrUn_KN149973v1, chrUn_KN149998v1, chrUn_KN150000v1, c [... truncated]
I have eight reference datasets under the regions/
directory, all download from UCSC in bed format. It seems like these bed files doesn't have the same chrmosome numbers? But the eight files should be in different. I have no idea for these warnings.
Thank you so much for your answer!
Best
Ariel
Hi, is it possible to combine LOLA Core and Extended in a single run? If so, what would be the best way to do so?
Thank you,
Rajat
LOLA should suggest, but not depend on simpleCache. If it exists, it may be used to speed up database loading, but it's also possible to just load the database manually each time, if you have a small DB, and so simpleCache would not be required.
Hi,
I wanted to get my overlap regions by using the extractEnrichmentOverlaps function, but it allways gives me the Error in .subset(x, j) : invalid subscript type 'S4'
Obviously I have a wrong datatype as input, but I can´t find out what datatype I need instead of S4. Also my colleagues used the exact same data for runLOLA and that worked. Had anyone the same problem or could help me?
Best regards
Thomas
Instead of collections, loading a LOLA PEP should allow a sample attribute value filter.
it's more flexible because it can filter out files from multiple collections.
Suggestion: move description column further to the left, since this is sometimes the only information about the comparison.
Hi,
Does anyone know why this error appears when I run the example code in LOLA website:
library("LOLA")
dbPath = system.file("extdata", "hg19", package="LOLA")
regionDB = loadRegionDB(dbPath)
data("sample_input", package="LOLA") # load userSets
data("sample_universe", package="LOLA") # load userUniverse
locResults = runLOLA(userSets, userUniverse, regionDB, cores=1)
Error in setnames(scoreTable, c("Var1", "Var2", "value"), c("userSet", :
Items of 'old' not found in column names: Var1,Var2
Thanks !
Hi,
I would very much like to use LOLA, but I can't get the LOLACore db to load into R. I've tried both the cache and the ordinary db, but they both stop loading with error "Error in .combine_DFrame_rows(x, objects, strict.colnames = FALSE) : the objects to combine must be DFrame objects or derivatives". The same ting happens if I try to load the example db in your gettingStared-vignette:
library("LOLA")
dbPath = system.file("extdata", "hg19", package="LOLA")
regionDB = loadRegionDB(dbPath)
Reading collection annotations:
ucsc_example: found collection annotation:C:/Users/HH/Documents/R/win-library/4.1/LOLA/extdata/hg19/ucsc_example/collection.txt
Reading region annotations...
::Loading cache:: C:/Users/HH/Documents/R/win-library/4.1/LOLA/extdata/hg19/ucsc_example//ucsc_example_files.RData
ucsc_example
::Creating cache:: C:/Users/HH/Documents/R/win-library/4.1/LOLA/extdata/hg19/ucsc_example/ucsc_example.RData
Reading 5 files...
1: C:/Users/HH/Documents/R/win-library/4.1/LOLA/extdata/hg19/ucsc_example/regions/cpgIslandExt.bed
1 ERR:C:/Users/HH/Documents/R/win-library/4.1/LOLA/extdata/hg19/ucsc_example/regions/cpgIslandExt.bed
Error in .combine_DFrame_rows(x, objects, strict.colnames = FALSE) :
the objects to combine must be DFrame objects or derivatives
In addition: Warning message:
In dir.create(cacheDir, recursive = TRUE) :
'C:\Users\HH\Documents\R\win-library\4.1\LOLA\extdata\hg19\ucsc_example' already exists
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=Norwegian Bokmål_Norway.1252 LC_CTYPE=Norwegian Bokmål_Norway.1252 LC_MONETARY=Norwegian Bokmål_Norway.1252 LC_NUMERIC=C
[5] LC_TIME=Norwegian Bokmål_Norway.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] LOLA_1.19.1
loaded via a namespace (and not attached):
[1] xfun_0.30 remotes_2.4.2 reshape2_1.4.4 purrr_1.0.1 vctrs_0.5.2 miniUI_0.1.1.1 usethis_2.1.6
[8] htmltools_0.5.2 stats4_4.1.2 yaml_2.2.1 rlang_1.0.6 pkgbuild_1.3.1 urlchecker_1.0.1 later_1.2.0
[15] glue_1.6.2 withr_2.5.0 BiocGenerics_0.40.0 sessioninfo_1.2.2 plyr_1.8.6 GenomeInfoDbData_1.2.7 lifecycle_1.0.3
[22] zlibbioc_1.36.0 stringr_1.5.0 devtools_2.4.4 htmlwidgets_1.5.4 memoise_2.0.1 evaluate_0.16 knitr_1.40
[29] callr_3.7.2 IRanges_2.28.0 fastmap_1.1.0 httpuv_1.6.1 ps_1.7.1 GenomeInfoDb_1.30.1 curl_4.3.2
[36] simpleCache_0.4.2 Rcpp_1.0.7 xtable_1.8-4 promises_1.2.0.1 cachem_1.0.6 writexl_1.4.2 S4Vectors_0.32.3
[43] pkgload_1.3.0 XVector_0.30.0 mime_0.10 fs_1.5.2 digest_0.6.27 stringi_1.5.3 processx_3.7.0
[50] shiny_1.7.2 GenomicRanges_1.42.0 cli_3.4.1 tools_4.1.2 bitops_1.0-7 magrittr_2.0.1 RCurl_1.98-1.3
[57] profvis_0.3.7 crayon_1.5.2 ellipsis_0.3.2 data.table_1.14.0 prettyunits_1.1.1 rmarkdown_2.16 rstudioapi_0.14
[64] R6_2.5.1 compiler_4.1.2
Do you have any suggestions as to how this problem can be solved?.
For the record, I am using Windows.
Kind regards,
Hanne
The current input format for readBed requires that the strand be +, -, or *. The standard bed format uses . instead of * when the strand is unknown (https://genome.ucsc.edu/FAQ/FAQformat.html#format1). It would be great if readBed could be modified so that unknown strands can be . or *. Thank you!
As of now, the runLOLA() results table displays columns defined in the index.txt file (e.g tissue, antibody). If the PEP sample annotation sheet doesn't contain those fields, the overlaps and fisher scores can be calculated but the function will crash when writing the results table.
Perhaps we could make this aspect of runLOLA() more flexible so that it accepts any fields the user provides in the PEP sample_annotation.csv file.
Lines 180 to 185 in 6d5111a
Hi there
I noticed that a handful of the entries for encodeTFBSmm10 have NA in the cellType and antibody annotations. Fortunately, this data seems to be available here, and I've enclosed a code snippet that adds the missing entries to lola.db$regionAnno
to make things easier for you. (Right now I'm using the below code as a workaround for myself.)
Thanks for making and maintaining a very useful tool!
lola.db <- loadRegionDB('LOLA/LOLACore/mm10') #change to wherever LOLACore is downloaded
# Fix missing cell/antibody entries in LOLA
encode.meta <- read.table('https://raw.githubusercontent.com/theaidenlab/juicebox/master/src/juicebox/encode/encode.mm9.txt',
sep='\t',
header=TRUE)
encode.meta$path <- sub('.*\\/', '', encode.meta$path)
encode.meta$path <- sub('.gz$', '', encode.meta$path)
rownames(encode.meta) <- encode.meta$path
lola.missing <- which(lola.db$regionAnno$collection=='encodeTFBSmm10'&is.na(lola.db$regionAnno$cellType))
missing.files <- lola.db$regionAnno$filename[lola.missing]
lola.db$regionAnno[lola.missing, c('cellType', 'antibody')] <-
encode.meta[missing.files, c('cell', 'antibody')]
Dear Nathan,
I started using LOLA a few days ago and am coming across an issue for which I hope you can help. When I read my database with loadRegionDB containing bed files, the conversion of 0-based bed files to 1-based regions in LOLA doesn't quite work. Hence I am getting a few overlaps between my userSets and the database that aren't correct.
For example, my bed file contains the following lines:
scaffold_19 23641645 23642149
scaffold_13 24500587 24501311
scaffold_0 23378037 23378626
After loading the database, this looks like:
seqnames ranges strand
1 scaffold_19 [23641645, 23642149] *
2 scaffold_13 [24500587, 24501311] *
3 scaffold_0 [23378037, 23378626] *
... when actually the left coordinate should be +1.
Reading my userSet and userUniverse with makeGRangesFromDataFrame(df=userSet, starts.in.df.are.0based=TRUE) works.
I am loading the latest LOLA version that I just downloaded from github, since I saw you were working on this issue 20 days ago. Can you comment? Am I doing anything wrong?
Thank you in advance!
Christoph made a suggestion that LOLA could understand a zipped archive of bed files, in addition to the current custom database parsing.
Currently, refreshing the size file causes an error if you don't have write access to the directory containing the collection.
Thanks
I got different results from LOLAweb interactive server and LOLA installed from bioconductor with the same input files( region set, universe, and core database). What's the difference between web-based LOLA and LOLA R package?
Hi Nathan,
I installed LOLA 1.4.0 in R 3.3.2 and Bioconductor 3.4 by biocLite("LOLA")
. It seems that readBed()
and loadRegionDB()
would create GRanges without adding 1 to the left sites in the bed files, leading to not so right 1-based GRanges which would latter give out not so right results.
library("LOLA")
bed <- system.file("extdata", "hg19", "ucsc_example", "regions", "cpgIslandExt.bed", package="LOLA")
cat(readLines(bed, n = 5), sep = "\n")
## chr1 28735 29810
## chr1 135124 135563
## chr1 327790 328229
## chr1 437151 438164
## chr1 449273 450544
readBed(bed)
## GRanges object with 28691 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## 1 chr1 [ 28735, 29810] *
## 2 chr1 [135124, 135563] *
## 3 chr1 [327790, 328229] *
## 4 chr1 [437151, 438164] *
## 5 chr1 [449273, 450544] *
## ... ... ... ...
## 28687 chrY [27610115, 27611088] *
## 28688 chrY [28555535, 28555932] *
## 28689 chrY [28773315, 28773544] *
## 28690 chrY [59213794, 59214183] *
## 28691 chrY [59349266, 59349574] *
## -------
## seqinfo: 69 sequences from an unspecified genome; no seqlengths
library("rtracklayer")
import(bed)
## GRanges object with 28691 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 [ 28736, 29810] *
## [2] chr1 [135125, 135563] *
## [3] chr1 [327791, 328229] *
## [4] chr1 [437152, 438164] *
## [5] chr1 [449274, 450544] *
## ... ... ... ...
## [28687] chrY [27610116, 27611088] *
## [28688] chrY [28555536, 28555932] *
## [28689] chrY [28773316, 28773544] *
## [28690] chrY [59213795, 59214183] *
## [28691] chrY [59349267, 59349574] *
## -------
## seqinfo: 69 sequences from an unspecified genome; no seqlengths
I find that runLOLA()
invokes countOverlaps()
. However, countOverlaps()
is based on 1-based closed interval. So readBed()
would in fact add one more base on the left side of all the regions. Even it would have small impact on the result, it maybe better to be fixed. :)
When I replaced the background data set in the R package
I run LOLA and found an error
Negative b entry in table. This means either: 1) Your user sets contain items outside your universe; or 2) your universe has a region that overlaps multiple user set regions, interfering with the universe set overlap calculation.
But I used the background data set of your example, but no error was reported.
Hi,
I think I read somewhere that hg38 is available but when I download your package and run dbPath = system.file("extdata", "hg38", package="LOLA") gives me an empty variable when I checked the extdata directory its also not there. Can you add it to your package?
Hi there,
Thanks for the useful package, it makes enrichment calculations a breeze.
I am currently trying to find whether two region sets are differentially enriched in certain features. Could you upload an example on how to compare enrichment for testing two region sets for differential enrichment against a reference database?
From the publication:
Furthermore, the buildRestrictedUniverse() function automatically builds a universe based on query sets and can be used to test two region sets for differential enrichment against a reference database.
The only example I find in documentation is from ?redefineUserSets:
dbPath = system.file("extdata", "hg19", package="LOLA")
regionDB = loadRegionDB(dbLocation=dbPath)
data("sample_universe", package="LOLA")
data("sample_input", package="LOLA")
getRegionSet(regionDB, collections="ucsc_example", filenames="vistaEnhancers.bed")
getRegionSet(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed")
getRegionFile(dbPath, collections="ucsc_example", filenames="vistaEnhancers.bed")
res = runLOLA(userSets, userUniverse, regionDB, cores=1)
locResult = res[2,]
extractEnrichmentOverlaps(locResult, userSets, regionDB)
writeCombinedEnrichment(locResult, "temp_outfolder")
userSetsRedefined = redefineUserSets(userSets, userUniverse)
resRedefined = runLOLA(userSetsRedefined, userUniverse, regionDB, cores=1)
g = plotTopLOLAEnrichments(resRedefined)
However, in this example, despite userSets containing setA and setB, I cant't find any statistics showing whether two sets are differentially enriched in certain features. What is suggested way to approach this? Thanks!
If feasible, please provide a basic command line interface for LOLA so that the tool can be integrated into computational pipelines w/o the need to write wrapper scripts.
loadRegionDB() silently fails to load if you give an incorrectly formatted folder as a regionDB. Instead, it show give a meaningful error.
I think you should re-think the names of some of the output columns:
Hello, I think I found a bug in the readBed function. I used it to read a .bed file with differentially methylated CpG sites. LOLA complained that the ranges are not disjoined. This was strange as each CpG range is 2 nucleotides long and they can't possibly be overlapping. I inspected the GRanges object and saw that the ranges are 3 nucleotides long. There were a few instances in which there were neighbouring CpGs and their ranges overlapped, e.g. 34567-34569 and 34569-34571, matching the ranges in the .bed file. As GRanges coordinates are 1-based, the coordinates should have been transformed to 34568-34569 and 34570-34571. I found a workaround for my problem by creating and using a non-standard .bed file with 1-based coordinates but I think the readBed function should be made to take into account the difference between coordinate systems and transform coordinates correctly.
I was thinking it would be nice to have the option to choose which type of fisher's test to use for the enrichment, unless there is some reason why you would only check for positive enrichment?
For my work, it would be informative to know not only which regions are enriched but also which are depleted.
It is a simple change in the code to allow an option for the fisher test. For example, instead of hard-coding alternative="greater"
you could include a variable fisher=c("greater","less", "two.sided")
Let me know what you think. Thanks.
Hello,
Thanks for LOLA! It looks like a great resource.
I get this error when trying to load the latest LOLAExt cached database:
> regionDB2 <- loadRegionDB("LOLAExt/hg19")
Reading collection annotations:
Error in readCollectionAnnotation(dbLocation, collections) :
No regions found. Are you sure '../../data/DNase-seq/12LOLA/LOLAExt/hg19' is a region database?
In addition: Warning messages:
1: In readCollectionAnnotation(dbLocation, collections) :
jaspar_motifs has no collection annotation or region files. Skipping...
2: In readCollectionAnnotation(dbLocation, collections) :
roadmap_epigenomics has no collection annotation or region files. Skipping...
LOLACore loads fine.
Thanks,
Enrico
Are you accepting feature requests?
If yes, I think it'd be great to have collections of histone marks from ENCODE and Roadmap Epigenomics, similar to encode_tfbs, cistrome_cistrome or cistrome_epigenome.
Thanks!
Is there a simple programmatic way to merge two regionsets from regionDB into a new comboned set within regionDB?
This workaround I can think of seems complicated:
For example, if I want enrichment for regions that are DNAse Weak OR UCSC CpG islands, would have to
extract the two sets from redionDM (or bed files) and create a new bed file, etc.
Seems like there should be a simpler way?
Hi,
I have a question. fisher.test returns the odds ratio ( estimate ). I could not find in the code that the odds ratio is been converted to logOddsRatio. Does the runLOLA outputs odds ratio or logOddsRatio ?
No problem with the path.
regionDB = loadRegionDB(dbLocation=dbPath)
Reading collection annotations:
ucsc_example: found collection annotation:C:/Program Files/R/R-3.6.0/library/LOLA/extdata/hg19/ucsc_example/collection.txt
Reading region annotations...
C:/Program Files/R/R-3.6.0/library/LOLA/extdata/hg19//ucsc_example/regions
In 'ucsc_example', found index file:C:/Program Files/R/R-3.6.0/library/LOLA/extdata/hg19/ucsc_example/index.txt
Collection: ucsc_example. Creating size file...
Error in system(paste("wc -l ", filename), intern = TRUE) :
'wc' not found
In addition: Warning message:
In readRegionSetAnnotation(dbLocation, collections) :
You don't have simpleCache installed, so you won't be able to cache the
regionDB after reading it in. Install simpleCache to speed up later database
loading.
If the GRangesList that is given to runLOLA as part of the regionSetDB has names for each region set, an error results. This can be fixed by setting the names to NULL to trigger the code chunk referenced below. Suggested fix: make the conditional code chunk no longer conditional.
Line 60 in 8262ff5
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.