greenleaflab / archr Goto Github PK

View Code? Open in Web Editor NEW

365.0 365.0 129.0 148.26 MB

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)

License: MIT License

R 98.38% C++ 1.62%

archr's People

Stargazers

Watchers

Forkers

aselewa scfurl hisplan kant liyuanwang89 jajcobyang mitsiask fanc-wu zefeng-wu hchetia willey2020 xiangrong7 hfldai yjchen1201 chaoscicada aifimmunology mengchengyao angleyuan fstrueb weng-lab alexbarrera 0cbh0 anamariaelek hzaurzli pinlyu3 derpylz standardgalactic hengbingao genomicsnx lijiacd985 weiiioyo kw572 bit-vs-it markphillippebworth ryanyip-kat shulp2211 fumi-github cheyujlee lu-david boberrey kerwin12580 naughtoncolin cnk113 settylab biobenkj woshiyangsi xuzhougeng busayob hukai916 zrcjessica caiyingzhu waltno lorinairs y-gao alleninstitute alkrup shuttleloop rcorces murphchuyi anastasiya-pendragon wshands crsky1023 qshao wuv21 shouzhangsun anjalic4 andyyhchen mjz1 badoi simrit1 jessedgibson suger0917 khain2650 goultard59 jacob-greene anatoly108 paupaiz zji90 jblich870 jacobog02 regnerm2015 pengweixing zhao-yy howtofindme luca825 solvi808 amaiatintori tzhu-bio laurenhsu1 nigiord lornawessels xiaoxiaoh16 hvanphucs shalberg aaronwolf1995 songlingzhang sky970415 nhu-mskcc joshchiou neurogenomics

archr's Issues

plotEmbedding should report error if supplied names dont exist in the matrix features

Currently, plotEmbedding() fails but provides no error if you pass arguments to names that dont exist as features in useMatrix

getProjectSummary bug?

This function is supposed to output a text summary if returnSummary is set to FALSE. I'm not sure the code does that given that this else statement just has return(0)? Or maybe I'm reading that wrong.

ArchR/R/ArchRProjectMethods.R

Line 519 in c78a264

getBdgPeaks is both exported and hidden - two copies

in MatrixDeviations.R

ArchR/R/MatrixDeviations.R

Line 485 in 6465c3d

getBdgPeaks <- function(

and

ArchR/R/MatrixDeviations.R

Line 531 in 6465c3d

.getBdgPeaks <- function(

duplicate quantileCut functions

quantileCut function is present both in HelperUtils.R and in Trajectory.R

Support more psuedobulk-based analyses

Example situation:
You have cluster calls and you want to compare the difference between Cluster1-SampleA and Cluster1-SampleB. This is possible with markerFeatures. But say you want to do this across a much larger comparison and see how similar Samples A, B, C, D, and E are within Cluster1. This is much harder to do.

One solution:
Make it possible to get a pseudobulk count matrix.

Continuing through after error ggplot for TSS by Frags

Hi,

Congrats and thank you for developing and sharing ArchR!

I'm running into an error very quickly using your tutorial dataset at createArrowFiles. copy/pasting lines in R studio as I run createArrowFiles:
Using GeneAnnotation set by addArchRGenome(Hg19)!
Using GeneAnnotation set by addArchRGenome(Hg19)!
ArchR logging to : ArchRLogs/ArchR-createArrows-38764dcec8f-Date-2020-04-29_Time-19-45-15.log
If there is an issue, please report to github with logFile!
2020-04-29 19:45:15 : Batch Execution w/ safelapply!, 0 mins elapsed.
2020-04-29 19:51:01 : createArrowFiles has encountered an error, checking if any ArrowFiles completed..
ArchR logging successful to : ArchRLogs/ArchR-createArrows-38764dcec8f-Date-2020-04-29_Time-19-45-15.log
Warning message:
In mclapply(..., mc.cores = threads, mc.preschedule = preschedule) :
3 function calls resulted in an error

ArrowFiles
character(0)

And this is with the latest install:

devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories())
Skipping install of 'ArchR' from a github remote, the SHA1 (c323e3c) has not changed since last install.
Use force = TRUE to force installation

When I look thru the logs, the error seems to be at the ggplot step for the TSS by frag. The frag size distribution plot comes out for all three datasets. Any help would be appreciated - thanks!!

ArchR-createArrows-38764dcec8f-Date-2020-04-29_Time-19-45-15.log

Duplicate markerHeatmap() and .ArchRHeatmap() functions

These functions and a couple others below them are present in both the MarkerFeatures.R and MarkerHeatmap.R. I will annotate both copies so just delete whichever you want.

No fragments found

Hi,

I used my own 10X inputs to ArchR and somehow I couldn't create the arrow file.

I also tried to extract the 10x barcodes with getValidBarcodes() and feed it into the createArrowFiles() function. It did not work. Also tried to add 'chr' string to each line in the fragments file. It did not work either.

This .log file is the attempt with 10x fragments file as it is.

Thank you.

ArchR-createArrows-4456fb7160a-Date-2020-05-02_Time-04-33-16.log

bdg vs bgd

Change all instances of "bdg" to "bgd" typically with "bdgPeaks" or "bdgPeaks"

Remove support for SEs in footprinting code

I think we should remove support for footprinting summarized experiments in the input to plotFootprints. Its added confusion and not really applicable to ArchR

ArchR/R/Footprinting.R

Line 5 in c78a264

#' @param input An ArchRProject object or Footprint Summarized Experiment

Create more ways to pull data out of ArchR

For ex. for pseudobulks, we could support exporting fragment files based on a given cellColData column. The other option is to solve #46 in a more comprehensive way.

easiest way to load group coverages into public browser?

I usually use the washu browser which doesn't have an option for h5 files. Also, the output folders have replicates - ideally I would like to merge these and be able to load whatever tracks are shown in the ArchR browser (which look great btw) into a public browser.

Thanks!

Ridges plots have different length y axis for top-most sample

... additional args should probably be removed where not applicable

I think that if additional arguments are not handled by a function, then the ... additional arguments option should be excluded. It makes the documentation confusing. There are of course some times where the additional arguments are needed (clustering with Seurat / louvain for ex.).

incorrect function export

I think there are some functions that are incorrectly being exported but I'm not sure.

If a function name is preceded with "." for example ".nullGeneAnnotation", it should not have a @export tag right?

See

ArchR/R/ArchRProjectMethods.R

Line 417 in c78a264

#' @export

And many others below it in ArchRProjectMethods.R
This may exist elsewhere, I'll try to annotate as i find them

param documentation for `scaleTo`

The utility and explanation for the scaleTo parameter needs to be improved across the board

Disconnect between addImputeWeights() and getImputeWeights()

From chapter 7.3 of the bookdown:

proj2 <- addImputeWeights(proj2)

2020-03-19 12:53:52 : Computing Impute Weights Using Magic (Cell 2018), 0 mins elapsed..
2020-03-19 12:54:13 : Completed Getting Magic Weights!, 0.348 mins elapsed..

p <- plotEmbedding(
  ArchRProj = proj2, 
  colorBy = "GeneScoreMatrix", 
  name = markerGenes, 
  embedding = "UMAP",
  imputeWeights = getImputeWeights(proj2)
)

Getting Matrix Values...

Error in imputeMatrix(mat = as.matrix(colorMat), imputeWeights = proj@imputeWeights) : 
  trying to get slot "imputeWeights" from an object of a basic class ("function") with no slots

Sounds to me like getImputeWeights is just not looking at the right level of the ArchRProject object, since I see the imputation weight files listed:
proj2@imputeWeights$Weights@listData

$w1
[1] "/Users/dannyconrad/ArchR_Walkthrough/ArchR/ImputeWeights/Impute-Weights-Rep-1"

$w2
[1] "/Users/dannyconrad/ArchR_Walkthrough/ArchR/ImputeWeights/Impute-Weights-Rep-2"

Imputation.R missing parameters

The functions in Imputation.R are missing param descriptions and I'm not familiar with the code or functionality.

Ridges plots show samples in reverse order

Footprint plot axes

Update the footprint plot axes to be more descriptive. something like this:

Also note that the x-axis says "BP" instead of "bp"

Using outDir with createArrowFiles causes issues downstream

I havent fully diagnosed this one but I believe if you use the outDir param with createArrowFiles() it causes issues with downstream functions not being able to find files in the expected location.
For example, after running:

ArrowFiles <- createArrowFiles(
  inputFiles <- inputFiles,
  outDir = "/oak/stanford/groups/howchang/users/mcorces/scATAC_TCGA/analysis/ENCODE_Lung/ArrowFiles/",
  sampleNames = names(inputFiles),
  filterTSS = 4,
  filterFrags = 1000,
  addTileMat = TRUE,
  addGeneScoreMat = TRUE
)

I get this error when adding doublet scores:

> doubScores <- addDoubletScores(
+   input = ArrowFiles,
+   k = 10, #Refers to how many cells near a "pseudo-doublet" to count.
+   knnMethod = "UMAP", #Refers to embedding to use for nearest neighbor search with doublet projection.
+   LSIMethod = 1
+ )
2020-04-09 09:47:55 : Batch Execution w/ safelapply!, 0 mins elapsed.
###########
2020-04-09 09:47:55 : Computing Doublet Scores W62_LNGL_B_8819_X020_S03_B1_T1 (1 of 7)!, 0 mins elapsed.
###########
Checking Inputs...
2020-04-09 09:47:57 : Computing Total Accessibility Across All Features, 0.001 mins elapsed.
2020-04-09 09:48:02 : Computing Top Features, 0.086 mins elapsed.
###########
2020-04-09 09:48:03 : Running LSI (1 of 2) on Top Features, 0.095 mins elapsed.
###########
2020-04-09 09:48:03 : Creating Partial Matrix, 0.095 mins elapsed.
2020-04-09 09:48:15 : Computing LSI, 0.303 mins elapsed.
2020-04-09 09:48:23 : Identifying Clusters, 0.431 mins elapsed.
2020-04-09 09:48:35 : Identified 2 Clusters, 0.636 mins elapsed.
2020-04-09 09:48:35 : Creating Cluster Matrix on the total Group Features, 0.636 mins elapsed.
2020-04-09 09:48:46 : Computing Variable Features, 0.82 mins elapsed.
###########
2020-04-09 09:48:47 : Running LSI (2 of 2) on Variable Features, 0.822 mins elapsed.
###########
2020-04-09 09:48:47 : Creating Partial Matrix, 0.822 mins elapsed.
2020-04-09 09:48:57 : Computing LSI, 1 mins elapsed.
2020-04-09 09:49:04 : Finished Running IterativeLSI, 1.114 mins elapsed.
###########
2020-04-09 09:49:04 : Constructing Partial Matrix for Projection, 1.159 mins elapsed.
###########
###########
2020-04-09 09:49:14 : Running LSI UMAP, 1.323 mins elapsed.
###########
###########
2020-04-09 09:49:26 : Simulating and Projecting Doublets, 1.518 mins elapsed.
###########
UMAP Projection R^2 = 0.94171
Error in gzfile(file, mode) : cannot open the connection
In addition: Warning message:
In gzfile(file, mode) :
  cannot open compressed file 'QualityControl/W62_LNGL_B_8819_X020_S03_B1_T1/W62_LNGL_B_8819_X020_S03_B1_T1-Doublet-Summary.rds', probable reason 'No such file or directory'

This error does not happen when I run without outDir

As a separate issue, I also think the parameter outDir is confusing in createArrowFiles() because I assumed it was the directory where the Arrow files would be created but it is not.

Not able to read files to creat Arrov

Hi,

I'm trying to read the fragment files to create ArrowFiles, but unfortunately ArchR is not able to read it.
Here is my try


> inputFiles<-c("/Volumes/G_Drive/scAL/ATAC/S111/fragments.tsv.gz")
> 
> ArrowFiles <- createArrowFiles(
+   inputFiles = inputFiles,
+   sampleNames = c("111"),
+   filterTSS = 4, #Dont set this too high because you can always increase later
+   filterFrags = 1000, 
+   addTileMat = TRUE,
+   addGeneScoreMat = TRUE
+ )
Using GeneAnnotation set by addArchRGenome(Hg19)!
Using GeneAnnotation set by addArchRGenome(Hg19)!
ArchR logging to : ArchRLogs/ArchR-createArrows-9d3a110ba5be-Date-2020-04-30_Time-17-06-20.log
If there is an issue, please report to github with logFile!
Error in file(file, ifelse(append, "a", "w")) : 
  cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
  cannot open file 'ArchRLogs/ArchR-createArrows-9d3a110ba5be-Date-2020-04-30_Time-17-06-20.log': No such file or directory

Add support for user-supplied genomic region in ArchRBrowser

For a future release.
We should add support within the ArchRBrowser to navigate to a user-supplied genomic region.

motif marker heatmap

Is there a way to do this? I have tried the below and am getting errors, although it worked for GeneScoreMatrix and PeakMatrix

markerFeatures(proj, useMatrix = "MotifMatrix")
Error in (function (ArchRProj = NULL, groupBy = "Clusters", useGroups = NULL, : When accessing features from a matrix of class Sparse.Assays.Matrix it requires seqnames! Please specify 1 seqname in useSeqnames to continue! If confused, try getFeatures(ArchRProj, useMatrix) to list out available seqnames for input!

and when I try to pass in features from
getFeatures(proj, "MotifMatrix")

that also fails
Error in (function (ArchRProj = NULL, groupBy = "Clusters", useGroups = NULL, : Less than 1 feature is remaining in featureDF please check input!

I did see the motif enrichment in peaks heatmap, but it would also be great to directly do markers.

thanks!

Standardize fonts to Arial

In making pub-quality figures, some of the PDFs come with Helvetica and some with Helvetica Light. I havent looked up how to standardize this but those fonts will not be available to anyone using Windows or Linux. Using this as a reminder to look into font standardization

scRNA-seq and scATAC-seq integration

Great package and superb documentation.

Maybe I missed it in the documentation but is it possible to plot integrated scRNA-seq and scATAC-seq on the same UMAP?
As in the examples here https://satijalab.org/signac/articles/integration.html

If not, and again sorry if I missed it in the documentation, is it possible to export an arrow project to a SingleCellExperiment or Seurat class object?

Thanks

Docs: GSL dependency resolution on Linux systems

First: Thanks for putting out ArchR! I'm excited to test it out on our data! I hit a snag in installation that I thought other users might encounter. This was found on a Debian 9 system:

If this is an issue with documentation that is absent/missing:

Describe what material you feel should be explained
Some dependencies of ArchR require the Gnu Scientific Library (GSL) to be installed. In particular, I hit an installation roadblock with DirichletMultinomial failing to install. The errors weren't exactly clear for this case, so it may be good to provide a brief documentation section to help users along if they encounter this.

GSL can be installed with:

wget http://gnu.mirror.constant.com/gsl/gsl-2.6.tar.gz
tar -xzvf gsl-2.6.tar.gz
cd gsl-2.6
./configure
make
sudo make install

On some systems, the GSL library location will have to be provided to R using:

ld_path <- paste(Sys.getenv("LD_LIBRARY_PATH"), "/usr/local/lib/", sep = ";")
Sys.setenv(LD_LIBRARY_PATH = ld_path)

ArchR and its dependencies should then be able to be installed as normal with:

devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories())

Where do you think this documentation would belong?
This may belong in the "Known trouble spots for installation" section of the site at https://www.archrproject.com/ .

Cheers,
-Lucas

Error creating pseudoBulk replicates

Good day!

Thanks for developing such a friendly-user tool. Very well documented. I have successfully run the tools until I have to create pseudo bulk profiles. I am getting the next error after the message Batch Execution w/ safelapply!, 0.031 mins elapsed.:

Warning message in mclapply(..., mc.cores = threads, mc.preschedule = preschedule):
"3 function calls resulted in an error"
Error in (function (..., threads = 1, preschedule = FALSE) : 
Error Found Iteration 12 : 
	[1] "Error in H5Fcreate(file) : HDF5. File accessibilty. Unable to open file.\n"
	<simpleError in H5Fcreate(file): HDF5. File accessibilty. Unable to open file.>
Error Found Iteration 13 : 
	[1] "Error in H5Fcreate(file) : HDF5. File accessibilty. Unable to open file.\n"
	<simpleError in H5Fcreate(file): HDF5. File accessibilty. Unable to open file.>
Error Found Iteration 14 : 
	[1] "Error in H5Fcreate(file) : HDF5. File accessibilty. Unable to open file.\n"
	<simpleError in H5Fcreate(file): HDF5. File accessibilty. Unable to open file.>

Traceback:

1. addGroupCoverages(ArchRProj = projHeme2, groupBy = "Clusters")
2. .batchlapply(args)
3. do.call(.safelapply, args)
4. (function (..., threads = 1, preschedule = FALSE) 
 . {
 .     if (tolower(.Platform$OS.type) == "windows") {
 .         threads <- 1
 .     }
 .     if (threads > 1) {
 .         o <- mclapply(..., mc.cores = threads, mc.preschedule = preschedule)
 .         errorMsg <- list()
 .         for (i in seq_along(o)) {
 .             if (inherits(o[[i]], "try-error")) {
 .                 capOut <- utils::capture.output(o[[i]])
 .                 capOut <- capOut[!grepl("attr\\(\\,|try-error", 
 .                   capOut)]
 .                 capOut <- head(capOut, 10)
 .                 capOut <- unlist(lapply(capOut, function(x) substr(x, 
 .                   1, 250)))
 .                 capOut <- paste0("\t", capOut)
 .                 errorMsg[[length(errorMsg) + 1]] <- paste0(c(paste0("Error Found Iteration ", 
 .                   i, " : "), capOut), "\n")
 .             }
 .         }
 .         if (length(errorMsg) != 0) {
 .             errorMsg <- unlist(errorMsg)
 .             errorMsg <- head(errorMsg, 50)
 .             errorMsg[1] <- paste0("\n", errorMsg[1])
 .             stop(errorMsg)
 .         }
 .     }
 .     else {
 .         o <- lapply(...)
 .     }
 .     o
 . })(X = 1:33, FUN = function (i = NULL, cellGroups, kmerBias = NULL, 
 .     kmerLength = 6, genome = NULL, ArrowFiles = NULL, cellsInArrow = NULL, 
 .     availableChr = NULL, chromLengths = NULL, covDir = NULL, 
 .     tstart = NULL, subThreads = 1, verbose = TRUE, logFile = NULL) 
 . {
 .     prefix <- sprintf("Group (%s of %s) :", i, length(cellGroups))
 .     .logDiffTime(sprintf("%s Creating Group Coverage", prefix), 
 .         tstart, verbose = verbose, logFile = logFile)
 .     cellGroupi <- cellGroups[[i]]
 .     tableGroupi <- table(cellGroupi)
 .     covFile <- file.path(covDir, paste0(names(cellGroups)[i], 
 .         ".insertions.coverage.h5"))
 .     rmf <- .suppressAll(file.remove(covFile))
 .     o <- h5createFile(covFile)
 .     o <- h5createGroup(covFile, paste0("Coverage"))
 .     o <- h5createGroup(covFile, paste0("Metadata"))
 .     o <- h5write(obj = "ArrowCoverage", file = covFile, name = "Class")
 .     o <- h5createGroup(covFile, paste0("Coverage/Info"))
 .     o <- h5write(as.character(cellGroupi), covFile, "Coverage/Info/CellNames")
 .     nFragDump <- 0
 .     nCells <- c()
 .     for (k in seq_along(availableChr)) {
 .         .logDiffTime(sprintf("%s Processed Fragments Chr (%s of %s)", 
 .             prefix, k, length(availableChr)), tstart, verbose = FALSE, 
 .             logFile = logFile)
 .         it <- 0
 .         for (j in seq_along(ArrowFiles)) {
 .             cellsInI <- sum(cellsInArrow[[names(ArrowFiles)[j]]] %in% 
 .                 cellGroupi)
 .             if (cellsInI > 0) {
 .                 it <- it + 1
 .                 if (it == 1) {
 .                   fragik <- .getFragsFromArrow(ArrowFiles[j], 
 .                     chr = availableChr[k], out = "GRanges", cellNames = cellGroupi)
 .                 }
 .                 else {
 .                   fragik <- c(fragik, .getFragsFromArrow(ArrowFiles[j], 
 .                     chr = availableChr[k], out = "GRanges", cellNames = cellGroupi))
 .                 }
 .             }
 .         }
 .         matchRG <- as.vector(S4Vectors::match(mcols(fragik)$RG, 
 .             names(tableGroupi)))
 .         fragik <- rep(fragik, tableGroupi[matchRG])
 .         nCells <- c(nCells, unique(mcols(fragik)$RG))
 .         covk <- coverage(IRanges(start = c(start(fragik), end(fragik)), 
 .             width = 1), width = chromLengths[availableChr[k]])
 .         nFragDump <- nFragDump + length(fragik)
 .         rm(fragik)
 .         chrLengths <- paste0("Coverage/", availableChr[k], "/Lengths")
 .         chrValues <- paste0("Coverage/", availableChr[k], "/Values")
 .         lengthRle <- length(covk@lengths)
 .         o <- h5createGroup(covFile, paste0("Coverage/", availableChr[k]))
 .         o <- .suppressAll(h5createDataset(covFile, chrLengths, 
 .             storage.mode = "integer", dims = c(lengthRle, 1), 
 .             level = 0))
 .         o <- .suppressAll(h5createDataset(covFile, chrValues, 
 .             storage.mode = "integer", dims = c(lengthRle, 1), 
 .             level = 0))
 .         o <- h5write(obj = covk@lengths, file = covFile, name = chrLengths)
 .         o <- h5write(obj = covk@values, file = covFile, name = chrValues)
 .         gc()
 .     }
 .     if (length(unique(cellGroupi)) != length(unique(nCells))) {
 .         .logMessage(paste0("Not all cells (", length(unique(cellGroupi)), 
 .             ") were found for coverage creation (", length(unique(nCells)), 
 .             ")!"), logFile = logFile)
 .         stop("Not all cells (", length(unique(cellGroupi)), ") were found for coverage creation (", 
 .             length(unique(nCells)), ")!")
 .     }
 .     out <- list(covFile = covFile, nCells = length(cellGroupi), 
 .         nFragments = nFragDump)
 .     return(out)
 . }, cellGroups = new("SimpleCharacterList", elementType = "character", 
 .     elementMetadata = NULL, metadata = list(), listData = list(....... cellIDs....),
covDir = "/home/cruiz/10x_scATACseq/analysis/heme_project/archr/Save-ProjHeme2/GroupCoverages/Clusters", 
 .     threads = 20L, verbose = TRUE, tstart = structure(1588423926.40649, class = c("POSIXct", 
 .     "POSIXt")), logFile = "ArchRLogs/ArchR-addGroupCoverages-da1d623db3da-Date-2020-05-02_Time-14-52-06.log", 
 .     subThreads = 1)
5. stop(errorMsg)

This happens using either 10, 20 or 48 cores (I thought it could be a problem with the number of threads). This is the message after I changed the cluster names on my arch project. If I do not change the idents of the clusters, the error is different (and is the same one when the tutorial data):

Warning message in mclapply(..., mc.cores = threads, mc.preschedule = preschedule):
"21 function calls resulted in an error"
Error in .safelapply(seq_along(availableChr), function(x) {: 
Error Found Iteration 1 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 2 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 3 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 4 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 5 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 6 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 7 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 8 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 9 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 10 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 11 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 12 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 13 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 14 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 15 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 16 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 17 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"

Traceback:

1. addGroupCoverages(ArchRProj = projHeme2, groupBy = "Clusters")
2. .addKmerBiasToCoverage(coverageMetadata = coverageMetadata, genome = getGenome(ArchRProj), 
 .     kmerLength = kmerLength, threads = threads, verbose = FALSE, 
 .     logFile = logFile)
3. .safelapply(seq_along(availableChr), function(x) {
 .     .logMessage(sprintf("Kmer Bias %s (%s of %s)", availableChr[x], 
 .         x, length(availableChr)), logFile = logFile)
 .     message(availableChr[x], " ", appendLF = FALSE)
 .     chrBS <- BSgenome[[availableChr[x]]]
 .     exp <- Biostrings::oligonucleotideFrequency(chrBS, width = kmerLength)
 .     obsList <- lapply(seq_along(coverageFiles), function(y) {
 .         .logMessage(sprintf("Coverage File %s (%s of %s)", availableChr[x], 
 .             y, length(coverageFiles)), logFile = logFile)
 .         tryCatch({
 .             obsx <- .getCoverageInsertionSites(coverageFiles[y], 
 .                 availableChr[x]) %>% {
 .                 BSgenome::Views(chrBS, IRanges(start = . - floor(kmerLength/2), 
 .                   width = kmerLength))
 .             } %>% {
 .                 Biostrings::oligonucleotideFrequency(., width = kmerLength, 
 .                   simplify.as = "collapsed")
 .             }
 .             gc()
 .             obsx
 .         }, error = function(e) {
 .             errorList <- list(y = y, coverageFile = coverageFiles[y], 
 .                 chr = availableChr[x], iS = tryCatch({
 .                   .getCoverageInsertionSites(coverageFiles[y], 
 .                     availableChr[x])
 .                 }, error = function(e) {
 .                   "Error .getCoverageInsertionSites"
 .                 }))
 .             .logError(e, fn = ".addKmerBiasToCoverage", info = "", 
 .                 errorList = errorList, logFile = logFile)
 .         })
 .     }) %>% SimpleList
 .     names(obsList) <- names(coverageFiles)
 .     SimpleList(expected = exp, observed = obsList)
 . }, threads = threads) %>% SimpleList
4. eval(lhs, parent, parent)
5. eval(lhs, parent, parent)
6. .safelapply(seq_along(availableChr), function(x) {
 .     .logMessage(sprintf("Kmer Bias %s (%s of %s)", availableChr[x], 
 .         x, length(availableChr)), logFile = logFile)
 .     message(availableChr[x], " ", appendLF = FALSE)
 .     chrBS <- BSgenome[[availableChr[x]]]
 .     exp <- Biostrings::oligonucleotideFrequency(chrBS, width = kmerLength)
 .     obsList <- lapply(seq_along(coverageFiles), function(y) {
 .         .logMessage(sprintf("Coverage File %s (%s of %s)", availableChr[x], 
 .             y, length(coverageFiles)), logFile = logFile)
 .         tryCatch({
 .             obsx <- .getCoverageInsertionSites(coverageFiles[y], 
 .                 availableChr[x]) %>% {
 .                 BSgenome::Views(chrBS, IRanges(start = . - floor(kmerLength/2), 
 .                   width = kmerLength))
 .             } %>% {
 .                 Biostrings::oligonucleotideFrequency(., width = kmerLength, 
 .                   simplify.as = "collapsed")
 .             }
 .             gc()
 .             obsx
 .         }, error = function(e) {
 .             errorList <- list(y = y, coverageFile = coverageFiles[y], 
 .                 chr = availableChr[x], iS = tryCatch({
 .                   .getCoverageInsertionSites(coverageFiles[y], 
 .                     availableChr[x])
 .                 }, error = function(e) {
 .                   "Error .getCoverageInsertionSites"
 .                 }))
 .             .logError(e, fn = ".addKmerBiasToCoverage", info = "", 
 .                 errorList = errorList, logFile = logFile)
 .         })
 .     }) %>% SimpleList
 .     names(obsList) <- names(coverageFiles)
 .     SimpleList(expected = exp, observed = obsList)
 . }, threads = threads)
7. stop(errorMsg)

What do you think might be the problem? Thanks in advance for your help!

ArchR-addGroupCoverages-d819747731bc-Date-2020-05-02_Time-14-43-35.log
ArchR-addGroupCoverages-d93d104d33a2-Date-2020-05-02_Time-14-47-21.log
ArchR-addGroupCoverages-da1d623db3da-Date-2020-05-02_Time-14-52-06.log
ArchR_tutotial_addGroupCoverages-ec51247cb782-Date-2020-05-02_Time-18-25-47.log

14.1: Myeloid Trajectory - error with addTrajectory()

I'm trying to apply Myeloid Trajectory with my own data, and am running into an error with addTrajectory(), and I've tried it with different combinations of clusters for the trajectory.

trajectory <- c(paste0("Cluster", c(6, 7, 8, 9, 10))) # I changed these to experiment, some error
trajectory
# [1] "Cluster6"  "Cluster7"  "Cluster8"  "Cluster9"  "Cluster10"

#First we need to create a Trajectory and add it to ArchRProj cellColData
projAll <- addTrajectory(ArchRProj = projAll, name = "MyeloidU", trajectory = trajectory, embedding = "UMAP", force = TRUE)

I get the following error and traceback:

Error in smooth.spline(x = initialTime, y = matFilter[names(initialTime),  : 
  'tol' must be strictly positive and finite
> traceback()
8: stop("'tol' must be strictly positive and finite")
7: smooth.spline(x = initialTime, y = matFilter[names(initialTime), 
       x], df = dof, spar = spar)
6: FUN(X[[i]], ...)
5: lapply(seq_len(ncol(matFilter)), function(x) {
       smooth.spline(x = initialTime, y = matFilter[names(initialTime), 
           x], df = dof, spar = spar)[[2]]
   })
4: eval(lhs, parent, parent)
3: eval(lhs, parent, parent)
2: lapply(seq_len(ncol(matFilter)), function(x) {
       smooth.spline(x = initialTime, y = matFilter[names(initialTime), 
           x], df = dof, spar = spar)[[2]]
   }) %>% Reduce("cbind", .) %>% data.frame()
1: addTrajectory(ArchRProj = projAll, name = "MyeloidU", trajectory = trajectory, 
       embedding = "UMAP", force = TRUE)

So first things first, I clarified that the clustering for the project has occurred:

table(projAll$Clusters)

  C1   C2   C3   C4   C5   C6   C7   C8   C9 
  66 3689 3382   15  147 4326 2506 2680 3682

Another issue is we do not have yet scRNA data to integrate into our project, so lines of codes like p2 <- plotEmbedding(ArchRProj = projAll, colorBy = "cellColData", name = "Clusters2", embedding = "UMAP"), a few lines before, do not work yet.

From scouring online, the smooth.spline error happens when the datasets were too small. I have a feeling they could even be of size 0 here, but I don't know how exactly to check. Any idea for how to solve this problem? Thank you!

Hugo

Heatmap for marker peaks and marker genes doesnt render

Code :

markersPeaks <- markerFeatures(ArchRProj = proj, useMatrix = "PeakMatrix", groupBy = "Clusters")

#Visualize Markers as a heatmap
heatmapPeaks <- markerHeatmap(
seMarker = markersPeaks,
cutOff = "FDR <= 0.1 & Log2FC >= 1"
)
plotPDF(heatmapPeaks, name = "Peak-Marker-Heatmap", width = 8, height = 12, ArchRProj = proj, addDOC = FALSE)

Error :

2020-04-22 15:54:46 : ERROR Found in .ArchRHeatmap for
LogFile = ArchRLogs/ArchR-plotMarkerHeatmap-b7c95bbc52b7-Date-2020-04-22_Time-15-54-42.log

Error in .logError(e, fn = ".ArchRHeatmap", info = "", errorList = errorList, : Exiting See Error Above
Traceback:

markerHeatmap(seMarker = markersPeaks, cutOff = "FDR <= 0.1 & Log2FC >= 1")
plotMarkerHeatmap(...)
tryCatch({
. .ArchRHeatmap(mat = mat, scale = FALSE, limits = c(min(mat),
. max(mat)), color = pal, clusterCols = clusterCols, clusterRows = clusterRows,
. labelRows = labelRows, labelCols = TRUE, customRowLabel = mn,
. showColDendrogram = TRUE, draw = FALSE, name = paste0("Row Z-Scores\n",
. nrow(mat), " features\n", metadata(seMarker)$Params$useMatrix))
. .logError(e, fn = ".ArchRHeatmap", info = "", errorList = errorList,
. logFile = logFile)
. }, error = function(e) {
. errorList <- list(mat = mat, scale = FALSE, limits = c(min(mat),
. max(mat)), color = pal, clusterCols = clusterCols, clusterRows = clusterRows,
. labelRows = labelRows, labelCols = TRUE, customRowLabel = mn,
. showColDendrogram = TRUE, draw = FALSE, name = paste0("Row Z-Scores\n",
. nrow(mat), " features\n", metadata(seMarker)$Params$useMatrix))
. .logError(e, fn = ".ArchRHeatmap", info = "", errorList = errorList,
. logFile = logFile)
. })
tryCatchList(expr, classes, parentenv, handlers)
tryCatchOne(expr, names, parentenv, handlers[[1L]])
value[3L]
.logError(e, fn = ".ArchRHeatmap", info = "", errorList = errorList,
. logFile = logFile)
stop("Exiting See Error Above")

param documentation for `pal`

The documentation for pal is all over the place. I'd like to create a single, universal description that fits this parameter. My understanding:

pal is a named character vector that contains a defined set of colors represented as hexadecimal codes. The name of each vector component is a number that indicates the order in which that color set should be parsed to create an optimally-distinct set of colors for a discrete palette based on the provided number of colors.

Is it true that the value of pal should always be provided by calling paletteContinuous() or paletteDiscrete() unless the user decides to provide a properly formatted color palette? Is pal checked robustly for poor user inputs? Do we even want to endorse user-defined palettes?

Pseudo-bulk replicates

Just a reminder to double check that the number of cells annotated per replicate is the number of unique cells and does not include cells sampled with replacement.

AddDoubletScores(): "Correlation of UMAP Projection is below 0.9"

Hi, I am following the bookdown walkthrough with my own scATAC files, and I'm reaching an issue with addDoubletScores().

When I call the method using the default parameters:
doubScores <- addDoubletScores( input = ArrowFiles, k = 10, #Refers to how many cells near a "pseudo-doublet" to count. knnMethod = "UMAP", #Refers to embedding to use for nearest neighbor search with doublet projection. LSIMethod = 1 )

I get UMAP Projection R^2 = 0.58245, 0.78375, and 0.67597 for my three files. These are far less than 0.9, thus the files have little heterogeneity and the doubletCalling is inaccurate. Should I:

Fiddle with the parameters in order to try to fix the R values?
Continue as if it worked correctly?
Check my pipeline to see if something happened incorrectly with the files before ArchR?
Or something else?

Thank you!

plotTrajectory has params "name" and "trajectory" which seem to be the same thing

ArchR/R/Trajectory.R

Line 417 in ce9d5bb

trajectory = "Trajectory",

parameters trajectory and name appear to be the same thing.

addArchRThreads suggested warning

Might be worth throwing a warning depending on the input of addArchRThreads if people request more than an advisable number of cores. For whatever reason, on the macbook I'm using, if I ask for 8 cores, ArchR seems to hang during arrow file creation but if I ask for 4 cores it goes quickly.

Log2FC = NaN in getMarkerFeatures results

I have been using getMarkerFeatures and am noticing that a lot of the reported Log2FC results are NaN. Particularly for the most significant features. What do these NaNs mean? A lot of them look interesting and I'm wondering if I can assign them a number (e.g., > 5 or something). Log file is attached. Thank you!

ArchR-getMarkerFeatures-166ea3ffd2219-Date-2020-04-19_Time-12-14-33.log

Error in doubScores

Hi,

I downloaded ArchR today and tried to run it on some of 6 scATAC-seq (Cellranger output) files. While the creation of the arrow files did not pose any problems, the next step of detecting doublets ran into an error. This error was so severe that it ended up with me being possibly fork bombed (not sure about this, but I kept getting bash fork retry: no child processes and had to ask the sys admin to delete my processes).

I have attached my log files. I should probably mention that I installed ARchR in an anaconda environment.

If you have any pointers, it would be much appreciated.
ArchR-addDoubletScores-6def61bf62cb-Date-2020-05-01_Time-14-29-10.log

Creating Arrow files with tutorial data

Hi,

Just downloaded ArchR and tried to use the tutorial data. I've set the thread to 1 (just in case) and give it a go. I'm getting this error.

Thank you

ArchR-createArrows-4ca794f406c-Date-2020-05-02_Time-01-59-00.log

How to do differential (integrative) anlysis between case and control ?

Hi,

Great package, but I think one important aspect of the analysis is missing in your package.
I would love to have the following feature in your package.
Given that fact most of the time people have single-cell data from case and control groups, and they are interested to learn about disease-specific perturbation in a cell type-specific manner.
For example, linking differential peaks (between disease-control) to differential gene expression. or
Looking for differential TF activity in a specific cell type between case and control.

Having these features can increase the use case of your package.

Minor: Repeated warnings from addIterativeLSI() and plotEmbedding()

Attach your log file
ArchR has a built-in logging functionality for all complex functions. You MUST attach your log file (indicated in the console output) to this issue. Just drag and drop it here.
ArchR-addIterativeLSI-7a7238e4f7af-Date-2020-05-01_Time-18-12-25.log

Describe the bug
After running AddIterativeLSI() or plotEmbedding(), I'm getting repeated warning messages printed to the console.

To Reproduce
This also occurs for me with the demo datasets. I'm attaching a notebook here with those results:
01_explore_archr_demodata.pdf

Expected behavior
Everything except the warnings :)

Screenshots
Here's the console output generated by addIterativeLSI():

Checking Inputs...
ArchR logging to : ArchRLogs/ArchR-addIterativeLSI-7a7238e4f7af-Date-2020-05-01_Time-18-12-25.log
If there is an issue, please report to github with logFile!
2020-05-01 18:12:25 : Computing Total Accessibility Across All Features, 0 mins elapsed.
2020-05-01 18:12:30 : Computing Top Features, 0.077 mins elapsed.
###########
2020-05-01 18:12:30 : Running LSI (1 of 2) on Top Features, 0.085 mins elapsed.
###########
2020-05-01 18:12:30 : Sampling Cells (N = 10000) for Estimated LSI, 0.086 mins elapsed.
2020-05-01 18:12:30 : Creating Sampled Partial Matrix, 0.086 mins elapsed.
2020-05-01 18:12:48 : Computing Estimated LSI (projectAll = FALSE), 0.392 mins elapsed.
2020-05-01 18:13:58 : Identifying Clusters, 1.55 mins elapsed.
2020-05-01 18:14:27 : Identified 5 Clusters, 2.027 mins elapsed.
2020-05-01 18:14:27 : Saving LSI Iteration, 2.027 mins elapsed.
Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”2020-05-01 18:14:42 : Creating Cluster Matrix on the total Group Features, 2.279 mins elapsed.
2020-05-01 18:14:48 : Computing Variable Features, 2.392 mins elapsed.
###########
2020-05-01 18:14:49 : Running LSI (2 of 2) on Variable Features, 2.394 mins elapsed.
###########
2020-05-01 18:14:49 : Creating Partial Matrix, 2.395 mins elapsed.
2020-05-01 18:15:06 : Computing LSI, 2.688 mins elapsed.
2020-05-01 18:16:05 : Finished Running IterativeLSI, 3.661 mins elapsed.

Outputs from plotEmbedding:

ArchR logging to : ArchRLogs/ArchR-plotEmbedding-7a723e0f661a-Date-2020-05-01_Time-18-26-36.log
If there is an issue, please report to github with logFile!
Getting UMAP Embedding
ColorBy = cellColData
Plotting Embedding
1 
ArchR logging successful to : ArchRLogs/ArchR-plotEmbedding-7a723e0f661a-Date-2020-05-01_Time-18-26-36.log
Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”

Session Info

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS:   /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] gridExtra_2.3                     ggrastr_0.1.7                    
 [3] uwot_0.1.8                        Seurat_3.1.5                     
 [5] BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome_1.54.0                  
 [7] rtracklayer_1.46.0                Biostrings_2.54.0                
 [9] XVector_0.26.0                    ArchR_0.9.2                      
[11] magrittr_1.5                      rhdf5_2.30.1                     
[13] Matrix_1.2-17                     data.table_1.12.8                
[15] SummarizedExperiment_1.16.1       DelayedArray_0.12.3              
[17] BiocParallel_1.20.1               matrixStats_0.56.0               
[19] Biobase_2.46.0                    GenomicRanges_1.38.0             
[21] GenomeInfoDb_1.22.1               IRanges_2.20.2                   
[23] S4Vectors_0.24.4                  BiocGenerics_0.32.0              
[25] ggplot2_3.3.0                    

loaded via a namespace (and not attached):
 [1] Rtsne_0.15               colorspace_1.4-1         ellipsis_0.3.0          
 [4] ggridges_0.5.2           IRdisplay_0.7.0          base64enc_0.1-3         
 [7] farver_2.0.3             leiden_0.3.3             listenv_0.8.0           
[10] npsurv_0.4-0             ggrepel_0.8.2            RSpectra_0.16-0         
[13] codetools_0.2-16         splines_3.6.1            lsei_1.2-0              
[16] IRkernel_1.0.2           jsonlite_1.6.1           Cairo_1.5-12            
[19] Rsamtools_2.2.3          ica_1.0-2                cluster_2.1.0           
[22] png_0.1-7                sctransform_0.2.1        compiler_3.6.1          
[25] httr_1.4.1               assertthat_0.2.1         lazyeval_0.2.2          
[28] htmltools_0.4.0          tools_3.6.1              rsvd_1.0.3              
[31] igraph_1.2.5             gtable_0.3.0             glue_1.4.0              
[34] GenomeInfoDbData_1.2.2   RANN_2.6.1               reshape2_1.4.4          
[37] dplyr_0.8.5              rappdirs_0.3.1           Rcpp_1.0.4.6            
[40] vctrs_0.2.4              gdata_2.18.0             ape_5.3                 
[43] nlme_3.1-140             lmtest_0.9-37            stringr_1.4.0           
[46] globals_0.12.5           lifecycle_0.2.0          irlba_2.3.3             
[49] gtools_3.8.2             XML_3.99-0.3             future_1.17.0           
[52] zlibbioc_1.32.0          MASS_7.3-51.4            zoo_1.8-7               
[55] scales_1.1.0             RColorBrewer_1.1-2       reticulate_1.15         
[58] pbapply_1.4-2            stringi_1.4.6            caTools_1.18.0          
[61] repr_1.0.1               rlang_0.4.5              pkgconfig_2.0.3         
[64] bitops_1.0-6             evaluate_0.14            lattice_0.20-38         
[67] ROCR_1.0-7               purrr_0.3.4              Rhdf5lib_1.8.0          
[70] labeling_0.3             GenomicAlignments_1.22.1 patchwork_1.0.0         
[73] htmlwidgets_1.5.1        cowplot_1.0.0            tidyselect_1.0.0        
[76] RcppAnnoy_0.0.16         plyr_1.8.6               R6_2.4.1                
[79] gplots_3.0.3             pbdZMQ_0.3-3             pillar_1.4.3            
[82] withr_2.2.0              fitdistrplus_1.0-14      survival_3.1-12         
[85] RCurl_1.98-1.2           tibble_3.0.1             future.apply_1.5.0      
[88] tsne_0.1-3               crayon_1.3.4             uuid_0.1-4              
[91] KernSmooth_2.23-15       plotly_4.9.2             digest_0.6.25           
[94] tidyr_1.0.2              munsell_0.5.0            viridisLite_0.3.0

Additional context
Add any other context about the problem here.

documentation for ArrowFiles

For ArrowFiles, is there ever a situation where a path needs to be provided to the arrow file? Or is it only ever the name of the file and the files are always found in the outDir of the ArchRProject? The documentation is not very explict about whether ArrowFiles are strictly file names or sometimes relative file paths.

Remove duplicate instance of .isColor from ArchRBrowser.R

chromosome prefix issue

The hidden utility .availableChr ensures that chromosomes contain the prefix "chr" in all GRange objects, which breaks the pipeline for non-model genomes. Is this requirement really necessary?

If so, a work-around would be much appreciated.

advice / doc request on subsetting projects

After subsetting a project, which analysis steps should I re-do? Clearly I need to re-do addIterativeLSI and addUMAP, which I am keeping track of by creating new names for each subsetted project. But should I redo the below steps? I was redoing everything, but I'm a bit confused because there is no "name" attribute for addPeakMatrix and so addPeakMatrix is I think overwriting the previous PeakMatrix in my project.

proj <- addImputeWeights(proj, reducedDims=dr)
proj <- addGroupCoverages(proj, groupBy=clustering)
proj <- addReproduciblePeakSet(proj, groupBy=clustering)
proj <- addPeakMatrix(proj)
proj <- addMotifAnnotations(proj,force=TRUE)
proj <- addDeviationsMatrix(proj)

If this is an issue with documentation that is absent/missing:

Describe what material you feel should be explained
Currently subsetting is discussed very briefly as an example in section 3.2 and it would be great if this section could be expanded with more details

Where do you think this documentation would belong?
Subsetting is an important use case so I feel like this could have its own subsection (maybe 3.7)

issues with outputs in R studio

createArrowFiles() does not produce status messages in Rstudio.

computeKNN being exported without parameter descriptions

Is the function computeKNN supposed to be exported? if so, we should add function and parameter information.

ArchR/R/Clustering.R

Line 330 in c78a264

 computeKNN <- function(data, query = NULL, k = 50, method = NULL, includeSelf = FALSE, ...){ 

There should not be a default genome

This is going to cause problems. ArchR should error out until you add the genome you want. otherwise, people are just going to import their data and its going to look like crap unless they aligned to hg19 (which nobody should be doing anymore)

weak parameter name for addMotifAnnotations

I would change the parameter w to be named width

ArchR/R/ArchRProjectMethods.R

Line 654 in c78a264

w = 7,

geneAnno vs geneAnnotation

Currently, createArrowFiles() uses a parameter called geneAnno and the ArchRProject() constructor uses a parameter called geneAnnotation.
Is that intentional? If not, maybe harmonize those.

plotEmbedding() missing quantHex parameter?

parameter quantHex is missing in the function definition?

ArchR/R/VisualizeData.R

Line 25 in ce9d5bb

plotEmbedding <- function(

greenleaflab / archr Goto Github PK

archr's People

Stargazers

Watchers

Forkers

archr's Issues

If this is an issue with documentation that is absent/missing:

If this is an issue with documentation that is absent/missing:

Recommend Projects

Recommend Topics

Recommend Org