Giter Club home page Giter Club logo

archr's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

archr's Issues

Support more psuedobulk-based analyses

Example situation:
You have cluster calls and you want to compare the difference between Cluster1-SampleA and Cluster1-SampleB. This is possible with markerFeatures. But say you want to do this across a much larger comparison and see how similar Samples A, B, C, D, and E are within Cluster1. This is much harder to do.

One solution:
Make it possible to get a pseudobulk count matrix.

Continuing through after error ggplot for TSS by Frags

Hi,

Congrats and thank you for developing and sharing ArchR!

I'm running into an error very quickly using your tutorial dataset at createArrowFiles. copy/pasting lines in R studio as I run createArrowFiles:
Using GeneAnnotation set by addArchRGenome(Hg19)!
Using GeneAnnotation set by addArchRGenome(Hg19)!
ArchR logging to : ArchRLogs/ArchR-createArrows-38764dcec8f-Date-2020-04-29_Time-19-45-15.log
If there is an issue, please report to github with logFile!
2020-04-29 19:45:15 : Batch Execution w/ safelapply!, 0 mins elapsed.
2020-04-29 19:51:01 : createArrowFiles has encountered an error, checking if any ArrowFiles completed..
ArchR logging successful to : ArchRLogs/ArchR-createArrows-38764dcec8f-Date-2020-04-29_Time-19-45-15.log
Warning message:
In mclapply(..., mc.cores = threads, mc.preschedule = preschedule) :
3 function calls resulted in an error

ArrowFiles
character(0)

And this is with the latest install:

devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories())
Skipping install of 'ArchR' from a github remote, the SHA1 (c323e3c) has not changed since last install.
Use force = TRUE to force installation

When I look thru the logs, the error seems to be at the ggplot step for the TSS by frag. The frag size distribution plot comes out for all three datasets. Any help would be appreciated - thanks!!

ArchR-createArrows-38764dcec8f-Date-2020-04-29_Time-19-45-15.log

No fragments found

Hi,

I used my own 10X inputs to ArchR and somehow I couldn't create the arrow file.

I also tried to extract the 10x barcodes with getValidBarcodes() and feed it into the createArrowFiles() function. It did not work. Also tried to add 'chr' string to each line in the fragments file. It did not work either.

This .log file is the attempt with 10x fragments file as it is.

Thank you.

ArchR-createArrows-4456fb7160a-Date-2020-05-02_Time-04-33-16.log

bdg vs bgd

Change all instances of "bdg" to "bgd" typically with "bdgPeaks" or "bdgPeaks"

easiest way to load group coverages into public browser?

I usually use the washu browser which doesn't have an option for h5 files. Also, the output folders have replicates - ideally I would like to merge these and be able to load whatever tracks are shown in the ArchR browser (which look great btw) into a public browser.

Thanks!

... additional args should probably be removed where not applicable

I think that if additional arguments are not handled by a function, then the ... additional arguments option should be excluded. It makes the documentation confusing. There are of course some times where the additional arguments are needed (clustering with Seurat / louvain for ex.).

incorrect function export

I think there are some functions that are incorrectly being exported but I'm not sure.

If a function name is preceded with "." for example ".nullGeneAnnotation", it should not have a @export tag right?

See

#' @export

And many others below it in ArchRProjectMethods.R
This may exist elsewhere, I'll try to annotate as i find them

Disconnect between addImputeWeights() and getImputeWeights()

From chapter 7.3 of the bookdown:

proj2 <- addImputeWeights(proj2)
2020-03-19 12:53:52 : Computing Impute Weights Using Magic (Cell 2018), 0 mins elapsed..
2020-03-19 12:54:13 : Completed Getting Magic Weights!, 0.348 mins elapsed..
p <- plotEmbedding(
  ArchRProj = proj2, 
  colorBy = "GeneScoreMatrix", 
  name = markerGenes, 
  embedding = "UMAP",
  imputeWeights = getImputeWeights(proj2)
)
Getting Matrix Values...

Error in imputeMatrix(mat = as.matrix(colorMat), imputeWeights = proj@imputeWeights) : 
  trying to get slot "imputeWeights" from an object of a basic class ("function") with no slots

Sounds to me like getImputeWeights is just not looking at the right level of the ArchRProject object, since I see the imputation weight files listed:
proj2@imputeWeights$Weights@listData

$w1
[1] "/Users/dannyconrad/ArchR_Walkthrough/ArchR/ImputeWeights/Impute-Weights-Rep-1"

$w2
[1] "/Users/dannyconrad/ArchR_Walkthrough/ArchR/ImputeWeights/Impute-Weights-Rep-2"

Footprint plot axes

Update the footprint plot axes to be more descriptive. something like this:
image
Also note that the x-axis says "BP" instead of "bp"

Using outDir with createArrowFiles causes issues downstream

I havent fully diagnosed this one but I believe if you use the outDir param with createArrowFiles() it causes issues with downstream functions not being able to find files in the expected location.
For example, after running:

ArrowFiles <- createArrowFiles(
  inputFiles <- inputFiles,
  outDir = "/oak/stanford/groups/howchang/users/mcorces/scATAC_TCGA/analysis/ENCODE_Lung/ArrowFiles/",
  sampleNames = names(inputFiles),
  filterTSS = 4,
  filterFrags = 1000,
  addTileMat = TRUE,
  addGeneScoreMat = TRUE
)

I get this error when adding doublet scores:

> doubScores <- addDoubletScores(
+   input = ArrowFiles,
+   k = 10, #Refers to how many cells near a "pseudo-doublet" to count.
+   knnMethod = "UMAP", #Refers to embedding to use for nearest neighbor search with doublet projection.
+   LSIMethod = 1
+ )
2020-04-09 09:47:55 : Batch Execution w/ safelapply!, 0 mins elapsed.
###########
2020-04-09 09:47:55 : Computing Doublet Scores W62_LNGL_B_8819_X020_S03_B1_T1 (1 of 7)!, 0 mins elapsed.
###########
Checking Inputs...
2020-04-09 09:47:57 : Computing Total Accessibility Across All Features, 0.001 mins elapsed.
2020-04-09 09:48:02 : Computing Top Features, 0.086 mins elapsed.
###########
2020-04-09 09:48:03 : Running LSI (1 of 2) on Top Features, 0.095 mins elapsed.
###########
2020-04-09 09:48:03 : Creating Partial Matrix, 0.095 mins elapsed.
2020-04-09 09:48:15 : Computing LSI, 0.303 mins elapsed.
2020-04-09 09:48:23 : Identifying Clusters, 0.431 mins elapsed.
2020-04-09 09:48:35 : Identified 2 Clusters, 0.636 mins elapsed.
2020-04-09 09:48:35 : Creating Cluster Matrix on the total Group Features, 0.636 mins elapsed.
2020-04-09 09:48:46 : Computing Variable Features, 0.82 mins elapsed.
###########
2020-04-09 09:48:47 : Running LSI (2 of 2) on Variable Features, 0.822 mins elapsed.
###########
2020-04-09 09:48:47 : Creating Partial Matrix, 0.822 mins elapsed.
2020-04-09 09:48:57 : Computing LSI, 1 mins elapsed.
2020-04-09 09:49:04 : Finished Running IterativeLSI, 1.114 mins elapsed.
###########
2020-04-09 09:49:04 : Constructing Partial Matrix for Projection, 1.159 mins elapsed.
###########
###########
2020-04-09 09:49:14 : Running LSI UMAP, 1.323 mins elapsed.
###########
###########
2020-04-09 09:49:26 : Simulating and Projecting Doublets, 1.518 mins elapsed.
###########
UMAP Projection R^2 = 0.94171
Error in gzfile(file, mode) : cannot open the connection
In addition: Warning message:
In gzfile(file, mode) :
  cannot open compressed file 'QualityControl/W62_LNGL_B_8819_X020_S03_B1_T1/W62_LNGL_B_8819_X020_S03_B1_T1-Doublet-Summary.rds', probable reason 'No such file or directory'

This error does not happen when I run without outDir

As a separate issue, I also think the parameter outDir is confusing in createArrowFiles() because I assumed it was the directory where the Arrow files would be created but it is not.

Not able to read files to creat Arrov

Hi,

I'm trying to read the fragment files to create ArrowFiles, but unfortunately ArchR is not able to read it.
Here is my try


> inputFiles<-c("/Volumes/G_Drive/scAL/ATAC/S111/fragments.tsv.gz")
> 
> ArrowFiles <- createArrowFiles(
+   inputFiles = inputFiles,
+   sampleNames = c("111"),
+   filterTSS = 4, #Dont set this too high because you can always increase later
+   filterFrags = 1000, 
+   addTileMat = TRUE,
+   addGeneScoreMat = TRUE
+ )
Using GeneAnnotation set by addArchRGenome(Hg19)!
Using GeneAnnotation set by addArchRGenome(Hg19)!
ArchR logging to : ArchRLogs/ArchR-createArrows-9d3a110ba5be-Date-2020-04-30_Time-17-06-20.log
If there is an issue, please report to github with logFile!
Error in file(file, ifelse(append, "a", "w")) : 
  cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
  cannot open file 'ArchRLogs/ArchR-createArrows-9d3a110ba5be-Date-2020-04-30_Time-17-06-20.log': No such file or directory

motif marker heatmap

Is there a way to do this? I have tried the below and am getting errors, although it worked for GeneScoreMatrix and PeakMatrix

markerFeatures(proj, useMatrix = "MotifMatrix")
Error in (function (ArchRProj = NULL, groupBy = "Clusters", useGroups = NULL, : When accessing features from a matrix of class Sparse.Assays.Matrix it requires seqnames! Please specify 1 seqname in useSeqnames to continue! If confused, try getFeatures(ArchRProj, useMatrix) to list out available seqnames for input!

and when I try to pass in features from
getFeatures(proj, "MotifMatrix")

that also fails
Error in (function (ArchRProj = NULL, groupBy = "Clusters", useGroups = NULL, : Less than 1 feature is remaining in featureDF please check input!

I did see the motif enrichment in peaks heatmap, but it would also be great to directly do markers.

thanks!

Standardize fonts to Arial

In making pub-quality figures, some of the PDFs come with Helvetica and some with Helvetica Light. I havent looked up how to standardize this but those fonts will not be available to anyone using Windows or Linux. Using this as a reminder to look into font standardization

Docs: GSL dependency resolution on Linux systems

First: Thanks for putting out ArchR! I'm excited to test it out on our data! I hit a snag in installation that I thought other users might encounter. This was found on a Debian 9 system:

If this is an issue with documentation that is absent/missing:

Describe what material you feel should be explained
Some dependencies of ArchR require the Gnu Scientific Library (GSL) to be installed. In particular, I hit an installation roadblock with DirichletMultinomial failing to install. The errors weren't exactly clear for this case, so it may be good to provide a brief documentation section to help users along if they encounter this.

GSL can be installed with:

wget http://gnu.mirror.constant.com/gsl/gsl-2.6.tar.gz
tar -xzvf gsl-2.6.tar.gz
cd gsl-2.6
./configure
make
sudo make install

On some systems, the GSL library location will have to be provided to R using:

ld_path <- paste(Sys.getenv("LD_LIBRARY_PATH"), "/usr/local/lib/", sep = ";")
Sys.setenv(LD_LIBRARY_PATH = ld_path)

ArchR and its dependencies should then be able to be installed as normal with:

devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories())

Where do you think this documentation would belong?
This may belong in the "Known trouble spots for installation" section of the site at https://www.archrproject.com/ .

Cheers,
-Lucas

Error creating pseudoBulk replicates

Good day!

Thanks for developing such a friendly-user tool. Very well documented. I have successfully run the tools until I have to create pseudo bulk profiles. I am getting the next error after the message Batch Execution w/ safelapply!, 0.031 mins elapsed.:

Warning message in mclapply(..., mc.cores = threads, mc.preschedule = preschedule):
"3 function calls resulted in an error"
Error in (function (..., threads = 1, preschedule = FALSE) : 
Error Found Iteration 12 : 
	[1] "Error in H5Fcreate(file) : HDF5. File accessibilty. Unable to open file.\n"
	<simpleError in H5Fcreate(file): HDF5. File accessibilty. Unable to open file.>
Error Found Iteration 13 : 
	[1] "Error in H5Fcreate(file) : HDF5. File accessibilty. Unable to open file.\n"
	<simpleError in H5Fcreate(file): HDF5. File accessibilty. Unable to open file.>
Error Found Iteration 14 : 
	[1] "Error in H5Fcreate(file) : HDF5. File accessibilty. Unable to open file.\n"
	<simpleError in H5Fcreate(file): HDF5. File accessibilty. Unable to open file.>

Traceback:

1. addGroupCoverages(ArchRProj = projHeme2, groupBy = "Clusters")
2. .batchlapply(args)
3. do.call(.safelapply, args)
4. (function (..., threads = 1, preschedule = FALSE) 
 . {
 .     if (tolower(.Platform$OS.type) == "windows") {
 .         threads <- 1
 .     }
 .     if (threads > 1) {
 .         o <- mclapply(..., mc.cores = threads, mc.preschedule = preschedule)
 .         errorMsg <- list()
 .         for (i in seq_along(o)) {
 .             if (inherits(o[[i]], "try-error")) {
 .                 capOut <- utils::capture.output(o[[i]])
 .                 capOut <- capOut[!grepl("attr\\(\\,|try-error", 
 .                   capOut)]
 .                 capOut <- head(capOut, 10)
 .                 capOut <- unlist(lapply(capOut, function(x) substr(x, 
 .                   1, 250)))
 .                 capOut <- paste0("\t", capOut)
 .                 errorMsg[[length(errorMsg) + 1]] <- paste0(c(paste0("Error Found Iteration ", 
 .                   i, " : "), capOut), "\n")
 .             }
 .         }
 .         if (length(errorMsg) != 0) {
 .             errorMsg <- unlist(errorMsg)
 .             errorMsg <- head(errorMsg, 50)
 .             errorMsg[1] <- paste0("\n", errorMsg[1])
 .             stop(errorMsg)
 .         }
 .     }
 .     else {
 .         o <- lapply(...)
 .     }
 .     o
 . })(X = 1:33, FUN = function (i = NULL, cellGroups, kmerBias = NULL, 
 .     kmerLength = 6, genome = NULL, ArrowFiles = NULL, cellsInArrow = NULL, 
 .     availableChr = NULL, chromLengths = NULL, covDir = NULL, 
 .     tstart = NULL, subThreads = 1, verbose = TRUE, logFile = NULL) 
 . {
 .     prefix <- sprintf("Group (%s of %s) :", i, length(cellGroups))
 .     .logDiffTime(sprintf("%s Creating Group Coverage", prefix), 
 .         tstart, verbose = verbose, logFile = logFile)
 .     cellGroupi <- cellGroups[[i]]
 .     tableGroupi <- table(cellGroupi)
 .     covFile <- file.path(covDir, paste0(names(cellGroups)[i], 
 .         ".insertions.coverage.h5"))
 .     rmf <- .suppressAll(file.remove(covFile))
 .     o <- h5createFile(covFile)
 .     o <- h5createGroup(covFile, paste0("Coverage"))
 .     o <- h5createGroup(covFile, paste0("Metadata"))
 .     o <- h5write(obj = "ArrowCoverage", file = covFile, name = "Class")
 .     o <- h5createGroup(covFile, paste0("Coverage/Info"))
 .     o <- h5write(as.character(cellGroupi), covFile, "Coverage/Info/CellNames")
 .     nFragDump <- 0
 .     nCells <- c()
 .     for (k in seq_along(availableChr)) {
 .         .logDiffTime(sprintf("%s Processed Fragments Chr (%s of %s)", 
 .             prefix, k, length(availableChr)), tstart, verbose = FALSE, 
 .             logFile = logFile)
 .         it <- 0
 .         for (j in seq_along(ArrowFiles)) {
 .             cellsInI <- sum(cellsInArrow[[names(ArrowFiles)[j]]] %in% 
 .                 cellGroupi)
 .             if (cellsInI > 0) {
 .                 it <- it + 1
 .                 if (it == 1) {
 .                   fragik <- .getFragsFromArrow(ArrowFiles[j], 
 .                     chr = availableChr[k], out = "GRanges", cellNames = cellGroupi)
 .                 }
 .                 else {
 .                   fragik <- c(fragik, .getFragsFromArrow(ArrowFiles[j], 
 .                     chr = availableChr[k], out = "GRanges", cellNames = cellGroupi))
 .                 }
 .             }
 .         }
 .         matchRG <- as.vector(S4Vectors::match(mcols(fragik)$RG, 
 .             names(tableGroupi)))
 .         fragik <- rep(fragik, tableGroupi[matchRG])
 .         nCells <- c(nCells, unique(mcols(fragik)$RG))
 .         covk <- coverage(IRanges(start = c(start(fragik), end(fragik)), 
 .             width = 1), width = chromLengths[availableChr[k]])
 .         nFragDump <- nFragDump + length(fragik)
 .         rm(fragik)
 .         chrLengths <- paste0("Coverage/", availableChr[k], "/Lengths")
 .         chrValues <- paste0("Coverage/", availableChr[k], "/Values")
 .         lengthRle <- length(covk@lengths)
 .         o <- h5createGroup(covFile, paste0("Coverage/", availableChr[k]))
 .         o <- .suppressAll(h5createDataset(covFile, chrLengths, 
 .             storage.mode = "integer", dims = c(lengthRle, 1), 
 .             level = 0))
 .         o <- .suppressAll(h5createDataset(covFile, chrValues, 
 .             storage.mode = "integer", dims = c(lengthRle, 1), 
 .             level = 0))
 .         o <- h5write(obj = covk@lengths, file = covFile, name = chrLengths)
 .         o <- h5write(obj = covk@values, file = covFile, name = chrValues)
 .         gc()
 .     }
 .     if (length(unique(cellGroupi)) != length(unique(nCells))) {
 .         .logMessage(paste0("Not all cells (", length(unique(cellGroupi)), 
 .             ") were found for coverage creation (", length(unique(nCells)), 
 .             ")!"), logFile = logFile)
 .         stop("Not all cells (", length(unique(cellGroupi)), ") were found for coverage creation (", 
 .             length(unique(nCells)), ")!")
 .     }
 .     out <- list(covFile = covFile, nCells = length(cellGroupi), 
 .         nFragments = nFragDump)
 .     return(out)
 . }, cellGroups = new("SimpleCharacterList", elementType = "character", 
 .     elementMetadata = NULL, metadata = list(), listData = list(....... cellIDs....),
covDir = "/home/cruiz/10x_scATACseq/analysis/heme_project/archr/Save-ProjHeme2/GroupCoverages/Clusters", 
 .     threads = 20L, verbose = TRUE, tstart = structure(1588423926.40649, class = c("POSIXct", 
 .     "POSIXt")), logFile = "ArchRLogs/ArchR-addGroupCoverages-da1d623db3da-Date-2020-05-02_Time-14-52-06.log", 
 .     subThreads = 1)
5. stop(errorMsg)

This happens using either 10, 20 or 48 cores (I thought it could be a problem with the number of threads). This is the message after I changed the cluster names on my arch project. If I do not change the idents of the clusters, the error is different (and is the same one when the tutorial data):

Warning message in mclapply(..., mc.cores = threads, mc.preschedule = preschedule):
"21 function calls resulted in an error"
Error in .safelapply(seq_along(availableChr), function(x) {: 
Error Found Iteration 1 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 2 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 3 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 4 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 5 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 6 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 7 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 8 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 9 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 10 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 11 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 12 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 13 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 14 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 15 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 16 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"
	<simpleError in validObject(.Object): invalid class "DNAString" object: superclass "vector_OR_Vector" not defined in the environment of the object's class>
Error Found Iteration 17 : 
	[1] "Error in validObject(.Object) : \n  invalid class \"DNAString\" object: superclass \"vector_OR_Vector\" not defined in the environment of the object's class\n"

Traceback:

1. addGroupCoverages(ArchRProj = projHeme2, groupBy = "Clusters")
2. .addKmerBiasToCoverage(coverageMetadata = coverageMetadata, genome = getGenome(ArchRProj), 
 .     kmerLength = kmerLength, threads = threads, verbose = FALSE, 
 .     logFile = logFile)
3. .safelapply(seq_along(availableChr), function(x) {
 .     .logMessage(sprintf("Kmer Bias %s (%s of %s)", availableChr[x], 
 .         x, length(availableChr)), logFile = logFile)
 .     message(availableChr[x], " ", appendLF = FALSE)
 .     chrBS <- BSgenome[[availableChr[x]]]
 .     exp <- Biostrings::oligonucleotideFrequency(chrBS, width = kmerLength)
 .     obsList <- lapply(seq_along(coverageFiles), function(y) {
 .         .logMessage(sprintf("Coverage File %s (%s of %s)", availableChr[x], 
 .             y, length(coverageFiles)), logFile = logFile)
 .         tryCatch({
 .             obsx <- .getCoverageInsertionSites(coverageFiles[y], 
 .                 availableChr[x]) %>% {
 .                 BSgenome::Views(chrBS, IRanges(start = . - floor(kmerLength/2), 
 .                   width = kmerLength))
 .             } %>% {
 .                 Biostrings::oligonucleotideFrequency(., width = kmerLength, 
 .                   simplify.as = "collapsed")
 .             }
 .             gc()
 .             obsx
 .         }, error = function(e) {
 .             errorList <- list(y = y, coverageFile = coverageFiles[y], 
 .                 chr = availableChr[x], iS = tryCatch({
 .                   .getCoverageInsertionSites(coverageFiles[y], 
 .                     availableChr[x])
 .                 }, error = function(e) {
 .                   "Error .getCoverageInsertionSites"
 .                 }))
 .             .logError(e, fn = ".addKmerBiasToCoverage", info = "", 
 .                 errorList = errorList, logFile = logFile)
 .         })
 .     }) %>% SimpleList
 .     names(obsList) <- names(coverageFiles)
 .     SimpleList(expected = exp, observed = obsList)
 . }, threads = threads) %>% SimpleList
4. eval(lhs, parent, parent)
5. eval(lhs, parent, parent)
6. .safelapply(seq_along(availableChr), function(x) {
 .     .logMessage(sprintf("Kmer Bias %s (%s of %s)", availableChr[x], 
 .         x, length(availableChr)), logFile = logFile)
 .     message(availableChr[x], " ", appendLF = FALSE)
 .     chrBS <- BSgenome[[availableChr[x]]]
 .     exp <- Biostrings::oligonucleotideFrequency(chrBS, width = kmerLength)
 .     obsList <- lapply(seq_along(coverageFiles), function(y) {
 .         .logMessage(sprintf("Coverage File %s (%s of %s)", availableChr[x], 
 .             y, length(coverageFiles)), logFile = logFile)
 .         tryCatch({
 .             obsx <- .getCoverageInsertionSites(coverageFiles[y], 
 .                 availableChr[x]) %>% {
 .                 BSgenome::Views(chrBS, IRanges(start = . - floor(kmerLength/2), 
 .                   width = kmerLength))
 .             } %>% {
 .                 Biostrings::oligonucleotideFrequency(., width = kmerLength, 
 .                   simplify.as = "collapsed")
 .             }
 .             gc()
 .             obsx
 .         }, error = function(e) {
 .             errorList <- list(y = y, coverageFile = coverageFiles[y], 
 .                 chr = availableChr[x], iS = tryCatch({
 .                   .getCoverageInsertionSites(coverageFiles[y], 
 .                     availableChr[x])
 .                 }, error = function(e) {
 .                   "Error .getCoverageInsertionSites"
 .                 }))
 .             .logError(e, fn = ".addKmerBiasToCoverage", info = "", 
 .                 errorList = errorList, logFile = logFile)
 .         })
 .     }) %>% SimpleList
 .     names(obsList) <- names(coverageFiles)
 .     SimpleList(expected = exp, observed = obsList)
 . }, threads = threads)
7. stop(errorMsg)

What do you think might be the problem? Thanks in advance for your help!

ArchR-addGroupCoverages-d819747731bc-Date-2020-05-02_Time-14-43-35.log
ArchR-addGroupCoverages-d93d104d33a2-Date-2020-05-02_Time-14-47-21.log
ArchR-addGroupCoverages-da1d623db3da-Date-2020-05-02_Time-14-52-06.log
ArchR_tutotial_addGroupCoverages-ec51247cb782-Date-2020-05-02_Time-18-25-47.log

14.1: Myeloid Trajectory - error with addTrajectory()

I'm trying to apply Myeloid Trajectory with my own data, and am running into an error with addTrajectory(), and I've tried it with different combinations of clusters for the trajectory.

trajectory <- c(paste0("Cluster", c(6, 7, 8, 9, 10))) # I changed these to experiment, some error
trajectory
# [1] "Cluster6"  "Cluster7"  "Cluster8"  "Cluster9"  "Cluster10"

#First we need to create a Trajectory and add it to ArchRProj cellColData
projAll <- addTrajectory(ArchRProj = projAll, name = "MyeloidU", trajectory = trajectory, embedding = "UMAP", force = TRUE)

I get the following error and traceback:

Error in smooth.spline(x = initialTime, y = matFilter[names(initialTime),  : 
  'tol' must be strictly positive and finite
> traceback()
8: stop("'tol' must be strictly positive and finite")
7: smooth.spline(x = initialTime, y = matFilter[names(initialTime), 
       x], df = dof, spar = spar)
6: FUN(X[[i]], ...)
5: lapply(seq_len(ncol(matFilter)), function(x) {
       smooth.spline(x = initialTime, y = matFilter[names(initialTime), 
           x], df = dof, spar = spar)[[2]]
   })
4: eval(lhs, parent, parent)
3: eval(lhs, parent, parent)
2: lapply(seq_len(ncol(matFilter)), function(x) {
       smooth.spline(x = initialTime, y = matFilter[names(initialTime), 
           x], df = dof, spar = spar)[[2]]
   }) %>% Reduce("cbind", .) %>% data.frame()
1: addTrajectory(ArchRProj = projAll, name = "MyeloidU", trajectory = trajectory, 
       embedding = "UMAP", force = TRUE)

So first things first, I clarified that the clustering for the project has occurred:

table(projAll$Clusters)

  C1   C2   C3   C4   C5   C6   C7   C8   C9 
  66 3689 3382   15  147 4326 2506 2680 3682 

Another issue is we do not have yet scRNA data to integrate into our project, so lines of codes like p2 <- plotEmbedding(ArchRProj = projAll, colorBy = "cellColData", name = "Clusters2", embedding = "UMAP"), a few lines before, do not work yet.

From scouring online, the smooth.spline error happens when the datasets were too small. I have a feeling they could even be of size 0 here, but I don't know how exactly to check. Any idea for how to solve this problem? Thank you!

Hugo

Heatmap for marker peaks and marker genes doesnt render

Code :

markersPeaks <- markerFeatures(ArchRProj = proj, useMatrix = "PeakMatrix", groupBy = "Clusters")

#Visualize Markers as a heatmap
heatmapPeaks <- markerHeatmap(
seMarker = markersPeaks,
cutOff = "FDR <= 0.1 & Log2FC >= 1"
)
plotPDF(heatmapPeaks, name = "Peak-Marker-Heatmap", width = 8, height = 12, ArchRProj = proj, addDOC = FALSE)

Error :


2020-04-22 15:54:46 : ERROR Found in .ArchRHeatmap for
LogFile = ArchRLogs/ArchR-plotMarkerHeatmap-b7c95bbc52b7-Date-2020-04-22_Time-15-54-42.log

<simpleError in .logError(e, fn = ".ArchRHeatmap", info = "", errorList = errorList, logFile = logFile): object 'e' not found>


Error in .logError(e, fn = ".ArchRHeatmap", info = "", errorList = errorList, : Exiting See Error Above
Traceback:

  1. markerHeatmap(seMarker = markersPeaks, cutOff = "FDR <= 0.1 & Log2FC >= 1")
  2. plotMarkerHeatmap(...)
  3. tryCatch({
    . .ArchRHeatmap(mat = mat, scale = FALSE, limits = c(min(mat),
    . max(mat)), color = pal, clusterCols = clusterCols, clusterRows = clusterRows,
    . labelRows = labelRows, labelCols = TRUE, customRowLabel = mn,
    . showColDendrogram = TRUE, draw = FALSE, name = paste0("Row Z-Scores\n",
    . nrow(mat), " features\n", metadata(seMarker)$Params$useMatrix))
    . .logError(e, fn = ".ArchRHeatmap", info = "", errorList = errorList,
    . logFile = logFile)
    . }, error = function(e) {
    . errorList <- list(mat = mat, scale = FALSE, limits = c(min(mat),
    . max(mat)), color = pal, clusterCols = clusterCols, clusterRows = clusterRows,
    . labelRows = labelRows, labelCols = TRUE, customRowLabel = mn,
    . showColDendrogram = TRUE, draw = FALSE, name = paste0("Row Z-Scores\n",
    . nrow(mat), " features\n", metadata(seMarker)$Params$useMatrix))
    . .logError(e, fn = ".ArchRHeatmap", info = "", errorList = errorList,
    . logFile = logFile)
    . })
  4. tryCatchList(expr, classes, parentenv, handlers)
  5. tryCatchOne(expr, names, parentenv, handlers[[1L]])
  6. value[3L]
  7. .logError(e, fn = ".ArchRHeatmap", info = "", errorList = errorList,
    . logFile = logFile)
  8. stop("Exiting See Error Above")

param documentation for `pal`

The documentation for pal is all over the place. I'd like to create a single, universal description that fits this parameter. My understanding:

pal is a named character vector that contains a defined set of colors represented as hexadecimal codes. The name of each vector component is a number that indicates the order in which that color set should be parsed to create an optimally-distinct set of colors for a discrete palette based on the provided number of colors.

Is it true that the value of pal should always be provided by calling paletteContinuous() or paletteDiscrete() unless the user decides to provide a properly formatted color palette? Is pal checked robustly for poor user inputs? Do we even want to endorse user-defined palettes?

Pseudo-bulk replicates

Just a reminder to double check that the number of cells annotated per replicate is the number of unique cells and does not include cells sampled with replacement.

AddDoubletScores(): "Correlation of UMAP Projection is below 0.9"

Hi, I am following the bookdown walkthrough with my own scATAC files, and I'm reaching an issue with addDoubletScores().

When I call the method using the default parameters:
doubScores <- addDoubletScores( input = ArrowFiles, k = 10, #Refers to how many cells near a "pseudo-doublet" to count. knnMethod = "UMAP", #Refers to embedding to use for nearest neighbor search with doublet projection. LSIMethod = 1 )

I get UMAP Projection R^2 = 0.58245, 0.78375, and 0.67597 for my three files. These are far less than 0.9, thus the files have little heterogeneity and the doubletCalling is inaccurate. Should I:

  1. Fiddle with the parameters in order to try to fix the R values?
  2. Continue as if it worked correctly?
  3. Check my pipeline to see if something happened incorrectly with the files before ArchR?
  4. Or something else?

Thank you!

addArchRThreads suggested warning

Might be worth throwing a warning depending on the input of addArchRThreads if people request more than an advisable number of cores. For whatever reason, on the macbook I'm using, if I ask for 8 cores, ArchR seems to hang during arrow file creation but if I ask for 4 cores it goes quickly.

Error in doubScores

Hi,

I downloaded ArchR today and tried to run it on some of 6 scATAC-seq (Cellranger output) files. While the creation of the arrow files did not pose any problems, the next step of detecting doublets ran into an error. This error was so severe that it ended up with me being possibly fork bombed (not sure about this, but I kept getting bash fork retry: no child processes and had to ask the sys admin to delete my processes).

I have attached my log files. I should probably mention that I installed ARchR in an anaconda environment.

If you have any pointers, it would be much appreciated.
ArchR-addDoubletScores-6def61bf62cb-Date-2020-05-01_Time-14-29-10.log

How to do differential (integrative) anlysis between case and control ?

Hi,

Great package, but I think one important aspect of the analysis is missing in your package.
I would love to have the following feature in your package.
Given that fact most of the time people have single-cell data from case and control groups, and they are interested to learn about disease-specific perturbation in a cell type-specific manner.
For example, linking differential peaks (between disease-control) to differential gene expression. or
Looking for differential TF activity in a specific cell type between case and control.

Having these features can increase the use case of your package.

Minor: Repeated warnings from addIterativeLSI() and plotEmbedding()

Attach your log file
ArchR has a built-in logging functionality for all complex functions. You MUST attach your log file (indicated in the console output) to this issue. Just drag and drop it here.
ArchR-addIterativeLSI-7a7238e4f7af-Date-2020-05-01_Time-18-12-25.log

Describe the bug
After running AddIterativeLSI() or plotEmbedding(), I'm getting repeated warning messages printed to the console.

To Reproduce
This also occurs for me with the demo datasets. I'm attaching a notebook here with those results:
01_explore_archr_demodata.pdf

Expected behavior
Everything except the warnings :)

Screenshots
Here's the console output generated by addIterativeLSI():

Checking Inputs...
ArchR logging to : ArchRLogs/ArchR-addIterativeLSI-7a7238e4f7af-Date-2020-05-01_Time-18-12-25.log
If there is an issue, please report to github with logFile!
2020-05-01 18:12:25 : Computing Total Accessibility Across All Features, 0 mins elapsed.
2020-05-01 18:12:30 : Computing Top Features, 0.077 mins elapsed.
###########
2020-05-01 18:12:30 : Running LSI (1 of 2) on Top Features, 0.085 mins elapsed.
###########
2020-05-01 18:12:30 : Sampling Cells (N = 10000) for Estimated LSI, 0.086 mins elapsed.
2020-05-01 18:12:30 : Creating Sampled Partial Matrix, 0.086 mins elapsed.
2020-05-01 18:12:48 : Computing Estimated LSI (projectAll = FALSE), 0.392 mins elapsed.
2020-05-01 18:13:58 : Identifying Clusters, 1.55 mins elapsed.
2020-05-01 18:14:27 : Identified 5 Clusters, 2.027 mins elapsed.
2020-05-01 18:14:27 : Saving LSI Iteration, 2.027 mins elapsed.
Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”2020-05-01 18:14:42 : Creating Cluster Matrix on the total Group Features, 2.279 mins elapsed.
2020-05-01 18:14:48 : Computing Variable Features, 2.392 mins elapsed.
###########
2020-05-01 18:14:49 : Running LSI (2 of 2) on Variable Features, 2.394 mins elapsed.
###########
2020-05-01 18:14:49 : Creating Partial Matrix, 2.395 mins elapsed.
2020-05-01 18:15:06 : Computing LSI, 2.688 mins elapsed.
2020-05-01 18:16:05 : Finished Running IterativeLSI, 3.661 mins elapsed.

Outputs from plotEmbedding:

ArchR logging to : ArchRLogs/ArchR-plotEmbedding-7a723e0f661a-Date-2020-05-01_Time-18-26-36.log
If there is an issue, please report to github with logFile!
Getting UMAP Embedding
ColorBy = cellColData
Plotting Embedding
1 
ArchR logging successful to : ArchRLogs/ArchR-plotEmbedding-7a723e0f661a-Date-2020-05-01_Time-18-26-36.log
Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”Warning message:
“Use of `dfMean$color` is discouraged. Use `color` instead.”

Session Info

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS:   /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] gridExtra_2.3                     ggrastr_0.1.7                    
 [3] uwot_0.1.8                        Seurat_3.1.5                     
 [5] BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome_1.54.0                  
 [7] rtracklayer_1.46.0                Biostrings_2.54.0                
 [9] XVector_0.26.0                    ArchR_0.9.2                      
[11] magrittr_1.5                      rhdf5_2.30.1                     
[13] Matrix_1.2-17                     data.table_1.12.8                
[15] SummarizedExperiment_1.16.1       DelayedArray_0.12.3              
[17] BiocParallel_1.20.1               matrixStats_0.56.0               
[19] Biobase_2.46.0                    GenomicRanges_1.38.0             
[21] GenomeInfoDb_1.22.1               IRanges_2.20.2                   
[23] S4Vectors_0.24.4                  BiocGenerics_0.32.0              
[25] ggplot2_3.3.0                    

loaded via a namespace (and not attached):
 [1] Rtsne_0.15               colorspace_1.4-1         ellipsis_0.3.0          
 [4] ggridges_0.5.2           IRdisplay_0.7.0          base64enc_0.1-3         
 [7] farver_2.0.3             leiden_0.3.3             listenv_0.8.0           
[10] npsurv_0.4-0             ggrepel_0.8.2            RSpectra_0.16-0         
[13] codetools_0.2-16         splines_3.6.1            lsei_1.2-0              
[16] IRkernel_1.0.2           jsonlite_1.6.1           Cairo_1.5-12            
[19] Rsamtools_2.2.3          ica_1.0-2                cluster_2.1.0           
[22] png_0.1-7                sctransform_0.2.1        compiler_3.6.1          
[25] httr_1.4.1               assertthat_0.2.1         lazyeval_0.2.2          
[28] htmltools_0.4.0          tools_3.6.1              rsvd_1.0.3              
[31] igraph_1.2.5             gtable_0.3.0             glue_1.4.0              
[34] GenomeInfoDbData_1.2.2   RANN_2.6.1               reshape2_1.4.4          
[37] dplyr_0.8.5              rappdirs_0.3.1           Rcpp_1.0.4.6            
[40] vctrs_0.2.4              gdata_2.18.0             ape_5.3                 
[43] nlme_3.1-140             lmtest_0.9-37            stringr_1.4.0           
[46] globals_0.12.5           lifecycle_0.2.0          irlba_2.3.3             
[49] gtools_3.8.2             XML_3.99-0.3             future_1.17.0           
[52] zlibbioc_1.32.0          MASS_7.3-51.4            zoo_1.8-7               
[55] scales_1.1.0             RColorBrewer_1.1-2       reticulate_1.15         
[58] pbapply_1.4-2            stringi_1.4.6            caTools_1.18.0          
[61] repr_1.0.1               rlang_0.4.5              pkgconfig_2.0.3         
[64] bitops_1.0-6             evaluate_0.14            lattice_0.20-38         
[67] ROCR_1.0-7               purrr_0.3.4              Rhdf5lib_1.8.0          
[70] labeling_0.3             GenomicAlignments_1.22.1 patchwork_1.0.0         
[73] htmlwidgets_1.5.1        cowplot_1.0.0            tidyselect_1.0.0        
[76] RcppAnnoy_0.0.16         plyr_1.8.6               R6_2.4.1                
[79] gplots_3.0.3             pbdZMQ_0.3-3             pillar_1.4.3            
[82] withr_2.2.0              fitdistrplus_1.0-14      survival_3.1-12         
[85] RCurl_1.98-1.2           tibble_3.0.1             future.apply_1.5.0      
[88] tsne_0.1-3               crayon_1.3.4             uuid_0.1-4              
[91] KernSmooth_2.23-15       plotly_4.9.2             digest_0.6.25           
[94] tidyr_1.0.2              munsell_0.5.0            viridisLite_0.3.0   

Additional context
Add any other context about the problem here.

documentation for ArrowFiles

For ArrowFiles, is there ever a situation where a path needs to be provided to the arrow file? Or is it only ever the name of the file and the files are always found in the outDir of the ArchRProject? The documentation is not very explict about whether ArrowFiles are strictly file names or sometimes relative file paths.

chromosome prefix issue

The hidden utility .availableChr ensures that chromosomes contain the prefix "chr" in all GRange objects, which breaks the pipeline for non-model genomes. Is this requirement really necessary?

If so, a work-around would be much appreciated.

advice / doc request on subsetting projects

After subsetting a project, which analysis steps should I re-do? Clearly I need to re-do addIterativeLSI and addUMAP, which I am keeping track of by creating new names for each subsetted project. But should I redo the below steps? I was redoing everything, but I'm a bit confused because there is no "name" attribute for addPeakMatrix and so addPeakMatrix is I think overwriting the previous PeakMatrix in my project.

proj <- addImputeWeights(proj, reducedDims=dr)
proj <- addGroupCoverages(proj, groupBy=clustering)
proj <- addReproduciblePeakSet(proj, groupBy=clustering)
proj <- addPeakMatrix(proj)
proj <- addMotifAnnotations(proj,force=TRUE)
proj <- addDeviationsMatrix(proj)

If this is an issue with documentation that is absent/missing:

Describe what material you feel should be explained
Currently subsetting is discussed very briefly as an example in section 3.2 and it would be great if this section could be expanded with more details

Where do you think this documentation would belong?
Subsetting is an important use case so I feel like this could have its own subsection (maybe 3.7)

There should not be a default genome

This is going to cause problems. ArchR should error out until you add the genome you want. otherwise, people are just going to import their data and its going to look like crap unless they aligned to hg19 (which nobody should be doing anymore)

geneAnno vs geneAnnotation

Currently, createArrowFiles() uses a parameter called geneAnno and the ArchRProject() constructor uses a parameter called geneAnnotation.
Is that intentional? If not, maybe harmonize those.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.