erichson / rsvd Goto Github PK

Randomized Matrix Decompositions using R

Home Page: https://arxiv.org/abs/1608.02148

R 100.00%

dimension-reduction svd principal-component-analysis probabilistic-algorithms cran randomized-algorithm singular-value-decomposition matrix-approximation pca

rsvd's People

Contributors

Stargazers

Watchers

rsvd's Issues

NULL output from rsvd when nu=0 or nv=0

Currently, rsvd sets the output $u or $v to NULL when nu=0 or nv=0, respectively.

Is this really necessary? It complicates downstream processing for client packages, which now have to check whether $u is NULL before attempting to perform an operation on it.

You might say that one shouldn't be doing anything on $u if nu=0, and that would be mostly correct - I'm not using that output for anything meaningful. But I do have to do some sanity checks, coercions and renamings, and it's a bother to have to check for NULL every time.

It seems easier for the both of us to simply require nu=0 result in an empty matrix with no columns. This means that you don't have to have the special code to set it to NULL, and also the output is consistent with the documentation in ?rsvd with no surprising behaviours; while I can be secure in knowing that I get a matrix with zero SVs if I asked for it, without worrying about a sudden NULL.

Tests are not picked by testthat

Apparently something is missing, tests folder is there, but testthat does not find it.

--->  Testing R-rsvd
* using log directory ‘/opt/local/var/macports/build/_opt_PPCRosettaPorts_R_R-rsvd/R-rsvd/work/rsvd/rsvd.Rcheck’
* using R version 4.2.3 (2023-03-15)
* using platform: powerpc-apple-darwin10.8.0 (32-bit)
* using session charset: UTF-8
* checking for file ‘rsvd/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘rsvd’ version ‘1.0.5’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘rsvd’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking LazyData ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking examples ... OK
* checking PDF version of manual ... OK
* DONE

Status: OK

rpca falls over with k=1 when retx=TRUE

Running rpca with k=1 and retx=TRUE gets the following error:

# set up
library('rsvd')
data('iris')
log_iris 		= log( iris[ , 1:4] )
iris_species 	= iris[ , 5]
# Perform rPCA and compute only the first two PCs
iris_rpca 		= rpca(log_iris, k=1, retx=TRUE)
# Error in array(STATS, dims[perm]) : 'dims' cannot be of length 0

I think you've forgotten a drop=FALSE in line ~64 in rpca.default, so it should be:

rpcaObj$x <- sweep(out$u[, 1:k, drop=FALSE], MARGIN = 2, STATS = out$d[1:k], FUN = "*", check.margin = TRUE)

(this now works for me)

Cheers!
Will

Error in ggbiplot when retx = FALSE in rpca

When I set retx = FALSE when running rpca, I get an error that is difficult to interpret, e.g.,

> data(iris)
> X   <- log(iris[1:4])
> out <- rpca(X,scale = FALSE,retx = FALSE)
> print(ggbiplot(out))
Error in `colnames<-`(`*tmp*`, value = c("a", "b")) :
  'names' attribute [2] must be the same length as the vector [0]

Perhaps you could report a more informative error in ggbiplot in this case?

Thanks,
Peter

Wrong pca$sdev for the first pca component (after calling rsvd::rpca)

Hi! Thank you for nice package.

I'm running

pca <- rsvd::rpca(data, 10) 
sdev <- apply(pca$x, 2, sd)

and then comparing sdev with pca$sdev. It's basically the same except the first principal component. Here is the output for my sample data:

> sdev
 [1] 3.718906 3.480278 2.946602 2.451098 1.920354 1.913088 1.768413 1.758458 1.722685 1.650044
> pca$sdev
 [1] 12.668942  3.483281  2.964610  2.452116  1.935481  1.919646  1.771491  1.763022  1.724073  1.650757

Can you "project" new samples onto the L and S Matrices

Hi there,

Thanks for this package and your implementation of the randomized robust PCA method in the functon rrpca.

Once you have applied rrpca to some training data to learn the L and S matrices, is there a way to project new data onto these matrices to extract the corresponding L and S values for these new data? The rationale is to prevent the "re-learning" of the L and S matrix for each new data sample if I were to take the approach of just including the new sample in the input matrix to rrpca.

In other words, I am looking for something eqivalent of projecting a new sample onto an existing PCA space in a standard PCA analysis. For instance:

training_data <- USArrests[1:48, ]
new_data <- USArrests[49:50, ]
pr_out <- prcomp(training_data, scale = TRUE)
scale(new_data, pr_out$center, pr_out$scale) %*% pr_out$rotation

                PC1        PC2        PC3        PC4
Wisconsin 2.1059185 -0.6184669 -0.1558858  0.1897872
Wyoming   0.6759575  0.3035009 -0.2465021 -0.1636568

With these principal component (PC) values, I could then do something like subtracting them from the new data to clean up the input signal (assuming that the PC values represent noise in the system):

Is there an equivalent of this with rrpca with something like this:

rrpca_out <- rsvd::rrpca(training_data)
predict(rrpca_out, new_data)

Thanks!

What is the expected amount of error?

Hi,

first of all, thanks for the cool package. I was playing around with randomized SVD to perform dimension reduction on sparse single-cell RNA-seq data, but was surprised that the results differed from irlba's and from the svd function.

To illustrate the issue, I generated a random matrix and compared the results with and without compression:

set.seed(1)
mat <- matrix(rnorm(100 * 500), nrow =100, ncol = 500)
v1 <- rsvd::rsvd(mat, k = 3, p = 30)$v
v2 <- rsvd::rsvd(mat, nu = 3, nv = 3)$v
plot(v1, v2); abline(0,1); abline(0, -1)

^{Created on 2023-12-13 with reprex v2.0.2}

My questions are: Are the differences between the results expected? Is there a good adaptive way to set p to ensure that the results are within a reasonable bound of the true result?

Best,
Constantin

Support non-default matrix types?

I was wondering whether it would be possible to support non-default matrix types in rsvd? For example, anything from Matrix, or some of our custom matrix classes in Bioconductor packages. Some testing suggests that this would only require minor modifications to the existing code, namely:

Removal of the as.matrix(A) line near the top of rsvd.R.
Addition of importFrom(Matrix,crossprod) to the NAMESPACE.

And then stuff like this automatically works without trying to expand the matrix into a dense array:

library(Matrix)
library(rsvd)
out <- rsvd(rsparsematrix(10000, 10000, 0.01), k=10)

In our case, we're dealing with fairly huge matrices (>100 GB in RAM) that are held on file. We have %*% and crossprod defined, the only things preventing us from using rsvd() are the two points above.

I'm happy to put in a PR on this matter if you're open to it.

erichson / rsvd Goto Github PK

rsvd's People

Contributors

Stargazers

Watchers

Forkers

rsvd's Issues

NULL output from rsvd when nu=0 or nv=0

Tests are not picked by testthat

rpca falls over with k=1 when retx=TRUE

Error in ggbiplot when retx = FALSE in rpca

Wrong pca$sdev for the first pca component (after calling rsvd::rpca)

Can you "project" new samples onto the L and S Matrices

What is the expected amount of error?

Support non-default matrix types?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent