Giter Club home page Giter Club logo

rsvd's People

Contributors

benli11 avatar erichson avatar ltla avatar odow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rsvd's Issues

NULL output from rsvd when nu=0 or nv=0

Currently, rsvd sets the output $u or $v to NULL when nu=0 or nv=0, respectively.

Is this really necessary? It complicates downstream processing for client packages, which now have to check whether $u is NULL before attempting to perform an operation on it.

You might say that one shouldn't be doing anything on $u if nu=0, and that would be mostly correct - I'm not using that output for anything meaningful. But I do have to do some sanity checks, coercions and renamings, and it's a bother to have to check for NULL every time.

It seems easier for the both of us to simply require nu=0 result in an empty matrix with no columns. This means that you don't have to have the special code to set it to NULL, and also the output is consistent with the documentation in ?rsvd with no surprising behaviours; while I can be secure in knowing that I get a matrix with zero SVs if I asked for it, without worrying about a sudden NULL.

Tests are not picked by testthat

Apparently something is missing, tests folder is there, but testthat does not find it.

--->  Testing R-rsvd
* using log directory ‘/opt/local/var/macports/build/_opt_PPCRosettaPorts_R_R-rsvd/R-rsvd/work/rsvd/rsvd.Rcheck’
* using R version 4.2.3 (2023-03-15)
* using platform: powerpc-apple-darwin10.8.0 (32-bit)
* using session charset: UTF-8
* checking for file ‘rsvd/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘rsvd’ version ‘1.0.5’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘rsvd’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking LazyData ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking examples ... OK
* checking PDF version of manual ... OK
* DONE

Status: OK

rpca falls over with k=1 when retx=TRUE

Hi

Running rpca with k=1 and retx=TRUE gets the following error:

# set up
library('rsvd')
data('iris')
log_iris 		= log( iris[ , 1:4] )
iris_species 	= iris[ , 5]
# Perform rPCA and compute only the first two PCs
iris_rpca 		= rpca(log_iris, k=1, retx=TRUE)
# Error in array(STATS, dims[perm]) : 'dims' cannot be of length 0

I think you've forgotten a drop=FALSE in line ~64 in rpca.default, so it should be:

rpcaObj$x <- sweep(out$u[, 1:k, drop=FALSE], MARGIN = 2, STATS = out$d[1:k], FUN = "*", check.margin = TRUE)

(this now works for me)

Cheers!
Will

Error in ggbiplot when retx = FALSE in rpca

When I set retx = FALSE when running rpca, I get an error that is difficult to interpret, e.g.,

> data(iris)
> X   <- log(iris[1:4])
> out <- rpca(X,scale = FALSE,retx = FALSE)
> print(ggbiplot(out))
Error in `colnames<-`(`*tmp*`, value = c("a", "b")) :
  'names' attribute [2] must be the same length as the vector [0]

Perhaps you could report a more informative error in ggbiplot in this case?

Thanks,
Peter

Wrong pca$sdev for the first pca component (after calling rsvd::rpca)

Hi! Thank you for nice package.

I'm running

pca <- rsvd::rpca(data, 10) 
sdev <- apply(pca$x, 2, sd)

and then comparing sdev with pca$sdev. It's basically the same except the first principal component. Here is the output for my sample data:

> sdev
 [1] 3.718906 3.480278 2.946602 2.451098 1.920354 1.913088 1.768413 1.758458 1.722685 1.650044
> pca$sdev
 [1] 12.668942  3.483281  2.964610  2.452116  1.935481  1.919646  1.771491  1.763022  1.724073  1.650757

Can you "project" new samples onto the L and S Matrices

Hi there,

Thanks for this package and your implementation of the randomized robust PCA method in the functon rrpca.

Once you have applied rrpca to some training data to learn the L and S matrices, is there a way to project new data onto these matrices to extract the corresponding L and S values for these new data? The rationale is to prevent the "re-learning" of the L and S matrix for each new data sample if I were to take the approach of just including the new sample in the input matrix to rrpca.

In other words, I am looking for something eqivalent of projecting a new sample onto an existing PCA space in a standard PCA analysis. For instance:

training_data <- USArrests[1:48, ]
new_data <- USArrests[49:50, ]
pr_out <- prcomp(training_data, scale = TRUE)
scale(new_data, pr_out$center, pr_out$scale) %*% pr_out$rotation
                PC1        PC2        PC3        PC4
Wisconsin 2.1059185 -0.6184669 -0.1558858  0.1897872
Wyoming   0.6759575  0.3035009 -0.2465021 -0.1636568

With these principal component (PC) values, I could then do something like subtracting them from the new data to clean up the input signal (assuming that the PC values represent noise in the system):

Is there an equivalent of this with rrpca with something like this:

rrpca_out <- rsvd::rrpca(training_data)
predict(rrpca_out, new_data)

Thanks!

What is the expected amount of error?

Hi,

first of all, thanks for the cool package. I was playing around with randomized SVD to perform dimension reduction on sparse single-cell RNA-seq data, but was surprised that the results differed from irlba's and from the svd function.

To illustrate the issue, I generated a random matrix and compared the results with and without compression:

set.seed(1)
mat <- matrix(rnorm(100 * 500), nrow =100, ncol = 500)
v1 <- rsvd::rsvd(mat, k = 3, p = 30)$v
v2 <- rsvd::rsvd(mat, nu = 3, nv = 3)$v
plot(v1, v2); abline(0,1); abline(0, -1)

Created on 2023-12-13 with reprex v2.0.2

My questions are: Are the differences between the results expected? Is there a good adaptive way to set p to ensure that the results are within a reasonable bound of the true result?

Best,
Constantin

Support non-default matrix types?

I was wondering whether it would be possible to support non-default matrix types in rsvd? For example, anything from Matrix, or some of our custom matrix classes in Bioconductor packages. Some testing suggests that this would only require minor modifications to the existing code, namely:

  • Removal of the as.matrix(A) line near the top of rsvd.R.
  • Addition of importFrom(Matrix,crossprod) to the NAMESPACE.

And then stuff like this automatically works without trying to expand the matrix into a dense array:

library(Matrix)
library(rsvd)
out <- rsvd(rsparsematrix(10000, 10000, 0.01), k=10)

In our case, we're dealing with fairly huge matrices (>100 GB in RAM) that are held on file. We have %*% and crossprod defined, the only things preventing us from using rsvd() are the two points above.

I'm happy to put in a PR on this matter if you're open to it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.