selbouhaddani / omicspls Goto Github PK

R package for High dimensional data analysis and integration with O2PLS!

Home Page: https://doi.org/10.1186/s12859-018-2371-3

R 100.00%

bioinformatics biostatistics omics multi-omics data-integration principal-component-analysis pca partial-least-squares-regression pls latent-variable-models

omicspls's People

Contributors

Stargazers

Watchers

Forkers

bioinformaticsmaterials zhujiegu guhjy krassowski pjx1990 wenlitang

omicspls's Issues

Difference in the o2m objects produced by full and stripped versions of o2m

The objects returned by the stripped versions use T_Yosc.:

OmicsPLS/R/OmicsPLS_o2m.R

Line 598 in 69086e5

T_Yosc. = T_Yosc, U_Xosc. = U_Xosc, W_Yosc = W_Yosc, C_Xosc = C_Xosc,

OmicsPLS/R/OmicsPLS_o2m.R

Line 727 in 69086e5

T_Yosc. = T_Yosc, U_Xosc. = U_Xosc, W_Yosc = W_Yosc, C_Xosc = C_Xosc,

while non-stripped versions use T_Yosc:

OmicsPLS/R/OmicsPLS_o2m.R

Line 202 in 69086e5

 model <- list(Tt = Tt, W. = W, U = U, C. = C, E = E, Ff = Ff, T_Yosc = T_Yosc, P_Yosc. = P_Yosc, W_Yosc = W_Yosc, 

OmicsPLS/R/OmicsPLS_o2m.R

Line 477 in 69086e5

 model <- list(Tt = Tt, W. = W, U = U, C. = C, E = 0, Ff = 0, T_Yosc = T_Yosc, P_Yosc. = P_Yosc, W_Yosc = W_Yosc, 

Is it intentional or just a typo? The documentation mentions the latter only.

Add VIP function for OmicsPLS

Did you try to add VIP(variable influence on projection) function to calculate the VIP for the OmicsPLS?
Thanks!

crossval_o2m_adjR2 shows MSE of "NA"

Hi,

I'm trying to run OmicsPLS on a RNA-Seq and Methyl-array dataset. When I run crossval_o2m_adjR2, I get MSE of "NA". These results do not look valid, especially since there is no n value given. Do you have any insight into what is going on?

Thanks!
Jen

Command/output:
crossval_o2m_adjR2(methyl.shared.trans, rna.shared.trans, 1:3, 0:3, 0:3, nr_folds = 2, nr_cores = 4)
minimum is at n =
Elapsed time: 570.87 sec
MSE n nx ny
1 NA 1 0 3
2 NA 2 0 2
3 NA 3 0 3

Residuals are not populated when using NIPALS

E and Ff are set to zero in o2m2 and are not changed afterwards.

OmicsPLS/R/OmicsPLS_o2m.R

Line 477 in d32510a

 model <- list(Tt = Tt, W. = W, U = U, C. = C, E = 0, Ff = 0, T_Yosc = T_Yosc, P_Yosc. = P_Yosc, W_Yosc = W_Yosc, 

Out of curiosity, is 't' in Tt or 'f' in Ff meaningful or are those just added due to potentially conflicting symbols in R or something?

Prnting the result of o2m2 resulst in an error

Code:

r = o2m2(as.matrix(X), as.matrix(Y), 2, 0, 0) # works fine
print(r) # error

Error:

Error in if (x$flags$stripped) cat("O2PLS fit: Stripped \n") else if (x$flags$highd) cat("O2PLS fit: High dimensional \n") else cat("O2PLS fit \n") :
argument is of length zero
Calls: -> -> withVisible -> print -> print.o2m

But using:

r = o2m(as.matrix(X), as.matrix(Y), 2, 0, 0, p_thresh=0)
print(r)

works fine.

How to set the parameters of cross-validation

Dear Author:
I am new to machine learning.
When I am using crossvalidation I don't know how to set its parameters, I check the pdf in the article, crossval_o2m_adjR2(rna, metab, 1:3, c(0,1,5,10), c(0,1,5,10), nr_folds = 2, nr_cores = 4) I don't know which of these 1:3, c(0,1 ,5,10), c(0,1,5,10) how about determining. For example I have two matrices with 6 rows and 3000 columns. Should I take the columns randomly or is there any requirement?
Here's a partial screenshot of my data

Looking forward to your reply, thanks!

Scores does not work for ortogonal scores

OmicsPLS/R/OmicsPLS.R

Line 783 in c2e3348

 which_scores = switch(which_part, Xjoint = "Tt", Yjoint = "U", Xorth = "T_Yosc.", Yorth = "U_Xosc.") 

To me, it seems that the dots are the issue here (fit does not have attributes T_Yosc. and U_Xosc., but has T_Yosc and U_Xosc)

Store sparsity cross-validation results for each component to make diagnostic plots

Hi,

First, thanks for this tool! I have been using sO2PLS regularly for integrating omics datasets. I have been working on some visualisations to show the results of a sO2PLS analysis. In particular, I wanted to show the results of the sparsity cross-validation step, by plotting, for each joint component, the covariance mean and SD for the different values of keepx and keepy tested. However these are not currently returned by the function (I am using OmicsPLS version 2.0.2): the mean_covTU and srr_covTU matrices are overwritten by the next component, so that what is returned at the end is only the matrix of covariance mean and SD for the last component.
I think this can easily be fixed by adding in the crossval_sparsity function the following code:

Add on line 299 in Crossval_OmicsPLS.R (just before if (method == "SO2PLS") { line)

mean_covTU_list <- list()
srr_covTU_list <- list()

Then on line 348 and line 447 (both time before the 1-standard error rule code):

mean_covTU_list[[comp]] <- mean_covTU
srr_covTU_list[[comp]] <- srr_covTU

And then the returnon on line 491 would be:

return( list(Best = unlist(bestsp), Covs = mean_covTU_list, SEcov = srr_covTU_list))

This allows me to make plots like that (might need to be improved, but that's the idea):

Do you think it would be possible (and useful) to add this to the function?
Thanks!

An example for O2PLS

Hi, selbouhaddani,
Could you please provide an example to implement the O2PLS?
For example, what is the difference between the two cross-validation functions, i.e., crossval_o2m_adjR2() and crossval_o2m()?
How to obtain the resulting common and distinctive matrices for each block?
When I open the OmicsPLS_vignette.Rmd in Firefox browser, the math formula are not recognizable.

Thanks.

The input checks seem too strict

I've got a couple of cases where I wished to run o2m but could not as the input checks failed: data with NaN is not accepted, it is impossible to perform O2PLS-DA (strict "less than" check of the number of components vs the number of columns in data; granted it is less common thing to do than OPLS-DA); in cross-validation checks the sum of requested components is checked against the number of columns, which of course will work for omics but not for many other datasets etc.

I understand that some limitations may arise from the implementation details (e.g. use of SVD for PCA) but, I wonder if it would be possible to relax some of the checks. Do you plan to support the cases I mentioned above in this package?
Or maybe would it be reasonable to provide a "force" argument to ignore the checks and let the user take the risk of failing miserably (when the algorithm does not indeed support specific case)?

On the model statistics

R2Xcorr is currently computed as:

OmicsPLS/R/OmicsPLS_o2m.R

Lines 712 to 713 in 69086e5

 R2Xcorr <- ssq(Tt) / ssq(X_true) 

 R2Ycorr <- ssq(U) / ssq(Y_true)

I wonder why it is not R2Xcorr <- ssq(Tt %*% t(W)) / ssq(X_true) as it would be suggested by the Table 2 of Evaluation of O2PLS in Omics data integration. I understand that there might be some compensation in the code which would make it equivalent but it eludes my comprehension of the codebase. I would be very grateful if you could hint me on that.

Also, I wanted to thank you for sharing your work and apologize for opening so many issues on GitHub; I can offer help in fixing the minor typos I found if you wish to accept PRs. To my knowledge, this is not only the only open-source package offering O2PLS, but also a well designed and documented one and I hope that I could contribute to make it more bulletproof and be able to use it again in the future!

Edit: I think that some other statistics may require more attention.

Any special reason for n + max(nx, ny) in o2m2?

I just wanted to let you know that I was able to (roughly) reproduce Figure 12b from the (Trygg, and Wold, 2003) paper using (slightly modified) o2m2. It occurred to me that if I replace the number of components (A in the paper) in the first pass with n rather than n + max(nx, ny) the algorithm better reflects what I would read from the paper one and the recreated figure is more similar to the original one. SVD version works almost as well (+/- a flipping sign).

Result for n + max(nx, ny):

Result for n:

Relevant code:

OmicsPLS/R/OmicsPLS_o2m.R

Lines 130 to 134 in 913c3e5

 if (nx + ny > 0) { 

 # larger principal subspace 

 n2 <- n + max(nx, ny) 

 cdw <- svd(t(Y) %*% X, nu = n2, nv = n2)

Based on the comment ("larger principal subspace") I understand that there might be a reason for this modification and would be happy to learn if you could point me to a reference. If you don't have anything at hand, please feel free to close this issue - I wanted to put this up somewhere so another curious person (or future me) would not need to go through the debuging process again.

There is still some noise (which may have to do with the difference in cross-validation splits or with the differences in the OSC filtering) and the y-axis scales differ (I tried passing it through autoscailing, it did not help). I could not find anything what could explain the differences and it seems that not much more could be deduced from the original publication without having acess to their code.

Finally, thank you for all the recent improvements!

Best wishes, Michał

	R2Xcorr <- ssq(Tt) / ssq(X_true)
	R2Ycorr <- ssq(U) / ssq(Y_true)

	if (nx + ny > 0) {
	# larger principal subspace
	n2 <- n + max(nx, ny)

	cdw <- svd(t(Y) %*% X, nu = n2, nv = n2)

selbouhaddani / omicspls Goto Github PK

omicspls's People

Contributors

Stargazers

Watchers

Forkers

omicspls's Issues

Recommend Projects

Recommend Topics

Recommend Org