Giter Club home page Giter Club logo

omicspls's People

Contributors

selbouhaddani avatar wenlitang avatar zhujiegu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

omicspls's Issues

Difference in the o2m objects produced by full and stripped versions of o2m

The objects returned by the stripped versions use T_Yosc.:

T_Yosc. = T_Yosc, U_Xosc. = U_Xosc, W_Yosc = W_Yosc, C_Xosc = C_Xosc,

T_Yosc. = T_Yosc, U_Xosc. = U_Xosc, W_Yosc = W_Yosc, C_Xosc = C_Xosc,

while non-stripped versions use T_Yosc:

model <- list(Tt = Tt, W. = W, U = U, C. = C, E = E, Ff = Ff, T_Yosc = T_Yosc, P_Yosc. = P_Yosc, W_Yosc = W_Yosc,

model <- list(Tt = Tt, W. = W, U = U, C. = C, E = 0, Ff = 0, T_Yosc = T_Yosc, P_Yosc. = P_Yosc, W_Yosc = W_Yosc,

Is it intentional or just a typo? The documentation mentions the latter only.

crossval_o2m_adjR2 shows MSE of "NA"

Hi,

I'm trying to run OmicsPLS on a RNA-Seq and Methyl-array dataset. When I run crossval_o2m_adjR2, I get MSE of "NA". These results do not look valid, especially since there is no n value given. Do you have any insight into what is going on?

Thanks!
Jen

Command/output:
crossval_o2m_adjR2(methyl.shared.trans, rna.shared.trans, 1:3, 0:3, 0:3, nr_folds = 2, nr_cores = 4)
minimum is at n =
Elapsed time: 570.87 sec
MSE n nx ny
1 NA 1 0 3
2 NA 2 0 2
3 NA 3 0 3

Residuals are not populated when using NIPALS

E and Ff are set to zero in o2m2 and are not changed afterwards.

model <- list(Tt = Tt, W. = W, U = U, C. = C, E = 0, Ff = 0, T_Yosc = T_Yosc, P_Yosc. = P_Yosc, W_Yosc = W_Yosc,

Out of curiosity, is 't' in Tt or 'f' in Ff meaningful or are those just added due to potentially conflicting symbols in R or something?

Prnting the result of o2m2 resulst in an error

Code:

r = o2m2(as.matrix(X), as.matrix(Y), 2, 0, 0) # works fine
print(r) # error

Error:

Error in if (x$flags$stripped) cat("O2PLS fit: Stripped \n") else if (x$flags$highd) cat("O2PLS fit: High dimensional \n") else cat("O2PLS fit \n") :
argument is of length zero
Calls: -> -> withVisible -> print -> print.o2m

But using:

r = o2m(as.matrix(X), as.matrix(Y), 2, 0, 0, p_thresh=0)
print(r)

works fine.

How to set the parameters of cross-validation

Dear Author:
I am new to machine learning.
When I am using crossvalidation I don't know how to set its parameters, I check the pdf in the article, crossval_o2m_adjR2(rna, metab, 1:3, c(0,1,5,10), c(0,1,5,10), nr_folds = 2, nr_cores = 4) I don't know which of these 1:3, c(0,1 ,5,10), c(0,1,5,10) how about determining. For example I have two matrices with 6 rows and 3000 columns. Should I take the columns randomly or is there any requirement?
Here's a partial screenshot of my data
daixie
gene
Looking forward to your reply, thanks!

Store sparsity cross-validation results for each component to make diagnostic plots

Hi,

First, thanks for this tool! I have been using sO2PLS regularly for integrating omics datasets. I have been working on some visualisations to show the results of a sO2PLS analysis. In particular, I wanted to show the results of the sparsity cross-validation step, by plotting, for each joint component, the covariance mean and SD for the different values of keepx and keepy tested. However these are not currently returned by the function (I am using OmicsPLS version 2.0.2): the mean_covTU and srr_covTU matrices are overwritten by the next component, so that what is returned at the end is only the matrix of covariance mean and SD for the last component.
I think this can easily be fixed by adding in the crossval_sparsity function the following code:

Add on line 299 in Crossval_OmicsPLS.R (just before if (method == "SO2PLS") { line)

mean_covTU_list <- list()
srr_covTU_list <- list()

Then on line 348 and line 447 (both time before the 1-standard error rule code):

mean_covTU_list[[comp]] <- mean_covTU
srr_covTU_list[[comp]] <- srr_covTU

And then the returnon on line 491 would be:

return( list(Best = unlist(bestsp), Covs = mean_covTU_list, SEcov = srr_covTU_list))

This allows me to make plots like that (might need to be improved, but that's the idea):

image

Do you think it would be possible (and useful) to add this to the function?
Thanks!

An example for O2PLS

Hi, selbouhaddani,
Could you please provide an example to implement the O2PLS?
For example, what is the difference between the two cross-validation functions, i.e., crossval_o2m_adjR2() and crossval_o2m()?
How to obtain the resulting common and distinctive matrices for each block?
When I open the OmicsPLS_vignette.Rmd in Firefox browser, the math formula are not recognizable.

Thanks.

The input checks seem too strict

I've got a couple of cases where I wished to run o2m but could not as the input checks failed: data with NaN is not accepted, it is impossible to perform O2PLS-DA (strict "less than" check of the number of components vs the number of columns in data; granted it is less common thing to do than OPLS-DA); in cross-validation checks the sum of requested components is checked against the number of columns, which of course will work for omics but not for many other datasets etc.

I understand that some limitations may arise from the implementation details (e.g. use of SVD for PCA) but, I wonder if it would be possible to relax some of the checks. Do you plan to support the cases I mentioned above in this package?
Or maybe would it be reasonable to provide a "force" argument to ignore the checks and let the user take the risk of failing miserably (when the algorithm does not indeed support specific case)?

On the model statistics

R2Xcorr is currently computed as:

OmicsPLS/R/OmicsPLS_o2m.R

Lines 712 to 713 in 69086e5

R2Xcorr <- ssq(Tt) / ssq(X_true)
R2Ycorr <- ssq(U) / ssq(Y_true)

I wonder why it is not R2Xcorr <- ssq(Tt %*% t(W)) / ssq(X_true) as it would be suggested by the Table 2 of Evaluation of O2PLS in Omics data integration. I understand that there might be some compensation in the code which would make it equivalent but it eludes my comprehension of the codebase. I would be very grateful if you could hint me on that.

Also, I wanted to thank you for sharing your work and apologize for opening so many issues on GitHub; I can offer help in fixing the minor typos I found if you wish to accept PRs. To my knowledge, this is not only the only open-source package offering O2PLS, but also a well designed and documented one and I hope that I could contribute to make it more bulletproof and be able to use it again in the future!

Edit: I think that some other statistics may require more attention.

Any special reason for n + max(nx, ny) in o2m2?

I just wanted to let you know that I was able to (roughly) reproduce Figure 12b from the (Trygg, and Wold, 2003) paper using (slightly modified) o2m2. It occurred to me that if I replace the number of components (A in the paper) in the first pass with n rather than n + max(nx, ny) the algorithm better reflects what I would read from the paper one and the recreated figure is more similar to the original one. SVD version works almost as well (+/- a flipping sign).

Result for n + max(nx, ny):
12a_reproduction_n+max(nx,ny)

Result for n:
12b_reproduction_n

Relevant code:

OmicsPLS/R/OmicsPLS_o2m.R

Lines 130 to 134 in 913c3e5

if (nx + ny > 0) {
# larger principal subspace
n2 <- n + max(nx, ny)
cdw <- svd(t(Y) %*% X, nu = n2, nv = n2)

Based on the comment ("larger principal subspace") I understand that there might be a reason for this modification and would be happy to learn if you could point me to a reference. If you don't have anything at hand, please feel free to close this issue - I wanted to put this up somewhere so another curious person (or future me) would not need to go through the debuging process again.

There is still some noise (which may have to do with the difference in cross-validation splits or with the differences in the OSC filtering) and the y-axis scales differ (I tried passing it through autoscailing, it did not help). I could not find anything what could explain the differences and it seems that not much more could be deduced from the original publication without having acess to their code.

Finally, thank you for all the recent improvements!

Best wishes, Michał

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.