Giter Club home page Giter Club logo

Comments (9)

Datseris avatar Datseris commented on June 14, 2024

(to be exact, the error does come from MultivariateStats: Type at /home/isensee/.julia/packages/MultivariateStats/nNJuu/src/pca.jl:24, but this is most likely bad computation on our part)

from timeseriesprediction.jl.

JonasIsensee avatar JonasIsensee commented on June 14, 2024

Not sure if I fully agree with the low priority label.
This definitely needs to be investigated before publication.

from timeseriesprediction.jl.

Datseris avatar Datseris commented on June 14, 2024

Okay. Next time you encounter this save the initial conditions of the run.

The error comes from:

compute_pca(covmat, pratio, maxoutdim) = pcacov(covmat, T[]; maxoutdim=maxoutdim, pratio=pratio)

which is calling pcacov.

In addition:

  • What does this error mean? What is the principal variance and what is the "total"? Is the total the sum of all variances across each dimension? Then is the principal the first one?
  • Has this happened with Float64? I guess not, since you are suggesting the conversions. Then, at what point to convert?
  • Does MultivariateStats allows you to give the variances? We already compute the covariance matrix ourselves, right?
  • Do you have a MWE? That would be our best bet into trying to solve this. A reproducible example that makes it happen.
  • Can you record the numbers that go into pcacov when this happens? Also, is it sensitive to the parameters pratio, maxoutdim?

from timeseriesprediction.jl.

JonasIsensee avatar JonasIsensee commented on June 14, 2024

To answer your questions:
let C be the covariance matrix that we approximate

  • The total variance is the variance of the reconstruction prior to dimension reduction. In other words this is trace(C) the sum of the diagonal elements of the cov mat.
    This quantity is should be conserved in a PCA. The principal variance is the sum of the
    first D_R eigenvalues of C. (where D_R is the reduced dimension)
    Therefore the principal variance must always be smaller than the total variance and can only be equal
    if the dimension was not reduced. If not, an error is thrown.
    This means that some numerical error magically increases the observed variance in the dataset.

  • This has not happened with Float64

  • We compute the covariance matrix. the variances are the diagonal elements of it.

  • Sadly I do not have an MWE.

  • Recording the numbers that produce this error will be very difficult.
    The function that throws the error does not have access to them.
    I would have to hack into TimeseriesPrediction.jl to save every single covmat it computes into a file
    and sort through them later on. This would be very very hacky though.
    This has nothing to to with pratio or maxoutdim

from timeseriesprediction.jl.

JonasIsensee avatar JonasIsensee commented on June 14, 2024

Update:
It is possible to produce the error above by calling pcacov on a matrix with negative eigenvalues.

julia> data = rand(1000,10);

julia> cmat = cov(data)
10×10 Array{Float64,2}:
  0.0857479    -0.000731168   0.00411678    0.000172448  -0.00451867   -0.00309346   -0.00206307   -0.00120418    0.00153542   -0.00397786 
 -0.000731168   0.0868766     0.00118265    3.84247e-5    0.000199679  -0.00170826   -0.00653588    0.000576799  -0.000971882  -0.00445345 
  0.00411678    0.00118265    0.0875116     0.00155156   -0.00312113    0.00279713    0.000809654   0.000550121   0.000293062  -0.00077902 
  0.000172448   3.84247e-5    0.00155156    0.0838742     0.000380285  -0.00335316    0.000226646  -0.00275513    0.00279633   -0.00328242 
 -0.00451867    0.000199679  -0.00312113    0.000380285   0.080504      0.00322914    0.00195956   -0.00178002   -0.00328432   -4.66188e-5 
 -0.00309346   -0.00170826    0.00279713   -0.00335316    0.00322914    0.0853385    -0.000485913   0.00205254    0.00168462    0.00196793 
 -0.00206307   -0.00653588    0.000809654   0.000226646   0.00195956   -0.000485913   0.0845162     0.000258483   0.00016042    0.000706343
 -0.00120418    0.000576799   0.000550121  -0.00275513   -0.00178002    0.00205254    0.000258483   0.0839405    -0.00214507   -0.0018921  
  0.00153542   -0.000971882   0.000293062   0.00279633   -0.00328432    0.00168462    0.00016042   -0.00214507    0.0829183    -0.0019793  
 -0.00397786   -0.00445345   -0.00077902   -0.00328242   -4.66188e-5    0.00196793    0.000706343  -0.0018921    -0.0019793     0.0829308  

julia> for n = 1:10
       cmat[n,n] -= 1
       end

julia> pcacov(cmat, Float64[])
ERROR: ArgumentError: principal variance cannot exceed total variance.
Stacktrace:
 [1] PCA(::Array{Float64,1}, ::Array{Float64,2}, ::Array{Float64,1}, ::Float64) at /home/jonas/.julia/packages/MultivariateStats/nNJuu/src/pca.jl:24
 [2] #pcacov#8(::Int64, ::Float64, ::Function, ::Array{Float64,2}, ::Array{Float64,1}) at /home/jonas/.julia/packages/MultivariateStats/nNJuu/src/pca.jl:112
 [3] pcacov(::Array{Float64,2}, ::Array{Float64,1}) at /home/jonas/.julia/packages/MultivariateStats/nNJuu/src/pca.jl:105
 [4] top-level scope at none:0

A covariance matrix should never have negative eigenvalues.
However this is likely the cause

julia> eps(Float32)
1.1920929f-7
julia> eps(Float64)
2.220446049250313e-16

During estimation of the covmat we normalize with the number of points ~10^8.
This will likely cause significant rounding errors.
Either we have to get smarter on when to do the normalization or we have to use Float64

from timeseriesprediction.jl.

Datseris avatar Datseris commented on June 14, 2024

Can we just run through the diagonal of the covariance matrix and do

if C[n, n] \le 0
   C[n,n] = eps(eltype(C))
end

?

I think this would solve all problems?

from timeseriesprediction.jl.

JonasIsensee avatar JonasIsensee commented on June 14, 2024

That will surely stop the error from showing up but
the resulting PCA model will be less than useful.
No, that is not an option.

I believe it may be best to write a second constructor for PCAEmbedding.
This one will convert the values to Float64 for the relevant computations
and convert back once the tricky part is done.

from timeseriesprediction.jl.

Datseris avatar Datseris commented on June 14, 2024

Sure, this seems an easy enough solution. What are the typical sizes of the covariance matrix? maximum of up to 500x500 right? That's quite small, it won't be expensive.

from timeseriesprediction.jl.

JonasIsensee avatar JonasIsensee commented on June 14, 2024

Converting to 64 bit precision for covmat works.

from timeseriesprediction.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.