This happens to me sometimes on the cluster. So far the worker nodes haven't given

Update: It is possible to produce the error above by calling <code class="notransl

ArgumentError: principal variance cannot exceed total variance. about timeseriesprediction.jl HOT 9 CLOSED

JonasIsensee commented on June 14, 2024

ArgumentError: principal variance cannot exceed total variance.

from timeseriesprediction.jl.

Comments (9)

Datseris commented on June 14, 2024

(to be exact, the error does come from MultivariateStats: Type at /home/isensee/.julia/packages/MultivariateStats/nNJuu/src/pca.jl:24, but this is most likely bad computation on our part)

from timeseriesprediction.jl.

JonasIsensee commented on June 14, 2024

Not sure if I fully agree with the low priority label.
This definitely needs to be investigated before publication.

from timeseriesprediction.jl.

Datseris commented on June 14, 2024

Okay. Next time you encounter this save the initial conditions of the run.

The error comes from:

compute_pca(covmat, pratio, maxoutdim) = pcacov(covmat, T[]; maxoutdim=maxoutdim, pratio=pratio)

which is calling pcacov.

In addition:

What does this error mean? What is the principal variance and what is the "total"? Is the total the sum of all variances across each dimension? Then is the principal the first one?
Has this happened with Float64? I guess not, since you are suggesting the conversions. Then, at what point to convert?
Does MultivariateStats allows you to give the variances? We already compute the covariance matrix ourselves, right?
Do you have a MWE? That would be our best bet into trying to solve this. A reproducible example that makes it happen.
Can you record the numbers that go into pcacov when this happens? Also, is it sensitive to the parameters pratio, maxoutdim?

from timeseriesprediction.jl.

JonasIsensee commented on June 14, 2024

To answer your questions:
let C be the covariance matrix that we approximate

The total variance is the variance of the reconstruction prior to dimension reduction. In other words this is trace(C) the sum of the diagonal elements of the cov mat.
This quantity is should be conserved in a PCA. The principal variance is the sum of the
first D_R eigenvalues of C. (where D_R is the reduced dimension)
Therefore the principal variance must always be smaller than the total variance and can only be equal
if the dimension was not reduced. If not, an error is thrown.
This means that some numerical error magically increases the observed variance in the dataset.
This has not happened with Float64
We compute the covariance matrix. the variances are the diagonal elements of it.
Sadly I do not have an MWE.
Recording the numbers that produce this error will be very difficult.
The function that throws the error does not have access to them.
I would have to hack into TimeseriesPrediction.jl to save every single covmat it computes into a file
and sort through them later on. This would be very very hacky though.
This has nothing to to with pratio or maxoutdim

from timeseriesprediction.jl.

JonasIsensee commented on June 14, 2024

Update:
It is possible to produce the error above by calling pcacov on a matrix with negative eigenvalues.

julia> data = rand(1000,10);

julia> cmat = cov(data)
10×10 Array{Float64,2}:
  0.0857479    -0.000731168   0.00411678    0.000172448  -0.00451867   -0.00309346   -0.00206307   -0.00120418    0.00153542   -0.00397786 
 -0.000731168   0.0868766     0.00118265    3.84247e-5    0.000199679  -0.00170826   -0.00653588    0.000576799  -0.000971882  -0.00445345 
  0.00411678    0.00118265    0.0875116     0.00155156   -0.00312113    0.00279713    0.000809654   0.000550121   0.000293062  -0.00077902 
  0.000172448   3.84247e-5    0.00155156    0.0838742     0.000380285  -0.00335316    0.000226646  -0.00275513    0.00279633   -0.00328242 
 -0.00451867    0.000199679  -0.00312113    0.000380285   0.080504      0.00322914    0.00195956   -0.00178002   -0.00328432   -4.66188e-5 
 -0.00309346   -0.00170826    0.00279713   -0.00335316    0.00322914    0.0853385    -0.000485913   0.00205254    0.00168462    0.00196793 
 -0.00206307   -0.00653588    0.000809654   0.000226646   0.00195956   -0.000485913   0.0845162     0.000258483   0.00016042    0.000706343
 -0.00120418    0.000576799   0.000550121  -0.00275513   -0.00178002    0.00205254    0.000258483   0.0839405    -0.00214507   -0.0018921  
  0.00153542   -0.000971882   0.000293062   0.00279633   -0.00328432    0.00168462    0.00016042   -0.00214507    0.0829183    -0.0019793  
 -0.00397786   -0.00445345   -0.00077902   -0.00328242   -4.66188e-5    0.00196793    0.000706343  -0.0018921    -0.0019793     0.0829308  

julia> for n = 1:10
       cmat[n,n] -= 1
       end

julia> pcacov(cmat, Float64[])
ERROR: ArgumentError: principal variance cannot exceed total variance.
Stacktrace:
 [1] PCA(::Array{Float64,1}, ::Array{Float64,2}, ::Array{Float64,1}, ::Float64) at /home/jonas/.julia/packages/MultivariateStats/nNJuu/src/pca.jl:24
 [2] #pcacov#8(::Int64, ::Float64, ::Function, ::Array{Float64,2}, ::Array{Float64,1}) at /home/jonas/.julia/packages/MultivariateStats/nNJuu/src/pca.jl:112
 [3] pcacov(::Array{Float64,2}, ::Array{Float64,1}) at /home/jonas/.julia/packages/MultivariateStats/nNJuu/src/pca.jl:105
 [4] top-level scope at none:0

A covariance matrix should never have negative eigenvalues.
However this is likely the cause

julia> eps(Float32)
1.1920929f-7
julia> eps(Float64)
2.220446049250313e-16

During estimation of the covmat we normalize with the number of points ~10^8.
This will likely cause significant rounding errors.
Either we have to get smarter on when to do the normalization or we have to use Float64

from timeseriesprediction.jl.

Datseris commented on June 14, 2024

Can we just run through the diagonal of the covariance matrix and do

if C[n, n] \le 0
   C[n,n] = eps(eltype(C))
end

I think this would solve all problems?

from timeseriesprediction.jl.

JonasIsensee commented on June 14, 2024

That will surely stop the error from showing up but
the resulting PCA model will be less than useful.
No, that is not an option.

I believe it may be best to write a second constructor for PCAEmbedding.
This one will convert the values to Float64 for the relevant computations
and convert back once the tricky part is done.

from timeseriesprediction.jl.

Datseris commented on June 14, 2024

Sure, this seems an easy enough solution. What are the typical sizes of the covariance matrix? maximum of up to 500x500 right? That's quite small, it won't be expensive.

from timeseriesprediction.jl.

JonasIsensee commented on June 14, 2024

Converting to 64 bit precision for covmat works.

from timeseriesprediction.jl.

ArgumentError: principal variance cannot exceed total variance. about timeseriesprediction.jl HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent