Comments (9)
(to be exact, the error does come from MultivariateStats: Type at /home/isensee/.julia/packages/MultivariateStats/nNJuu/src/pca.jl:24
, but this is most likely bad computation on our part)
from timeseriesprediction.jl.
Not sure if I fully agree with the low priority
label.
This definitely needs to be investigated before publication.
from timeseriesprediction.jl.
Okay. Next time you encounter this save the initial conditions of the run.
The error comes from:
compute_pca(covmat, pratio, maxoutdim) = pcacov(covmat, T[]; maxoutdim=maxoutdim, pratio=pratio)
which is calling pcacov.
In addition:
- What does this error mean? What is the principal variance and what is the "total"? Is the total the sum of all variances across each dimension? Then is the principal the first one?
- Has this happened with Float64? I guess not, since you are suggesting the conversions. Then, at what point to convert?
- Does MultivariateStats allows you to give the variances? We already compute the covariance matrix ourselves, right?
- Do you have a MWE? That would be our best bet into trying to solve this. A reproducible example that makes it happen.
- Can you record the numbers that go into
pcacov
when this happens? Also, is it sensitive to the parameterspratio, maxoutdim
?
from timeseriesprediction.jl.
To answer your questions:
let C
be the covariance matrix that we approximate
-
The total variance is the variance of the reconstruction prior to dimension reduction. In other words this is
trace(C)
the sum of the diagonal elements of the cov mat.
This quantity is should be conserved in a PCA. The principal variance is the sum of the
firstD_R
eigenvalues ofC
. (whereD_R
is the reduced dimension)
Therefore the principal variance must always be smaller than the total variance and can only be equal
if the dimension was not reduced. If not, an error is thrown.
This means that some numerical error magically increases the observed variance in the dataset. -
This has not happened with
Float64
-
We compute the covariance matrix. the variances are the diagonal elements of it.
-
Sadly I do not have an MWE.
-
Recording the numbers that produce this error will be very difficult.
The function that throws the error does not have access to them.
I would have to hack intoTimeseriesPrediction.jl
to save every single covmat it computes into a file
and sort through them later on. This would be very very hacky though.
This has nothing to to withpratio
ormaxoutdim
from timeseriesprediction.jl.
Update:
It is possible to produce the error above by calling pcacov
on a matrix with negative eigenvalues.
julia> data = rand(1000,10);
julia> cmat = cov(data)
10×10 Array{Float64,2}:
0.0857479 -0.000731168 0.00411678 0.000172448 -0.00451867 -0.00309346 -0.00206307 -0.00120418 0.00153542 -0.00397786
-0.000731168 0.0868766 0.00118265 3.84247e-5 0.000199679 -0.00170826 -0.00653588 0.000576799 -0.000971882 -0.00445345
0.00411678 0.00118265 0.0875116 0.00155156 -0.00312113 0.00279713 0.000809654 0.000550121 0.000293062 -0.00077902
0.000172448 3.84247e-5 0.00155156 0.0838742 0.000380285 -0.00335316 0.000226646 -0.00275513 0.00279633 -0.00328242
-0.00451867 0.000199679 -0.00312113 0.000380285 0.080504 0.00322914 0.00195956 -0.00178002 -0.00328432 -4.66188e-5
-0.00309346 -0.00170826 0.00279713 -0.00335316 0.00322914 0.0853385 -0.000485913 0.00205254 0.00168462 0.00196793
-0.00206307 -0.00653588 0.000809654 0.000226646 0.00195956 -0.000485913 0.0845162 0.000258483 0.00016042 0.000706343
-0.00120418 0.000576799 0.000550121 -0.00275513 -0.00178002 0.00205254 0.000258483 0.0839405 -0.00214507 -0.0018921
0.00153542 -0.000971882 0.000293062 0.00279633 -0.00328432 0.00168462 0.00016042 -0.00214507 0.0829183 -0.0019793
-0.00397786 -0.00445345 -0.00077902 -0.00328242 -4.66188e-5 0.00196793 0.000706343 -0.0018921 -0.0019793 0.0829308
julia> for n = 1:10
cmat[n,n] -= 1
end
julia> pcacov(cmat, Float64[])
ERROR: ArgumentError: principal variance cannot exceed total variance.
Stacktrace:
[1] PCA(::Array{Float64,1}, ::Array{Float64,2}, ::Array{Float64,1}, ::Float64) at /home/jonas/.julia/packages/MultivariateStats/nNJuu/src/pca.jl:24
[2] #pcacov#8(::Int64, ::Float64, ::Function, ::Array{Float64,2}, ::Array{Float64,1}) at /home/jonas/.julia/packages/MultivariateStats/nNJuu/src/pca.jl:112
[3] pcacov(::Array{Float64,2}, ::Array{Float64,1}) at /home/jonas/.julia/packages/MultivariateStats/nNJuu/src/pca.jl:105
[4] top-level scope at none:0
A covariance matrix should never have negative eigenvalues.
However this is likely the cause
julia> eps(Float32)
1.1920929f-7
julia> eps(Float64)
2.220446049250313e-16
During estimation of the covmat we normalize with the number of points ~10^8
.
This will likely cause significant rounding errors.
Either we have to get smarter on when to do the normalization or we have to use Float64
from timeseriesprediction.jl.
Can we just run through the diagonal of the covariance matrix and do
if C[n, n] \le 0
C[n,n] = eps(eltype(C))
end
?
I think this would solve all problems?
from timeseriesprediction.jl.
That will surely stop the error from showing up but
the resulting PCA model will be less than useful.
No, that is not an option.
I believe it may be best to write a second constructor for PCAEmbedding
.
This one will convert the values to Float64
for the relevant computations
and convert back once the tricky part is done.
from timeseriesprediction.jl.
Sure, this seems an easy enough solution. What are the typical sizes of the covariance matrix? maximum of up to 500x500 right? That's quite small, it won't be expensive.
from timeseriesprediction.jl.
Converting to 64 bit precision for covmat works.
from timeseriesprediction.jl.
Related Issues (20)
- Move documentation of TSPred to this repo.
- Issue with light cone embedding
- Implement a CompositeBoundary HOT 1
- Prediction of flow dynamics using point processes (new prediction scheme)
- Documentation of SymmetricSpatialEmbedding
- Documentation: light_cone_embedding and r0 HOT 2
- Nearest Trajectory Strategy for Time Series Prediction HOT 2
- Barkley in tests/system_defs.jl is very slow
- Consistency of source code for crossprediction
- versions, documentation, paper
- Use multiple input timeseries as training HOT 1
- Remove duplication of inner to reduce source
- Better Boundary Conditions
- LoadError: UndefVarError: AbstractEmbedding not defined HOT 4
- Is this repo supported? HOT 2
- error to run timeseriesprediction.ipynb HOT 1
- Compile fail : AbstractDataset not defined HOT 2
- TagBot trigger issue HOT 10
- Compatibility with DynamicalSystems v3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from timeseriesprediction.jl.