Giter Club home page Giter Club logo

symmetriceigenproblem.jl's Introduction

SymmetricEigenProblem.jl

A more or less working implementation of the QR-algorithm for real, symmetric, tridiagonal matrices.

Turns out trivial bulge-chasing can be competitive with MRRR.

Features

  • Fast computation of eigenvectors through applying Given's rotations in bulk without using GEMM. [1]
  • Otherwise standard bulge chasing with Wilkinson shifts, no multiple tightly packed bulges or anything.

The Given's rotations can theoretically get 75% of gemm performance (4 muls, 2 adds). To benchmark this non-blas-type routine, try:

julia> include("benchmark/benchmark.jl")

julia> bench(Float64, 2000, 2000, 64)
45.722959723206735

julia> bench(Float32, 2000, 2000, 64)
89.314389055655

which will apply 64 × 1999 Given's rotations to the columns of a 2000 × 2000 matrix and output the GFLOP/s. On my computer single-threaded peakflops is:

julia> using LinearAlgebra

julia> BLAS.set_num_threads(1)

julia> LinearAlgebra.peakflops() / 1e9
60.28352275144608

So indeed roughly 74.95% of dgemm.

Threading

For threading use JULIA_NUM_THREADS=8 julia. It will parellellize the application of Given's rotations and gives significant speedups.

julia> include("benchmark/benchmark.jl")

julia> bench_parallel(Float64, 2000, 2000, 64)
267.1348970050438

julia> bench_parallel(Float32, 2000, 2000, 64)
530.8279734148967

Caveats

Currently only supports matrices of order divisable by 2.

This restriction can be lifted when the fused Given's rotation kernels are generated (4 rotations are combined, so ≤ 16 different kernels, I've implemented 3 by hand in src/bulk_givens.jl).

Stability issues

Sometimes the algorithm might use exact eigenvalues as shifts, which might introduce extremely big errors. I have to figure out how to stabilize it a bit more, but it should be fine as longs as the eigenvalues you have are separated a bit.

Example

julia> using LinearAlgebra, SymmetricEigenProblem

julia> n = 2000;

julia> A = SymTridiagonal(collect(1.0 : n), rand(n - 1));

julia> Q = Matrix(1.0I, n, n);

julia> D = copy(A);

julia> @time SymmetricEigenProblem.qr_algorithm!(D, Q);
  0.258179 seconds (1.15 k allocations: 4.041 MiB)

julia> norm(A * Q - Q * Diagonal(D))
2.8632552826583774e-10

[1] Van Zee, Field G., Robert A. Van De Geijn, and Gregorio Quintana-Orti. "Restructuring the QR-Algorithm for High-Performance Applications of Givens Rotations." (2011).

symmetriceigenproblem.jl's People

Contributors

haampie avatar

Stargazers

Yijiang Huang avatar SilenceOverflow avatar Yixuan Qiu avatar Yingbo Ma avatar

Watchers

 avatar James Cloos avatar  avatar

symmetriceigenproblem.jl's Issues

Do aggressive early deflation and multiple bulges

The terminology is terrible, but to be more competitive with MRRR it seems necessary.

Basic idea:

  1. A is the input matrix, T is the current approximate eigenvalue matrix, Q is the matrix that should converge to the eigenvectors
  2. The last few columns Q[:, end-k:end] span a subspace containing some eigenvectors up to convergence tolerance.
  3. The bottom right block of T is the Galerkin projection T[end-k:end, end-k:end] = Q[:, end-k:end]' * A * Q[:, end-k:end], so finding eigenvalues of that block is finding Ritz values of A with respect to the subspace induced by the last columns of Q
  4. Solve the eigenproblem for T[end-k:end, end-k:end] and you magically get the residual norms of all approximate eigenpairs (normally you would look at the off-diagonal value T[end-k-1,end-k] to judge whether the bottom right block as a whole is converged as an invariant subspace, now instead you kind of split that specific value into residual norms per approximate eigenpair in the block, which is richer information)
  5. Permute the diagonal of T to move converged eigenvalues to the back
  6. Store the non-converged Ritz values
  7. Restore spike -> tridiagonal using householder reflections
  8. Use multiple of the non-converged Ritz values as shifts and chase a series of bulges through T.

So, we both detect convergence of individual eigenvalues earlier instead of waiting for the block as a whole to converge, and we can exploit knowing the unconverged Ritz values of the block to do multiple bulge chases as the same time.

It should integrate nicely with the Given's kernel, since multiple bulges = multiple layers, and multiple layers are later applied in bulk to the Q matrix.

The only not so nice thing is that it requires applying householder reflectors to the Q matrix as well, and introduces more parameters to tune.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.