flaport / torch_sparse_solve Goto Github PK

View Code? Open in Web Editor NEW

57.0 2.0 4.0 50 KB

A sparse KLU solver for PyTorch.

Home Page: https://pypi.org/project/torch-sparse-solve

License: GNU Lesser General Public License v2.1

Python 60.89% C++ 35.29% Makefile 3.82%

pytorch sparse-matrix sparse-linear-systems klu-algorithm sparse sparse-matrices suitesparse python

torch_sparse_solve's Introduction

Torch Sparse Solve

An alternative to torch.solve for sparse PyTorch CPU tensors using the efficient KLU algorithm.

CPU tensors only

This library is a wrapper around the SuiteSparse KLU algorithms. This means the algorithm is only implemented for C-arrays and hence is only available for PyTorch CPU tensors. However, for large, sparse enough tensors, it might still be worth doing the GPU→CPU conversion.

Usage

The torch_sparse_solve library provides a single function solve(A, b), which solves for x in the batched matrix × batched matrix system Ax=b for torch.float64 tensors (notice the different API in comparison to torch.solve):

import torch
from torch_sparse_solve import solve
torch.manual_seed(42)
mask = torch.tensor([[[1,0,0],[1,1,0],[0,0,1]]], dtype=torch.float64)
A = (mask * torch.randn(4, 3, 3, dtype=torch.float64)).to_sparse()
b = torch.randn(4, 3, 2, dtype=torch.float64)
x = solve(A, b)

# compare to torch.solve:
A = A.to_dense()
print( (x - torch.solve(b, A)[0] < 1e-9).all() )

True

Caveats

There are two major caveats you should be aware of when using torch_sparse_solve.solve(A, b):

A should be 'dense' in the first dimension, i.e. the batch dimension should contain as many elements as the batch size.
A should have the same sparsity pattern for every element in the batch. If this is not the case, you have two options:
1. Create a new sparse matrix with the same sparsity pattern for every element in the batch by adding zeros to the sparse representation.
2. OR loop over the batch dimension and solve sequentially, i.e. with shapes (1, m, m) and (1, m, n) for each element in A and b respectively.
solve is differentiable, but only for the non-zero elements of A (which in most cases is what you want anyway).

Installation

The library can be installed with pip:

pip install torch_sparse_solve

Please note that no pre-built wheels exist. This means that pip will attempt to install the library from source. Make sure you have the necessary dependencies installed for your OS.

Dependencies

Linux

On Linux, having PyTorch, scipy and suitesparse installed is often enough to be able install the library (along with the typical developer tools for your distribution). Run the following inside a conda environment:

conda install suitesparse scipy
conda install pytorch -c pytorch
pip install torch_sparse_solve

Windows

On Windows, the installation process is a bit more involved as typically the build dependencies are not installed. To install those, download Visual Studio Community 2017 from here. During installation, go to Workloads and select the following workloads:

Desktop development with C++
Python development

Then go to Individual Components and select the following additional items:

C++/CLI support
VC++ 2015.3 v14.00 (v140) toolset for desktop

Then, download and install Microsoft Visual C++ Redistributable from here.

After these installation steps, run the following commands inside a x64 Native Tools Command Prompt for VS 2017, after activating your conda environment:

set DISTUTILS_USE_SDK=1
conda install suitesparse scipy
conda install pytorch -c pytorch
pip install torch_sparse_solve

License & Credits

This library was partly based on:

SuiteSparse, LGPL-2.1
kagami-c/PyKLU, LGPL-2.1
scipy.sparse, BSD-3

torch_sparse_solve's People

Contributors

Stargazers

Watchers

Forkers

penghao94 amir90 hengshu1 njkrichardson

torch_sparse_solve's Issues

speed?

Hi!
Thank you for this implementation. Do you have any idea on the speed compared to using torch.solve?

Error: Failed building wheel for torch-sparse-solve

I was trying to build it on my linux machine where I get this error:

1417 | inline Tensor Tensor::index(TensorList indices) const {
| ~~~~~~~~~~~^~~~~~~
torch_sparse_solve.cpp:32:143: error: cannot convert '' to 'at::TensorList' {aka 'c10::ArrayRefat::Tensor'}
32 | at::Tensor gradA_tmp = at::sparse_coo_tensor(indices, -at::sum((gradb.index({i, indices.index({0})}) * x.index({i, indices.index({1})})), -1)).unsqueeze(0);
|

Avoiding dense matmult

Very nice library!

If I'm understanding your CPP code right, this line

torch_sparse_solve/torch_sparse_solve.cpp

Line 28 in ce765ce

at::Tensor gradA = (-at::bmm(gradb, at::transpose(x, -1, -2)));

is creating dense matrices, even though the result will later be sparsified. This wouldn't scale to large matrices that won't fit into memory dense; is there a way to compute just the nonzero elements?

Cannot install torch_sparse_solve

Dear flaport,

I tried multiple times with different environmental settings in Colab, but I consistantly get the same error msg:

Preparing metadata (setup.py) ... done
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for torch-sparse-solve (setup.py) ... error
ERROR: Failed building wheel for torch-sparse-solve
ERROR: Could not build wheels for torch-sparse-solve, which is required to install pyproject.toml-based projects

Could you plz help with this?

Thank you very much!

Best,
JY

[Question] Backward pass with respect to the sparse matrix - dense or sparse?

Using the notation x = A^{-1} b, I am interested in the gradient with respect to sparse matrix A and more specifically the nonzero values in A but haven't been able to test the library yet (See #12 ).

I looked at the documentation but it wasn't clear to me whether the library would return a dense or a sparse matrix for A.grad. Reading the blog post I can see the derivative with respect to the (dense matrix) being explained but the derivative with respect to only the non-zero values in A is not made explicit.

Does the libray support providing derivatives with repect to only the non-zero elements of A?

Wired bugs: Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Hi Flaport:

Thanks for your amazing code!

However, when I applied your code, there is a weired error:
'''Process finished with exit code 139 (interrupted by signal 11: SIGSEGV).'''
The code is instantly stopped by the above bugs. And I have no idea what should I do.

To Reproduce

The following code can reproduce the behavior:

import torch
from torch_sparse_solve import solve
import random

def random_int_list(start, stop, length):
    start, stop = (int(start), int(stop)) if start <= stop else (int(stop), int(start))
    length = int(abs(length)) if length else 0
    random_list = []
    for i in range(length):
        random_list.append(random.randint(start, stop))
    return random_list

'''
create b
'''
b = torch.randn(1024,1,requires_grad=True,dtype=torch.float64).cuda()

'''
create A
'''
L_width = 1024
nnz = random.randint(300, 800) # number of nonzero elements
val = [random.uniform(-2, 2) for i in range(nnz)]
row = sorted(random_int_list(0,L_width-1,nnz))
col = sorted(random_int_list(0,L_width-1,nnz))

A = torch.sparse_coo_tensor([row, col], val, (L_width, L_width), dtype=torch.float64, requires_grad=True).cuda()

result_oc = solve(A.unsqueeze(0), b.unsqueeze(0))

Expected behavior

Can properly apply your code: solve(A, b)

Environment

Collecting environment information...
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 9.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.2 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Libc version: glibc-2.9

Python version: 3.6.12 |Anaconda, Inc.| (default, Sep  8 2020, 23:10:56)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.15.0-151-generic-x86_64-with-debian-buster-sid
Is CUDA available: True
CUDA runtime version: 9.2.148
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 396.54
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.4.1
/usr/local/cuda-9.2/lib64/libcudnn.so.7
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.7.1
[pip3] torch-sparse-solve==0.0.4
[pip3] torchaudio==0.7.0a0+a853dff
[pip3] torchvision==0.8.2
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               9.2                           0
[conda] mkl                       2020.2                      256
[conda] mkl-service               2.3.0            py36he8ac12f_0
[conda] mkl_fft                   1.2.1            py36h54f3939_0
[conda] mkl_random                1.1.1            py36h0573a6f_0
[conda] numpy                     1.19.2           py36h54aff64_0
[conda] numpy-base                1.19.2           py36hfa32c7d_0
[conda] pytorch                   1.7.1           py3.6_cuda9.2.148_cudnn7.6.3_0    pytorch
[conda] torch-sparse-solve        0.0.4                     <pip>
[conda] torchaudio                0.7.2                      py36    pytorch
[conda] torchvision               0.8.2                 py36_cu92    pytorch

Additional context

useful link: https://stackoverflow.com/questions/49414841/process-finished-with-exit-code-139-interrupted-by-signal-11-sigsegv

Really looking forward for your reply

Best Regards

Supporting pytorch_sparse

Hi again,

You might be aware of this project (pytorch_sparse) which is really cool.
It's be great to support their SparseTensor as an input to solve, or maybe adding another method for those tensors.

I don't know much about sparse format but it does support efficient csc conversion (here), it might be worth having a look.

Formulas in the backward

Thanks for this great wrapper !

I wonder, could you point me towards formulas for solve_backward please?

https://github.com/flaport/torch_sparse_solve/blob/ce765cea780ab084e819048a77550c1129ccd878/torch_sparse_solve.cpp#L26-33

Occur bugs when doing backward process.

Hi flaport:

Many thanks for your great blog!

However, when I run the backward process, there is some weird problems I cannot solve.
Could you give me some suggestion?

Code:

import torch
from torch_sparse_solve import solve
torch.manual_seed(42)
from torch.autograd import gradcheck

mask = torch.tensor([[[1, 0, 0], [1, 1, 0], [0, 0, 1]]], dtype=torch.float64)
A = mask * torch.randn(4, 3, 3, dtype=torch.float64)
Asp = A.to_sparse()
Asp.requires_grad_()
b = torch.randn(4, 3, 2, dtype=torch.float64, requires_grad=True)
gradcheck(solve, [Asp, b], check_sparse_nnz=True)`

Error:

/home/hongjin/anaconda3/envs/20210714colorization/bin/python3 /home/hongjin/PycharmProjects/Year_2021/Colorization_libo/0714_customed_sparse_linear_system_pytorch/test.py
Traceback (most recent call last):
  File "/home/hongjin/PycharmProjects/Year_2021/Colorization_libo/0714_customed_sparse_linear_system_pytorch/test.py", line 60, in <module>
    gradcheck(solve, [Asp, b], check_sparse_nnz=True)
  File "/home/hongjin/anaconda3/envs/20210714colorization/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 478, in gradcheck
    if not check_undefined_grad_support(output_to_check):
  File "/home/hongjin/anaconda3/envs/20210714colorization/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 458, in check_undefined_grad_support
    if (gi is not None) and (not gi.eq(0).all()):
RuntimeError: Could not run 'aten::eq.Scalar' with arguments from the 'SparseCPU' backend. 'aten::eq.Scalar' is only available for these backends: [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /opt/conda/conda-bld/pytorch_1607370169888/work/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
CUDA: registered at /opt/conda/conda-bld/pytorch_1607370169888/work/build/aten/src/ATen/CUDAType.cpp:2983 [kernel]
QuantizedCPU: registered at /opt/conda/conda-bld/pytorch_1607370169888/work/build/aten/src/ATen/QuantizedCPUType.cpp:297 [kernel]
BackendSelect: fallthrough registered at /opt/conda/conda-bld/pytorch_1607370169888/work/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: fallthrough registered at /opt/conda/conda-bld/pytorch_1607370169888/work/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
AutogradOther: registered at /opt/conda/conda-bld/pytorch_1607370169888/work/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradCPU: registered at /opt/conda/conda-bld/pytorch_1607370169888/work/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradCUDA: registered at /opt/conda/conda-bld/pytorch_1607370169888/work/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradXLA: registered at /opt/conda/conda-bld/pytorch_1607370169888/work/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse1: registered at /opt/conda/conda-bld/pytorch_1607370169888/work/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse2: registered at /opt/conda/conda-bld/pytorch_1607370169888/work/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse3: registered at /opt/conda/conda-bld/pytorch_1607370169888/work/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
Tracer: registered at /opt/conda/conda-bld/pytorch_1607370169888/work/torch/csrc/autograd/generated/TraceType_4.cpp:9291 [kernel]
Autocast: fallthrough registered at /opt/conda/conda-bld/pytorch_1607370169888/work/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /opt/conda/conda-bld/pytorch_1607370169888/work/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /opt/conda/conda-bld/pytorch_1607370169888/work/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
```


Process finished with exit code 1
`

Many thanks for your kind help!

HONGJIN

(very) incorrect solve for default test

No idea what's happening but the solver returns erroneous solutions, using the provided test. Tested on both Mac and Linux.

import torch
from torch_sparse_solve import solve
torch.manual_seed(42)
mask = torch.tensor([[[1,0,0],[1,1,0],[0,0,1]]], dtype=torch.float64)
A = (mask * torch.randn(4, 3, 3, dtype=torch.float64)).to_sparse()
b = torch.randn(4, 3, 2, dtype=torch.float64)
x = solve(A, b)

# compare to torch.solve:
A = A.to_dense()
print( (x - torch.linalg.solve( A,b)[0] < 1e-9).all() ) #changed your syntax from the deprecated "torch.solve"

returns

False

looking at the error:

(x - torch.linalg.solve( A,b)[0]).abs().max()

returns

tensor(6.0243, dtype=torch.float64)

Install issue on google collab

Thanks for the very helpful post here. I wanted to try your library and just started with google colab. Unfortunately it fails to install there:

!apt-get install libsuitesparse-dev
!pip install torch_sparse_solve

leads to

Collecting torch_sparse_solve
  Downloading torch_sparse_solve-0.0.5.tar.gz (5.2 kB)
Building wheels for collected packages: torch-sparse-solve
  Building wheel for torch-sparse-solve (setup.py) ... error
  ERROR: Failed building wheel for torch-sparse-solve
  Running setup.py clean for torch-sparse-solve
Failed to build torch-sparse-solve
Installing collected packages: torch-sparse-solve
    Running setup.py install for torch-sparse-solve ... error
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-18a9e695/torch-sparse-solve_4ae219b9ded1407587aaac79c4097416/setup.py'"'"'; __file__='"'"'/tmp/pip-install-18a9e695/torch-sparse-solve_4ae219b9ded1407587aaac79c4097416/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-02qc6g_t/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.7/torch-sparse-solve Check the logs for full command output.

---------------------------------------------------------------------------

Is there a simple workaround?

Much slower than `torch.linalg.solve`

I wrote the following code to benchmark against torch.linalg.solve for large sparse matrices. Here, A is a 5000x5000 coalesced COO matrix with 10 nonzero elements in every row. b is a length-5000 vector.

import torch
from torch_sparse_solve import solve

import time

torch.manual_seed(0)

N = 5000
k = 10
row_inds = torch.arange(N).reshape(-1, 1).expand(-1, k)
col_inds = torch.multinomial(torch.ones(N, N, dtype=torch.float64), num_samples=k).sort(axis=1).values
indices = torch.cat([torch.zeros(1, N * k, dtype=torch.int64), row_inds.reshape(1, -1), col_inds.reshape(1, -1)], dim=0)
values = torch.randn(N * k, dtype=torch.float64)
A = torch.sparse_coo_tensor(indices, values, size=(1, N, N))
b = torch.randn(1, N, 1, dtype=torch.float64)

start_time = time.time()
x = solve(A, b)
print(f"sparse solve: {time.time() - start_time: .7f} seconds")

# compare to torch.solve:
A = A.to_dense()
start_time = time.time()
x_true = torch.linalg.solve(A, b)
print(f"dense solve: {time.time() - start_time: .7f} seconds")
print("error: {}".format(torch.max(torch.abs(x - x_true))))

The output:

sparse solve:  45.2241280 seconds
dense solve:  0.9555798 seconds
error: 1.7035422615663265e-05

I was hoping to use torch_sparse_solve to solve some large systems quickly and in a memory-efficient manner, so it would be great if this performance issue can be looked into. Thanks!

Does torch_sparse_solve support backward propagation?

Hi, Floris!
Thank you for this amazing work! I just wonder if the solving process supports backward propagation?
Best regards,
Rodney.

Avoiding the for loop in solve

Hey Floris,

I wonder if you'd consider removing the for-loop in solve by using a method similar to this i.e treating the whole system with a block diagonal A where the batches are the blocks?

I guess the speed-up would be quite large compared to the for loop (and doubled since it is used in forward and backward).