cusplibrary / cusplibrary Goto Github PK

CUSP : A C++ Templated Sparse Matrix Library

License: Apache License 2.0

C++ 68.51% C 2.70% Cuda 27.47% Python 1.31% Shell 0.01%

cusplibrary's Introduction

CUSP : A C++ Templated Sparse Matrix Library

Linux	Windows	Coverage

For more information, see the project documentation at CUSP Website.

A Simple Example

#include <cuda.h>
#include <thrust/version.h>

#include <cusp/version.h>
#include <cusp/hyb_matrix.h>
#include <cusp/io/matrix_market.h>
#include <cusp/krylov/cg.h>

#include <iostream>

int main(void)
{
    int cuda_major =  CUDA_VERSION / 1000;
    int cuda_minor = (CUDA_VERSION % 1000) / 10;
    int thrust_major = THRUST_MAJOR_VERSION;
    int thrust_minor = THRUST_MINOR_VERSION;
    int cusp_major = CUSP_MAJOR_VERSION;
    int cusp_minor = CUSP_MINOR_VERSION;
    std::cout << "CUDA   v" << cuda_major   << "." << cuda_minor   << std::endl;
    std::cout << "Thrust v" << thrust_major << "." << thrust_minor << std::endl;
    std::cout << "Cusp   v" << cusp_major   << "." << cusp_minor   << std::endl;

    // create an empty sparse matrix structure (HYB format)
    cusp::hyb_matrix<int, float, cusp::device_memory> A;

    // load a matrix stored in Matrix-Market format
    cusp::io::read_matrix_market_file(A, "./testing/data/laplacian/5pt_10x10.mtx");

    // allocate storage for solution (x) and right hand side (b)
    cusp::array1d<float, cusp::device_memory> x(A.num_rows, 0);
    cusp::array1d<float, cusp::device_memory> b(A.num_rows, 1);

    // solve the linear system A * x = b with the conjugate gradient method
    cusp::krylov::cg(A, x, b);

    return 0;
}

CUSP is a header-only library. To compile this example clone both CUSP and Nvidia/cccl:

[email protected]:cusplibrary/cusplibrary.git
cd cusplibrary
git clone [email protected]:NVIDIA/cccl.git
nvcc -Icccl/thrust -Icccl/libcudacxx/include -Icccl/cub -I. example.cu -o example

Stable Releases

CUSP releases are labeled using version identifiers having three fields:

Date	Version	Date	Version
		03/13/2015	CUSP v0.5.0
		08/30/2013	CUSP v0.4.0
		03/08/2012	CUSP v0.3.1
		02/04/2012	CUSP v0.3.0
		05/30/2011	CUSP v0.2.0
04/28/2015	CUSP v0.5.1	07/10/2010	CUSP v0.1.0

Contributors

CUSP is developed as an open-source project by NVIDIA Research. Nathan Bell was the original creator and Steven Dalton is the current primary contributor.

CUSP is available under the Apache v2.0 open source LICENSE

Citing

@MISC{Cusp,
  author = "Steven Dalton and Nathan Bell and Luke Olson and Michael Garland",
  title = "Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations",
  year = "2014",
  url = "http://cusplibrary.github.io/", note = "Version 0.5.0"
}

cusplibrary's People

Contributors

Stargazers

Watchers

Forkers

sdalton1 filipemaia liuexp frpays soulsheng jappa msuchard lorynj reen bniemczyk erianthus solter versionzero kmsgithub theoryno3 gjherschlag mczapin ahmedatorky ohickey hohocode potatoym arthaszrz huoyao nvzitejing01 mspraggs alemagnani emmanuelquansah fox000002 bryan-peterson maydaygmail rserban lifengliu adrian-hsu world2005 marcinz wangjiaxin5918 jiujiangzhu shuyunqi ccecka xiongzubiao ma-sa chidcha xulunfan whu-zhigao settyblue jbbeau neveroldmilk yuhangwang personifyinc connectthefuture liuguoyou gralecm juhsi gnroy bryant1410 sailordiary ivanshih frandy vineeth-thumma falcong njh19 yao1993 mindis tigercouple giulange marziehlenjani shuweizhao canyuchen langefj ezhangle xyuan aeroglyphic leimingyu dabh gyzhu89 indigos33k3r stjordanis be-strong-and-principled eattardo nitronoid gevtushenko lupoglaz nerikhman akshayrdeodhar haoxiangmiao eminsight zeta1999 luoyin500 wangfeng012316 hecbarcab fan1997 parthnavale sumitisnot4u deepanshhu farzi56787 raman-sh batmanabcdefg yaojie-yu benbrock mostroot

cusplibrary's Issues

ELL accesses not consistent

Algorithms acting on ell matrices or views should respect the ordering of the indices and value matrices (row or column) and propagate the alignment if provided during construction.

ell_cg.txt

warnings about unused typedefs

Dear developers,

thank you for your great library.
I works flawlessly for me but there is an issue that bugs me for some time now. I get warnings about unused typedefs when I compile my programs with -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP
Since my compiler (g++ 4.9) prints the whole inclusion path this clutters the compiler output very much. Could you maybe fix that?
Thanks
Matthias

The whole warning message I get on v0.5.1 is

matthias@pc40-c722:/feltor/src/toefl$ make device=omp
g++ -O3 -Wall -x c++ -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP -fopenmp toefl_hpc.cu -o toefl_hpc -I/home/matthias/include -I../ -I../../inc -lnetcdf -lhdf5 -lhdf5_hl -L/home/matthias/include/json/../../src/lib_json -ljsoncpp -DDG_BENCHMARK -g
In file included from /home/matthias/include/cusp/sort.h:287:0,
from /home/matthias/include/cusp/system/detail/generic/conversions/coo_to_other.h:22,
from /home/matthias/include/cusp/system/detail/generic/convert.inl:24,
from /home/matthias/include/cusp/system/detail/generic/convert.h:50,
from /home/matthias/include/cusp/detail/convert.inl:25,
from /home/matthias/include/cusp/convert.h:94,
from /home/matthias/include/cusp/detail/array2d.inl:322,
from /home/matthias/include/cusp/array2d.h:515,
from /home/matthias/include/cusp/detail/format_utils.inl:18,
from /home/matthias/include/cusp/format_utils.h:169,
from /home/matthias/include/cusp/detail/csr_matrix.inl:17,
from /home/matthias/include/cusp/csr_matrix.h:525,
from /home/matthias/include/cusp/system/detail/sequential/elementwise.h:28,
from /home/matthias/include/cusp/system/cpp/detail/elementwise.h:22,
from /home/matthias/include/cusp/system/cpp/execution_policy.h:45,
from /home/matthias/include/cusp/execution_policy.h:29,
from /home/matthias/include/cusp/multiply.h:25,
from ../../inc/dg/backend/cusp_matrix_blas.cuh:8,
from ../../inc/dg/blas2.h:8,
from ../../inc/dg/blas.h:7,
from ../../inc/dg/algorithm.h:8,
from toeflR.cuh:4,
from toefl_hpc.cu:8:
/home/matthias/include/cusp/detail/sort.inl: In function ‘void cusp::detail::sort_by_row(thrust::execution_policy&, ArrayType1&, ArrayType2&, ArrayType3&, typename ArrayType1::value_type, typename ArrayType1::value_type)’:
/home/matthias/include/cusp/detail/sort.inl:46:47: warning: typedef ‘MemorySpace’ locally defined but not used [-Wunused-local-typedefs]
typedef typename ArrayType1::memory_space MemorySpace;
^
/home/matthias/include/cusp/detail/sort.inl: In function ‘void cusp::detail::sort_by_row_and_column(thrust::execution_policy&, ArrayType1&, ArrayType2&, ArrayType3&, typename ArrayType1::value_type, typename ArrayType1::value_type, typename ArrayType2::value_type, typename ArrayType2::value_type)’:
/home/matthias/include/cusp/detail/sort.inl:84:47: warning: typedef ‘MemorySpace’ locally defined but not used [-Wunused-local-typedefs]
typedef typename ArrayType1::memory_space MemorySpace;
^
In file included from /home/matthias/include/cusp/format_utils.h:169:0,
from /home/matthias/include/cusp/detail/csr_matrix.inl:17,
from /home/matthias/include/cusp/csr_matrix.h:525,
from /home/matthias/include/cusp/system/detail/sequential/elementwise.h:28,
from /home/matthias/include/cusp/system/cpp/detail/elementwise.h:22,
from /home/matthias/include/cusp/system/cpp/execution_policy.h:45,
from /home/matthias/include/cusp/execution_policy.h:29,
from /home/matthias/include/cusp/multiply.h:25,
from ../../inc/dg/backend/cusp_matrix_blas.cuh:8,
from ../../inc/dg/blas2.h:8,
from ../../inc/dg/blas.h:7,
from ../../inc/dg/algorithm.h:8,
from toeflR.cuh:4,
from toefl_hpc.cu:8:
/home/matthias/include/cusp/detail/format_utils.inl: In function ‘void cusp::detail::extract_diagonal(thrust::execution_policy&, const Matrix&, Array&, cusp::csr_format)’:
/home/matthias/include/cusp/detail/format_utils.inl:72:42: warning: typedef ‘MemorySpace’ locally defined but not used [-Wunused-local-typedefs]
typedef typename Array::memory_space MemorySpace;
^
/home/matthias/include/cusp/detail/format_utils.inl: In function ‘void cusp::detail::extract_diagonal(thrust::execution_policy&, const Matrix&, Array&, cusp::dia_format)’:
/home/matthias/include/cusp/detail/format_utils.inl:98:42: warning: typedef ‘IndexType’ locally defined but not used [-Wunused-local-typedefs]
typedef typename Matrix::index_type IndexType;
^
In file included from /home/matthias/include/cusp/format_utils.h:169:0,
from /home/matthias/include/cusp/detail/csr_matrix.inl:17,
from /home/matthias/include/cusp/csr_matrix.h:525,
from /home/matthias/include/cusp/system/detail/sequential/elementwise.h:28,
from /home/matthias/include/cusp/system/cpp/detail/elementwise.h:22,
from /home/matthias/include/cusp/system/cpp/execution_policy.h:45,
from /home/matthias/include/cusp/execution_policy.h:29,
from /home/matthias/include/cusp/multiply.h:25,
from ../../inc/dg/backend/cusp_matrix_blas.cuh:8,
from ../../inc/dg/blas2.h:8,
from ../../inc/dg/blas.h:7,
from ../../inc/dg/algorithm.h:8,
from toeflR.cuh:4,
from toefl_hpc.cu:8:
/home/matthias/include/cusp/detail/format_utils.inl: In function ‘size_t cusp::detail::count_diagonals(thrust::execution_policy&, size_t, size_t, const ArrayType1&, const ArrayType2&)’:
/home/matthias/include/cusp/detail/format_utils.inl:211:47: warning: typedef ‘MemorySpace’ locally defined but not used [-Wunused-local-typedefs]
typedef typename ArrayType1::memory_space MemorySpace;
^
/home/matthias/include/cusp/detail/format_utils.inl: In function ‘size_t cusp::detail::compute_optimal_entries_per_row(thrust::execution_policy&, const ArrayType&, float, size_t)’:
/home/matthias/include/cusp/detail/format_utils.inl:269:46: warning: typedef ‘MemorySpace’ locally defined but not used [-Wunused-local-typedefs]
typedef typename ArrayType::memory_space MemorySpace;
^
matthias@pc40-c722:~/feltor/src/toefl$ g++ --version
g++ (Ubuntu 4.9.3-13ubuntu2) 4.9.3
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

cusp::multiply inner working..

I am not able to see implementation (or inner working) of multiply function which is i think common API for all dense as well sparse matrices with all available formats (like coo,csr,hyb etc). I think it is also common for CPU when we pass matrices allocated on Host and GPU when we pass matrices allocated on Device. Please correct me if i am wrong.
I want to know that does multiply function is just performing multiplication (spmv in my case) with desired format only or it is doing some extra stuffs like conversion to other format or multiple iteration of multiply etc. ? because it is slower than Cusparse in my implementation. But i want to go with cusp only.
Please help me with this.
i have other question as well
does conversion like bellow is correct or I need to use convert function to perform conversion.
cusp::coo_matrix<int, float, cusp::device_memory> coo_device;
cusp::io::read_matrix_market_file(coo_device, mtx_file);cusp::csr_matrix<int, float, cusp::device_memory> csr_device;
csr_device = coo_device;

error: non-type template argument depends on a template parameter of the partial specialization

I'm working on a Homebrew formula for CUSP. I'm on OSX 10.9 with CUDA 7.5. I can compile and run this example with version 0.4.0:

#include <cusp/io/matrix_market.h>
#include <cusp/krylov/cg.h>
#include <iostream>

int main(void)
{  
  cusp::hyb_matrix<int, float, cusp::device_memory> A;
  cusp::io::read_matrix_market_file(A, "5pt_10x10.mtx");
  cusp::array1d<float, cusp::device_memory> x(A.num_rows, 0);
  cusp::array1d<float, cusp::device_memory> b(A.num_rows, 1);
  cusp::krylov::cg(A, x, b);                                                                            
  return 0;
}

but with CUSP 0.5.0 and 0.5.1 I get lots of messages of the form

/usr/local/include/cusp/system/cuda/detail/graph/b40c/graph/bfs/contract_expand_atomic/../../../util/io/load_tile.cuh:227:23: error: non-type template argument depends on a template parameter of the partial specialization
struct Iterate< LOAD, LOAD_VEC_SIZE, dummy> {
                      ^~~~~~~~~~~~~

I compile with /Developer/NVIDIA/CUDA-7.5/bin/nvcc testcusp.cu -o testcusp and I set the CUDACC environment variable to /Developer/NVIDIA/CUDA-7.5/bin/nvcc.

Any idea what's wrong?

Inconsistent GMRES results for complex numbers

I was trying to replace the GMRES solver in my codebase with cusp version. My matrices and vector are all complex. Having implemented a linear operator to call my own functions, the result and convergence rate were very different to my reference code. So I went on to compare the two implementations and figured out that with the following three modifications, I am able to reproduce the referenced result.

template <typename ValueType>
void ApplyPlaneRotation(ValueType& dx,
                        ValueType& dy,
                        ValueType& cs,
                        ValueType& sn)
{
    ValueType temp = cs * dx + sn *dy;
    //original: dy = -sn * dx + cs * dy;
    dy = -conj(sn) * dx + cs * dy;        <--- added conjugate
    dx = temp;
}


in detail/gmres.inl
template <class LinearOperator,
         class Vector,
         class Monitor,
         class Preconditioner>
void gmres(LinearOperator& A,
           Vector& x,
           Vector& b,
           const size_t restart,
           Monitor& monitor,
           Preconditioner& M)
{
    ...

    for (k = 0; k <= i; k++) {
        //  H(k,i) = <V(i+1),V(k)>        
        //original: H(k, i) = blas::dotc( w,V.column(k));
        H(k, i) = blas::dotc( V.column(k),w);     <- order swapped
        // V(i+1) -= H(k, i) * V(k)  //
        blas::axpy(V.column(k),w,-H(k,i));
    }

    ...
}


in detail/gmres.inl , replaced the following function, not sure whether there is any better implementation
template <typename ValueType>
void GeneratePlaneRotation(ValueType& dx,
                           ValueType& dy,
                           ValueType& cs,
                           ValueType& sn)
{
  typedef typename cusp::detail::norm_type<ValueType>::type NormType;

  if (dx == ValueType(0.)) {
    cs = ValueType(0);
    sn = ValueType(1);
  } else {
    NormType scale = abs(dx) + abs(dy);
    NormType norm = scale * sqrt(abs(dx / scale) * abs(dx / scale) +
                             abs(dy / scale) * abs(dy / scale));
    ValueType alpha = dx / abs(dx);
    cs = abs(dx) / norm;
    sn = alpha * conj(dy) / norm;
  }

}

current implementation:
max iterations=500, restart = 500, gmres failed to converge
max iterations=500, restart =50, gmres converged after 52 iterations
max iterations=500, restart =10, gmres converged after 44 iterations
not sure if this indicates that current implementation is good enough in general

with modification:
converged in 11 iterations

Bug ('parallel_for failed' error) in cusp/system/detail/generic/multiply.inl and a quick fix

Hi,
When running cg/preconditioned-cg examples in cusp-cuda9 branch with DIA/CSR/ELL matrix formats on CUDA 9.1, I got the following error:

terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: invalid configuration argument
Aborted (core dumped)

Interestingly, COO and HYB formats worked. I found cusp::multiply function somehow breaks the policy for other functions such as cusp::blas::axpby and even cusp::print to work (all threw same error).

A quick fix worked for me is to change in cusp/system/detail/generic/multiply.inl the following line

multiply(thrust::detail::derived_cast(exec), A, B, C, initialize, combine, reduce, format1, format2, format3);

into

multiply(exec, A, B, C, initialize, combine, reduce, format1, format2, format3);

Not sure if this fix is robust, but hope this info is helpful for your official fix.

Regards,
Ruxi

Preconditioners/smoothed_aggregation example does not compile

scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
nvcc.exe -o Preconditioners\smoothed_aggregation.obj -c -arch=sm_13 -Xcompiler -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -Xcompiler /Ox -Xcompiler /bigobj -I d:\documents\cusplibrary-master -I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" Preconditioners\smoothed_aggregation.cu
smoothed_aggregation.cu
d:/documents/cusplibrary-master\cusp/precond/aggregation/smoothed_aggregation.h(112): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during instantiation of class "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType> [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/smoothed_aggregation.h(112): error: type "cusp::multilevel<MatrixType, SmootherType>::MemorySpace [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(48): here is inaccessible
detected during instantiation of class "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType> [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/smoothed_aggregation.h(113): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during instantiation of class "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType> [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/smoothed_aggregation.h(113): error: type "cusp::multilevel<MatrixType, SmootherType>::MemorySpace [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(48): here is inaccessible
detected during instantiation of class "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType> [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/smoothed_aggregation.h(125): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during instantiation of class "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType> [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/smoothed_aggregation.h(132): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during instantiation of class "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType> [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/smoothed_aggregation.h(135): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during instantiation of class "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType> [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/smoothed_aggregation.h(119): error: type "cusp::multilevel<MatrixType, SmootherType>::MemorySpace [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(48): here is inaccessible
detected during:
instantiation of class "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::sa_level [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64/../../../VC/INCLUDE\vector(1270): here
instantiation of "void std::vector<_Ty, _Ax>::_Destroy(std::vector<_Ty, _Ax>::pointer, std::vector<_Ty, _Ax>::pointer) [with _Ty=cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level, _Ax=std::allocator<cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level>]"
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64/../../../VC/INCLUDE\vector(1305): here
instantiation of "void std::vector<_Ty, _Ax>::_Tidy() [with _Ty=cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level, _Ax=std::allocator<cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level>]"
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64/../../../VC/INCLUDE\vector(705): here
instantiation of "std::vector<_Ty, _Ax>::~vector() [with _Ty=cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level, _Ax=std::allocator<cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level>]"
(109): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/smoothed_aggregation.h(120): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during:
instantiation of class "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::sa_level [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64/../../../VC/INCLUDE\vector(1270): here
instantiation of "void std::vector<_Ty, _Ax>::_Destroy(std::vector<_Ty, _Ax>::pointer, std::vector<_Ty, _Ax>::pointer) [with _Ty=cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level, _Ax=std::allocator<cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level>]"
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64/../../../VC/INCLUDE\vector(1305): here
instantiation of "void std::vector<_Ty, _Ax>::_Tidy() [with _Ty=cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level, _Ax=std::allocator<cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level>]"
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64/../../../VC/INCLUDE\vector(705): here
instantiation of "std::vector<_Ty, _Ax>::~vector() [with _Ty=cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level, _Ax=std::allocator<cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level>]"
(109): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/smoothed_aggregation.h(120): error: type "cusp::multilevel<MatrixType, SmootherType>::MemorySpace [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(48): here is inaccessible
detected during:
instantiation of class "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::sa_level [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64/../../../VC/INCLUDE\vector(1270): here
instantiation of "void std::vector<_Ty, _Ax>::_Destroy(std::vector<_Ty, _Ax>::pointer, std::vector<_Ty, _Ax>::pointer) [with _Ty=cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level, _Ax=std::allocator<cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level>]"
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64/../../../VC/INCLUDE\vector(1305): here
instantiation of "void std::vector<_Ty, _Ax>::_Tidy() [with _Ty=cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level, _Ax=std::allocator<cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level>]"
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64/../../../VC/INCLUDE\vector(705): here
instantiation of "std::vector<_Ty, _Ax>::~vector() [with _Ty=cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level, _Ax=std::allocator<cusp::precond::aggregation::smoothed_aggregation<int, float, cusp::device_memory, cusp::relaxation::jacobi<float, cusp::device_memory>>::sa_level>]"
(109): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/detail/smoothed_aggregation.inl(91): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during instantiation of "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::smoothed_aggregation(const MatrixType &, ValueType) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/detail/smoothed_aggregation.inl(93): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during instantiation of "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::smoothed_aggregation(const MatrixType &, ValueType) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/detail/smoothed_aggregation.inl(94): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during instantiation of "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::smoothed_aggregation(const MatrixType &, ValueType) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/detail/smoothed_aggregation.inl(141): error: type "cusp::multilevel<MatrixType, SmootherType>::MemorySpace [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(48): here is inaccessible
detected during:
instantiation of "void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::extend_hierarchy() [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
(123): here
instantiation of "void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::init(const MatrixType &, const ArrayType &) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>, ArrayType=cusp::array1d_view<thrust::constant_iterator<float, thrust::use_default, thrust::use_default>>]"
(95): here
instantiation of "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::smoothed_aggregation(const MatrixType &, ValueType) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/detail/smoothed_aggregation.inl(154): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during:
instantiation of "void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::extend_hierarchy() [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
(123): here
instantiation of "void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::init(const MatrixType &, const ArrayType &) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>, ArrayType=cusp::array1d_view<thrust::constant_iterator<float, thrust::use_default, thrust::use_default>>]"
(95): here
instantiation of "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::smoothed_aggregation(const MatrixType &, ValueType) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/detail/smoothed_aggregation.inl(157): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during:
instantiation of "void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::extend_hierarchy() [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
(123): here
instantiation of "void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::init(const MatrixType &, const ArrayType &) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>, ArrayType=cusp::array1d_view<thrust::constant_iterator<float, thrust::use_default, thrust::use_default>>]"
(95): here
instantiation of "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::smoothed_aggregation(const MatrixType &, ValueType) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/detail/smoothed_aggregation.inl(157): error: type "cusp::multilevel<MatrixType, SmootherType>::MemorySpace [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(48): here is inaccessible
detected during:
instantiation of "void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::extend_hierarchy() [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
(123): here
instantiation of "void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::init(const MatrixType &, const ArrayType &) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>, ArrayType=cusp::array1d_view<thrust::constant_iterator<float, thrust::use_default, thrust::use_default>>]"
(95): here
instantiation of "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::smoothed_aggregation(const MatrixType &, ValueType) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/detail/smoothed_aggregation.inl(164): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during:
instantiation of "void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::extend_hierarchy() [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
(123): here
instantiation of "void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::init(const MatrixType &, const ArrayType &) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>, ArrayType=cusp::array1d_view<thrust::constant_iterator<float, thrust::use_default, thrust::use_default>>]"
(95): here
instantiation of "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::smoothed_aggregation(const MatrixType &, ValueType) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/detail/smoothed_aggregation.inl(126): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during:
instantiation of "void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::init(const MatrixType &, const ArrayType &) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>, ArrayType=cusp::array1d_view<thrust::constant_iterator<float, thrust::use_default, thrust::use_default>>]"
(95): here
instantiation of "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::smoothed_aggregation(const MatrixType &, ValueType) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

d:/documents/cusplibrary-master\cusp/precond/aggregation/detail/smoothed_aggregation.inl(127): error: type "cusp::multilevel<MatrixType, SmootherType>::ValueType [with MatrixType=cusp::hyb_matrix<int, float, cusp::device_memory>, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>]"
d:/documents/cusplibrary-master\cusp/multilevel.h(47): here is inaccessible
detected during:
instantiation of "void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::init(const MatrixType &, const ArrayType &) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>, ArrayType=cusp::array1d_view<thrust::constant_iterator<float, thrust::use_default, thrust::use_default>>]"
(95): here
instantiation of "cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType>::smoothed_aggregation(const MatrixType &, ValueType) [with IndexType=int, ValueType=float, MemorySpace=cusp::device_memory, SmootherType=cusp::relaxation::jacobi<float, cusp::device_memory>, MatrixType=cusp::coo_matrix<int, float, cusp::device_memory>]"
Preconditioners/smoothed_aggregation.cu(67): here

20 errors detected in the compilation of "C:/Users/polyakov/AppData/Local/Temp/tmpxft_000015d8_00000000-8_smoothed_aggregation.cpp1.ii".
scons: *** [Preconditioners\smoothed_aggregation.obj] Error 2
scons: building terminated because of errors.

cusp for Multi-GPU

Can cusp auto-parallelize across multiple GPUs? How should I implement that ?

Reconcile complex number support with thrust/complex.h

CUSP calculates incorrect number of connected components

Hi,

I am trying to use cusp to calculate the number of connected components in large graphs. I am using csr file format to read matrix market graph files. I have encountered the following issues.

Issue 1: CUSP outputs different(no. of components) results when I run the code with same dataset on different runs.

Run1 : command: ./cusp_cc_mtx in-2004.mtx
Output: Connected Components - [CUSP]
Input filename - in-2004.mtx
Number of Connected Components : 542

Run2 : command: ./cusp_cc_mtx in-2004.mtx
output: Connected Components - [CUSP]
Input filename - in-2004.mtx
Number of Connected Components : 526

Run3 : command: ./cusp_cc_mtx in-2004.mtx
output: Connected Components - [CUSP]
Input filename - in-2004.mtx
Number of Connected Components : 519

None of the above calculated number of connected components is correct. The actual number of components for the above mentioned dataset is 134 which I verified it with a serial code. I re-ran it for different datasets and faced the same issue.

Issue 2: For some matrix market graphs, I am not able to run cusp. I get the following error,

"[/home/jayadharini/cusp/cusplibrary-0.5.1/cusp/system/cuda/detail/graph/b40c/graph/bfs/enactor_hybrid.cuh, 538] Frontier queue overflow. Please increase queue-sizing factor. (CUDA error 9: invalid configuration argument)"

Is there any limit on the input dataset's size? I ran the cusp code for various datasets and I noticed that I get this error for datasets with large number of connected components. I am able to run it for larger datasets with less number of connected components. Please clarify!

I am using,
CUDA v7.5
Thrust v1.8
Cusp v0.5

I also tried using cusp - 5.1 and the current development version in github. I am compiling with sm_30. Since I am a student working on university server, I cannot upgrade to CUDA 8.0.

Please help me fix this.

Thanks,
Dharini

Release branches should be tags

Usually releases are tagged instead of keeping separate long-running branches around for them. Are you intending to backport fixes to those release branches? If not I suggest removing those branches and tagging the releases.

Would be nice with a conjugate residuals implementation

This is my favorite Krylov subspace method that unlike CG will converge to a least-squares solution if A is symetric but not positive definite.

This is very useful in Newton-like optimization on lagrange multiplier problems, looking for saddle point solutions.

Please see

http://en.wikipedia.org/wiki/Conjugate_residual_method

and

http://en.wikipedia.org/wiki/Newton's_method_in_optimization

`performance` examples compile, but compilation error for `examples`

I can compile the samples in performance/solver with SCons just fine. However, if I run the exact same compilation command for one of the examples, e.g. examples/Preconditioner, I get the following error:

scons: Entering directory `/home/cusplibrary/examples'
EnvironmentError: No tool named 'nvcc': not a Zip file:
  File "/home/cusplibrary/examples/SConstruct", line 9:
    env = Environment()
  File "<string>", line 297:
    None
  File "/usr/lib/scons/SCons/Environment.py", line 1788:
    tool = SCons.Tool.Tool(tool, toolpath, **kw)
  File "/usr/lib/scons/SCons/Tool/__init__.py", line 104:
    module = self._tool_module()
  File "/usr/lib/scons/SCons/Tool/__init__.py", line 164:
    raise SCons.Errors.EnvironmentError(m)

This is on the cuda9 branch. nvcc --version gives

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

If I copy/paste the performance/solver SConscript into examples/Preconditioner, I get the same error, but if I copy e.g. examples/Preconditioner/diagonal.cu into performance/solver and run SCons there, then I can compile and run that example. Any idea what's up / how to fix this? Thanks in advance!

Agency migration

CUSP's internal implementation should be migrated to use Agency instead of Thrust. This will improve performance, flexibility, and integration with execution policies.

How can I define the combine and reduce operator in matrix-vector multiplication?

When I compute a matrix-vector(like y=A*x) product,I want to define different operator to replace the original combine function(i.e. multiplication) and original reduce function(i.e addition).
However, it seems that cusp::multiply can only support to those function defined in cuda/functional.h, which is based on thrust/functional.h.
I wonder that if cusp supports to user-defined combine and reduce function in spmv. If not, can you give me some suggestions to achieve it? I guess revising thrust/functional.h may be help.

Please give me some suggestion or solution. Any info will help, thank you !!

thrust::copy between cusp::complex and std::complex fails (cuda 5.5, thrust 1.7)

The following code has stopped working since installation of cuda 5.5.

#include <vector>
#include <cusp/complex.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <complex>
int main(int argc, char *argv[])
{
thrust::device_vector<cusp::complex<float> > temp(10,1);
std::vector<std::complex<float> > htemp(temp.size());
thrust::copy(temp.begin(), temp.end(), htemp.begin());
return 0;
}

The error output is

/usr/local/cuda-5.5/bin/..//include/thrust/system/detail/internal/scalar/general_copy.h(43): error: no operator "=" matches these operands
operand types are: std::complex = thrust::referencecusp::complex<float, thrust::pointercusp::complex<float, thrust::system::cpp::detail::tag, thrust::use_default, thrust::use_default>, thrust::use_default>
detected during:
instantiation of "OutputIterator thrust::system::detail::internal::scalar::general_copy(InputIterator, InputIterator, OutputIterator) [with InputIterator=thrust::detail::normal_iteratorthrust::pointer<cusp::complex<float, thrust::system::cpp::detail::tag, thrust::use_default, thrust::use_default>>, OutputIterator=__gnu_cxx::__normal_iteratorstd::complex<float *, std::vectorstd::complex<float, std::allocatorstd::complex>>]"
/usr/local/cuda-5.5/bin/..//include/thrust/system/detail/internal/scalar/copy.inl(69): here
instantiation of "OutputIterator thrust::system::detail::internal::scalar::copy_detail::copy(InputIterator, InputIterator, OutputIterator, thrust::detail::false_type) [with InputIterator=thrust::detail::normal_iteratorthrust::pointer<cusp::complex<float, thrust::system::cpp::detail::tag, thrust::use_default, thrust::use_default>>, OutputIterator=__gnu_cxx::__normal_iteratorstd::complex<float *, std::vectorstd::complex<float, std::allocatorstd::complex>>]"
/usr/local/cuda-5.5/bin/..//include/thrust/system/detail/internal/scalar/copy.inl(107): here
instantiation of "OutputIterator thrust::system::detail::internal::scalar::copy(InputIterator, InputIterator, OutputIterator) [with InputIterator=thrust::detail::normal_iteratorthrust::pointer<cusp::complex<float, thrust::system::cpp::detail::tag, thrust::use_default, thrust::use_default>>, OutputIterator=__gnu_cxx::__normal_iteratorstd::complex<float *, std::vectorstd::complex<float, std::allocatorstd::complex>>]"
/usr/local/cuda-5.5/bin/..//include/thrust/system/cpp/detail/copy.h(43): here
instantiation of "OutputIterator thrust::system::cpp::detail::copy(thrust::system::cpp::detail::tag, InputIterator, InputIterator, OutputIterator) [with InputIterator=thrust::detail::normal_iteratorthrust::pointer<cusp::complex<float, thrust::system::cpp::detail::tag, thrust::use_default, thrust::use_default>>, OutputIterator=__gnu_cxx::__normal_iteratorstd::complex<float *, std::vectorstd::complex<float, std::allocatorstd::complex>>]"
/usr/local/cuda-5.5/bin/..//include/thrust/detail/copy.inl(35): here
instantiation of "OutputIterator thrust::copy(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, OutputIterator) [with System=thrust::system::cpp::detail::tag, InputIterator=thrust::detail::normal_iteratorthrust::pointer<cusp::complex<float, thrust::system::cpp::detail::tag, thrust::use_default, thrust::use_default>>, OutputIterator=__gnu_cxx::__normal_iteratorstd::complex<float *, std::vectorstd::complex<float, std::allocatorstd::complex>>]"
/usr/local/cuda-5.5/bin/..//include/thrust/system/cuda/detail/copy_cross_system.inl(207): here
[ 3 instantiation contexts not shown ]
instantiation of "OutputIterator thrust::system::cuda::detail::copy_cross_system(thrust::system::cuda::detail::cross_system<System1, System2>, InputIterator, InputIterator, OutputIterator) [with System1=thrust::system::cuda::detail::tag, System2=thrust::system::cpp::detail::tag, InputIterator=thrust::detail::normal_iteratorthrust::device_ptr<cusp::complex>, OutputIterator=__gnu_cxx::__normal_iteratorstd::complex<float *, std::vectorstd::complex<float, std::allocatorstd::complex>>]"
/usr/local/cuda-5.5/bin/..//include/thrust/system/cuda/detail/copy.inl(53): here
instantiation of "OutputIterator thrust::system::cuda::detail::copy(thrust::system::cuda::detail::cross_system<System1, System2>, InputIterator, InputIterator, OutputIterator) [with System1=thrust::system::cuda::detail::tag, System2=thrust::system::cpp::detail::tag, InputIterator=thrust::detail::normal_iteratorthrust::device_ptr<cusp::complex>, OutputIterator=__gnu_cxx::__normal_iteratorstd::complex<float *, std::vectorstd::complex<float, std::allocatorstd::complex>>]"
/usr/local/cuda-5.5/bin/..//include/thrust/detail/copy.inl(35): here
instantiation of "OutputIterator thrust::copy(const thrust::detail::execution_policy_base &, InputIterator, InputIterator, OutputIterator) [with System=thrust::system::cuda::detail::cross_system<thrust::system::cuda::detail::tag, thrust::system::cpp::detail::tag>, InputIterator=thrust::detail::normal_iteratorthrust::device_ptr<cusp::complex>, OutputIterator=__gnu_cxx::__normal_iteratorstd::complex<float *, std::vectorstd::complex<float, std::allocatorstd::complex>>]"
/usr/local/cuda-5.5/bin/..//include/thrust/detail/copy.inl(66): here
instantiation of "OutputIterator thrust::detail::two_system_copy(thrust::execution_policy &, thrust::execution_policy &, InputIterator, InputIterator, OutputIterator) [with FromSystem=thrust::system::cuda::detail::tag, ToSystem=thrust::system::cpp::detail::tag, InputIterator=thrust::detail::normal_iteratorthrust::device_ptr<cusp::complex>, OutputIterator=__gnu_cxx::__normal_iteratorstd::complex<float *, std::vectorstd::complex<float, std::allocatorstd::complex>>]"
/usr/local/cuda-5.5/bin/..//include/thrust/detail/copy.inl(102): here
instantiation of "OutputIterator thrust::copy(InputIterator, InputIterator, OutputIterator) [with InputIterator=thrust::detail::normal_iteratorthrust::device_ptr<cusp::complex>, OutputIterator=__gnu_cxx::__normal_iteratorstd::complex<float *, std::vectorstd::complex<float, std::allocatorstd::complex>>]"
test.cu(10): here

1 error detected in the compilation of "/tmp/tmpxft_00002dda_00000000-6_test.cpp1.ii".

long cuda compile times

compile times are excessively long in some cases. consider the performance/amg/smoothed_aggregation.cu example

nvcc smoothed_aggregation.cu -o smoothed_aggregation -I ../.. -I /usr/local/cuda/include -Xcudafe -# -O3
Front end time 94.89 (CPU) 95.00 (elapsed)
Back end time 3.88 (CPU) 4.00 (elapsed)
Total compilation time 99.15 (CPU) 99.00 (elapsed)
Front end time 3.95 (CPU) 4.00 (elapsed)
Back end time 1.56 (CPU) 2.00 (elapsed)
Total compilation time 5.51 (CPU) 6.00 (elapsed)
Front end time 1.41 (CPU) 1.00 (elapsed)
Back end time 0.15 (CPU) 0.00 (elapsed)
Total compilation time 1.56 (CPU) 1.00 (elapsed)

One workaround during development is use the cpp or omp backends

nvcc smoothed_aggregation.cu -I ../.. -I /usr/local/cuda/include -o smoothed_aggregation -Xcompiler -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP -Xcompiler -fopenmp -Xcudafe -# -O3
Front end time 5.23 (CPU) 5.00 (elapsed)
Back end time 0.72 (CPU) 1.00 (elapsed)
Total compilation time 5.96 (CPU) 6.00 (elapsed)
Front end time 0.65 (CPU) 1.00 (elapsed)
Back end time 0.23 (CPU) 0.00 (elapsed)
Total compilation time 0.89 (CPU) 1.00 (elapsed)
Front end time 1.11 (CPU) 1.00 (elapsed)
Back end time 0.10 (CPU) 0.00 (elapsed)
Total compilation time 1.22 (CPU) 1.00 (elapsed)

Need to analyze header dependency to reduce the number of redundant header file inclusions.

delete row/column of sparse matrix

Is there an easy way to delete certain rows and columns in a sparse matrix in cusp? This would be usefull for certain boundary conditions in fdm/fem.

may be there is a bug

in cusp/detail/format_utils.inl line 72

template <typename IndexArray, typename OffsetArray>
void indices_to_offsets(const IndexArray& indices, OffsetArray& offsets)
{
CUSP_PROFILE_SCOPED();

typedef typename OffsetArray::value_type OffsetType;

// convert uncompressed row indices into compressed row offsets
thrust::lower_bound(indices.begin(),
                    indices.end(),
                    thrust::counting_iterator<OffsetType>(0),
                    thrust::counting_iterator<OffsetType>(offsets.size()),//maybe a bug, this should be the cols length of matrix, since offsets 's length is cols length 
                    offsets.begin());

}

matrix market io entries conflict

Hi everyone,
I run cusp library on Red Hat 4.1.2 with NVIDIA 460 GPU card and CUDA version is 5.0
I measure sparse matrix conversion performance with some matrices on matrix market.
However, there are something confused me--when I test s3dkq4m2 (http://math.nist.gov/MatrixMarket/data/misc/cylshell/s3dkq4m2.html) ,
the output info from cusp shows that s3dkq4m2 have 4820891 entries,
but the data format shows total non-zero elements is 2455670.
The same problem also shown in cant matrix, but bcsstk18, pdb1HYS mc2depi, consph matrices performs normally.
I have checked the /cusp/io/detail/matrix_market.inl files for entries computation,
but I can't figure out.
Thank you!

SA setup failing on Windows VS2013

I am using this configuration

Windows 8.1 64bit
Microsoft Visual Studio Professional 2013
CUDA 7.0
Thrust 1.8
And on the CUDA / include directory I am copying either a snapshot of CUSP 0.4.0 taken on 11/09/14, which I will call v040, or a snapshot of the 0.5.0 downloaded today, which I will call v050

I am also using two computers, a laptop with a GT 730M (a sm3.5 device with 1GB or RAM) and a desktop with a GTC TITAN Black (sm 3.5, 6GB RAM), both computers have the SW configurations, as above.

The SA example compiles OK with all software versions. The SA example though does not run on the laptop if compiled with CUSP v050, I believe it throws an exception and I get a small debug dialog from visual studio (if I use CUSP v040 all is OK on this laptop). The same example runs fine on the GTX TITAN Black. Please see the reminder of the message below, but I believe this to be a memory problem. It looks to me CUSP v050 results in a larger memory footprint when building the precoditioner and fails to run on the mobile GPU where there is less memory, while it runs within the 6GB of the TITAN Black.

I am also developing an application which integrates CUSP as a solver. In this application I am using as a test case a FEM mesh with 65lk nodes to solve the Laplace equation. On both GPUs sa_initialize() throws an exception when called if compiled against CUSP v050, and works perfectly with CUSP v040. My explanation for this behavior is that this use case requires more memory than the smoothed_aggregation.cu example, and, in my hypothesis, CUSP v050 is somehow consuming more memory compared to v040 to such an extent that the even the 6GBs of the TITAN Black do not suffice, and the algorithm does not run on either of the GPUs.

In the application I am just catching any exception with a block as follows:

try
{
    this->elecSysPrecond.sa_initialize(this->elecSysMatrix);
}
catch (...)
{
    return 1;
}

Is there any plan to include diagonal incomplete-Cholesky (DIC) preconditioner?

Dear developer,
As diagonal incomplete-Cholesky (DIC) and diagonal incomplete LU (DILU) is one of the fastest preconditoner on CPU, but DIC and DILU are inherently sequential preconditioners and require a lot of work to make them run in parallel on the GPU
Is there any plan to include diagonal incomplete-Cholesky (DIC) preconditioner in CUSP?
I cannot find any other open source code include DIC or DILU by cuda language, can someone familiar
with this algorithm give some instruction?

Best regards,
Benjamin

Refactor Bridson preconditioners

Bridson AINV preconditioners missing copy constructors and take place sequentially on the CPU internally. See if it's possible to execute portions on the device directly and refactor the structure to use new CUSP capabilities.

Any plans to support CUDA 10?

Add block matrix example

Add example of creating a blocked matrix view of several smaller matrices.

Given A, B, and D then

K = [[A, B],
[B^t, D]]

The types of the matrices are complex, need to simplify. Example version using C++11 is shown below.

#include <cusp/multiply.h>
#include <cusp/linear_operator.h>
#include <cusp/gallery/poisson.h>
#include <cusp/krylov/cg.h>

#include <cusp/precond/aggregation/smoothed_aggregation.h>

template<typename MatrixType>
auto generate_shifted_matrix_view(const MatrixType& A, const int row, const int col)
->decltype(cusp::make_coo_matrix_view(A.num_rows + row, A.num_cols + col, A.num_entries,
                                      cusp::make_array1d_view(thrust::make_transform_iterator(A.row_indices.cbegin(), thrust::placeholders::_1 + row),
                                              thrust::make_transform_iterator(A.row_indices.cbegin(), thrust::placeholders::_1 + row) + A.num_entries),
                                      cusp::make_array1d_view(thrust::make_transform_iterator(A.column_indices.cbegin(), thrust::placeholders::_1 + col),
                                              thrust::make_transform_iterator(A.column_indices.cbegin(), thrust::placeholders::_1 + col) + A.num_entries),
                                      A.values))
{
    return cusp::make_coo_matrix_view(A.num_rows + row, A.num_cols + col, A.num_entries,
                                      cusp::make_array1d_view(thrust::make_transform_iterator(A.row_indices.cbegin(), thrust::placeholders::_1 + row),
                                              thrust::make_transform_iterator(A.row_indices.cbegin(), thrust::placeholders::_1 + row) + A.num_entries),
                                      cusp::make_array1d_view(thrust::make_transform_iterator(A.column_indices.cbegin(), thrust::placeholders::_1 + col),
                                              thrust::make_transform_iterator(A.column_indices.cbegin(), thrust::placeholders::_1 + col) + A.num_entries),
                                      A.values);
}

template<typename MatrixType1, typename MatrixType2>
class concatenate_matrix_views
{
public:

    typedef typename MatrixType1::index_type   IndexType;
    typedef typename MatrixType1::memory_space MemorySpace;

    cusp::array1d<unsigned int, MemorySpace> indices, indices_t;

    const MatrixType1& A;
    const MatrixType2& B;

    concatenate_matrix_views(const MatrixType1& A, const MatrixType2& B) : A(A), B(B), indices(A.num_entries + B.num_entries)
    {
        thrust::merge_by_key(thrust::make_zip_iterator(thrust::make_tuple(A.row_indices.begin(), A.column_indices.begin())),
                             thrust::make_zip_iterator(thrust::make_tuple(A.row_indices.begin(), A.column_indices.begin())) + A.num_entries,
                             thrust::make_zip_iterator(thrust::make_tuple(B.row_indices.begin(), B.column_indices.begin())),
                             thrust::make_zip_iterator(thrust::make_tuple(B.row_indices.begin(), B.column_indices.begin())) + B.num_entries,
                             thrust::counting_iterator<unsigned int>(0),
                             thrust::counting_iterator<unsigned int>(A.num_entries),
                             thrust::make_discard_iterator(),
                             indices.begin(),
                             cusp::detail::coo_tuple_comp<int>());
    }

    auto M(void)->decltype(
        cusp::make_coo_matrix_view(std::max(A.num_rows, B.num_rows), std::max(A.num_cols, B.num_cols), indices.size(),
                                   cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                           A.row_indices.begin(), B.row_indices.begin(), indices.begin()),
                                           cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                   A.row_indices.begin(), B.row_indices.begin(), indices.begin()) + indices.size()),
                                   cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                           A.column_indices.begin(), B.column_indices.begin(), indices.begin()),
                                           cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                   A.column_indices.begin(), B.column_indices.begin(), indices.begin()) + indices.size()),
                                   cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                           A.values.begin(), B.values.begin(), indices.begin()),
                                           cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                   A.values.begin(), B.values.begin(), indices.begin()) + indices.size())))
    {
        return cusp::make_coo_matrix_view(std::max(A.num_rows, B.num_rows), std::max(A.num_cols, B.num_cols), indices.size(),
                                          cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                  A.row_indices.begin(), B.row_indices.begin(), indices.begin()),
                                                  cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                          A.row_indices.begin(), B.row_indices.begin(), indices.begin()) + indices.size()),
                                          cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                  A.column_indices.begin(), B.column_indices.begin(), indices.begin()),
                                                  cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                          A.column_indices.begin(), B.column_indices.begin(), indices.begin()) + indices.size()),
                                          cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                  A.values.begin(), B.values.begin(), indices.begin()),
                                                  cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                          A.values.begin(), B.values.begin(), indices.begin()) + indices.size()));
    }

    auto M_T(void)->decltype(
        cusp::make_coo_matrix_view(std::max(A.num_rows, B.num_rows), std::max(A.num_cols, B.num_cols), indices.size(),
                                   cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                           A.row_indices.begin(), B.row_indices.begin(), indices.begin()),
                                           cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                   A.row_indices.begin(), B.row_indices.begin(), indices.begin()) + indices.size()),
                                   cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                           A.column_indices.begin(), B.column_indices.begin(), indices.begin()),
                                           cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                   A.column_indices.begin(), B.column_indices.begin(), indices.begin()) + indices.size()),
                                   cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                           A.values.begin(), B.values.begin(), indices.begin()),
                                           cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                   A.values.begin(), B.values.begin(), indices.begin()) + indices.size())))
    {
        auto C = cusp::make_coo_matrix_view(std::max(A.num_rows, B.num_rows), std::max(A.num_cols, B.num_cols), indices.size(),
                                            cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                    A.row_indices.begin(), B.row_indices.begin(), indices.begin()),
                                                    cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                            A.row_indices.begin(), B.row_indices.begin(), indices.begin()) + indices.size()),
                                            cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                    A.column_indices.begin(), B.column_indices.begin(), indices.begin()),
                                                    cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                            A.column_indices.begin(), B.column_indices.begin(), indices.begin()) + indices.size()),
                                            cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                    A.values.begin(), B.values.begin(), indices.begin()),
                                                    cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                            A.values.begin(), B.values.begin(), indices.begin()) + indices.size()));

        indices_t.resize(A.num_entries + B.num_entries);
        thrust::sequence(indices_t.begin(), indices_t.end());

        cusp::array1d<IndexType,MemorySpace> column_indices(C.column_indices);
        thrust::sort_by_key(column_indices.begin(), column_indices.end(), indices_t.begin());

        return cusp::make_coo_matrix_view(std::max(A.num_rows, B.num_rows), std::max(A.num_cols, B.num_cols), indices.size(),
                                          cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                  A.column_indices.begin(), B.column_indices.begin(), indices_t.begin()),
                                                  cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                          A.column_indices.begin(), B.column_indices.begin(), indices_t.begin()) + indices.size()),
                                          cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                  A.row_indices.begin(), B.row_indices.begin(), indices_t.begin()),
                                                  cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                          A.row_indices.begin(), B.row_indices.begin(), indices_t.begin()) + indices.size()),
                                          cusp::make_array1d_view(cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                  A.values.begin(), B.values.begin(), indices_t.begin()),
                                                  cusp::make_join_iterator(A.num_entries, B.num_entries,
                                                          A.values.begin(), B.values.begin(), indices_t.begin()) + indices.size()));
    }
};

template<typename MatrixType1, typename MatrixType2>
concatenate_matrix_views<MatrixType1,MatrixType2>
make_concatenate_matrix_views(const MatrixType1& A, const MatrixType2& B)
{
    return concatenate_matrix_views<MatrixType1,MatrixType2>(A,B);
}

int main(void)
{
    typedef cusp::device_memory MemorySpace;
    typedef cusp::coo_matrix<int,float,MemorySpace> MatrixType;
    MatrixType A;

    cusp::gallery::poisson5pt(A, 5, 5);
    std::cout << "Generated base operator with shape ("  << A.num_rows << "," << A.num_cols << ") and "
              << A.num_entries << " entries" << "\n\n";

    // Build block diagonal matrix
    auto A0 = generate_shifted_matrix_view(A, 0*A.num_rows, 0*A.num_cols);
    auto A1 = generate_shifted_matrix_view(A, 1*A.num_rows, 1*A.num_cols);

    // Build block column matrix
    auto B0 = generate_shifted_matrix_view(A, 0*A.num_rows, 2*A.num_cols);
    auto B1 = generate_shifted_matrix_view(A, 1*A.num_rows, 2*A.num_cols);

    // Build bottom-right diagonal matrix
    auto D  = generate_shifted_matrix_view(A, 2*A.num_rows, 2*A.num_cols);

    // Concatenate A0 and A1
    auto A0_A1 = make_concatenate_matrix_views(A0, A1);
    auto K0    = A0_A1.M();

    // Concatenate (A0,A1) with D
    auto A0_A1_D = make_concatenate_matrix_views(K0, D);
    auto K1      = A0_A1_D.M();

    // Build column matrix from B0 and B1
    auto B0_B1 = make_concatenate_matrix_views(B0, B1);
    auto K2    = B0_B1.M();

    // Build transpose row matrix from B0 and B1
    auto K2_T  = B0_B1.M_T();

    // Concatenate A0,A1,D with B0,B1
    auto K1_K2 = make_concatenate_matrix_views(K1, K2);
    auto K3 = K1_K2.M();

    // Add transpose of B0,B1 below concatenated matrix
    auto K3_K2_T = make_concatenate_matrix_views(K3, K2_T);
    auto K = K3_K2_T.M();

    cusp::print(K);

    return 0;
}

CUDA9.0 Compatibility

Add limited support for Thrust version that ships with CUDA 9.0. Support will be limited because there is a bug in the Thrust version that ships with CUDA 9.0 related to NVIDIA/thrust#635.

Need a way to reconcile the differences between the Thrust version on Github and the version that ships with CUDA 9.0.

error: namespace "cusp" has no member "monitor"

I was trying to compile the example of CG for 2D Poisson problem. I got this error:

$ nvcc -O2 cusp_test.cu -o cusp_test
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecat
ed, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to s
uppress warning).
cusp_test.cu
cusp_test.cu(23): error: namespace "cusp" has no member "monitor"

cusp_test.cu(23): error: type name is not allowed

cusp_test.cu(23): error: identifier "monitor" is undefined

3 errors detected in the compilation of "C:/Users/username/AppData/Local/Temp/tm
pxft_00001864_00000000-13_cusp_test.cpp1.ii".

Incompatible /usr/local/cuda/lib/libcudart.so

In order to compile cuda solver of FoamExtend 4 on an Ubuntu 16.04 with CUDA toolkit 7.5, I run the following commands first

git clone https://github.com/cusplibrary/cusplibrary/
cp -r /home/mahmood/cusplibrary/cusp/ /usr/local/cuda/include/
cd /opt/foam/foam-extend-4.0/src/cudaSolvers/
fe40
./Allwmake

At the end, I get an error that libcudart is not compatible.


In file included from /usr/local/cuda/include/cusp/functional.h:927:0,
                 from /usr/local/cuda/include/cusp/detail/type_traits.h:30,
                 from /usr/local/cuda/include/cusp/coo_matrix.h:608,
                 from cudaSolver/cudaTypes.H:57,
                 from cudaSolver/cudaSolver.H:42,
                 from cudaSolver/cudaSolver.C:26:
/usr/local/cuda/include/cusp/detail/functional.inl: In member function ‘bool cusp::detail::speed_threshold_functor::operator()(IndexType) const’:
/usr/local/cuda/include/cusp/detail/functional.inl:130:86: warning: use of old-style cast [-Wold-style-cast]
         return relative_speed * (num_rows-rows) < num_rows || (size_t) (num_rows-rows) < breakeven_threshold;
                                                                                      ^
/usr/bin/ld: skipping incompatible /usr/local/cuda/lib/libcudart.so when searching for -lcudart
/usr/bin/ld: cannot find -lcudart
collect2: error: ld returned 1 exit status
/home/mahmood/foam/foam-extend-4.0/wmake/Makefile:186: recipe for target '/home/mahmood/foam/foam-extend-4.0/lib/linux64GccDPOpt/libcudaSolvers.so' failed
make: *** [/home/mahmood/foam/foam-extend-4.0/lib/linux64GccDPOpt/libcudaSolvers.so] Error 1
mahmood@tiger:~/foam/foam-extend-4.0/src/cudaSolvers$ ldd /usr/local/cuda/lib/libcudart.so
	linux-gate.so.1 =>  (0xf7f44000)
	libc.so.6 => /lib32/libc.so.6 (0xf7d16000)
	libdl.so.2 => /lib32/libdl.so.2 (0xf7d10000)
	libpthread.so.0 => /lib32/libpthread.so.0 (0xf7cf3000)
	librt.so.1 => /lib32/librt.so.1 (0xf7cea000)
	/lib/ld-linux.so.2 (0xf7f46000)
mahmood@tiger:~/foam/foam-extend-4.0/src/cudaSolvers$

What can I do to fix that?

Add sparse_vector datatype

It would be useful to have a sparse vector type for several graph routines, such as MIS.
Initial support should include all blas function and SpMV support for CSR matrices.

Transpose matrix-vector multiply

As far as I can tell, the cusp::multiply function performs B = A * X only. I was wondering if there is the equivalent of X = A' * B, i.e. a (conjugate) transpose multiply? The only way I can see to do it is to transpose and then multiply which is not ideal. BLAS has additional flags for whether to transpose and/or conjugate the matrices and Numerical Recipes has a dedicated function (SPRSTX) too.

How to set stream

I have to pass stream (CUDAStream_t) to high level API, but do not know how to do it.

bicgstab_m does not compile

Changing bicgstab to bicgstab_m in examples/bicgstab.cu gives:

scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
nvcc.exe -o Solvers\bicgstab.obj -c -arch=sm_13 -Xcompiler -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -Xcompiler /Ox -Xcompiler /bigobj -I d:\documents\cusplibrary-master -I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" Solvers\bicgstab.cu
bicgstab.cu
d:/documents/cusplibrary-master\cusp/krylov/detail/bicgstab_m.inl(614): error: class "cusp::verbose_monitor" has no member "end"
detected during instantiation of "void cusp::krylov::bicgstab_m(LinearOperator &, VectorType1 &, VectorType2 &, VectorType3 &, Monitor &) [with LinearOperator=cusp::hyb_matrix<int, ValueType, MemorySpace>, VectorType1=cusp::array1d<ValueType, MemorySpace>, VectorType2=cusp::array1d<ValueType, MemorySpace>, VectorType3=cusp::verbose_monitor, Monitor=cusp::identity_operator<ValueType, MemorySpace, int>]"
Solvers/bicgstab.cu(32): here

d:/documents/cusplibrary-master\cusp/krylov/detail/bicgstab_m.inl(614): error: class "cusp::verbose_monitor" has no member "begin"
detected during instantiation of "void cusp::krylov::bicgstab_m(LinearOperator &, VectorType1 &, VectorType2 &, VectorType3 &, Monitor &) [with LinearOperator=cusp::hyb_matrix<int, ValueType, MemorySpace>, VectorType1=cusp::array1d<ValueType, MemorySpace>, VectorType2=cusp::array1d<ValueType, MemorySpace>, VectorType3=cusp::verbose_monitor, Monitor=cusp::identity_operator<ValueType, MemorySpace, int>]"
Solvers/bicgstab.cu(32): here

d:/documents/cusplibrary-master\cusp/krylov/detail/bicgstab_m.inl(677): error: class "cusp::identity_operator<ValueType, MemorySpace, int>" has no member "finished"
detected during instantiation of "void cusp::krylov::bicgstab_m(LinearOperator &, VectorType1 &, VectorType2 &, VectorType3 &, Monitor &) [with LinearOperator=cusp::hyb_matrix<int, ValueType, MemorySpace>, VectorType1=cusp::array1d<ValueType, MemorySpace>, VectorType2=cusp::array1d<ValueType, MemorySpace>, VectorType3=cusp::verbose_monitor, Monitor=cusp::identity_operator<ValueType, MemorySpace, int>]"
Solvers/bicgstab.cu(32): here

d:/documents/cusplibrary-master\cusp/detail/blas.inl(44): error: class "cusp::verbose_monitor" has no member "size"
detected during:
instantiation of "void cusp::blas::detail::assert_same_dimensions(const Array1 &, const Array2 &) [with Array1=cusp::array1d<ValueType, MemorySpace>, Array2=cusp::verbose_monitor]"
(54): here
instantiation of "void cusp::blas::detail::assert_same_dimensions(const Array1 &, const Array2 &, const Array3 &) [with Array1=cusp::array1d<ValueType, MemorySpace>, Array2=cusp::array1d<ValueType, MemorySpace>, Array3=cusp::verbose_monitor]"
d:/documents/cusplibrary-master\cusp/krylov/detail/bicgstab_m.inl(355): here
instantiation of "void cusp::krylov::trans_m::compute_zb_m(const Array1 &, const Array2 &, const Array3 &, Array4 &, Array5 &, ScalarType, ScalarType, ScalarType) [with Array1=cusp::array1d<ValueType, MemorySpace>, Array2=cusp::array1d<ValueType, MemorySpace>, Array3=cusp::verbose_monitor, Array4=cusp::array1d<ValueType, MemorySpace>, Array5=cusp::array1d<ValueType, MemorySpace>, ScalarType=float]"
d:/documents/cusplibrary-master\cusp/krylov/detail/bicgstab_m.inl(686): here
instantiation of "void cusp::krylov::bicgstab_m(LinearOperator &, VectorType1 &, VectorType2 &, VectorType3 &, Monitor &) [with LinearOperator=cusp::hyb_matrix<int, ValueType, MemorySpace>, VectorType1=cusp::array1d<ValueType, MemorySpace>, VectorType2=cusp::array1d<ValueType, MemorySpace>, VectorType3=cusp::verbose_monitor, Monitor=cusp::identity_operator<ValueType, MemorySpace, int>]"
Solvers/bicgstab.cu(32): here

d:/documents/cusplibrary-master\cusp/krylov/detail/bicgstab_m.inl(359): error: class "cusp::verbose_monitor" has no member "begin"
detected during:
instantiation of "void cusp::krylov::trans_m::compute_zb_m(const Array1 &, const Array2 &, const Array3 &, Array4 &, Array5 &, ScalarType, ScalarType, ScalarType) [with Array1=cusp::array1d<ValueType, MemorySpace>, Array2=cusp::array1d<ValueType, MemorySpace>, Array3=cusp::verbose_monitor, Array4=cusp::array1d<ValueType, MemorySpace>, Array5=cusp::array1d<ValueType, MemorySpace>, ScalarType=float]"
(686): here
instantiation of "void cusp::krylov::bicgstab_m(LinearOperator &, VectorType1 &, VectorType2 &, VectorType3 &, Monitor &) [with LinearOperator=cusp::hyb_matrix<int, ValueType, MemorySpace>, VectorType1=cusp::array1d<ValueType, MemorySpace>, VectorType2=cusp::array1d<ValueType, MemorySpace>, VectorType3=cusp::verbose_monitor, Monitor=cusp::identity_operator<ValueType, MemorySpace, int>]"
Solvers/bicgstab.cu(32): here

d:/documents/cusplibrary-master\cusp/detail/blas.inl(44): error: class "cusp::verbose_monitor" has no member "size"
detected during:
instantiation of "void cusp::blas::detail::assert_same_dimensions(const Array1 &, const Array2 &) [with Array1=cusp::verbose_monitor, Array2=cusp::array1d<ValueType, MemorySpace>]"
(53): here
instantiation of "void cusp::blas::detail::assert_same_dimensions(const Array1 &, const Array2 &, const Array3 &) [with Array1=cusp::verbose_monitor, Array2=cusp::array1d<ValueType, MemorySpace>, Array3=cusp::array1d<ValueType, MemorySpace>]"
d:/documents/cusplibrary-master\cusp/krylov/detail/bicgstab_m.inl(544): here
instantiation of "void cusp::krylov::trans_m::compute_chirho_m(const Array1 &, const Array2 &, Array3 &, Array4 &, ScalarType) [with Array1=cusp::array1d<ValueType, MemorySpace>, Array2=cusp::verbose_monitor, Array3=cusp::array1d<ValueType, MemorySpace>, Array4=cusp::array1d<ValueType, MemorySpace>, ScalarType=float]"
d:/documents/cusplibrary-master\cusp/krylov/detail/bicgstab_m.inl(717): here
instantiation of "void cusp::krylov::bicgstab_m(LinearOperator &, VectorType1 &, VectorType2 &, VectorType3 &, Monitor &) [with LinearOperator=cusp::hyb_matrix<int, ValueType, MemorySpace>, VectorType1=cusp::array1d<ValueType, MemorySpace>, VectorType2=cusp::array1d<ValueType, MemorySpace>, VectorType3=cusp::verbose_monitor, Monitor=cusp::identity_operator<ValueType, MemorySpace, int>]"
Solvers/bicgstab.cu(32): here

d:/documents/cusplibrary-master\cusp/krylov/detail/bicgstab_m.inl(549): error: class "cusp::verbose_monitor" has no member "begin"
detected during:
instantiation of "void cusp::krylov::trans_m::compute_chirho_m(const Array1 &, const Array2 &, Array3 &, Array4 &, ScalarType) [with Array1=cusp::array1d<ValueType, MemorySpace>, Array2=cusp::verbose_monitor, Array3=cusp::array1d<ValueType, MemorySpace>, Array4=cusp::array1d<ValueType, MemorySpace>, ScalarType=float]"
(717): here
instantiation of "void cusp::krylov::bicgstab_m(LinearOperator &, VectorType1 &, VectorType2 &, VectorType3 &, Monitor &) [with LinearOperator=cusp::hyb_matrix<int, ValueType, MemorySpace>, VectorType1=cusp::array1d<ValueType, MemorySpace>, VectorType2=cusp::array1d<ValueType, MemorySpace>, VectorType3=cusp::verbose_monitor, Monitor=cusp::identity_operator<ValueType, MemorySpace, int>]"
Solvers/bicgstab.cu(32): here

d:/documents/cusplibrary-master\cusp/krylov/detail/bicgstab_m.inl(737): error: no operator "++" matches these operands
operand types are: ++ cusp::identity_operator<ValueType, MemorySpace, int>
detected during instantiation of "void cusp::krylov::bicgstab_m(LinearOperator &, VectorType1 &, VectorType2 &, VectorType3 &, Monitor &) [with LinearOperator=cusp::hyb_matrix<int, ValueType, MemorySpace>, VectorType1=cusp::array1d<ValueType, MemorySpace>, VectorType2=cusp::array1d<ValueType, MemorySpace>, VectorType3=cusp::verbose_monitor, Monitor=cusp::identity_operator<ValueType, MemorySpace, int>]"
Solvers/bicgstab.cu(32): here

8 errors detected in the compilation of "C:/Users/polyakov/AppData/Local/Temp/tmpxft_00000a7c_00000000-8_bicgstab.cpp1.ii".
scons: *** [Solvers\bicgstab.obj] Error 2
scons: building terminated because of errors.

std::bad_alloc: out of memory for matrices larger than 7 GB

Hi
I am using cusp library for all available format (csr,coo,ell,dia,hyb). I am usinf gpu server with 64 GB RAM and nvidia GetForce GTX 1080 with four devices (each approx 10 GB). I am using Sparse Suit Collection dataset for sparse matrices. A Matrix sized 7.9 GB "Schenk/nlpkkt240/nlpkkt240.mtx" (and more than 7 GB) is giving an error 'thrust::system::detail::bad_alloc' what(): std::bad_alloc: out of memory. Although I have enough resources (64 GB ram and 4 Nvidia GetGorse GTX devices) still i am not able to handle matrix size bigger than 7 GB. can you please help me where and what i am doing wrong.
The method which i am using is as follows:

double time_ = 0;
// read mtx on device in COO format
cusp::coo_matrix<int, float, cusp::device_memory> coo_device;
cusp::io::read_matrix_market_file(coo_device, mtx_file);

// allocate storage for output Y and input X on device
cusp::array1d<float, cusp::device_memory> Y_device(coo_device.num_rows, 0);
cusp::array1d<float, cusp::device_memory> X_device(coo_device.num_cols, 1);
cusp::csr_matrix<int, float, cusp::device_memory> csr_device;
try {
csr_device = coo_device;
// cusp::convert(coo_device, csr);
} catch (cusp::format_conversion_exception) {
std::cout << "\tUnable to convert to CSR format" << std::endl;
return -1;
}

timer t
cusp::multiply(csr_device, X_device, Y_device);
cudaThreadSynchronize();
time_ = t.seconds_elapsed() / num_trials;

Is this right what i am doing.

I have some other questions as well.
(1) Does this code section guaranteed using GPU as i am reading and allocating memory on device only (cusp::coo_matrix<int, float, cusp::device_memory> coo_device;)?
(2) Do am i using right function (cusp::multiply(csr_device, X_device, Y_device)) to perform SpMV operation on GPU.

Please help.
Thanks

Conversion from CSR to HYB results in an empty matrix

When I try to convert a sparse matrix in CSR format into HYB format using CUSP::convert(), I get an empty matrix - (0 rows, 0 cols and 0 entries). If I print the ELL and COO components of the converted matrix, I get 0 entries for ELL portion. All the entries are printed under the COO portion. I know it could be because of sparsity pattern of the matrix, but then why do I get <0, 0> with 0 entries for the aggregate matrix in HYB format. Furthermore, I have observed this behavior for every matrix that I have tried.

The following program reproduces the problem.

`#include <cusp/csr_matrix.h>
#include <cusp/gallery/poisson.h>
#include <cusp/multiply.h>
#include <cusp/print.h>
#include <stdio.h>

int main()
{
cusp::csr_matrix<int,double,cusp::device_memory> A_csr;
cusp::hyb_matrix<int,double,cusp::device_memory> A_hyb;

//Create a sparse 5*5 test matrix
cusp::gallery::poisson5pt(A_csr, 5, 1);

//Convert the test matrix to HYB format
cusp::convert(A_csr, A_hyb);

printf("\n A in CSR format\n");
cusp::print(A_csr);

printf("\n A in HYB format\n");
cusp::print(A_hyb); //prints an empty matrix

printf("\n ELL component of A in HYB format\n");
cusp::print(A_hyb.ell); //prints an empty matrix

printf("\n COO component of A in HYB format\n");
cusp::print(A_hyb.coo); //prints all the entries of A
}`

I get the following output:

` A in CSR format
sparse matrix <5, 5> with 13 entries
0 0 (4)
0 1 (-1)
1 0 (-1)
1 1 (4)
1 2 (-1)
2 1 (-1)
2 2 (4)
2 3 (-1)
3 2 (-1)
3 3 (4)
3 4 (-1)
4 3 (-1)
4 4 (4)

A in HYB format
sparse matrix <0, 0> with 0 entries

ELL component of A in HYB format
sparse matrix <5, 5> with 0 entries

COO component of A in HYB format
sparse matrix <5, 5> with 13 entries
0 0 (4)
0 1 (-1)
1 0 (-1)
1 1 (4)
1 2 (-1)
2 1 (-1)
2 2 (4)
2 3 (-1)
3 2 (-1)
3 3 (4)
3 4 (-1)
4 3 (-1)
4 4 (4)`

Smoothed aggregation preconditioner incompatible with CUDA 4.2 and below

As of f525d61 using the smooth aggregation preconditioners fails on CUDA 4.2 and below. This minimum sample smoothed_aggregation_minsample.cu:

#include "cusp/precond/aggregation/smoothed_aggregation.h"

int main() {
  exit(0);
}

Fails as follows:

$ nvcc /tmp/cusp_minsample.cu 
/usr/local/cuda-4.2/include/cusp/detail/multilevel.inl: In member function ‘void cusp::multilevel<MatrixType, SmootherType, SolverType>::solve(const Array1&, Array2&, Monitor&)’:
/usr/local/cuda-4.2/include/cusp/detail/multilevel.inl:63:38: error: ‘__T15’ has not been declared
/usr/local/cuda-4.2/include/cusp/detail/multilevel.inl: In member function ‘void cusp::multilevel<MatrixType, SmootherType, SolverType>::_solve(const Array1&, Array2&, size_t)’:
/usr/local/cuda-4.2/include/cusp/detail/multilevel.inl:106:38: error: ‘__T15’ has not been declared
/usr/local/cuda-4.2/include/cusp/precond/aggregation/detail/smoothed_aggregation.inl: In member function ‘void cusp::precond::aggregation::smoothed_aggregation<IndexType, ValueType, MemorySpace, SmootherType, SolverType>::extend_hierarchy()’:
/usr/local/cuda-4.2/include/cusp/precond/aggregation/detail/smoothed_aggregation.inl:149:50: error: ‘__T15’ has not been declared

Complete support for BLAS backend

adding support for lower memory overhead for reading matrix market files into CSR.

Hi,

I am trying to read sk-2005 matrix (http://www.cise.ufl.edu/research/sparse/matrices/LAW/sk-2005.html) using CUSP.

The matrix has 1,949,412,601 entries so if ints are used for the column indices then 4 bytes/entry puts it at ~8GB. For double precision values ~16GB so the column indices and values together should take around 24GB. CUSP allocates a COO matrix to read matrix market files, adding another 8GB for the row indices, then copies (and converts) the matrix into the output storage container[1]. Holding both matrices in memory (32GB for COO format) and at least 24GB for CSR format (ignoring the costs of the row offsets array) puts the total at about 56GB.

While reading the matrix I am getting "unexpected EOF while reading MatrixMarket entries" in the read_coordinate_stream inside /io/detail/matrix_market.inl

Steve during discussion in cusp-users has suggested that
"Allocating temporary storage could be avoided by adding another overload to specialize for CSR format therefore reusing the column_indices and values arrays, however this optimization hasn't been added to CUSP yet"

thrust::complex does not support 'volatile' qualifier

thrust::complex does not support 'volatile' qualifier. Please check here for more information: NVIDIA/thrust#565

Unfortunately, the implementation of cusp::complex was changed to rely on thrust::complex recently. Without the 'volatile' support, the compiling of kernel function spmv_coo_flat_kernel in file <cusp/detail/device/spmv/coo_flat.h> will fail when using cusp::complex as ValueType. The complex array in shared memory can not be volatile any more. The same issue was obeserved for the kernel function spmv_csr_vector_kernel.

I am not sure if I should roll back the implementation of cusp::complex, or remove the 'volatile' qualifier for those shared memory arrays. Which one is applicable or better? Look forward to comments and suggestions. Thanks.

Add csr_matrix produces error

I have two matrices with different sparsity patterns (one is a diagonal matrix, the other is a sparse matrix).
When I use add, some of the values are dropped from the result. My matrices are 576 by 576, with 576 elements and ~6000 elements respectively.

Solvers for complex <double> GMRES

Hello all,

I have a couple of questions. I am trying to compute a GMREs for complex numbers in double precision. I have took as example gmres.cu and modified a bit to apply complex numbers. you can find the code in the attachment. When I execute the code with single precision, everything goes fine. print(A) works fine but i get compile errors with print(x)

Error 27 error : no operator "<<" matches these operands C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include\cusp\detail\print.inl 73 1 CUSPComplexSolver
in the following line

s << std::setw(14) << p[i] << "\n";

On the other hand if I try to set an example with double precision cusp::gallery::poisson5pt(A, 10, 10); I get an error (ErrorComplexDouble.PNG) in synchronize.inl line

throw thrust::system_error(error, thrust::cuda_category(), std::string("synchronize: ") + message);

I have double checked whether I have the flag on [1]. I use CUDA Version 5.0, capability 3.5, CUSP 0.3, Thrust 1.5.3 and Visual Studio 2012. It seems to be that this is right. I can see in Project properties--> CUDA C/C++ --> Device --> Code Generation "compute_10,sm_10;compute_20,sm_20;compute_30,sm_30;compute_35,sm_35;compute_13,sm_13"
Compilation output

1> C:\Users\Eduardo\documents\visual studio 2012\Projects\CUSPComplexSolver\CUSPComplexSolver>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc.exe" -gencode=arch=compute_10,code="sm_10,compute_10" -gencode=arch=compute_20,code="sm_20,compute_20" -gencode=arch=compute_30,code="sm_30,compute_30" -gencode=arch=compute_35,code="sm_35,compute_35" --use-local-env --cl-version 2010 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin" -I"../../common/inc" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -g -DWIN32 -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MTd " -o "Win32/Debug/cuspComplexSolver.cu.obj" "C:\Users\Eduardo\documents\visual studio 2012\Projects\CUSPComplexSolver\CUSPComplexSolver\cuspComplexSolver.cu"

I am wondering whether it is a general issue for double precision or I am doing something wrong Any suggestion?

My second question is how to measure the time to compute the solver? I guess I should follow the rule in NVIDIA [2] You can see how I have applied this already on my code or do we need to synchronize the CUSP methods in special way?.

Thanks in advance for any help!

[1] http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#single-vs-double-precision
[2] http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#timing

Extra zero entries after conversions

I generate a coo matrix A with explicit zeros

cusp::coo_matrix<int, double, cusp::host_memory> A(rows, cols, num_entries);
//...init A with explicit zeroes
cusp::ell_matrix<int, double, cusp::host_memory> B = A;
cusp::coo_matrix<int, double, cusp::host_memory> C = B;

then

cusp::print(A);
cusp::print(C);

show different outputs.
While A is printed correctly In C all zero entries are moved to the end of the matrix showing only zeros
0 0 0
0 0 0
0 0 0

Is this a bug?

Monitor precision issue

I noticed that the precision which is set in the monitor is not the same as the precision used for the computation.

If I use the solver example cg.cu and the following monitor then the reported precision is dependend on the size of the mesh.

cusp::verbose_monitor<ValueType> monitor(b, 100, 1e-12);

For example a 1000x1000 mesh reports:

Solver will continue until residual norm 1e-09 or reaching 100 iterations

When a 10x10 mesh is used the precision is reported to be 1e-11. So it somehow multiplies the precision with the number of rows, although the value is hard-coded?

The Iteration also stops at the reported precision and not the one set in the code.

I used cusp 0.4.

Can you reproduce this? Or what am I doing wrong?

sparse-dense multiply output changes with dense matrix size

I have a sparse transition matrix (csr_matrix) A that I multiply by a dense array2d B into array2d C, where each column in B and C is an independent state vector. I'm hoping to optimize by increasing the number of columns but as B and C get wider the results in column 0 change. Here is a MWE:

#include <stdio.h>
#include <iostream>

#include <cusp/csr_matrix.h>
#include <cusp/coo_matrix.h>
#include <cusp/print.h>
#include <cusp/functional.h>
#include <cusp/multiply.h>
#include <cusp/array2d.h>
#include <cusp/array1d.h>
#include <cusp/coo_matrix.h>
#include <cusp/print.h>
#include <cusp/gallery/poisson.h>

using namespace std;

int main()
{
  int state_size = 15;

  cusp::array2d<float, cusp::device_memory> pi(state_size, state_size);
  pi(1,0) = 0.19; pi(2,0) = 0.2; pi(3,0) = 0.21; pi(4,0) = 0.22; pi(5,0) = 0.18;
  pi(1,1) = 0.19; pi(2,2) = 0.2; pi(3,3) = 0.21; pi(4,4) = 0.22; pi(5,5) = 0.18;
  pi(1,6) = 0.19; pi(2,7) = 0.2; pi(3,8) = 0.21; pi(4,9) = 0.22; pi(5,10) = 0.18;
  pi(5,1) = 0.29; pi(6,2) = 0.3; pi(7,3) = 0.31; pi(8,4) = 0.32; pi(9,5) = 0.28;
  pi(5,6) = 0.29; pi(6,7) = 0.3; pi(7,8) = 0.31; pi(8,9) = 0.32; pi(9,10) = 0.28;
  
  cout << "Dense version of pi:" << endl;
  cusp::print(pi);
  cusp::csr_matrix<int,float,cusp::device_memory> pi_sparse(pi);
    
  for(int batch_size = 1; batch_size <= 6; batch_size++){
      cusp::array2d<float, cusp::device_memory> prev_mat(state_size, batch_size, 0.0f);
      cusp::array2d<float, cusp::device_memory> next_mat(state_size, batch_size, 0.0f);

      for(int i = 0; i < batch_size; i++){
          prev_mat(0, i) = 1;
      }
  
      cusp::multiply(pi_sparse, prev_mat, next_mat);
  
      cout << "Previous matrix:" << endl;
      cusp::print(prev_mat);
      cout << "Next matrix non-zeros:" << endl;
  
      for(int i = 0; i < batch_size; i++){
        for(int j = 0; j < state_size; j++){
             if(next_mat(j,i) > 0){
                 cout << "Value of next_mat[ " << j << ", " << i << "] = " << next_mat(j,i) << endl;
             }
          }
      }
  }
}

In the first pass where batch_size=1, the output is correct:
Value of next_mat[ 1, 0] = 0.19
Value of next_mat[ 2, 0] = 0.2
Value of next_mat[ 3, 0] = 0.21
Value of next_mat[ 4, 0] = 0.22
Value of next_mat[ 5, 0] = 0.18

(in fact all columns of next_mat should look like this).
As batch_size increases (here 3), it adds odd values into rows that should be 0 based on the transition matrix:
Value of next_mat[ 2, 0] = 0.19
Value of next_mat[ 4, 0] = 0.2
Value of next_mat[ 5, 0] = 0.21
Value of next_mat[ 9, 0] = 2.262e+16
Value of next_mat[ 14, 0] = 4.2e-17

or puts the right values into the wrong rows (batch_size=5):
Value of next_mat[ 2, 0] = 0.19
Value of next_mat[ 3, 0] = 0.2
Value of next_mat[ 4, 0] = 0.21
Value of next_mat[ 6, 0] = 0.22
Value of next_mat[ 7, 0] = 0.18

Any help will be appreciated.

Bug needs to be fixed

After downloading the realeases (v0.4, v0.5), it fails to use cusp::zero_functor.
The reason is that cusp/functional.h is moved to cusp/detail/functional.h, and this struct should be cusp::detail::zero_function now.

Performance calculation of spmv using csr_blocked

In file cusplibrary/performance/spmv/benchmark.h you are using two different equations to compute spmv performance:

The general method in function test_spmv line 171:
float GFLOPs = (time == 0) ? 0 : (2 * host_matrix.num_entries / time) / 1e9; //equation (1)

And then in function test_spmv_block line 198:
float GFLOPs = (time == 0) ? 0 : (num_cols * 2 * host_matrix.num_entries / time) / 1e9; //equation (2)

I am wondering why num_cols is multiplied by 2NNZ in the blocked version, even though the number of spmv operations are the same as any other storage format that uses the first equation.

using cuda9 branch of cusp with cuda-7.5

I was attempting to see if I can use the same cusp snapshot from cuda-7.5 to 9.1 [ tried only these 2 versions but nothing in between]

So when I use cusp/cuda9 branch with cuda-7.5 - I get the following error:

                 from /sandbox/balay/petsc/src/mat/impls/aij/mpi/mpicusp/mpiaijAssemble.cu:11:
/sandbox/balay/petsc/arch-cuda-double/include/cusp/system/cuda/arch.h:27:44: fatal error: thrust/system/cuda/detail/arch.h: No such file or directory
 #include <thrust/system/cuda/detail/arch.h>
                                            ^

THRUST_VERSION for cuda-7.5 is:

$ grep '#define THRUST_VERSION' /soft/apps/packages/cuda-7.5/include/thrust/version.h 
#define THRUST_VERSION 100802

So The following change gets this compile going.

$ diff -Nru cusp/system/cuda/arch.h~ cusp/system/cuda/arch.h
--- cusp/system/cuda/arch.h~	2018-03-07 17:05:59.346856000 -0600
+++ cusp/system/cuda/arch.h	2018-03-07 17:09:28.887263722 -0600
@@ -21,7 +21,7 @@
 #include <thrust/extrema.h>
 
 #if THRUST_VERSION >= 100900
-#elif THRUST_VERSION >= 100803
+#elif THRUST_VERSION >= 100802
 #include <thrust/system/cuda/detail/detail/launch_calculator.h>
 #elif THRUST_VERSION >= 100600
 #include <thrust/system/cuda/detail/arch.h>

Is this a suitable fix for cusp/cuda9 branch?

Remove the dispatch system and use Thrust-like ADL backend

Copile error on cuda 9

trying to compile a custom CUSP interface I get the following error (using the cuda9 branch):
1>d:\eas\cusplibrary-cuda9\cusp/system/detail/generic/multiply.inl(42): error C2923: 'cusp::system::detail::generic::has_member_operator_exec_impl': 'cusp::array2d<ValueType,cusp::system::cpp::detail::par_t,cusp::row_major_base<thrust::detail::integral_constant<bool,false>>>::T' is not a valid template type argument for parameter 'T'
using cuda version 9.0.176

Mixed real-complex matrix multiply

I would like to multiply a real matrix and a complex vector. The multiply.cu code from here works if all arrays are real and also if they are all are complex, but not for the mixed case. Is there any prospect of cusp::multiply(A, x, y) adding support for mixed operations?

Compilation error:

/usr/local/cuda/bin/nvcc -O2 testMultiply.cu
/usr/include/cusp/system/detail/sequential/multiply/array2d_mv.h(57): error: function "cusp::constant_functor<T>::operator() [with T=float]" cannot be called with the given argument list
            argument types are: (thrust::complex<float>)
            object type is: cusp::constant_functor<float>

Code testMultiply.cu:

#include <cusp/array1d.h>
#include <cusp/array2d.h>
#include <cusp/multiply.h>
#include <cusp/print.h>
#include <cusp/complex.h>
int main(void)
{
    // initialize matrix
    //cusp::array2d<cusp::complex<float>, cusp::host_memory> A(2,2);
    cusp::array2d<float, cusp::host_memory> A(2,2);
    A(0,0) = 10;  A(0,1) = 20;
    A(1,0) = 40;  A(1,1) = 50;
    // initialize input vector
    cusp::array1d<cusp::complex<float>, cusp::host_memory> x(2);
    x[0] = 1;
    x[1] = 2;
    // allocate output vector
    cusp::array1d<cusp::complex<float>, cusp::host_memory> y(2);
    // compute y = A * x
    cusp::multiply(A, x, y);
    // print y
    cusp::print(y);
    return 0;
}

Does gmres work with complex numbers?

Changing the original file gmres.cu with the following modification:

typedef cusp::complex ValueType;

result in the following compilation errors:

/usr/local/cuda/bin/../include/cusp/krylov/detail/gmres.inl(145): error: no suitable conversion function from "ValueType" to "float" exists
detected during:
instantiation of "void cusp::krylov::gmres(LinearOperator &, Vector &, Vector &, size_t, Monitor &, Preconditioner &) [with LinearOperator=cusp::hyb_matrix<int, ValueType, MemorySpace>, Vector=cusp::array1d<ValueType, MemorySpace>, Monitor=cusp::verbose_monitor, Preconditioner=cusp::identity_operator<ValueType, MemorySpace, int>]"
(97): here
instantiation of "void cusp::krylov::gmres(LinearOperator &, Vector &, Vector &, size_t, Monitor &) [with LinearOperator=cusp::hyb_matrix<int, ValueType, MemorySpace>, Vector=cusp::array1d<ValueType, MemorySpace>, Monitor=cusp::verbose_monitor]"
gmres_prova.cu(36): here

/usr/local/cuda/bin/../include/cusp/krylov/detail/gmres.inl(174): error: no suitable conversion function from "ValueType" to "float" exists
detected during:
instantiation of "void cusp::krylov::gmres(LinearOperator &, Vector &, Vector &, size_t, Monitor &, Preconditioner &) [with LinearOperator=cusp::hyb_matrix<int, ValueType, MemorySpace>, Vector=cusp::array1d<ValueType, MemorySpace>, Monitor=cusp::verbose_monitor, Preconditioner=cusp::identity_operator<ValueType, MemorySpace, int>]"
(97): here
instantiation of "void cusp::krylov::gmres(LinearOperator &, Vector &, Vector &, size_t, Monitor &) [with LinearOperator=cusp::hyb_matrix<int, ValueType, MemorySpace>, Vector=cusp::array1d<ValueType, MemorySpace>, Monitor=cusp::verbose_monitor]"
gmres_prova.cu(36): here

cusplibrary / cusplibrary Goto Github PK

cusplibrary's Introduction

CUSP : A C++ Templated Sparse Matrix Library

A Simple Example

Stable Releases

Contributors

Citing

cusplibrary's People

Contributors

Stargazers

Watchers

Forkers

cusplibrary's Issues

Recommend Projects

Recommend Topics

Recommend Org