Giter Club home page Giter Club logo

aggregation's Introduction

APM example workflow example workflow

Aggregation

The aggregation repository contains a set of algorithms for grouping vertices of DAGs coming from loop-carried dependencies. Load-balance Level Coarsening (LBC) is one of the aggregation algorithms.

Hybrid Aggregation (HDagg) is another algorithm that can operate on chordal and non-chordal DAGs directly.

The algorithms in this repository can be used within code generators or libraries.

Install

Prerequisites

First following items should be installed:

  • CMake
  • C++ compiler (GCC, ICC, or CLang)
  • METIS (optional) dependency for running the demo efficiently and is handled by the cmake. If you have installed the package using a packet manager (e.g., apt of homebrew), CMake should be able to detect it. Otherwise, it installs METIS from source internally.
  • OpenMP (optional) for running some parts of the code in parallel. If you use GCC/ICC then OpenMP should be supported natively. If you use Apple CLang, you probably need to install OpenMP using homebrew install libomp. You can
  • also install LLVM usng brew install llvm which support OpenMP natively.

Build

Then build Aggregation, using the following:

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

You can always set -DCMAKE_CXX_COMPILER= and -DCMAKE_C_COMPILER= to use a different compiler. For example: cmake -DCMAKE_CXX_COMPILER=/usr/local/Cellar/gcc\@9/9.3.0_2/bin/g++-9 -DCMAKE_C_COMPILER=/usr/local/Cellar/gcc\@9/9.3.0_2/bin/gcc-9 ..

Example

The example directory shows how to call LBC API and iterate over the created partitioning. For more examples on how LBC is used for making loops with sparse dependencies parallel.

aggregation's People

Contributors

behroozzare avatar cetiniz avatar cheshmi avatar hgeorge21 avatar ihasan98 avatar kobeliu85 avatar learning-chip avatar mmsalehid avatar shujianqian avatar sympiler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

aggregation's Issues

Invalid vector size bug during Tree-HDagg partitioning for certain input matrices

Bug description

The SpTrSv_LL_Tree_HDAGG case in SpTRSV_runtime.cpp example leads to error for certain input matrices:

Running LL HDAGG with BIN Code with #core: 4 - The runtime:7.7951e-05
core: 4; bin: 1
Compute DAG
terminate called after throwing an instance of 'std::length_error'
  what():  cannot create std::vector larger than max_size()
Aborted (core dumped)

This happens, for example, with input matrix A = random_square_sparse(200, 0.5, 1.0, 2U);, and with Metis reordering disabled. Depending on the random parameters, some matrices fail while others run fine.

Locating bug

The error occurs inside the HDAGG function, at the following call with invalid vector size nlevels = -1:

std::vector<std::vector<int>> chunk_sizes(nlevels);

The invalid value -1 is returned by the build_levelSet_CSC function at line 44:

cur_level++; //all nodes_ with zero indegree are processed.
//assert(cur_level < n);
if (cur_level >= n)
return -1; // The input graph has a cycle
levelPtr[cur_level] = cur_levelCol; // The levelPtr starts from level 1. Behrooz
while (inDegree[begin] == 1) // Why? Behrooz
{
begin++;
if (begin >= n)
break;
}
while (inDegree[end] == 1 && begin <= end) // The same why as above. Behrooz
end--;
//Updating degrees after removing the nodes_
for (int l = levelPtr[cur_level - 1]; l < levelPtr[cur_level]; ++l) // I don't get this part. Behrooz
{
int cc = levelSet[l];
for (int j = Lp[cc]; j < Lp[cc + 1]; ++j)
{
if (Li[j] != cc) //skip diagonals
inDegree[Li[j]]--; //removing corresponding edges
}
}
//print_vec("dd\n",0,n,inDegree);

When error occurs, the variable state is cur_level == n == 2, then -1 is returned. Moreover, the code in line 45~64 is placed after return so will never get executed, so I guess some logic must be wrong here. The assert at line 42 should probably be enabled?

Steps to reproduce

Modify SpTRSV_runtime.cpp with the following input matrix:

n = 200;
double density = 0.5;
matrix_name = "Random_" + std::to_string(n);
A = random_square_sparse(n, density, 1.0, 2U);

...
// disable METIS
#undef METIS
#ifdef METIS
...
#else
  // avoid name conflict with another `tmp` variable later
  CSC *tmp_csc =
      make_half(Lower_A_CSC->n, Lower_A_CSC->p, Lower_A_CSC->i, Lower_A_CSC->x);
  delete Lower_A_CSC;
  Lower_A_CSC = tmp_csc;
  Lower_A_CSR = csc_to_csr(Lower_A_CSC);
#endif

I can also send a draft PR to show such bug, and a parameterized unit test to sweep over various input matrices (update: see #10).

Then, build and run:

cmake -B build_debug -DCMAKE_BUILD_TYPE=Debug
cmake --build build_debug --target Hdagg_SpTRSV
./build_debug/example/Hdagg_SpTRSV   # get bug

Deeper look with GDB:

gdb ./build_debug/example/Hdagg_SpTRSV
(check state of `n` before entering loop)
break src/hdagg/hdagg.cpp:28
(check state of `cur_level` before returning -1)
break src/hdagg/hdagg.cpp:44

(the first break point is inside `SpTrSv_LL_HDAGG` case, not yet to `SpTrSv_LL_Tree_HDAGG`, so no bug)
run
print n
(n equals 200 for this input)

(the second break point is inside the buggy `SpTrSv_LL_Tree_HDAGG`)
c
print n
(n equals 2 because ngroups is used as first argument of HDAGG() function)

(the third break point is just before returning -1)
print cur_level
(cur_level equals 2, so -1 is returned as result of build_levelSet_CSC() function)

c
(crashes when creating a vector of size -1)

Why is Metis reordering necessary for the sptrsv example?

The SpTRSV_runtime.cpp example orders input matrix A before extracting its lower half:

/// Re-ordering matrix A
// std::cout << "METIS IS NOT ACTIVATED" << std::endl;
#ifdef METIS
std::cout << "METIS IS ACTIVATED" << std::endl;
// We only reorder A since dependency matters more in l-solve.
A = make_full(Lower_A_CSC);
delete Lower_A_CSC;
metis_perm_general(A, perm);
Lower_A_CSC = make_half(A->n, A->p, A->i, A->x);
CSC *Lt = transpose_symmetric(Lower_A_CSC, perm);
CSC *L1_ord = transpose_symmetric(Lt, NULLPNTR);
delete Lower_A_CSC;
Lower_A_CSC = L1_ord;
Lower_A_CSR = csc_to_csr(Lower_A_CSC);
delete Lt;
delete A;
delete[] perm;
#else

I wonder how would the reordering benefits sparse triangular solve? In the ParSy&HDagg papers it is only mentioned that METIS is used for fill-in reduction reordering, but that's for cholesky factorization and not relevant for sptrsv. Disabling the reordering doesn't seem to affect performance in my experiments. It might have postive or negative impacts on data locality, but I didn't find any arguments in the paper.

How is the average memory access latency measured or calculated?

In the HDagg paper Section "Executor Evaluation" it is said that "The average memory access latency is used as a metric to measure locality." and "PAPI’s performance counters are used to measure architecture information needed in computations related to the locality and load balance metrics."

In the sptrsv_profiler.cpp example, the PAPI event list is:

std::vector<int> event_list = {PAPI_L1_DCM, PAPI_L1_TCM, PAPI_L2_DCM,
PAPI_L2_DCA,PAPI_L2_TCM,PAPI_L2_TCA,
PAPI_L3_TCM, PAPI_L3_TCA,
PAPI_TLB_DM, PAPI_TOT_CYC, PAPI_LST_INS,
PAPI_TOT_INS, PAPI_REF_CYC, PAPI_RES_STL,
PAPI_BR_INS, PAPI_BR_MSP};

I wonder how is the memory latency obtained from the above metrics?

Assertion error inside HDAGG::partialSparsification for Hdagg_SpTRSV example

At the latest commit da104fa, and fixed #4, there is still run-time error for examples/SpTRSV_runtime.cpp. The error occurs during the call to HDAGG::partialSparsification in examples/SpTRSV_runtime.h, in particular this assertion inside partialSparsification:

//The starting point of the next clique
col = first + width;
assert(col <= n);
clique_ptr_per_thread.push_back(col);

Error messages

With debug build, the error message is:

METIS IS ACTIVATED
Starting SpTrSv Runtime analysis
Write in the existing file SpTrSv_Final_20.csv
Running LL Serial Code - The runtime:1.25e-05
Running LL Levelset Code with #core: 20 - The runtime:1.58e-05
Hdagg_SpTRSV: /work_dir/aggregation/src/hdagg/hdagg.cpp:2085: void HDAGG::partialSparsification(int, int, const int*, const int*, std::vector<int>&, std::vector<int>&, bool): Assertion `col <= n' failed.
Aborted

The "LL Serial" and "LL Levelset" case work fine because they don't call partialSparsification. The "LL Tree + Levelset" and later cases fail at partialSparsification.

With release build it leads to memory errors, likely due to exceeding array bounds:

METIS IS ACTIVATED
Starting SpTrSv Runtime analysis
Write in the existing file SpTrSv_Final_20.csv
Running LL Serial Code - The runtime:1.24e-05
Running LL Levelset Code with #core: 20 - The runtime:1.46e-05
malloc_consolidate(): invalid chunk size
Aborted

The above are using the built-in random matrix generation. Using an external matrix like G2_circuit give a different error, but still memory-related.

Reading matrix /work_dir/matrix_data/G2_circuit/G2_circuit.mtx
METIS IS ACTIVATED
Starting SpTrSv Runtime analysis
Write in the existing file SpTrSv_Final_20.csv
Running LL Serial Code - The runtime:0.0009317
Running LL Levelset Code with #core: 2 - The runtime:0.0016047
Hdagg_SpTRSV: malloc.c:4118: _int_malloc: Assertion `chunk_main_arena (fwd)' failed.
Aborted

Steps to reproduce

git clone https://github.com/sympiler/aggregation
cd aggregation

# debug mode
cmake -DCMAKE_BUILD_TYPE=Debug -S . -B build_debug
cmake --build build_debug
./build_debug/example/Hdagg_SpTRSV  # generates random matrice, same error when reading a mtx file

# release mode
cmake -DCMAKE_BUILD_TYPE=Release -S . -B build_release
cmake --build build_release -j 4
./build_release/example/Hdagg_SpTRSV
# read external matrix file and use less threads than default
./build_release/example/Hdagg_SpTRSV /work_dir/matrix_data/G2_circuit/G2_circuit.mtx 2

The system environment can be reproduced using this trivial Dockerfile that installs GCC 11.3 and CMake 3.22:

FROM ubuntu:22.04

RUN apt-get update \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    git wget vim \
    gcc g++ gfortran \
    libnuma-dev \
    libhwloc-dev \
    libmetis-dev \
    libomp-dev \
    make cmake \
    && rm -rf /var/lib/apt/lists/*

Use std::isnan in example/SpTRSV_runtime.h

In example/SpTRSV_runtime.h there are two isnan calls which should be std::isnan

int node = final_node_ptr[node_ptr];
assert(!isnan(cost[node]));
Total_cost += cost[node];

int node = final_node_ptr[node_ptr];
assert(!isnan(cost[node]));
Total_cost += cost[node];

Otherwise got this compile error (with gcc 11.3 and cmake 3.22):

In file included from /usr/include/c++/11/cassert:44,
                 from /work_dir/aggregation/include/aggregation/def.h:11,
                 from /work_dir/aggregation/include/aggregation/sparse_utilities.h:12,
                 from /work_dir/aggregation/include/aggregation/metis_interface.h:8,
                 from /work_dir/aggregation/example/SpTRSV_runtime.cpp:7:
/work_dir/aggregation/example/SpTRSV_runtime.h: In member function 'double sym_lib::SpTrSv_LL_Tree_HDAGG_BFS::getPotentialGain()':
/work_dir/aggregation/example/SpTRSV_runtime.h:859:19: error: 'isnan' was not declared in this scope; did you mean 'std::isnan'?
  859 |           assert(!isnan(cost[node]));
      |                   ^~~~~
In file included from /work_dir/aggregation/include/aggregation/test_utils.h:8,
                 from /work_dir/aggregation/example/SpTRSV_runtime.cpp:12:
/usr/include/c++/11/cmath:632:5: note: 'std::isnan' declared here
  632 |     isnan(_Tp __x)
      |     ^~~~~
In file included from /usr/include/c++/11/cassert:44,
                 from /work_dir/aggregation/include/aggregation/def.h:11,
                 from /work_dir/aggregation/include/aggregation/sparse_utilities.h:12,
                 from /work_dir/aggregation/include/aggregation/metis_interface.h:8,
                 from /work_dir/aggregation/example/SpTRSV_runtime.cpp:7:
/work_dir/aggregation/example/SpTRSV_runtime.h: In member function 'void sym_lib::SpTrSv_LL_Tree_HDAGG_BFS::computeMetrics(double&, double&, double&, double&, double&, double&, double&, double&, double&, double&, double&)':
/work_dir/aggregation/example/SpTRSV_runtime.h:908:19: error: 'isnan' was not declared in this scope; did you mean 'std::isnan'?
  908 |           assert(!isnan(cost[node]));
      |                   ^~~~~
In file included from /work_dir/aggregation/include/aggregation/test_utils.h:8,
                 from /work_dir/aggregation/example/SpTRSV_runtime.cpp:12:
/usr/include/c++/11/cmath:632:5: note: 'std::isnan' declared here
  632 |     isnan(_Tp __x)
      |     ^~~~~
gmake[2]: *** [example/CMakeFiles/Hdagg_SpTRSV.dir/build.make:76: example/CMakeFiles/Hdagg_SpTRSV.dir/SpTRSV_runtime.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:261: example/CMakeFiles/Hdagg_SpTRSV.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.