Giter Club home page Giter Club logo

Comments (7)

ShadenSmith avatar ShadenSmith commented on September 24, 2024

Hi there, can you share the command you use to run and the output that you receive?

Also consider running splatt check <tensor> on the tensor to ensure that there are not empty slices or duplicate non-zeros entries. If you use splatt check --fix=fixed.tns <tensor>, it will fix those issues (duplicates are summed).

from splatt.

bapriddy avatar bapriddy commented on September 24, 2024

The tensor is nell-2.tns from frostt.io. The command is run from inside a pbs script or even from a single node but both give the error.

single node version ./splatt cpd nell-2.tns -r 35
multinode is mpiexec -n # ./splatt cpd nell-2.tns -r 35
where #=2,4,8 i.e. the # nodes specified for the run

from splatt.

ShadenSmith avatar ShadenSmith commented on September 24, 2024

I'm not able to reproduce the issue on my system. Can you share the output of your configuration and build?

I configured with ./configure --with-mpi --intel (the --intel is just to use the Intel compilers found on my system):

$ ./configure --with-mpi --intel
Found CMAKE: '/opt/cmake-3.7.0-Linux-x86_64/bin/cmake'
Removing old build directory 'build/Linux-x86_64'...
mkdir: created directory ‘build/Linux-x86_64’
~/src/splatt-gh/build/Linux-x86_64 ~/src/splatt-gh
Calling cmake with arguments ' -DUSE_MPI=1 -DINTEL_OPT=1 -DCMAKE_C_COMPILER=icc'
-- The C compiler identification is Intel 18.0.2.20180210
-- The CXX compiler identification is GNU 4.8.5
-- Check for working C compiler: /opt/intel-2018/compilers_and_libraries_2018.2.199/linux/bin/intel64/icc
-- Check for working C compiler: /opt/intel-2018/compilers_and_libraries_2018.2.199/linux/bin/intel64/icc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
Building in RELEASE mode.
-- Try OpenMP C flag = [-qopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -qopenmp
Building with MPI support.
-- Found MPI_C: /opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpifort.so;/opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/release_mt/libmpi.so;/opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpigi.a;/usr/lib64/libdl.so;/usr/lib64/librt.so;/usr/lib64/libpthread.so
-- Found MPI_CXX: /opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpicxx.so;/opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpifort.so;/opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/release_mt/libmpi.so;/opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpigi.a;/usr/lib64/libdl.so;/usr/lib64/librt.so;/usr/lib64/libpthread.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/shadensm/src/splatt-gh/build/Linux-x86_64

One MPI rank:

$ mpiexec -n 1 ./build/Linux-x86_64/bin/splatt cpd --nowrite --seed=10 -i 5 -t 18 ~/tensors/nell-2.tns  -r 35 -t 18
****************************************************************
splatt v2.0.0 built from b4bbad4-master

Tensor information ---------------------------------------------
FILE=/home/shadensm/tensors/nell-2.tns
DIMS=12092x9184x28818 NNZ=76879419 DENSITY=2.402239e-05
COORD-STORAGE=2.29GB

MPI information ------------------------------------------------
DISTRIBUTION=MEDIUM DIMS=1x1x1
AVG NNZ=76879419
MAX NNZ=76879419  (0.00% diff)
AVG COMMUNICATION VOL=0
MAX COMMUNICATION VOL=0  (0.00% diff)

Factoring ------------------------------------------------------
NFACTORS=35 MAXITS=5 TOL=1.0e-05 REG=0.0e+00 RANKS=1 THREADS=18
CSF-ALLOC=TWOMODE TILE=NO
CSF-STORAGE=2.54GB FACTOR-STORAGE=13.38MB

  its =   1 (0.423s)  fit = 0.12947  delta = +1.2947e-01
  its =   2 (0.387s)  fit = 0.25750  delta = +1.2803e-01
  its =   3 (0.386s)  fit = 0.29686  delta = +3.9361e-02
  its =   4 (0.401s)  fit = 0.30744  delta = +1.0581e-02
  its =   5 (0.411s)  fit = 0.31434  delta = +6.8988e-03
Final fit: 0.31434

Timing information ---------------------------------------------
  TOTAL               32.633s
  CPD                 2.008s
****************************************************************

Two MPI ranks:

$ mpiexec -n 2 ./build/Linux-x86_64/bin/splatt cpd --nowrite --seed=10 -i 5 -t 18 ~/tensors/nell-2.tns  -r 35 -t 18
****************************************************************
splatt v2.0.0 built from b4bbad4-master

Tensor information ---------------------------------------------
FILE=/home/shadensm/tensors/nell-2.tns
DIMS=12092x9184x28818 NNZ=76879419 DENSITY=2.402239e-05
COORD-STORAGE=2.29GB

MPI information ------------------------------------------------
DISTRIBUTION=MEDIUM DIMS=1x1x2
AVG NNZ=38439709
MAX NNZ=38446186  (0.02% diff)
AVG COMMUNICATION VOL=42240
MAX COMMUNICATION VOL=42240  (0.00% diff)

Factoring ------------------------------------------------------
NFACTORS=35 MAXITS=5 TOL=1.0e-05 REG=0.0e+00 RANKS=2 THREADS=18
CSF-ALLOC=TWOMODE TILE=NO
CSF-STORAGE=2.55GB FACTOR-STORAGE=19.02MB

  its =   1 (0.225s)  fit = 0.12947  delta = +1.2947e-01
  its =   2 (0.193s)  fit = 0.25750  delta = +1.2803e-01
  its =   3 (0.195s)  fit = 0.29686  delta = +3.9361e-02
  its =   4 (0.189s)  fit = 0.30744  delta = +1.0581e-02
  its =   5 (0.189s)  fit = 0.31434  delta = +6.8988e-03
Final fit: 0.31434

Timing information ---------------------------------------------
  TOTAL               25.264s
  CPD                 0.991s
****************************************************************

Four MPI ranks:

$ mpiexec -n 4 ./build/Linux-x86_64/bin/splatt cpd --nowrite --seed=10 -i 5 -t 18 ~/tensors/nell-2.tns  -r 35 -t 18
****************************************************************
splatt v2.0.0 built from b4bbad4-master

Tensor information ---------------------------------------------
FILE=/home/shadensm/tensors/nell-2.tns
DIMS=12092x9184x28818 NNZ=76879419 DENSITY=2.402239e-05
COORD-STORAGE=2.29GB

MPI information ------------------------------------------------
DISTRIBUTION=MEDIUM DIMS=1x1x4
AVG NNZ=19219854
MAX NNZ=19225913  (0.03% diff)
AVG COMMUNICATION VOL=62868
MAX COMMUNICATION VOL=63000  (0.21% diff)

Factoring ------------------------------------------------------
NFACTORS=35 MAXITS=5 TOL=1.0e-05 REG=0.0e+00 RANKS=4 THREADS=18
CSF-ALLOC=TWOMODE TILE=NO
CSF-STORAGE=2.86GB FACTOR-STORAGE=30.16MB

  its =   1 (0.130s)  fit = 0.12947  delta = +1.2947e-01
  its =   2 (0.113s)  fit = 0.25750  delta = +1.2803e-01
  its =   3 (0.107s)  fit = 0.29686  delta = +3.9361e-02
  its =   4 (0.107s)  fit = 0.30744  delta = +1.0581e-02
  its =   5 (0.107s)  fit = 0.31434  delta = +6.8988e-03
Final fit: 0.31434

Timing information ---------------------------------------------
  TOTAL               21.211s
  CPD                 0.564s
****************************************************************

from splatt.

bapriddy avatar bapriddy commented on September 24, 2024

I will rebuild with a different setting for --with-mpi --type and for --blas-int=64. I have more than one compiler available. Not sure if this will help. It works fine with openMP. Here's the output from a run.

$ mpiexec -n 1 ./splatt cpd nell-2.tns -r 35
****************************************************************
splatt v2.0.0

Tensor information ---------------------------------------------
FILE=nell-2.tns
DIMS=12092x9184x28818 NNZ=76879419 DENSITY=2.402239e-05
COORD-STORAGE=2.29GB

MPI information ------------------------------------------------
DISTRIBUTION=MEDIUM DIMS=1x1x1
AVG NNZ=76879419
MAX NNZ=76879419  (0.00% diff)
AVG COMMUNICATION VOL=0
MAX COMMUNICATION VOL=0  (0.00% diff)

Factoring ------------------------------------------------------
NFACTORS=35 MAXITS=50 TOL=1.0e-05 REG=0.0e+00 RANKS=1 THREADS=1
CSF-ALLOC=TWOMODE TILE=NO
CSF-STORAGE=2.54GB FACTOR-STORAGE=13.38MB

  its =   1 (4.865s)  fit = 0.13946  delta = +1.3946e-01
SPLATT: Gram matrix is not SPD. Trying `GELSS`.
SPLATT: DGELSS returned 0
SPLATT:   DGELSS effective rank: 35
  its =   2 (5.053s)  fit = 0.24575  delta = +1.0630e-01
SPLATT: Gram matrix is not SPD. Trying `GELSS`.
SPLATT: DGELSS returned 0
SPLATT:   DGELSS effective rank: 35
  its =   3 (5.006s)  fit = 0.28344  delta = +3.7684e-02
SPLATT: Gram matrix is not SPD. Trying `GELSS`.
SPLATT: DGELSS returned 0
SPLATT:   DGELSS effective rank: 35
  its =   4 (4.977s)  fit = 0.30169  delta = +1.8251e-02

from splatt.

ShadenSmith avatar ShadenSmith commented on September 24, 2024

For what it's worth, the message in your case is benign and the factorization is proceeding correctly. The DGELSS message just indicates that the Cholesky factorization failed and we are falling back to using the SVD to solve the linear systems.

The line indicating the return code and effective rank should be moved to a debugging level of verbosity.

from splatt.

bapriddy avatar bapriddy commented on September 24, 2024

Thanks. I thought that was the case. I guess it interferes with the cpd timing though. But maybe its negligible. Anyway Thanks.

from splatt.

ShadenSmith avatar ShadenSmith commented on September 24, 2024

For most tensors (including NELL-2), that portion of ALS is negligible compared to the MTTKRP kernel.

Can you share what compiler/MPI implementation/etc. are being used? The output of your configure would be helpful to diagnose the underlying problem.

from splatt.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.