Comments (7)
Hi there, can you share the command you use to run and the output that you receive?
Also consider running splatt check <tensor>
on the tensor to ensure that there are not empty slices or duplicate non-zeros entries. If you use splatt check --fix=fixed.tns <tensor>
, it will fix those issues (duplicates are summed).
from splatt.
The tensor is nell-2.tns from frostt.io. The command is run from inside a pbs script or even from a single node but both give the error.
single node version ./splatt cpd nell-2.tns -r 35
multinode is mpiexec -n # ./splatt cpd nell-2.tns -r 35
where #=2,4,8 i.e. the # nodes specified for the run
from splatt.
I'm not able to reproduce the issue on my system. Can you share the output of your configuration and build?
I configured with ./configure --with-mpi --intel
(the --intel
is just to use the Intel compilers found on my system):
$ ./configure --with-mpi --intel
Found CMAKE: '/opt/cmake-3.7.0-Linux-x86_64/bin/cmake'
Removing old build directory 'build/Linux-x86_64'...
mkdir: created directory ‘build/Linux-x86_64’
~/src/splatt-gh/build/Linux-x86_64 ~/src/splatt-gh
Calling cmake with arguments ' -DUSE_MPI=1 -DINTEL_OPT=1 -DCMAKE_C_COMPILER=icc'
-- The C compiler identification is Intel 18.0.2.20180210
-- The CXX compiler identification is GNU 4.8.5
-- Check for working C compiler: /opt/intel-2018/compilers_and_libraries_2018.2.199/linux/bin/intel64/icc
-- Check for working C compiler: /opt/intel-2018/compilers_and_libraries_2018.2.199/linux/bin/intel64/icc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
Building in RELEASE mode.
-- Try OpenMP C flag = [-qopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -qopenmp
Building with MPI support.
-- Found MPI_C: /opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpifort.so;/opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/release_mt/libmpi.so;/opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpigi.a;/usr/lib64/libdl.so;/usr/lib64/librt.so;/usr/lib64/libpthread.so
-- Found MPI_CXX: /opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpicxx.so;/opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpifort.so;/opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/release_mt/libmpi.so;/opt/intel-2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpigi.a;/usr/lib64/libdl.so;/usr/lib64/librt.so;/usr/lib64/libpthread.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/shadensm/src/splatt-gh/build/Linux-x86_64
One MPI rank:
$ mpiexec -n 1 ./build/Linux-x86_64/bin/splatt cpd --nowrite --seed=10 -i 5 -t 18 ~/tensors/nell-2.tns -r 35 -t 18
****************************************************************
splatt v2.0.0 built from b4bbad4-master
Tensor information ---------------------------------------------
FILE=/home/shadensm/tensors/nell-2.tns
DIMS=12092x9184x28818 NNZ=76879419 DENSITY=2.402239e-05
COORD-STORAGE=2.29GB
MPI information ------------------------------------------------
DISTRIBUTION=MEDIUM DIMS=1x1x1
AVG NNZ=76879419
MAX NNZ=76879419 (0.00% diff)
AVG COMMUNICATION VOL=0
MAX COMMUNICATION VOL=0 (0.00% diff)
Factoring ------------------------------------------------------
NFACTORS=35 MAXITS=5 TOL=1.0e-05 REG=0.0e+00 RANKS=1 THREADS=18
CSF-ALLOC=TWOMODE TILE=NO
CSF-STORAGE=2.54GB FACTOR-STORAGE=13.38MB
its = 1 (0.423s) fit = 0.12947 delta = +1.2947e-01
its = 2 (0.387s) fit = 0.25750 delta = +1.2803e-01
its = 3 (0.386s) fit = 0.29686 delta = +3.9361e-02
its = 4 (0.401s) fit = 0.30744 delta = +1.0581e-02
its = 5 (0.411s) fit = 0.31434 delta = +6.8988e-03
Final fit: 0.31434
Timing information ---------------------------------------------
TOTAL 32.633s
CPD 2.008s
****************************************************************
Two MPI ranks:
$ mpiexec -n 2 ./build/Linux-x86_64/bin/splatt cpd --nowrite --seed=10 -i 5 -t 18 ~/tensors/nell-2.tns -r 35 -t 18
****************************************************************
splatt v2.0.0 built from b4bbad4-master
Tensor information ---------------------------------------------
FILE=/home/shadensm/tensors/nell-2.tns
DIMS=12092x9184x28818 NNZ=76879419 DENSITY=2.402239e-05
COORD-STORAGE=2.29GB
MPI information ------------------------------------------------
DISTRIBUTION=MEDIUM DIMS=1x1x2
AVG NNZ=38439709
MAX NNZ=38446186 (0.02% diff)
AVG COMMUNICATION VOL=42240
MAX COMMUNICATION VOL=42240 (0.00% diff)
Factoring ------------------------------------------------------
NFACTORS=35 MAXITS=5 TOL=1.0e-05 REG=0.0e+00 RANKS=2 THREADS=18
CSF-ALLOC=TWOMODE TILE=NO
CSF-STORAGE=2.55GB FACTOR-STORAGE=19.02MB
its = 1 (0.225s) fit = 0.12947 delta = +1.2947e-01
its = 2 (0.193s) fit = 0.25750 delta = +1.2803e-01
its = 3 (0.195s) fit = 0.29686 delta = +3.9361e-02
its = 4 (0.189s) fit = 0.30744 delta = +1.0581e-02
its = 5 (0.189s) fit = 0.31434 delta = +6.8988e-03
Final fit: 0.31434
Timing information ---------------------------------------------
TOTAL 25.264s
CPD 0.991s
****************************************************************
Four MPI ranks:
$ mpiexec -n 4 ./build/Linux-x86_64/bin/splatt cpd --nowrite --seed=10 -i 5 -t 18 ~/tensors/nell-2.tns -r 35 -t 18
****************************************************************
splatt v2.0.0 built from b4bbad4-master
Tensor information ---------------------------------------------
FILE=/home/shadensm/tensors/nell-2.tns
DIMS=12092x9184x28818 NNZ=76879419 DENSITY=2.402239e-05
COORD-STORAGE=2.29GB
MPI information ------------------------------------------------
DISTRIBUTION=MEDIUM DIMS=1x1x4
AVG NNZ=19219854
MAX NNZ=19225913 (0.03% diff)
AVG COMMUNICATION VOL=62868
MAX COMMUNICATION VOL=63000 (0.21% diff)
Factoring ------------------------------------------------------
NFACTORS=35 MAXITS=5 TOL=1.0e-05 REG=0.0e+00 RANKS=4 THREADS=18
CSF-ALLOC=TWOMODE TILE=NO
CSF-STORAGE=2.86GB FACTOR-STORAGE=30.16MB
its = 1 (0.130s) fit = 0.12947 delta = +1.2947e-01
its = 2 (0.113s) fit = 0.25750 delta = +1.2803e-01
its = 3 (0.107s) fit = 0.29686 delta = +3.9361e-02
its = 4 (0.107s) fit = 0.30744 delta = +1.0581e-02
its = 5 (0.107s) fit = 0.31434 delta = +6.8988e-03
Final fit: 0.31434
Timing information ---------------------------------------------
TOTAL 21.211s
CPD 0.564s
****************************************************************
from splatt.
I will rebuild with a different setting for --with-mpi --type and for --blas-int=64. I have more than one compiler available. Not sure if this will help. It works fine with openMP. Here's the output from a run.
$ mpiexec -n 1 ./splatt cpd nell-2.tns -r 35
****************************************************************
splatt v2.0.0
Tensor information ---------------------------------------------
FILE=nell-2.tns
DIMS=12092x9184x28818 NNZ=76879419 DENSITY=2.402239e-05
COORD-STORAGE=2.29GB
MPI information ------------------------------------------------
DISTRIBUTION=MEDIUM DIMS=1x1x1
AVG NNZ=76879419
MAX NNZ=76879419 (0.00% diff)
AVG COMMUNICATION VOL=0
MAX COMMUNICATION VOL=0 (0.00% diff)
Factoring ------------------------------------------------------
NFACTORS=35 MAXITS=50 TOL=1.0e-05 REG=0.0e+00 RANKS=1 THREADS=1
CSF-ALLOC=TWOMODE TILE=NO
CSF-STORAGE=2.54GB FACTOR-STORAGE=13.38MB
its = 1 (4.865s) fit = 0.13946 delta = +1.3946e-01
SPLATT: Gram matrix is not SPD. Trying `GELSS`.
SPLATT: DGELSS returned 0
SPLATT: DGELSS effective rank: 35
its = 2 (5.053s) fit = 0.24575 delta = +1.0630e-01
SPLATT: Gram matrix is not SPD. Trying `GELSS`.
SPLATT: DGELSS returned 0
SPLATT: DGELSS effective rank: 35
its = 3 (5.006s) fit = 0.28344 delta = +3.7684e-02
SPLATT: Gram matrix is not SPD. Trying `GELSS`.
SPLATT: DGELSS returned 0
SPLATT: DGELSS effective rank: 35
its = 4 (4.977s) fit = 0.30169 delta = +1.8251e-02
from splatt.
For what it's worth, the message in your case is benign and the factorization is proceeding correctly. The DGELSS message just indicates that the Cholesky factorization failed and we are falling back to using the SVD to solve the linear systems.
The line indicating the return code and effective rank should be moved to a debugging level of verbosity.
from splatt.
Thanks. I thought that was the case. I guess it interferes with the cpd timing though. But maybe its negligible. Anyway Thanks.
from splatt.
For most tensors (including NELL-2), that portion of ALS is negligible compared to the MTTKRP kernel.
Can you share what compiler/MPI implementation/etc. are being used? The output of your configure
would be helpful to diagnose the underlying problem.
from splatt.
Related Issues (18)
- Compile options matlab HOT 1
- MATLAB:unassignedOutputs when loading fails in splatt_load HOT 1
- Static TLS problem in Matlab HOT 3
- splatt_cpd throws segmentation fault in matlab for nips.tns HOT 2
- MEX (Octave/Matlab) interface also needs to have unit testing. HOT 1
- CPD initialization
- Minor issue HOT 1
- Fits bigger than 1000 HOT 2
- Question about the output... HOT 2
- Benchmarking SPLATT with MPI HOT 2
- Segmentation fault in tucker decomposition HOT 5
- Undefined reference that occurred in C/C++ API HOT 2
- Segmentation fault in distributed-memory version with MPI ranks >= 1K HOT 1
- Project build fail on sc16-sgd branch
- Splatt installation on Windows; --download-blas-lapack; Fortran errors HOT 1
- Support for MTTKRP with 2 sparse matrices
- --download-blas-lapack needs Fortran enabled. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from splatt.