Giter Club home page Giter Club logo

Comments (9)

grisuthedragon avatar grisuthedragon commented on June 2, 2024 1

Fixed in version 3.0.4

from flexiblas.

Enchufa2 avatar Enchufa2 commented on June 2, 2024

More findings about the issue:

  • If I perform a SVD decomposition with R on Fedora 33, it runs in parallel correctly.
  • If I install FlexiBLAS on Fedora 32 (with this repo) and numpy, then I run the benchmark with LD_PRELOAD=/lib64/libflexiblas.so.3, decompositions run in parallel correctly.

So this seems to happen only when numpy is built against FlexiBLAS. But AFAICT, it is correctly linked and the output of ldd for the different numpy .so files seem sane to me. So I'm a bit lost here.

from flexiblas.

grisuthedragon avatar grisuthedragon commented on June 2, 2024

That sounds strange. I remind me that that setup.py procedure of numpy is not only searching for a BLAS library it also tries to search for some special symbols in the BLAS library to identify whether it is ATLAS, Goto, Open, MKL.. etc. I think me need to look in the build process of the numpy package. Are the build logs for the fc33 are somewhere available.

from flexiblas.

lupinix avatar lupinix commented on June 2, 2024

Thats a good pointer. It looks like flexiblas is configured using openblas options in Fedoras numpy build: https://src.fedoraproject.org/rpms/numpy/blob/master/f/numpy.spec#_110

from flexiblas.

Enchufa2 avatar Enchufa2 commented on June 2, 2024

It is configured using the openblas key, because there's no specific key for flexiblas upstream and we thought it would be best if Numpy just thinks that it's using OpenBLAS, but this configuration points to libflexiblas (at the top of the spec, blaslib is set to flexiblas). In the build log, you can check that the flag being used is -lflexiblas, and libflexiblas.so.3()(64bit) is correctly listed as Requires at the end of the build.

from flexiblas.

Enchufa2 avatar Enchufa2 commented on June 2, 2024

Also, for Fedora 33, we have:

# ldd /usr/lib64/python3.9/site-packages/numpy/linalg/lapack_lite.cpython-39-x86_64-linux-gnu.so 
        linux-vdso.so.1 (0x00007ffcb4d80000)
        libflexiblas.so.3 => /lib64/libflexiblas.so.3 (0x00007f1291ceb000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f1291b1f000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f12919d9000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f12919d2000)
        libgfortran.so.5 => /lib64/libgfortran.so.5 (0x00007f1291717000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f12916f5000)
        libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007f12916a9000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f129209e000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f129168e000)

For Fedora 32 (linked against openblas-threads), we have

# ldd /usr/lib64/python3.8/site-packages/numpy/linalg/lapack_lite.cpython-38-x86_64-linux-gnu.so 
        linux-vdso.so.1 (0x00007fff9a9fa000)
        libopenblasp.so.0 => /lib64/libopenblasp.so.0 (0x00007f0ce9b92000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f0ce99c8000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f0ce9883000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f0ce9861000)
        libgfortran.so.5 => /lib64/libgfortran.so.5 (0x00007f0ce9596000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f0cebfd4000)
        libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007f0ce954c000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f0ce952f000)

The presence of libdl.so.2 is the only difference, besides some reordering. I checked SciPy too, and we have the same problem for its SVD function (which was to be expected, because they have very similar configuration and build systems).

from flexiblas.

lupinix avatar lupinix commented on June 2, 2024

Maybe using the buildtime openblas key changes something internally?

from flexiblas.

grisuthedragon avatar grisuthedragon commented on June 2, 2024

According to numpy's documentation, they use dgesdd from LAPACK to compute the SVD. In the benchmark, the SVD is called such that the singular values and the vectors are returned. I created a minimal working example to test this routines here: https://gist.githubusercontent.com/grisuthedragon/8956628ddde9af7505fb8c0d5c799c1e/raw/6d595f8006ef1129d16b60e82b4014e5e5a8d7ad/test_dgesdd.f90

Then I did the following on a FC33 VM / 4 Cores / Intel Sandybridge:

[root@fedora33 ~]# gfortran test_dgesdd.f90  -lflexiblas -o test_dgesdd.flexiblas
[root@fedora33 ~]# gfortran test_dgesdd.f90  -lopenblaso-r0.3.10 -o test_dgesdd.openblaso 
[root@fedora33 ~]# gfortran test_dgesdd.f90  -lopenblas-r0.3.10 -o test_dgesdd.openblass
[root@fedora33 ~]# gfortran test_dgesdd.f90  -llapack -lblas -o test_dgesdd.netlib

[root@fedora33 ~]# FLEXIBLAS=OPENBLAS-OPENMP ./test_dgesdd.flexiblas 
 Time =    1.4351626060000000     
[root@fedora33 ~]# FLEXIBLAS=OPENBLAS-SERIAL ./test_dgesdd.flexiblas 
 Time =    2.8791492609999998     
[root@fedora33 ~]# ./test_dgesdd.openblaso
 Time =    1.4242952959999999     
[root@fedora33 ~]# ./test_dgesdd.openblass
 Time =    2.9305672999999999     
[root@fedora33 ~]# ./test_dgesdd.netlib
 Time =    18.114536298000001

Regarding the runtimes we see that, as expected, FlexiBLAS-OpenBLAS and OpenBLAS naive are similar. But the Netlib time is high as expected. From this point of view it seems that in the Numpy case, FlexiBLAS falls back to its internal NETLIB fallback interface. I think it has something to do with the way Python and Numpy include their shared library addons. Using strange flags in dlopen could be the reason. In R, as @Enchufa2 tested, this isn't a problem. I remeber from my first developments (around version 1 in 2013) with FlexiBLAS, that a colleague produced sometimes a "symbol not found" error, when using FlexiBLAS with Numpy.

I will take a deeper look in this during this week. But If anybody has some ideas, especially how to monitor what python is doing under the hood, I am interested in.

from flexiblas.

grisuthedragon avatar grisuthedragon commented on June 2, 2024

I think I have the reason. Python loads all shared objects with RLTD_LAZY | RTLD_LOCAL and thus the first call to a LAPACK routine picks the first symbol in the GOT which fits the requirements. Since FlexiBLAS first integrates all Fallback symbols in the GLOBAL offset table during the initialization and the remaining ones(like everything from BLAS) it not looked up yet. The first call from Python/Numpy to LAPACK will pick the fallback we loaded in the GOT. I am currently developing the fix for this.

A short fix would be to replace the RTLD_GLOBAL flags in src/flexiblas.c / flexiblas_init by RTLD_LOCAL, but some first test show that this is only half of the truth.

from flexiblas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.