Giter Club home page Giter Club logo

Comments (21)

staticfloat avatar staticfloat commented on June 4, 2024 1

It looks to me like LBT should already know how to deal with BLIS.

There is a completely generic way in which you can register get/set_numthreads functions for your own BLAS library, but BLIS should already be handled natively.

from blis.jl.

carstenbauer avatar carstenbauer commented on June 4, 2024 1

Thanks for the info, that's good to know. But what if I have multiple BLAS/LAPACK libraries stacked on top of each other? Unless I'm missing something, BLAS.get_num_threads/BLAS.set_num_threads doesn't allow me to specify the library. Do we need to extend the API here or is there another way to access the registered get/set_num_threads functions?

UPDATE: According to the doc strings for lbt_get/set_num_threads I should get/set the num threads of all libraries at the same time. But that doesn't seem to be the case?

julia> using LinearAlgebra

julia> BLAS.get_num_threads()
8

julia> using blis_jll

julia> BLAS.lbt_forward(blis; clear=false)
157

julia> BLAS.get_num_threads()
8

julia> blis_get_num_threads() = @ccall blis.bli_thread_get_num_threads()::Cint;

julia> blis_set_num_threads(nthreads) = @ccall blis.bli_thread_set_num_threads(nthreads::Cint)::Cvoid;

julia> blis_get_num_threads()
-1

julia> blis_set_num_threads(2)

julia> blis_get_num_threads()
2

julia> BLAS.get_num_threads()
8

julia> BLAS.set_num_threads(3)

julia> blis_get_num_threads()
2

from blis.jl.

carstenbauer avatar carstenbauer commented on June 4, 2024 1

FYI: https://github.com/carstenbauer/BLISBLAS.jl

from blis.jl.

jd-foster avatar jd-foster commented on June 4, 2024 1

The issue observed above (#3 (comment)) should be fixed with the latest update to the Yggdrasil recipe (JuliaPackaging/Yggdrasil#7448).
@carstenbauer As verification, it seems to work now in tandem with the direct calls wrapped in BLISBLAS.jl:

julia> import BLISBLAS
[ Info: Precompiling BLISBLAS [6f275bd8-fec0-4d39-945b-7e95a765fa1e]

julia> using LinearAlgebra

julia> BLAS.get_num_threads()
6

julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
├ [ILP64] libopenblas64_.0.3.21.dylib
└ [ILP64] libblis.4.0.0.dylib

julia> BLAS.set_num_threads(42)

julia> BLISBLAS.get_num_threads()
42

from blis.jl.

xrq-phys avatar xrq-phys commented on June 4, 2024

Hi.

Thanks for contacting.
I'm not familiar with libblastrampoline, but what I want to tout is that BLIS provides a more flexible API compared to standard BLAS (e.g. generic strides and mixed precision) and I want to make use of it.

At this moment simply substituting the backend seems to be insufficient in that sense.

from blis.jl.

ViralBShah avatar ViralBShah commented on June 4, 2024

Right, BLIS provides a more flexible API. We should also be able to provide a way for BLIS to replace the underlying Julia BLAS with only one line of code. I will try this out and report findings - but we first need to do some more work on the LAPACK front.

from blis.jl.

xrq-phys avatar xrq-phys commented on June 4, 2024

I've once mimicid MKL.jl and created this toy.

I can directly put the switcher code inside this repo but trying libblastrampoline out seems more interesting.

from blis.jl.

ViralBShah avatar ViralBShah commented on June 4, 2024

Basically, lbt_forward in current Julia master (1.7-dev), allows you to switch the underlying BLAS for all routines with a new one with MKL or potentially BLIS, without having to rebuild the system image.

The only thing is that both OpenBLAS and MKL provide the full LAPACK, but when we use BLIS, we probably want to compile our own LAPACK from source and provide it in BinaryBuilder.

cc @staticfloat

from blis.jl.

xrq-phys avatar xrq-phys commented on June 4, 2024

from blis.jl.

ViralBShah avatar ViralBShah commented on June 4, 2024

In order to use BLIS in Julia, we will just have LAPACK link to BLIS' BLAS API trough LBT. Then all packages that need BLAS can use BLIS and we can see how it performs.

Separately this package and FLAME.jl in the future can explore further capabilities as you articulate.

from blis.jl.

ViralBShah avatar ViralBShah commented on June 4, 2024

JuliaPackaging/Yggdrasil#2657

from blis.jl.

carstenbauer avatar carstenbauer commented on June 4, 2024

Just wanted to drop a +1 here. Getting MKL via a simple using MKL is awesome. Would be great for BLIS too!

from blis.jl.

carstenbauer avatar carstenbauer commented on June 4, 2024

FWIW,

using blis_jll
using LinearAlgebra
BLAS.lbt_forward(blis)

seems to work nicely (up to the fact that the remaining LAPACK doesn't use / link against BLIS as mentioned by @ViralBShah above, see also here). I would really like to have a MKL.jl-like package for BLIS that does this simple BLAS switching via LBT. As I understand it from the comments above, the package here (BLIS.jl) currently has a different goal / approach. Is this correct (@xrq-phys)? Should I therefore create a new package, say, "BLISBLAS.jl"?

Side comment: I realized that for the stacked OpenBLAS + BLIS (example above) the function BLAS.set_num_threads(N) sets the number of OpenBLAS threads. Is there a way to also set the BLIS threads or, more generally, the threads of a specific BLAS/LAPACK in the LBT stack (cc @staticfloat)? For now I use

blis_get_num_threads() = @ccall blis.bli_thread_get_num_threads()::Cint
blis_set_num_threads(nthreads) = @ccall blis.bli_thread_set_num_threads(nthreads::Cint)::Cvoid

from blis.jl.

xrq-phys avatar xrq-phys commented on June 4, 2024

@carstenbauer I think LBT's failure to set # of threads is due to this line. libblastrampoline 64_ suffix to all library subroutines not just BLAS ones, while BLIS is built only with the latter.

from blis.jl.

xrq-phys avatar xrq-phys commented on June 4, 2024

Sorry not really.

BLIS DOES has 64_ suffix, but is in the form of bli_thread_set_num_threads_64_ instead of bli_thread_set_num_threads64_.

I would suppose in this case we shall amend libblastrampoline since BLIS in 32-bit case also yields bli_thread_set_num_threads_.

from blis.jl.

staticfloat avatar staticfloat commented on June 4, 2024

You can teach LBT about your thread function name with the following Julia code:

julia> using Libdl, blis_jll, libblastrampoline_jll
       getter = Libdl.dlsym(blis_jll.blis_handle, "bli_thread_get_num_threads_64_")
       setter = Libdl.dlsym(blis_jll.blis_handle, "bli_thread_set_num_threads_64_")
       @ccall libblastrampoline.lbt_register_thread_interface(getter::Ptr{Cvoid}, setter::Ptr{Cvoid})::Cvoid

Note that the 32-bit version of BLIS calls its thread setter function bli_thread_set_num_threads; no trailing underscore. I think there may be a small naming incongruity here.

EDIT: Whoops, I mis-read my own API, this code chunk is wrong.

from blis.jl.

xrq-phys avatar xrq-phys commented on June 4, 2024

Sorry I made a mistake.

In BLIS only the setter has F77 interface:

  • bli_thread_set_num_threads_64_ for 64-bit.
  • bli_thread_set_num_threads_ for 32-bit.

while bli_thread_set_num_threads is presented as C interface. So there's no incongruity here.

The problem is that bli_thread_get_num_threads doesn't have an F77-style counterpart. i.e. only accessible via C-style calling.

Another issue is that: While Julia deploys 64-bit BLAS by default, thread-num setter always passes in 32-bit integers. On the contrary, bli_thread_set_num_threads_ is LP64/ILP64 aware. I fear that the higher 32-bit lbt_set_num_threads() passes in would break the lib down. The thread-setting routine used by lbt_set_num_threads is void (int) while bli_thread_set_num_threads_ is an F77 interface void (int *), while the C interface bli_thread_set_num_threads takes 64-bit integers instead of 32-bit ones.

Btw line#14 and line#21 seem to have reversed setter and getter.

from blis.jl.

xrq-phys avatar xrq-phys commented on June 4, 2024

Perhaps, at least the generic registration method should support thread-num setter with and without the 64_ extension, while preferring the one with an extension.

from blis.jl.

staticfloat avatar staticfloat commented on June 4, 2024

In BLIS only the setter has F77 interface:

  • bli_thread_set_num_threads_64_ for 64-bit.
  • bli_thread_set_num_threads_ for 32-bit.

while bli_thread_set_num_threads is presented as C interface. So there's no incongruity here.

I'm a little confused here; is bli_thread_set_num_threads supposed to have a trailing underscore or not? Here's what I see from the blis_jll that I can download right now:

julia> using blis_jll
       run(`/bin/bash -c "nm $(blis_jll.blis_path) | grep bli_thread_set_num_threads"`)
0000000000a95520 T bli_thread_set_num_threads
0000000000a703e0 T bli_thread_set_num_threads_64_

So what I see here is that one symbol has no trailing underscore, whereas another does have the trailing underscore. I call this a trailing underscore because the ILP64 symbol suffix that the BLIS library uses (as detected by LBT) is 64_. You can see this with the following:

julia> using LinearAlgebra, blis_jll
       BLAS.lbt_forward(blis_jll.blis_path; verbose=true)
Generating forwards to /home/sabae/.julia/artifacts/b548e034d149feec83ed78f22ab942fea1ac3d12/lib/libblis.so
 -> Autodetected symbol suffix "64_"
 -> Autodetected interface ILP64 (64-bit)
 -> Autodetected gfortran calling convention
Processed 4945 symbols; forwarded 157 symbols with 64-bit interface and mangling to a suffix of "64_"
157

This symbol suffix is detected by probing for a few F77 names with a few suffixes, and if we look at the names for those symbols that are exported from BLIS:

julia> using blis_jll
       run(`/bin/bash -c "nm $(blis_jll.blis_path) | grep isamax"`);
0000000000a61340 T isamax_64_

We see that the canonical name isamax_ has 64_ suffixed to it. Now, for consistency's sake (and to allow for loading of libraries that export BOTH ILP64 and LP64 interfaces in a single .so!) LBT expects all exported names to follow a consistent naming rule, which is that the "canonical" names (whether C or FORTRAN) are suffixed reliably. This means that, for instance, if your LP64 symbol is called bli_thread_set_num_threads, then the ILP64 symbol is named bli_thread_set_num_threads64_. Otherwise, LBT has no hope of automatically finding all the different symbols. This is what I mean when I say that there is a symbol naming inconsistency.

The thread-setting routine used by lbt_set_num_threads is void (int) while bli_thread_set_num_threads_ is an F77 interface void (int *), while the C interface bli_thread_set_num_threads takes 64-bit integers instead of 32-bit ones.

Are you using a different version of libblis than I am? I do not have both bli_thread_set_num_threads and bli_thread_set_num_threads_ in my version. I'm using v0.9.0+0 of the JLL. In any case, if there were a C interface that takes 64-bit integers that's fine, as C passes arguments through registers, so when we pass a 32-bit integer it gets zero-extended. The FORTRAN interface would indeed be a problem though.

Btw line#14 and line#21 seem to have reversed setter and getter.

Good catch! Swapped in JuliaLinearAlgebra/libblastrampoline@145bb64

Perhaps, at least the generic registration method should support thread-num setter with and without the 64_ extension, while preferring the one with an extension.

The generic registration method doesn't pay any attention to names; it relies on you to do the dlsym() manually, then just pass in raw function pointer addresses. So you can do what I mentioned in the code snippet in my previous message and use that directly (with the C interface version of the symbols) and things should "just work".

from blis.jl.

xrq-phys avatar xrq-phys commented on June 4, 2024

This line seems only working on strings?

from blis.jl.

xrq-phys avatar xrq-phys commented on June 4, 2024

@staticfloat To your question, current configuration for BLIS builds bli_thread_set_num_threads_ for 32-bit machines and bli_thread_set_num_threads_64_ for 64-bit machines, while bli_thread_set_num_threads (the one without an underscore) is built always as a BLIS-defined C interface.

Anyway, since libblastrampoline does not pass-in pointers, I'd stick to bli_thread_set_num_threads without an underscore and manually create a bli_thread_set_num_threads64_ counterpart.

from blis.jl.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.