Giter Club home page Giter Club logo

Comments (10)

sundy-li avatar sundy-li commented on August 14, 2024 1

Ok, so is that possible to change the nonnull case branch to auto vectorized version?

from arrow2.

Dandandan avatar Dandandan commented on August 14, 2024 1

Hey @leiysky

Yes, we use the multiversion crate right now for achieving auto-vectorization with specific SIMD instructions.

See for an example here:
https://github.com/jorgecarleitao/arrow2/blob/main/src/compute/aggregate/sum.rs#L22

from arrow2.

Dandandan avatar Dandandan commented on August 14, 2024 1

Yes, we use the multiversion crate right now for achieving auto-vectorization with specific SIMD instructions.

To utilize avx at runtime. This still has to be compiled with a machine that supports this right? Or can it cross compile for different targets?

Rust can cross compile to a different target architecture if you like, but this code only generates it when compiling it for the specified target. E.g. x86_64+avx creates different compiled versions only when compiling with x86_64 as target but won't do that for aarch64 or x86 (it wouldn't make sense as the code would be invalid).
When it matches the target it will include multiple versions and will do detection at the first call of the function.

from arrow2.

jorgecarleitao avatar jorgecarleitao commented on August 14, 2024

yeap, it has been a battle. I actually have not used packed_simd for a while, but the null case was so important and I was unable to hit the right instructions, and so ended up adding it.

from arrow2.

jorgecarleitao avatar jorgecarleitao commented on August 14, 2024

it is faster; it is simpler => definitely :)

from arrow2.

leiysky avatar leiysky commented on August 14, 2024

Hi, @jorgecarleitao . I have a question about the comptibility of vectorization here.

Since there are many kinds of SIMD instruction sets(e.g. SSE, AVX, FMA), which are coupled with microarchitecture(e.g. Intel Skylake, AMD Zen2). If we only do simple cross compilation, that is, only specifying target architecture, we may not utilize with SIMD well.

AFAIK, this issue is usually solved by function multiversioning.

In C++ world, there are some approaches like GCC target attribute, which can generate multiple versions of a function(typically with different SIMD instruction sets) and dispatch them during load-time.

And I noticed that there is a multiversion crate https://docs.rs/multiversion/0.6.1/multiversion/, but I haven't tested it yet.

Is it possible to support this in arrow2?

from arrow2.

leiysky avatar leiysky commented on August 14, 2024

Hey @leiysky

Yes, we use the multiversion crate right now for achieving auto-vectorization with specific SIMD instructions.

See for an example here:

https://github.com/jorgecarleitao/arrow2/blob/main/src/compute/aggregate/sum.rs#L22

Nice!

I only read the code here, and find there seems no special handling.

https://github.com/jorgecarleitao/arrow2/blob/main/src/compute/arithmetics/basic/add.rs

Sorry for my misunderstanding.

from arrow2.

sundy-li avatar sundy-li commented on August 14, 2024

I only read the code here, and find there seems no special handling.

I had some doubt before, I think it may not work in the platform without avx support. But I have not tested about it.

from arrow2.

ritchie46 avatar ritchie46 commented on August 14, 2024

Yes, we use the multiversion crate right now for achieving auto-vectorization with specific SIMD instructions.

To utilize avx at runtime. This still has to be compiled with a machine that supports this right? Or can it cross compile for different targets?

from arrow2.

leiysky avatar leiysky commented on August 14, 2024

Yes, we use the multiversion crate right now for achieving auto-vectorization with specific SIMD instructions.

To utilize avx at runtime. This still has to be compiled with a machine that supports this right? Or can it cross compile for different targets?

Multiversioning allows you to define targets(e.g.avx, sse) for a function, then compiler will always produce specified versions of the function, and dispatch them at loadtime(not runtime, with which it can achieve zero-overhead).

from arrow2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.