Giter Club home page Giter Club logo

safe_arch's Introduction

safe_arch

Exposes arch-specific intrinsics as safe function.

  • SIMD types are newtype'd (with a pub field) and given appropriate trait impls such as From, Into, Default, etc.
  • Each intrinsic gets either a function or macro so that you can safely use it as directly as possible.
    • Functions are used when all arguments are runtime arguments.
    • Macros are used when one of the arguments must be a compile time constant, because Rust doesn't let you "pass through" compile time constants.
  • There's hundreds and hundreds of intrinsics, so the names of functions and macros tend to be very long and specific because there's often many similar ways to do nearly the same thing.
    • This crate isn't really intended for "everyday users". It is intended to be an "unopinionated" middle layer crate that just provides the safety. Higher level abstractions should mostly come from some other crate that wraps over this crate.

All function and macro availability is done purely at compile time via #[cfg()] attributes on the various modules. If a CPU feature isn't enabled for the build then those functions or macros won't be available. If you'd like to determine what CPU features are available at runtime and then call different code accordingly, this crate is not for you.

See the crate docs for more details.

Additional Resources

safe_arch's People

Contributors

lokathor avatar mcroomp avatar razrfalcon avatar rrradicaledward avatar soveu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

safe_arch's Issues

[neon] General ARM/Neon tracking issue.

Obviously we want to support the Neon intrinsics for ARM.

  • The current main blocker is that none of the Neon intrinsics are in Stable, and most of them aren't even in Nightly.

[avx][avx2] Operator overloads for 256-bit types

This will be basically like the 128-bit versions.

  • Remember that the operator overload impl goes in the module that provides the function it uses, not in the module that defines the type it works on. In other words, the BitAnd impl for m128 goes into the sse module, not in the m128_ module.

widen mul

We should be more precise about how the mul_i64_widen_low_bits_m128i works, particularly is it just a strict i32 to i64 mul using the odd numbered lanes, or if you have the i64 lanes set with values outside the i32 range... what happens.

It's probably simple, but we should document it.

[avx2] gather operations

At first glance it seems like these can't be made safe and also totally zero-runtime-cost in an ergonomic way. I'm open to suggestions.

I hear these requested often enough when people talk about avx2, so I think if we can't do it safely with no runtime checks it would be permissible to expose versions of these that did some runtime checking if it was clearly marked as such.

  • _mm_i32gather_epi32
  • _mm_i32gather_epi64
  • _mm_i32gather_pd
  • _mm_i32gather_ps
  • _mm_i64gather_epi32
  • _mm_i64gather_epi64
  • _mm_i64gather_pd
  • _mm_i64gather_ps
  • _mm_mask_i32gather_epi32
  • _mm_mask_i32gather_epi64
  • _mm_mask_i32gather_pd
  • _mm_mask_i32gather_ps
  • _mm_mask_i64gather_epi32
  • _mm_mask_i64gather_epi64
  • _mm_mask_i64gather_pd
  • _mm_mask_i64gather_ps
  • _mm256_i32gather_epi32
  • _mm256_i32gather_epi64
  • _mm256_i32gather_pd
  • _mm256_i32gather_ps
  • _mm256_i64gather_epi32
  • _mm256_i64gather_epi64
  • _mm256_i64gather_pd
  • _mm256_i64gather_ps
  • _mm256_mask_i32gather_epi32
  • _mm256_mask_i32gather_epi64
  • _mm256_mask_i32gather_pd
  • _mm256_mask_i32gather_ps
  • _mm256_mask_i64gather_epi32
  • _mm256_mask_i64gather_epi64
  • _mm256_mask_i64gather_pd
  • _mm256_mask_i64gather_ps

Sort out all the "convert" operations

We should inventory all the "convert" operations.

If necessary, we can come up with some sort of convention, or possibly more than one convention, and rename them for consistency.

Rename `_low` to `_s`

The low-lane-only operations should be renamed to a better name. _low already kinda mixes with the unpack low and unpack high stuff.

_s was suggested for "scalar", which is as good a name as any.

[avx] testc, textnzc, testz

I skipped them the first time around because they seem borderline useless anyway, but technically they should go in I guess.

Remove super::* imports

It may be a good idea to replace super::* imports with imports to related types and core::arch::*

[avx512] General AVX-512 tracking issue

There's a few problems with adding AVX-512 support:

  • I don't have a device to develop on that even supports AVX-512 (for testing).
  • Within Rust, it's all Nightly-only.

So (for now) we're blocked on adding avx512, but it'd be nice to have "some day".

This wouldn't be a 1.0 blocker, if we got to a state where we were otherwise 1.0 ready.

Put more verbs in the glossary

For any operation name that isn't already so common it's in the rust core lib (eg: add and sqrt) we should put them into the verb glossary.

And if it gets big enough, we might want to put the verb glossary on its own page.

[cpuid] Let the user perform runtime checks for CPU features

Even if the user wants to enable features at compile time or not, it's helpful as an extra sanity check to be able to check at program start that the features really are available and maybe print an error message or something if the actual CPU isn't ready to handle what you're doing.


  • first call __get_cpuid_max(0) and check ret.0 for the max leaf.
  • If a leaf has sub-leaves you need to know the max of, call __get_cpuid_max(leaf) and check ret.1 for that max.
  • once you know your limits, particular features can be checked for by getting the info for a leaf and checking the bits of a particular return register. Which bit you need to look for in what register in what leaf is mostly covered in the CPUID wikipedia article.

Functions should say in their docs what intrinsics / assembly they implement

Right now you need the source files open to get intrinsic names, and then you'd also need to go to the intel intrinsics guide to see what the actual assembly instruction of the intrinsic is.

We can do better.

Proposed format:

/// short description.
///
/// extra details, if any
/// ```
/// doc test example
/// ```
/// * **Intrinsic:** [`name`]
/// * **Assembly:** `op arg, arg, arg`

Remember that the first line of the docs shows up in the function's summary when listing the functions of an entire module, so we want that to be short and sweet, and then everything past that first line can be various levels of info dump about what's going on.

Contribution Guidelines: If you want to try your hand at this, pick a small module with 10 or less functions (such as adx or bmi1) and do it for just that one small module. Then we can see if it's a comfortable format to look at and so on.

Convert all doc tests to integration tests

Currently all the tests are done via doc tests.

This means that most functions have only a single test case.

Particularly for a lot of the macros we probably want tests that check all sorts of edge cases. These should be put into the crate as integration tests (that is, modules in the tests/ folder).

Shuffle / Permute cleanup

Difference between them:

  • Shuffle is actually a = shuf(a, b, imm) (destructive)
  • Permute is a = perm(b, imm) (immutable)

So permute has a lot less register pressure, and should be preferred when possible, if it's available.

Full double check of all sideline functions (The ones not `sse` through `avx2`)

A shuffled list of all the functions not from the "main line" of sse stuff up trough avx2.

Note that part of "the point" of doing it in this chaotic random order is to prevent you from accidentally making assumptions about one function based on the previous similar function. And also hopefully it will reduce the "boringness" of the task.

Also we probably won't do this all at once because it's a heck of a lot.

  • unsigned __int64 _lzcnt_u64 (unsigned __int64 a)
  • unsigned __int64 _andn_u64 (unsigned __int64 a, unsigned __int64 b)
  • unsigned int _blsi_u32 (unsigned int a)
  • __m128i _mm_aeskeygenassist_si128 (__m128i a, const int imm8)
  • unsigned int _tzcnt_u32 (unsigned int a)
  • int _rdrand64_step (unsigned __int64* val)
  • unsigned __int64 _tzcnt_u64 (unsigned __int64 a)
  • unsigned int _blsr_u32 (unsigned int a)
  • int _rdseed32_step (unsigned int * val)
  • unsigned int _pdep_u32 (unsigned int a, unsigned int mask)
  • unsigned __int64 _mulx_u64 (unsigned __int64 a, unsigned __int64 b, unsigned __int64* hi)
  • __m128i _mm_clmulepi64_si128 (__m128i a, __m128i b, const int imm8)
  • __m128i _mm_aesdeclast_si128 (__m128i a, __m128i RoundKey)
  • unsigned __int64 _pext_u64 (unsigned __int64 a, unsigned __int64 mask)
  • unsigned __int64 _bextr2_u64 (unsigned __int64 a, unsigned __int64 control)
  • int _rdseed16_step (unsigned short * val)
  • unsigned int _andn_u32 (unsigned int a, unsigned int b)
  • unsigned __int64 _blsmsk_u64 (unsigned __int64 a)
  • unsigned int _blsmsk_u32 (unsigned int a)
  • unsigned int _bextr2_u32 (unsigned int a, unsigned int control)
  • int _rdrand16_step (unsigned short* val)
  • unsigned char _addcarryx_u32 (unsigned char c_in, unsigned int a, unsigned int b, unsigned int * out)
  • unsigned int _bextr_u32 (unsigned int a, unsigned int start, unsigned int len)
  • unsigned __int64 _bzhi_u64 (unsigned __int64 a, unsigned int index)
  • int _rdrand32_step (unsigned int* val)
  • unsigned char _addcarryx_u64 (unsigned char c_in, unsigned __int64 a, unsigned __int64 b, unsigned __int64 * out)
  • int _rdseed64_step (unsigned __int64 * val)
  • unsigned __int64 _blsi_u64 (unsigned __int64 a)
  • unsigned __int64 _pdep_u64 (unsigned __int64 a, unsigned __int64 mask)
  • unsigned int _mulx_u32 (unsigned int a, unsigned int b, unsigned int* hi)
  • __m128i _mm_aesenc_si128 (__m128i a, __m128i RoundKey)
  • int _popcnt64 (__int64 a)
  • int _popcnt32 (int a)
  • unsigned int _lzcnt_u32 (unsigned int a)
  • __m128i _mm_aesenclast_si128 (__m128i a, __m128i RoundKey)
  • __m128i _mm_aesdec_si128 (__m128i a, __m128i RoundKey)
  • unsigned int _bzhi_u32 (unsigned int a, unsigned int index)
  • unsigned int _pext_u32 (unsigned int a, unsigned int mask)
  • unsigned __int64 _blsr_u64 (unsigned __int64 a)
  • __m128i _mm_aesimc_si128 (__m128i a)
  • unsigned __int64 _bextr_u64 (unsigned __int64 a, unsigned int start, unsigned int len)

Not In Rust

  • int _mm_tzcnt_32 (unsigned int a)
  • __int64 _mm_tzcnt_64 (unsigned __int64 a)
  • __int64 _mm_popcnt_u64 (unsigned __int64 a)
  • int _mm_popcnt_u32 (unsigned int a)

[avx] the ops that chop 256-bit lanes to 128-bit lanes.

The first time through I skipped over these because at first I didn't realize that they do have an inverse version that doesn't have undefined memory. I thought that the only inverses of these ops would just he undefined register content.

then i got to the end of the avx list and saw that you can zero-extend a register's high bits.

So now we want to have these in the lib again.

PartialEq for m128?

It'd be possible to do PartialEq for m128 by using cmp_eq and then moving the mask out and comparing with 0b1111.

But movemask is a costly thing and you shouldn't be branching on simd stuff as much as you should be merging values based on a mask and stuff.

So i dunno.

splat / set_splat inconsistency

  • avx uses set_splat
  • sse2 uses just splat and drops the set. Others might also

probably we want to change everything to be set_splat and then we'd have "set" as one of our core verbs in the glossary and splat is just a modifier word.

improve Neg for m128

It was suggested that we don't use a const, and instead make a zeroed register and compare it to itself to get the all 1s reg.

needs godbolt

Stable Intrinsics Todo

This is the "1.0 goals list".

Before 1.0, everything on this list should either be added to the crate or examined and determined to be "post-1.0".

  • If support for a feature is added to the crate (one module per feature), then we delete it from this list. There's too much stuff to keep everything in the list and just check off boxes.
  • If a feature is examined and determined to be something we shouldn't support until post-1.0 (for some reason) then we'll break that off and make a separate issue for those items. In this case, those items would also be deleted from the list here.
  • So once we're all done, the list will be empty.

This list was made with a little bit of regex being applied to the x86_64 list of what's stable in Rust 1.43.0 (2020-05-06).


. __cpuid
. __cpuid_count
. __get_cpuid_max

Mark all functions with their required CPU features

This is easy to do on a module-by-module basis, you just apply the attribute to each function in a module.

// example for "avx"
#[cfg_attr(docs_rs, doc(cfg(target_feature = "avx")))]

I just didn't start doing this at the beginning of crate development so someone has to go do this for previous work. With clever search and replace a person could probably update the entire crate in a matter of minutes.

doc nge vs lt

document why a person would use negated comparisons.

_mm256_insertf128_si256(avx) vs _mm256_inserti128_si256 (avx2)

It seems like the docs and even signature for _mm256_insertf128_si256 and _mm256_inserti128_si256 are essentially the same, however based on the names and the Felix Cloutier Notes it seems like _mm256_insertf128_si256 is mis-typed and should be operating on floating data, not integer data.

This would be a fairly simple fix, we can just throw in an extra cast or two if needed. The question is if this conclusion is correct and if we should make this adjustment for the user. Normally I'd be against safe_arch doing a "fix" like this but in this case it's the intel intrinsics who are wrapping the assembly wrong, so it feels fair to give proper direct access to the assembly.

Pings to:

clarify naming scheme for getters / setters

We've got a number of getters and settters, and the names have become a little less consistent.

extract_i16_as_i32_m128i! should probably be "get" like the rest.

Also some of the "getters" don't get raw data they do a round, so that should be a "convert" maybe.

Naming is a mess.

--

Our naming rules are...

  • "cast" preserves the exact bit patterns involved.
  • ... ?

fix shuffle/permute naming

Update: It's a mess!

  • _mm_permutevar_ps
  • _mm256_permutevar_ps
  • _mm_permutevar_pd
  • _mm256_permutevar_pd
  • _mm256_permutevar8x32_ps
  • _mm256_permutevar8x32_epi32
  • _mm_shuffle_epi8
  • _mm256_shuffle_epi8
  • _mm256_permute2f128_ps
  • _mm256_permute2f128_pd
  • _mm256_permute2f128_si256
  • _mm256_permute2x128_si256
  • _mm_shuffle_ps
  • _mm256_shuffle_ps
  • _mm_shuffle_pd
  • _mm256_shuffle_pd
  • _mm_permute_ps
  • _mm_shuffle_epi32
  • _mm256_permute_ps
  • _mm_permute_pd
  • _mm256_permute4x64_pd
  • _mm256_permute_pd
  • _mm_shufflehi_epi16
  • _mm256_shufflehi_epi16
  • _mm_shufflelo_epi16
  • _mm256_shufflelo_epi16
  • _mm256_shuffle_epi32
  • _mm256_permute4x64_epi64

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.