lokathor / safe_arch Goto Github PK

View Code? Open in Web Editor NEW

41.0 4.0 6.0 269 KB

Exposes arch-specific intrinsics as safe function (via cfg).

Home Page: https://docs.rs/safe_arch

License: Apache License 2.0

Rust 99.42% Python 0.58%

rust zlib-license intrinsics

safe_arch's Introduction

Docs.rs

safe_arch

Exposes arch-specific intrinsics as safe function.

SIMD types are newtype'd (with a pub field) and given appropriate trait impls such as From, Into, Default, etc.
Each intrinsic gets either a function or macro so that you can safely use it as directly as possible.
- Functions are used when all arguments are runtime arguments.
- Macros are used when one of the arguments must be a compile time constant, because Rust doesn't let you "pass through" compile time constants.
There's hundreds and hundreds of intrinsics, so the names of functions and macros tend to be very long and specific because there's often many similar ways to do nearly the same thing.
- This crate isn't really intended for "everyday users". It is intended to be an "unopinionated" middle layer crate that just provides the safety. Higher level abstractions should mostly come from some other crate that wraps over this crate.

All function and macro availability is done purely at compile time via #[cfg()] attributes on the various modules. If a CPU feature isn't enabled for the build then those functions or macros won't be available. If you'd like to determine what CPU features are available at runtime and then call different code accordingly, this crate is not for you.

See the crate docs for more details.

Additional Resources

Intel Intrinsics Guide
- Raw Xml v3.5.2 and you can check their release notes to see if a later version has been put out since this readme file was last updated.

safe_arch's People

Contributors

Stargazers

Watchers

Forkers

soveu icodein cryze hadrieng2 mcroomp rrradicaledward

safe_arch's Issues

Ensure all macros use primitives via `::core::primitive` paths.

Any time that a macro in the crate references a primitive type it needs to say ::core::primitive::type instead of just type.

So any use of i32 should be ::core::primitive::i32 for example.

[neon] General ARM/Neon tracking issue.

Obviously we want to support the Neon intrinsics for ARM.

The current main blocker is that none of the Neon intrinsics are in Stable, and most of them aren't even in Nightly.

[avx][avx2] Operator overloads for 256-bit types

This will be basically like the 128-bit versions.

Remember that the operator overload impl goes in the module that provides the function it uses, not in the module that defines the type it works on. In other words, the BitAnd impl for m128 goes into the sse module, not in the m128_ module.

Fix all floating point docs about "ord" and "unord"

It's not "ordinary" it's "ordered", as in "can be ordered" as in "not nan".

widen mul

We should be more precise about how the mul_i64_widen_low_bits_m128i works, particularly is it just a strict i32 to i64 mul using the odd numbered lanes, or if you have the i64 lanes set with values outside the i32 range... what happens.

It's probably simple, but we should document it.

avoid unsafe in doctests

Change doctests to not use unsafe.

either with fancy iterators or something like that.

Full double check of all mainline functions (`sse` through `avx2`)

Also we probably won't do this all at once because it's a heck of a lot.

Ensure all macros have absolute `use` statements.

All of the crate's macros should be written like use ::core::arch::foo::bar; (leading ::) instead of just use core::arch::foo::bar;.

This should be a simple search and replace operation.

[avx2] gather operations

At first glance it seems like these can't be made safe and also totally zero-runtime-cost in an ergonomic way. I'm open to suggestions.

I hear these requested often enough when people talk about avx2, so I think if we can't do it safely with no runtime checks it would be permissible to expose versions of these that did some runtime checking if it was clearly marked as such.

_mm_i32gather_epi32
_mm_i32gather_epi64
_mm_i32gather_pd
_mm_i32gather_ps
_mm_i64gather_epi32
_mm_i64gather_epi64
_mm_i64gather_pd
_mm_i64gather_ps
_mm_mask_i32gather_epi32
_mm_mask_i32gather_epi64
_mm_mask_i32gather_pd
_mm_mask_i32gather_ps
_mm_mask_i64gather_epi32
_mm_mask_i64gather_epi64
_mm_mask_i64gather_pd
_mm_mask_i64gather_ps
_mm256_i32gather_epi32
_mm256_i32gather_epi64
_mm256_i32gather_pd
_mm256_i32gather_ps
_mm256_i64gather_epi32
_mm256_i64gather_epi64
_mm256_i64gather_pd
_mm256_i64gather_ps
_mm256_mask_i32gather_epi32
_mm256_mask_i32gather_epi64
_mm256_mask_i32gather_pd
_mm256_mask_i32gather_ps
_mm256_mask_i64gather_epi32
_mm256_mask_i64gather_epi64
_mm256_mask_i64gather_pd
_mm256_mask_i64gather_ps

Sort out all the "convert" operations

We should inventory all the "convert" operations.

If necessary, we can come up with some sort of convention, or possibly more than one convention, and rename them for consistency.

Rename `_low` to `_s`

The low-lane-only operations should be renamed to a better name. _low already kinda mixes with the unpack low and unpack high stuff.

_s was suggested for "scalar", which is as good a name as any.

Add `_mask` postfix for the boolish-output-functions

[avx] testc, textnzc, testz

I skipped them the first time around because they seem borderline useless anyway, but technically they should go in I guess.

Remove super::* imports

It may be a good idea to replace super::* imports with imports to related types and core::arch::*

Ensure all macros use crate types via an absolute import

Any time that a macro in this crate uses a type from this crate, such as m128, we need to prefix that usage with $crate::. So for example it would become $crate::m128

[avx512] General AVX-512 tracking issue

There's a few problems with adding AVX-512 support:

I don't have a device to develop on that even supports AVX-512 (for testing).
Within Rust, it's all Nightly-only.

So (for now) we're blocked on adding avx512, but it'd be nice to have "some day".

This wouldn't be a 1.0 blocker, if we got to a state where we were otherwise 1.0 ready.

Put more verbs in the glossary

For any operation name that isn't already so common it's in the rust core lib (eg: add and sqrt) we should put them into the verb glossary.

And if it gets big enough, we might want to put the verb glossary on its own page.

[cpuid] Let the user perform runtime checks for CPU features

Even if the user wants to enable features at compile time or not, it's helpful as an extra sanity check to be able to check at program start that the features really are available and maybe print an error message or something if the actual CPU isn't ready to handle what you're doing.

first call __get_cpuid_max(0) and check ret.0 for the max leaf.
If a leaf has sub-leaves you need to know the max of, call __get_cpuid_max(leaf) and check ret.1 for that max.
once you know your limits, particular features can be checked for by getting the info for a leaf and checking the bits of a particular return register. Which bit you need to look for in what register in what leaf is mostly covered in the CPUID wikipedia article.

Functions should say in their docs what intrinsics / assembly they implement

Right now you need the source files open to get intrinsic names, and then you'd also need to go to the intel intrinsics guide to see what the actual assembly instruction of the intrinsic is.

We can do better.

Proposed format:

/// short description.
///
/// extra details, if any
/// ```
/// doc test example
/// ```
/// * **Intrinsic:** [`name`]
/// * **Assembly:** `op arg, arg, arg`

Remember that the first line of the docs shows up in the function's summary when listing the functions of an entire module, so we want that to be short and sweet, and then everything past that first line can be various levels of info dump about what's going on.

Contribution Guidelines: If you want to try your hand at this, pick a small module with 10 or less functions (such as adx or bmi1) and do it for just that one small module. Then we can see if it's a comfortable format to look at and so on.

Convert all doc tests to integration tests

Currently all the tests are done via doc tests.

This means that most functions have only a single test case.

Particularly for a lot of the macros we probably want tests that check all sorts of edge cases. These should be put into the crate as integration tests (that is, modules in the tests/ folder).

average_uX docs describe the wrong action.

Try to make macro names terser without losing clarity

The blends can probably skip the word "immediate", in fact the probably all can. Or at least say "imm" or something?
The shifts can probably use shl and shr

Shuffle / Permute cleanup

Difference between them:

Shuffle is actually a = shuf(a, b, imm) (destructive)
Permute is a = perm(b, imm) (immutable)

So permute has a lot less register pressure, and should be preferred when possible, if it's available.

inside of `sse4_2.rs` toggles on the `sse4.1` feature (thankfully `lib.rs` is still correct)

This was noticed by @Soveu.

Thanks to the "two-layer paranoia" system this isn't actually a major bug, but we need to fix it obviously.

port blake3 hash to this crate as an example of performance.

https://en.wikipedia.org/wiki/BLAKE_(hash_function)

Full double check of all sideline functions (The ones not `sse` through `avx2`)

A shuffled list of all the functions not from the "main line" of sse stuff up trough avx2.

Note that part of "the point" of doing it in this chaotic random order is to prevent you from accidentally making assumptions about one function based on the previous similar function. And also hopefully it will reduce the "boringness" of the task.

Also we probably won't do this all at once because it's a heck of a lot.

Not In Rust

int _mm_tzcnt_32 (unsigned int a)
__int64 _mm_tzcnt_64 (unsigned __int64 a)
__int64 _mm_popcnt_u64 (unsigned __int64 a)
int _mm_popcnt_u32 (unsigned int a)

[avx] the ops that chop 256-bit lanes to 128-bit lanes.

The first time through I skipped over these because at first I didn't realize that they do have an inverse version that doesn't have undefined memory. I thought that the only inverses of these ops would just he undefined register content.

then i got to the end of the avx list and saw that you can zero-extend a register's high bits.

So now we want to have these in the lib again.

PartialEq for m128?

It'd be possible to do PartialEq for m128 by using cmp_eq and then moving the mask out and comparing with 0b1111.

But movemask is a costly thing and you shouldn't be branching on simd stuff as much as you should be merging values based on a mask and stuff.

So i dunno.

splat / set_splat inconsistency

avx uses set_splat
sse2 uses just splat and drops the set. Others might also

probably we want to change everything to be set_splat and then we'd have "set" as one of our core verbs in the glossary and splat is just a modifier word.

improve Neg for m128

It was suggested that we don't use a const, and instead make a zeroed register and compare it to itself to get the all 1s reg.

needs godbolt

Stable Intrinsics Todo

This is the "1.0 goals list".

Before 1.0, everything on this list should either be added to the crate or examined and determined to be "post-1.0".

If support for a feature is added to the crate (one module per feature), then we delete it from this list. There's too much stuff to keep everything in the list and just check off boxes.
If a feature is examined and determined to be something we shouldn't support until post-1.0 (for some reason) then we'll break that off and make a separate issue for those items. In this case, those items would also be deleted from the list here.
So once we're all done, the list will be empty.

This list was made with a little bit of regex being applied to the x86_64 list of what's stable in Rust 1.43.0 (2020-05-06).

. __cpuid
. __cpuid_count
. __get_cpuid_max

Mark all functions with their required CPU features

This is easy to do on a module-by-module basis, you just apply the attribute to each function in a module.

// example for "avx"
#[cfg_attr(docs_rs, doc(cfg(target_feature = "avx")))]

I just didn't start doing this at the beginning of crate development so someone has to go do this for previous work. With clever search and replace a person could probably update the entire crate in a matter of minutes.

doc nge vs lt

document why a person would use negated comparisons.

Sort out all bit shifting conventions

_mm256_insertf128_si256(avx) vs _mm256_inserti128_si256 (avx2)

It seems like the docs and even signature for _mm256_insertf128_si256 and _mm256_inserti128_si256 are essentially the same, however based on the names and the Felix Cloutier Notes it seems like _mm256_insertf128_si256 is mis-typed and should be operating on floating data, not integer data.

This would be a fairly simple fix, we can just throw in an extra cast or two if needed. The question is if this conclusion is correct and if we should make this adjustment for the user. Normally I'd be against safe_arch doing a "fix" like this but in this case it's the intel intrinsics who are wrapping the assembly wrong, so it feels fair to give proper direct access to the assembly.

Pings to:

Remove [must_use] from functions that don't return anything

Noticed that when using store_unaligned functions
I'll push changes with one additional test/bench

remove AsRef and AsMut from m128d

will break

test_mixed_ones_and_zeroes_m128i

this talks about "compliment" but should be "carry"

Do we need to use `_mm_lddqu_si128`

Compiler Explorer shows that there's some sort of difference between _mm_loadu_si128 and _mm_lddqu_si128, but I can't tell what and we need to investigate that probably.

inline m128 operator overloads.

clarify naming scheme for getters / setters

We've got a number of getters and settters, and the names have become a little less consistent.

extract_i16_as_i32_m128i! should probably be "get" like the rest.

Also some of the "getters" don't get raw data they do a round, so that should be a "convert" maybe.

Naming is a mess.

Our naming rules are...

"cast" preserves the exact bit patterns involved.
... ?

lokathor / safe_arch Goto Github PK

safe_arch's Introduction

safe_arch

Additional Resources

safe_arch's People

Contributors

Stargazers

Watchers

Forkers

safe_arch's Issues

Also we probably won't do this all at once because it's a heck of a lot.

Recommend Projects

Recommend Topics

Recommend Org