Giter Club home page Giter Club logo

bytecount's People

Contributors

atouchet avatar dependabot-preview[bot] avatar evolvedmicrobe avatar freaky avatar gelbpunkt avatar ignatenkobrain avatar jmagnuson avatar llogiq avatar palfrey avatar philippeitis avatar pthariensflame avatar ralfjung avatar stebalien avatar veedrac avatar waywardmonkeys avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bytecount's Issues

Use quickcheck for tests

I did that and actually found some issues. See https://github.com/vks/bytecounttest. For instance, bytecount::count returns 2 for this input: &[0, 0, 0, 0, 0, 0, 0, 0, 33, 32, 0, 0, 0, 0, 0, 0], 33

After these are fixed, it might be a good idea to use AFL, which is a bit more sophisticated than quickcheck.

"avx-accel" not available

When using avx-accel feature to accelerate byte counting, the compilation failed like:

   Compiling bytecount v0.1.6
error[E0432]: unresolved import `simd::x86::avx::LowHigh128`
  --> xxx/.cargo/registry/src/github.com-1ecc6299db9ec823/bytecount-0.1.6/src/lib.rs:12:22
   |
12 | use simd::x86::avx::{LowHigh128, u8x32};
   |                      ^^^^^^^^^^ Could not find `avx` in `x86`

error[E0432]: unresolved import `simd::x86::avx::u8x32`
  --> xxx/.cargo/registry/src/github.com-1ecc6299db9ec823/bytecount-0.1.6/src/lib.rs:12:34
   |
12 | use simd::x86::avx::{LowHigh128, u8x32};
   |                                  ^^^^^ Could not find `avx` in `x86`

error: aborting due to previous error(s)

error: Could not compile `bytecount`.

And from docs.rs avx is really absent, but rust-lang-nursery.github.io seems saying it's there.

I'm really lost what's going on. Should I add other relevant crates to make it work?

Replace `Iterator::sum()` with `fold`

The recent update #14 inadvertently increased the minimum Rust version to 1.11. We should be able to use a fold without performance loss.

I'll do it once I get to my PC – unless someone beats me to it.

Returns wrong value on some inputs

This code, in 0.1.2:

 let mytext = "some potentially large text, perhaps read from disk?";
 let spaces = bytecount::count(mytext.as_bytes(), b' ');
 let text2 = "12111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111121";
 let text3 = "10111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101";
 let count2 = bytecount::count(text2.as_bytes(), b'1');
 let count3 = bytecount::count(text3.as_bytes(), b'1');
 println!("bytecount {} {} {}", spaces, count2, count3); 

prints bytecount 7 126 128. It should be 7 126 126.

See conversation in xi-editor/xi-editor#122 for where this came from.

understanding SIMD intrinsics

I read the blog post about fast byte counting and I was reading through the code to better understand the SIMD intrinsic. I was mostly able to follow along but I got confused at this part

counts = _mm256_sub_epi8(
counts,
_mm256_cmpeq_epi8(mm256_from_offset(haystack, offset), needles)
);

I don't understand why we are subtracting the results of the compare instead of adding them. It seems like this will give incorrect results, but obviously doesn't, and I can't explain why. Could someone explain it to me?

Add method to count UTF-8 chars

Basically we just need an bitwise AND or OR and compare with the respective value whose highest bits are '10' and subtract the count from the slice length.

This could also go into .chars().count() if it turns out to not lose too much time compared to naive onn short strings.

Version 0.6.5 is broken on wasm targets (version 0.6.3 worked)

While compiling bytecount as a dependency of nom_locate v4.2.0 for wasm targets, I get the following error:

   Compiling bytecount v0.6.5
error[E0080]: evaluation of `core::arch::wasm32::i32x4_extract_lane::<4>::{constant#0}` failed
    --> /home/steph/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/wasm32/simd128.rs:1212:5
     |
1212 |     static_assert!(N < 4);
     |     ^^^^^^^^^^^^^^^^^^^^^ the evaluated program panicked at 'assertion failed: N < 4', /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/../../stdarch/crates/core_arch/src/wasm32/simd128.rs:1212:5
     |
     = note: this error originates in the macro `assert` which comes from the expansion of the macro `static_assert` (in Nightly builds, run with -Z macro-backtrace for more info)

note: the above error was encountered while instantiating `fn core::arch::wasm32::u32x4_extract_lane::<4>`
  --> /home/steph/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bytecount-0.6.5/src/simd/wasm.rs:62:9
   |
62 |         u32x4_extract_lane::<4>(u32s),
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0080`.

The minimal reproducible test case is here:
https://github.com/stephanemagnenat/bytecount-89

Just do cargo build --target wasm32-unknown-unknown in bytecount-issue-89/ and you get the same error as above (rustc 1.73.0 (cc66ad468 2023-10-03)).

The test case is trivial: The repository is a newly-created project with cargo and with bytecount added as a dependency.

CI: run clippy if it's available

We may want to have one nightly build that runs clippy over our code. Clippy gets new and improved lints all the time, so having it run could be useful.

Failed to parse manifest

With the latest nightly, attempting to build bytecount 0.1.6 gives this error:

error: unable to get packages from source

Caused by:
  failed to parse manifest at `/home/jeremy/.cargo/registry/src/github.com-1ecc6299db9ec823/bytecount-0.1.6/Cargo.toml`

Caused by:
  can't find `bench` bench, specify bench.path

Failing test

This test fails

let haystack = vec![0, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
let needle = 2;
assert_eq!(count(&haystack[0..], needle), naive_count(&haystack[0..], needle));

giving a count of 2 rather than 1. The errors are with

(((x ^ needles).wrapping_sub(LO)) & !x & HI) >> 7

Firstly, this should be

let x = x ^ needles;
((x.wrapping_sub(LO)) & !x & HI) >> 7

Secondly, this doesn't actually work for counting - the x.wrapping_sub(LO) sets the high bit correctly but can overflow into the next value if the next byte is 1 - wrapping_sub will decrement that to zero and the subtraction will continue past.

It's fairly simple to deal with by using

!((((v & !HI) + !HI) | v) >> 7) & LO

instead, but it'd be interesting to see if you don't have to spend the extra operations to get that. It doesn't seem to affect speed all too much, though.

"Fancy new algorithm" doesn't work on big-endian machines

Originally from BurntSushi/ripgrep#1144, we traced down the issue to bytecount, specifically commit c6668a3:

git bisect start master f2a67d5
git bisect run sh -c 'RUSTFLAGS="-L $LD_LIBRARY_PATH" RUSTDOCFLAGS="-L $LD_LIBRARY_PATH" cargo test'
[..]
running 9 tests
test check_count_correct ... FAILED
test check_count_large ... ok
test check_count_overflow ... ok
test check_count_some ... FAILED
test check_num_chars_correct ... ok
test check_count_large_rand ... ok
test check_num_chars_overflow ... ok
test check_num_chars_some ... ok
test check_num_chars_large ... ok

failures:

---- check_count_correct stdout ----
thread 'check_count_correct' panicked at '[quickcheck] TEST FAILED. Arguments: (([0, 0, 0, 1, 0, 0, 0, 0, 0, 0], 0))', /home/infinity0/.cargo/registry/src/github.com-eae4ba8cbf2ce1c7/quickcheck-0.6.2/src/tester.rs:179:28
note: Run with `RUST_BACKTRACE=1` for a backtrace.

---- check_count_some stdout ----
thread 'check_count_some' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', tests/check.rs:43:5


failures:
    check_count_correct
    check_count_some

test result: FAILED. 7 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out

error: test failed, to rerun pass '--test check'
c6668a3264ec208ae9733f83eb94b6312518f14a is the first bad commit
commit c6668a3264ec208ae9733f83eb94b6312518f14a
Author: Veedrac <[email protected]>
Date:   Wed Jan 31 02:54:17 2018 +0000

    Fancy new algorithm on stable SIMD

:100644 100644 f37644090b9886a0ed33293c5ac353b52631d2e3 c4b93928eda2ec61972beef49ed8c40860639d89 M      .travis.yml
:100644 100644 1eecc67f7677cdb843eba653d689faef0b3d7b13 91395c0abca0a56e7bf0f8075dbf83b25fc1a1e0 M      Cargo.toml
:100644 100644 29d6473403e8fcfe5ecb6d4b2d144cdd782fc7ea e76dfae2c603a0e87bb29efe4904200f09a56a4e M      appveyor.yml
:040000 040000 d0ac6b0edbf5fe91a4c2d499eb4e2f94ba91b799 47a3eb4dc78afe60b98f2b1f88f1ebd724ce690f M      src
bisect run success

Add no_std support

This is such a basic (in form of we need this everywhere, not in form of "simple") feature, I think it should be supported on embedded targets, too.

How does this work?

This library looks really interesting. As a person who hasn't done any real bit-twiddling yet, it is quite hard for me to get my mind around, though.

Could you explain, in non-bit-twiddler-terms, why this crate is faster than regular iterators / adaptors?

bytecount::num_chars returns wrong value if cpu doesn't support avx2 feature

PoC

use std::io;
use std::env::var;
use std::io::prelude::*;
use std::fs::File;

use bytecount;

fn main() -> io::Result<()> {
    let mut buffer = Vec::new();
    let mut f = File::open(format!("{}/sample.txt", var("HOME").unwrap()))?;
    f.read_to_end(&mut buffer)?;

    println!("num_chars = {}", bytecount::num_chars(&buffer) as isize);
    Ok(())
}

Results:

# cpu doesn't supports avx2 feature
$ ./test-avx2
num_chars = 4076

# cpu supports avx2 feature
$ ./test-avx2 
num_chars = 8172

$ python -c "from pathlib import Path; print('num_chars =', len(Path('~/sample.txt').expanduser().read_text()))"
num_chars = 8172

sample.txt

CLI version

Considering how we have projects like ripgrep, exa, and fd which aim to oxidise common shell utils, it would be great to create an alternative to the wc command that uses bytecount (I vote to name the binary ct, short for count).

Considering how the hard work has already been done for the actual counting, all that really would have to be done is parse command-line arguments with clap or similar. While we don't have an actual "word" counting functionality, 99% of the time I see wc is via wc -l anyway, and it seems pretty straightforward to make a binary that will do this with bytecount.

Aarch64 build appears broken

Intrinsic has incorrect return type!
ptr @llvm.aarch64.neon.ld1x4.v16i8.p0i8
in function _ZN4core9core_arch10arm_shared4neon9generated11vld1q_u8_x417h224d569be9c8b1b6E
LLVM ERROR: Broken function found, compilation aborted!
error: could not compile `bytecount` (lib)
warning: build failed, waiting for other jobs to finish...
$ uname -vm
Darwin Kernel Version 23.1.0: Mon Oct  9 21:28:45 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6020 arm64

benchmark -C codegen-units=N

Apparently, the default changed to codegen-units=16. However, when benchmarking I don't see a clear winner between N=1 or N=16 – it shows some small win on byte counting and some small loss on num_chars, but in both cases, it's a wash. Perhaps someone else can benchmark on a beefier machine?

SSE2 path uses SSE4.1 intrinsic

#[target_feature(enable = "sse2")]
unsafe fn sum(u8s: &__m128i) -> usize {
    let sums = _mm_sad_epu8(*u8s, _mm_setzero_si128());
    (_mm_extract_epi32(sums, 0) + _mm_extract_epi32(sums, 2)) as usize
}

Per _mm_extract_epi32 documentation -- Available on (x86 or x86-64) and target feature sse4.1 and x86-64 only

Documentation, naive_count_32 vs naive_count

These two functions are presented, with naive_count_32 having the weakness that it can overflow. That's a trade-off I might be happy to make, but the documentation doesn't say what the performance difference is. Could that be added, at least in one benchmark, to enable a more informed decision?

Needs crates.io release

The most recent version of bytecount published to crates.io (0.6.0, published August 2019) does not compile. It looks like a fix was sent to bytecount in PR #63 but that fix has never been published.

[dependencies]
bytecount = { version = "*", features = ["generic-simd"] }
error[E0432]: unresolved import `crate::arch::x86_64::_mm_movemask_pi8`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.