Giter Club home page Giter Club logo

Comments (4)

npmccallum avatar npmccallum commented on August 22, 2024

Const generics are now stable in 1.51.

from lzma-rs.

gendx avatar gendx commented on August 22, 2024

I had a brief look, the min_const_generics feature was indeed stabilized in 1.51, but we still need the more complex const_evaluatable_checked feature due to the 1 << num_bits expression here.

In the meantime, it might be worth experimenting with it on nightly to see if benchmarks would improve with const generics.

from lzma-rs.

chyyran avatar chyyran commented on August 22, 2024

I've been doing some experiments in https://github.com/chyyran/lzma-rs/commits/feature-perf-experiments and found that const generics in BitTree and RangeCoder doesn't actually contribute that much performance benefit, except for large dictionaries.

Without const generics (but with some additional optimizations) SnowflakePowered@ecc71a0

running 8 tests
test compress_65536                  ... bench:   1,372,700 ns/iter (+/- 32,501)
test compress_empty                  ... bench:         713 ns/iter (+/- 35)
test compress_hello                  ... bench:       1,021 ns/iter (+/- 50)
test decompress_after_compress_65536 ... bench:   2,463,569 ns/iter (+/- 77,148)
test decompress_after_compress_empty ... bench:       1,720 ns/iter (+/- 146)
test decompress_after_compress_hello ... bench:       2,219 ns/iter (+/- 50)
test decompress_big_file             ... bench:   3,959,745 ns/iter (+/- 3,155,824)
test decompress_huge_dict            ... bench:       2,221 ns/iter (+/- 136)

With const generics SnowflakePowered@ad7d096

running 8 tests
test compress_65536                  ... bench:   1,379,341 ns/iter (+/- 160,163)
test compress_empty                  ... bench:         711 ns/iter (+/- 57)
test compress_hello                  ... bench:       1,016 ns/iter (+/- 30)
test decompress_after_compress_65536 ... bench:   2,442,160 ns/iter (+/- 125,797)
test decompress_after_compress_empty ... bench:         544 ns/iter (+/- 110)
test decompress_after_compress_hello ... bench:       1,003 ns/iter (+/- 47)
test decompress_big_file             ... bench:   3,939,209 ns/iter (+/- 177,650)
test decompress_huge_dict            ... bench:       1,065 ns/iter (+/- 40)

Probably worth pursuing in the future because of that benefit but SnowflakePowered@ad7d096 uses kind of a nasty workaround to get around the lack of #[feature(generic_const_exprs)] by putting 1 << N into a const fn and passing it in at the constructor site.

from lzma-rs.

chyyran avatar chyyran commented on August 22, 2024

Vec2D from #77 lets us easily experiment with different backing stores for the literal_probs array which is the main allocation.
I used arrayvec which gives us a nice API over a raw array.

Since DecoderState needs to be able to update lclppb at runtime, we can't just parameterize it with const generics unlike BitTree, so we have to account for the theoretical maximum parameters which is sizeof(u16) * 0x300 * { 1 << (4 + 8) } = 3145728 * 2. An array of 6 megabytes is way too large to fit on the stack and I was unable to actually decompress anything without a stack overflow.

For LZMA2 streams, the theoretical maximum parameters for the literal_probs array is sizeof(u16) * 0x300 * { 1 << 4 } = 12288 * 2. While this isn't great, 24KB is way smaller than 6MB so I was able to actually run some benchmarks, which are not great.

These benchmarks are with an allocating BitTree, i.e. without const-generics.

ArrayVec<u16, 0x300 * { 1 << 4 }>

running 8 tests
test compress_65536                  ... bench:   1,323,468 ns/iter (+/- 174,348)
test compress_empty                  ... bench:         695 ns/iter (+/- 20)
test compress_hello                  ... bench:       1,035 ns/iter (+/- 90)
test decompress_after_compress_65536 ... bench:   1,336,165 ns/iter (+/- 57,379)
test decompress_after_compress_empty ... bench:       6,166 ns/iter (+/- 359)
test decompress_after_compress_hello ... bench:       6,531 ns/iter (+/- 224)
test decompress_big_file             ... bench:   3,716,763 ns/iter (+/- 121,865)
test decompress_huge_dict            ... bench:       6,572 ns/iter (+/- 315)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 13.64s

Box<[u16]>

running 8 tests
test compress_65536                  ... bench:   1,334,964 ns/iter (+/- 1,371,984)
test compress_empty                  ... bench:         709 ns/iter (+/- 436)
test compress_hello                  ... bench:       1,033 ns/iter (+/- 64)
test decompress_after_compress_65536 ... bench:   1,145,697 ns/iter (+/- 82,369)
test decompress_after_compress_empty ... bench:       1,890 ns/iter (+/- 92)
test decompress_after_compress_hello ... bench:       2,231 ns/iter (+/- 295)
test decompress_big_file             ... bench:   3,620,380 ns/iter (+/- 152,660)
test decompress_huge_dict            ... bench:       2,271 ns/iter (+/- 75)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 7.76s

The huge array on the stack probably causes a lot of cache misses and substantially reduces performance. It seems the only benefit to using const generics here is with BitTree, where it does show an improvement in the benchmarks as above.

from lzma-rs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.