Giter Club home page Giter Club logo

fluxsort's People

Contributors

scandum avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fluxsort's Issues

fluxsort comparison behavior

Hi, I've recently been doing some research on sort implementations and noticed that in my tests that fluxsort is not stable. The test and benchmark suite is written in Rust, the following test fails for fluxsort but passes for other stable sorts such as Rust slice::sort and C++ std::stable_sort but not for fluxsort. You can find the test code here https://github.com/Voultapher/sort-research-rs/blob/2642b98dc60cb6c9a6dbdec706a71ebdfbdf89e1/tests/main.rs#L219. In essence it packs two i32 into one u64 and extracts in the comparison function the two i32 again and only uses the first one as comparison and then later checks that the order of the ascending second elements remains in order.

Also on a sidenote, I'm seeing rather different benchmark results, where fluxsort pretty much never beats pdqsort.

Collaborating?

First off, sorry again to have jumped to conclusions on Hacker News. As a programming language guy I'm in the market for a stable sort and Fluxsort seems to have the fundamentals down!

I wanted to know if you're interested in incorporating some or all of what I do into this repository. I'd like to help improve Fluxsort's worst cases and pattern recognition with techniques like pdqsort uses, and expand the interface to support other operations. Some possible topics, which I can also split into individual issues:

  • Add more sophisticated benchmarks with subtle patterns
  • Candidate randomization to avoid poor pivot selection for patterned as well as random data
  • Move the call stack into the memory buffer to avoid stack overflow and improve speed
  • Recognize signs of mostly-sorted inputs during pivot selection
  • Specialized integer code; maybe SIMD instructions, non-stable optimizations, or hybridization with counting sort

(pdqsort depends on instability a lot, but I think there are usually ways to avoid this)

As evidence that I have some clue what I'm talking about, here's a small improvement: the following inner loop for flux_partition gives about a 3% speedup in my benchmarks on random 1e6-element 4-byte int arrays.

size_t n=0;
val = cmp(ptx + 0, &piv) <= 0; pta[n] = ptx[0]; pts[0-n] = ptx[0]; n += val;
// etc.

pta += n; pts += 8-n;
ptx += 8;

Suboptimal code-gen in the fundamental branchless-swap building block

The fundamental branchless swap_if code produces suboptimal code on x86-64. I ported it to Rust and noticed that changing it yielded a 50% performance uplift for that function on Zen3, this will of course depend on the the hardware, but cmov seems to yield better results than setl/setg style code that is currently being produced. Probably helped by doing 8 instead of 10 instructions.

Here is the current version:

And here is the version that produces cmov code:

I think if you can find a way to reliably produce cmov instructions like LLVM does, you should see a noticeable speed improvement.

Improve benchmark

Time is a lousy benchmark, especially with a low numbers of elements running on an OS. There a lot of way of profiling but I would suggest using an LLVM based profiling solution like this one because it measures the number of intermediate language instructions executed. This avoids compiler/microcode optimizations and gives you the most representative output for all platforms.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.