scandum / fluxsort Goto Github PK
View Code? Open in Web Editor NEWA fast branchless stable quicksort / mergesort hybrid that is highly adaptive.
License: The Unlicense
A fast branchless stable quicksort / mergesort hybrid that is highly adaptive.
License: The Unlicense
iirc, long double
is a weird 80-bit floating point number that x86 apparently supports
Depending on the compiled platform (32bit x86 or x86_64) the 80 bits get rounded up to 12 or 16 bytes
https://github.com/scandum/fluxsort/blob/main/src/fluxsort.h#L177
Hi, I've recently been doing some research on sort implementations and noticed that in my tests that fluxsort is not stable. The test and benchmark suite is written in Rust, the following test fails for fluxsort but passes for other stable sorts such as Rust slice::sort
and C++ std::stable_sort
but not for fluxsort. You can find the test code here https://github.com/Voultapher/sort-research-rs/blob/2642b98dc60cb6c9a6dbdec706a71ebdfbdf89e1/tests/main.rs#L219. In essence it packs two i32 into one u64 and extracts in the comparison function the two i32 again and only uses the first one as comparison and then later checks that the order of the ascending second elements remains in order.
Also on a sidenote, I'm seeing rather different benchmark results, where fluxsort pretty much never beats pdqsort.
First off, sorry again to have jumped to conclusions on Hacker News. As a programming language guy I'm in the market for a stable sort and Fluxsort seems to have the fundamentals down!
I wanted to know if you're interested in incorporating some or all of what I do into this repository. I'd like to help improve Fluxsort's worst cases and pattern recognition with techniques like pdqsort uses, and expand the interface to support other operations. Some possible topics, which I can also split into individual issues:
(pdqsort depends on instability a lot, but I think there are usually ways to avoid this)
As evidence that I have some clue what I'm talking about, here's a small improvement: the following inner loop for flux_partition
gives about a 3% speedup in my benchmarks on random 1e6-element 4-byte int arrays.
size_t n=0;
val = cmp(ptx + 0, &piv) <= 0; pta[n] = ptx[0]; pts[0-n] = ptx[0]; n += val;
// etc.
pta += n; pts += 8-n;
ptx += 8;
The fundamental branchless swap_if code produces suboptimal code on x86-64. I ported it to Rust and noticed that changing it yielded a 50% performance uplift for that function on Zen3, this will of course depend on the the hardware, but cmov seems to yield better results than setl/setg style code that is currently being produced. Probably helped by doing 8 instead of 10 instructions.
Here is the current version:
And here is the version that produces cmov code:
I think if you can find a way to reliably produce cmov instructions like LLVM does, you should see a noticeable speed improvement.
Time is a lousy benchmark, especially with a low numbers of elements running on an OS. There a lot of way of profiling but I would suggest using an LLVM based profiling solution like this one because it measures the number of intermediate language instructions executed. This avoids compiler/microcode optimizations and gives you the most representative output for all platforms.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.