Giter Club home page Giter Club logo

Comments (16)

scandum avatar scandum commented on May 30, 2024

Fluxsort passes my own stability tests.

Do you have the cmp macro defined for primitive comparisons, or are you using a comparison function? Things can get a bit finicky since fluxsort uses the C interface for sorts by default.

from fluxsort.

Voultapher avatar Voultapher commented on May 30, 2024

Indeed I incorrectly supplied the comparison value.

Here is a reproducer:

#include <fluxsort.h>

#include <stdint.h>
#include <string.h>

// This is demo code and will only work on little-endian systems.

void print_i32_slice(int32_t *vals, size_t len)
{
    printf("[");
    for (size_t i = 0; i < len; i += 2)
    {
        printf("(%d, %d)", vals[i], vals[i + 1]);
    }
    printf("]\n");
}

int cmp_unpack_u64_to_i32(const void *a_ptr, const void *b_ptr)
{
    int32_t a = *(int32_t *)(a_ptr);
    int32_t b = *(int32_t *)(b_ptr);

    printf("comparing a: %d b: %d\n", a, b);

    // if (a < b)
    // {
    //     return -1;
    // }
    // else if (a > b)
    // {
    //     return 1;
    // }

    // return 0;

    if (a < b)
    {
        return -1;
    }

    return 1;
}

int main()
{
    uint64_t original[2] = {4294967296, 8589934592};
    uint64_t v[2] = {};
    memcpy(v, original, sizeof(original));

    printf("original: ");
    print_i32_slice((int32_t *)original, 4);

    fluxsort((void *)v, 2, sizeof(uint64_t), cmp_unpack_u64_to_i32);

    printf("sorted: ");
    print_i32_slice((int32_t *)v, 4);

    return 0;
}

Rust internally and C++ externally use the is_less contract where a function returns a bool. I had incorrectly mapped this concept into the glue code, that uses a Rust closure as comparison function. When I supply the full values it passes the test.

Arguably this could be avoided by limiting comparisons to is_less ie. x = cmp(pta + 1, pta) < 0; instead of x = cmp(pta, pta + 1) > 0;. This ties into the theme of how to specify the comparison function. In your benchmarks you use a - b which can invoke UB, which for example manifests itself when sorting int32_t v[2] = {-2147483644, 5}; for example by yielding an incorrect sort result. Fundamentally this is the shortcoming of the qsort interface, by limiting yourself internally to is_less one could do:

if (a < b)
{
    return -1;
}

return 1;

And still retain the stability guarantee, while losing some performance but not the 50+% extra of using the fully correct comparison function. Or bite the bullet and change the comparison interface to use is_less instead of an int.

from fluxsort.

scandum avatar scandum commented on May 30, 2024

Internally my sorts only require the (a > b) comparison, which is the right way to go about it as that translates to both greater than and not stable, as well as being compatible with the qsort interface.

C++ messed up by not being compatible with C in this regard, and less than doesn't translate to stable since that would be (a <= b). So it should be c++ and rust biting the bullet, but I doubt that's going to happen.

a - b indeed has an issue with large numbers, I'm forced to use it however in the benchmark since it calls qsort, and qsort doesn't perform proper stability checks and uses a mix of <, <=, and >.

from fluxsort.

Voultapher avatar Voultapher commented on May 30, 2024

I'm not sure this is the right place to talk about standard library API design, to clarify Rust uses Ord as their interface and is_less is an internal detail.

Indeed you seem to have already done what I suggested but biased towards is_more instead of is_less.

a - b indeed has an issue with large numbers, I'm forced to use it however in the benchmark since it calls qsort, and qsort doesn't perform proper stability checks and uses a mix of <, <=, and >.

So what you are saying is, look at my stable algorithm and how it performs against this unstable sort, with a broken compare function? There is saying that writing fast code is easy, and writing correct code is hard. Looking at the documentation of qsort https://en.cppreference.com/w/c/algorithm/qsort it tells you to do:

int compare_ints(const void* a, const void* b)
{
    int arg1 = *(const int*)a;
    int arg2 = *(const int*)b;
 
    if (arg1 < arg2) return -1;
    if (arg1 > arg2) return 1;
    return 0;
 
    // return (arg1 > arg2) - (arg1 < arg2); // possible shortcut
    // return arg1 - arg2; // erroneous shortcut (fails if INT_MIN is present)
}

Imo if you want a fair comparison you ought to use the correct comparison function for both qsort and fluxsort, and if you want the fastest absolute performance and comparison to languages with faster comparison function abstractions like C++ and Rust changing the signature seem inevitable.

from fluxsort.

scandum avatar scandum commented on May 30, 2024

Not quite, in the benchmark against pdqsort the #define cmp(a,b) (*(a) > *(b)) macro is enabled for fluxsort and no comparison functions are involved.

In the case of qsort it doesn't particularly matter for the benchmark I'm running, but admittedly arg1 - arg2 is slightly favorable for fluxsort.

from fluxsort.

Voultapher avatar Voultapher commented on May 30, 2024

Ah good to know, I'll look into benchmarking with that.

from fluxsort.

scandum avatar scandum commented on May 30, 2024

I just came across your presentation and wanted to leave a few remarks for you, figured doing so here might be the best place.

Fluxsort does work on non-integers, but what I'm doing (probably in one of the more atrocious C programming practices) is treating pointers as either 32 or 64 bit integers. The code works out of the box for 32 and 64 bit systems if sizeof(char *) is used for C strings when calling fluxsort(). In the case of 64 bit pointers the pointers are copied as long long, which isn't an issue in C, and each long long is passed as a void type to the comparison function, which can correctly cast it to the pointer it actually is. The benchmark itself performs a test using C strings.

As for stability, some kind of definite proof would be useful. One possible issue when porting is that unguarded insert requires the first 2 elements to be sorted, which the tail_swap() function that calls it takes care of.

Testing for stability is pretty rigorous in the benchmark using a dedicated method for integers, and since identical strings have different pointers, it's also detected during the string test.

Anyways, a cool presentation, and good luck with the porting effort.

from fluxsort.

Voultapher avatar Voultapher commented on May 30, 2024

Thanks for feedback and kind words :)

You are correct that it works for some user types, however because it allocates memory with malloc void* it would underalign user types with larger than pointer alignment, which would be UB. So it's support for user defined types is limited. Plus there are two other properties it doesn't have that make a port without modifications unsuitable as general purpose sort in Rust. For one, a user may see diverging observations of what value was passed into the comparison function. If in a horribly malicious but legal way the user was sorting a Mutex<Option<Box<...>>> a Mutex holding either a null-pointer or a valid pointer, and would swap the content during a comparison from owned pointer to null-pointer freeing the allocation, when the user drops the slice after sorting it would free the allocation a second time -> double free UB. The other one would be a panic/exception during comparison, leaving the slice with potential duplicate elements, which again could lead to double free UB. Of course C has neither scope based destructors (RAII) nor unwinding, so for C it's perfectly fine for user provided types as long as their type has fundamental alignment.

from fluxsort.

scandum avatar scandum commented on May 30, 2024

I don't think I use malloc void* in fluxsort. The general idea in C would be to create a custom structure in fluxsort.h for each custom size needed. It is a bit obnoxious, but writing typeless code in C is even more so.

@Voultapher

I published a simplified version of quadsort with decent performance that might be useful to your porting effort. It's around 150 lines of code. It should be some orders of magnitude easier to create a formal proof for it as well.

https://github.com/scandum/piposort

from fluxsort.

Voultapher avatar Voultapher commented on May 30, 2024

@scandum unless I misunderstand the code or the relevant C rules,

VAR *swap = malloc(nmemb * sizeof(VAR));
will be under-aligned for types with alignment larger than the fundamental alignment. Even if the type switch in the initial function is adjusted.

Regarding piposort, thanks for the suggestion. I'll certainly take a look. Note, that I've largely abandoned the porting effort and am mostly focused on improving the 'components' of the existing implementations.

from fluxsort.

scandum avatar scandum commented on May 30, 2024

Not sure if we're talking about the same thing. In fluxsort.h you'd define:

typedef struct {char bytes[20];} struct160;
#define VAR struct160
#define FUNC(NAME) NAME##160
#define STRUCT(NAME) struct NAME##160

#include "fluxsort.c"

That should allow sorting any 20 byte custom structure. Would also need to add a case 20 in void fluxsort()

from fluxsort.

Voultapher avatar Voultapher commented on May 30, 2024

I'm talking about types with custom alignment:

#include <stdalign.h>
#include <stdlib.h>

struct {
    alignas(64) char bytes[8];
} CustomType;

void* x(size_t count) {
    // wrong
    // return malloc(sizeof(CustomType) * count);

    // correct
    return aligned_alloc(alignof(CustomType), sizeof(CustomType) * count);
}

Here the malloc call as done by fluxsort would under-align CustomType.

from fluxsort.

scandum avatar scandum commented on May 30, 2024

Thanks, that's not something I've had to deal with often. I'll look into it.

from fluxsort.

scandum avatar scandum commented on May 30, 2024

As for under alignment, it shouldn't make much of a performance difference since we're not dealing with random access. People could however use their own aligned alloc and call fluxsort_swap() directly.

I'm hesitant to do anything a-typical, since people tend to make strange and inaccurate claims already. I assume it's the combination of a tempting fast algorithm, and the annoyance of trying to understand the code.

from fluxsort.

Voultapher avatar Voultapher commented on May 30, 2024

I think it's fine to leave it as is, while adding a note to the documentation. My understanding is, that for certain types such as SIMD registers it's UB to allocate and then use them if they are under-aligned.

from fluxsort.

scandum avatar scandum commented on May 30, 2024

That's a good point. I'll do some actual testing (since I can intentionally misalign the data), and at the very least suggest its usage in a code comment.

from fluxsort.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.