Giter Club home page Giter Club logo

Comments (4)

 avatar commented on August 10, 2024

Interesting. this sounds like it is 1000 times too slow :D

from arsenal.

 avatar commented on August 10, 2024

Looking at the code, I'm wondering about three points:

  1. Does the fact that we're doing an in-place shuffle rather than building a new array have an impact on "shuffling" time ?
  2. Here, we're swapping values potentially multiple times within the array before we're done. Could it have an impact also ?
  3. The fact that we're using the cryptographically secure randRange, with all the small computations that we're asking Node.JS to do, and that loops over parseInt(GenerateSomeBinarydata().toString('hex'), 16) may really not be an optimal way of shuffling an array for a simple bootstrap list....

Luckily, this shuffling should only be done only once at startup over a bootstraplist, and as such should not be much of a pain in runtime. Can you confirm this for me, @LaurenSpiegel ?

from arsenal.

LaurenSpiegel avatar LaurenSpiegel commented on August 10, 2024

@DavidPineauScality, yes this shuffle function is only used when S3 starts up and instantiates a bucketClient and an sproxydclient. It is not used in the metadata repo (they have their own shuffle function in dbapi). Note that sproxydclient has the same function in its repo rather than calling from Arsenal so that will have to be updated to point to the super new Arsenal shuffle when it is updated. I will create an issue in sproxydclient linking to this issue.

from arsenal.

NicolasT avatar NicolasT commented on August 10, 2024

Let's see...

function shuffle(array) {
    for (let i = array.length - 1; i > 0; i--) {
        const randIndex = randomRange(0, i);
        const randIndexVal = array[randIndex];
        array[randIndex] = array[i];
        array[i] = randIndexVal;
    }
    return array;
};

Let's assume our input array is of length 8. Here goes:

function randomRange(min, max) {
    if (max < min) {
        throw new Error("Invalid range");
    }
    if (min === max) {
        return min;
    } else if (min < max) {
        const range = (max - min);
        const bits = bitsNeeded(range);
        // decide how many bytes we need to draw from nextBytes: drawing less
        // bytes means being more efficient
        const bytes = bitsToBytes(bits);
        // we use a mask as an optimization: it increases the chances for the
        // candidate to be in range
        const mask = createMaskOnes(bits);
        let candidate;
        do {
            candidate = parseInt(nextBytes(bytes).toString('hex'), 16) & mask;
        } while (candidate > range);
        return (candidate + min);
    }
}

This is called with (min=0, max=8) so we move into the last block. range becomes 8, then we calculate the number of bits that are required to represent this number:

function bitsNeeded(number) {
    if (number < 0) {
        throw new Error("Input must be greater than or equal to zero");
    } else if (number === 0) {
        return 1;
    } else {
        return Math.floor(Math.log2(number)) + 1;
    }
}

OK, 8 = 2 ** 3, so bits becomes 4. Next step:

function bitsToBytes(numBits) {
    if (numBits < 0) {
        throw new Error("Input must be greater than or equal to zero");
    }
    return Math.ceil(numBits / 8);
}

so bytes is 1, as expected. Finallly, the mask calculation:

function createMaskOnes(numBits) {
    if (numBits < 0)
        throw new Error("Input must be greater than or equal to zero");
    return Math.pow(2, numBits) - 1;
}

and mask is 15 (or 01111 in binary). Now, we go in the loop. Let's assume nextBytes, toString and parseInt are infinitely fast (they're not, but that's beside the point). We take the result, which is some random integer, and & it with mask. Then, until candidate <= range (with range = 8), we repeat the loop.

Consider what's going on here. The parseInt(nextBytes(bytes).toString('hex'), 16) call gives us some random number (in this case within [0..255] because bytes = 1). After the & with mask, in this case mask being 1111 (binary), we get a number which is somewhere between 0 and... 15 (inclusive). So there's at every iteration an almost-50% chance we have to do one more iteration. If you're "unlucky", it might take a while to hit a 'matching' random value.

Or nextBytes might be a bit slow of course, that could also explain.

What you could consider doing here is implement a very simple (and fast) PRNG (minstd/Lehmer could do perfectly well to shuffle something, see https://en.wikipedia.org/wiki/Lehmer_random_number_generator), seed it with a 'cryptographic' random number (or even a plain timestamp), and simply modulo the PRNG result every time so the generated value fits the range you're aiming for.

Also, there must be a better/more efficient way to turn a couple of bytes into the number they represent besides turning them into a hex-encoded string and parsing it base-16?

Edit: Of course 8 is a bit of a 'worst case' (and so is every power of 2) causing a near-50% miss rate (or 50% chance the loop spins). Any 2 ** N - 1 (e.g. 15) range needs only a single iteration, and others are somewhere in-between. But because the shuffle function calculates a randomRange(0, N) for every N from the length of the input array to 0, you hit all cases, including the ones most likely to spin.

from arsenal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.