Comments (4)
Interesting. this sounds like it is 1000 times too slow :D
from arsenal.
Looking at the code, I'm wondering about three points:
- Does the fact that we're doing an in-place shuffle rather than building a new array have an impact on "shuffling" time ?
- Here, we're swapping values potentially multiple times within the array before we're done. Could it have an impact also ?
- The fact that we're using the cryptographically secure randRange, with all the small computations that we're asking Node.JS to do, and that loops over parseInt(GenerateSomeBinarydata().toString('hex'), 16) may really not be an optimal way of shuffling an array for a simple bootstrap list....
Luckily, this shuffling should only be done only once at startup over a bootstraplist, and as such should not be much of a pain in runtime. Can you confirm this for me, @LaurenSpiegel ?
from arsenal.
@DavidPineauScality, yes this shuffle function is only used when S3 starts up and instantiates a bucketClient and an sproxydclient. It is not used in the metadata repo (they have their own shuffle function in dbapi). Note that sproxydclient has the same function in its repo rather than calling from Arsenal so that will have to be updated to point to the super new Arsenal shuffle when it is updated. I will create an issue in sproxydclient linking to this issue.
from arsenal.
Let's see...
function shuffle(array) {
for (let i = array.length - 1; i > 0; i--) {
const randIndex = randomRange(0, i);
const randIndexVal = array[randIndex];
array[randIndex] = array[i];
array[i] = randIndexVal;
}
return array;
};
Let's assume our input array is of length 8. Here goes:
function randomRange(min, max) {
if (max < min) {
throw new Error("Invalid range");
}
if (min === max) {
return min;
} else if (min < max) {
const range = (max - min);
const bits = bitsNeeded(range);
// decide how many bytes we need to draw from nextBytes: drawing less
// bytes means being more efficient
const bytes = bitsToBytes(bits);
// we use a mask as an optimization: it increases the chances for the
// candidate to be in range
const mask = createMaskOnes(bits);
let candidate;
do {
candidate = parseInt(nextBytes(bytes).toString('hex'), 16) & mask;
} while (candidate > range);
return (candidate + min);
}
}
This is called with (min=0, max=8)
so we move into the last block. range
becomes 8, then we calculate the number of bits that are required to represent this number:
function bitsNeeded(number) {
if (number < 0) {
throw new Error("Input must be greater than or equal to zero");
} else if (number === 0) {
return 1;
} else {
return Math.floor(Math.log2(number)) + 1;
}
}
OK, 8 = 2 ** 3, so bits
becomes 4. Next step:
function bitsToBytes(numBits) {
if (numBits < 0) {
throw new Error("Input must be greater than or equal to zero");
}
return Math.ceil(numBits / 8);
}
so bytes
is 1, as expected. Finallly, the mask calculation:
function createMaskOnes(numBits) {
if (numBits < 0)
throw new Error("Input must be greater than or equal to zero");
return Math.pow(2, numBits) - 1;
}
and mask
is 15 (or 01111 in binary). Now, we go in the loop. Let's assume nextBytes
, toString
and parseInt
are infinitely fast (they're not, but that's beside the point). We take the result, which is some random integer, and &
it with mask
. Then, until candidate <= range
(with range = 8
), we repeat the loop.
Consider what's going on here. The parseInt(nextBytes(bytes).toString('hex'), 16)
call gives us some random number (in this case within [0..255]
because bytes = 1
). After the &
with mask
, in this case mask
being 1111
(binary), we get a number which is somewhere between 0 and... 15 (inclusive). So there's at every iteration an almost-50% chance we have to do one more iteration. If you're "unlucky", it might take a while to hit a 'matching' random value.
Or nextBytes
might be a bit slow of course, that could also explain.
What you could consider doing here is implement a very simple (and fast) PRNG (minstd/Lehmer could do perfectly well to shuffle something, see https://en.wikipedia.org/wiki/Lehmer_random_number_generator), seed it with a 'cryptographic' random number (or even a plain timestamp), and simply modulo the PRNG result every time so the generated value fits the range you're aiming for.
Also, there must be a better/more efficient way to turn a couple of bytes into the number they represent besides turning them into a hex-encoded string and parsing it base-16?
Edit: Of course 8 is a bit of a 'worst case' (and so is every power of 2) causing a near-50% miss rate (or 50% chance the loop spins). Any 2 ** N - 1
(e.g. 15) range
needs only a single iteration, and others are somewhere in-between. But because the shuffle
function calculates a randomRange(0, N)
for every N
from the length of the input array to 0, you hit all cases, including the ones most likely to spin.
from arsenal.
Related Issues (20)
- multiple high and critical issues in `yarn audit` HOT 1
- empty/invalid/unredable/none Date header: errors incompatible with AWS
- invalid amz-date header: errors incompatible with AWS
- incorrect authorization header: errors incompatible with AWS
- authv4 sort query params HOT 1
- COMPAT: Increase Signature V2 Expires param
- COMPAT: return AccessDenied if Date before epochTime (01/01/1970) HOT 4
- callApiMethod in routes is not defined
- Error messages should not have periods
- Review of https://github.com/scality/Arsenal/pull/2152
- Re-review of https://github.com/scality/Arsenal/pull/2152
- Policy evaluation, action should be case insensitive
- Tests for #240 HOT 1
- delimiter.js maxKeys==0 param issue HOT 4
- Logs are broken HOT 6
- Exception occurs when using v4 authentication with certain query parameters
- Unbalanced log
- non monotonic clock used in version ID generation
- Action required: Greenkeeper could not be activated 🚨
- Invalid Greenkeeper configuration file
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arsenal.