First, thanks for creating this. It's really cool. I'm considering using it as a crypt

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Duplicate ids generated in torture test? about flake-idgen HOT 22 CLOSED

t-pwk commented on August 25, 2024

Duplicate ids generated in torture test?

from flake-idgen.

Comments (22)

Bramzor commented on August 25, 2024

@T-PWK Any idea? Because if there are that many collisions, we might have an issue?
@dominic-p What was the rate of ids generated per second?

from flake-idgen.

dominic-p commented on August 25, 2024

@Bramzor I just re-ran the tests. When running 4 instances in parallel I observed around 100,000 ids per second being written to disk (a little over 25,000 per instance), but I'm not sure if I'm measuring disk IO or some other factor. This was done on an i7-5820K Windows 10 machine.

I did not interrupt the process this time, and I wound up with 4,310 duplicate IDs (again with exactly 2 instances of each).

from flake-idgen.

Bramzor commented on August 25, 2024

@dominic-p You can calculate the number of unique ids you could calculate per second based on the counter bits which is 12 bits long. Not sure what the number is but I thought it should be safe at least up till 10.000 ids per second (that is per instance). Running 4 instances at the same time shouldn't matter as long as you change the datacenter/worker.
If you talk about instances, this is a process on the same machine right? Because if time is off between the instances, this might also cause duplicates.

Can you somehow limit the nr of IDs per second to a max of 10,000 (per instance) to see if that makes a difference?

from flake-idgen.

Bramzor commented on August 25, 2024

Just checked. The 12 bits counter can have 4096 values per ms so could (if evenly generated etc), generate 4M keys per second. Now this is if random generators would be perfect etc. I think it makes more sense to validate if in normal use conditions, you could have duplicates. I do not think normal use is anything above 100 ids per second per instance. So to be sure you could test with 1000 ids per second which should not easily trigger a duplicate.

But your 4% duplicates rate is not normal.

from flake-idgen.

dominic-p commented on August 25, 2024

Yeah, I think my use case is definitely not what this was made for. That's why I decided to go another route, but I still wanted to point out what I was seeing.

It sounds like the issue is just with my tests generating IDs way too quickly which is not a real world scenario.

Thanks for the insights.

from flake-idgen.

Bramzor commented on August 25, 2024

What you could do is randomizing datacenter and worker. In that case you get a 22 bits counter instead of 12 bits and maybe have enough counter to have almost no collisions in your test.
Normal usage of the generator would be to generate a number of ids without having a collision between different workers and with datacenter being a static number. Reason behind this is for example if you want to run 2 databases in 2 datacenters, you can check the id and know based on the generated id which datacenter you need to end up. So you can basically have 10 DCs spread over the globe to have low latency towards a user and have an easy method to know where you need to go. (In case you do not end up in the correct location directly using DNS geolocation anyway)

Long story short: If you will never use this feature, you can just use those 10 bits to lower the possibility of duplicates and still use it. Which makes it identical to for example https://github.com/leodutra/simpleflakes

from flake-idgen.

Bramzor commented on August 25, 2024

Just ran the exact same test as you, except that I did put a hardcoded instanceId in it and the results are:

Test 1: 1min 1instance
cat ids.txt | wc -l
2641900
sort ids.txt | uniq | wc -l
2641900
-> No duplicates

Test 2: 3min 2instances
cat ids.txt | wc -l
100000000
sort ids.txt | uniq | wc -l
100000000

Test 3: 1min 4 instances
cat ids.txt | wc -l
8733200
sort ids.txt | uniq | wc -l
8733200

So I think the problem is with your instanceId in your test as I did not hit any duplicates. Will let it run for a large number but if I set the same instanceId, this is the result on a short run:

Faulty Test 4 with same instanceId set on 4 instances:
cat ids.txt | wc -l
1650400
sort ids.txt | uniq | wc -l
490081

Instance id should be a number between 0 and 32.

from flake-idgen.

dominic-p commented on August 25, 2024

Interesting results. From the README:

id (10 bit) - generator identifier. It can have values from 0 to 1023. It can be provided instead of datacenter and worker identifiers.

So, do you think I'm doing something wrong where I assign the ID here?

from flake-idgen.

Bramzor commented on August 25, 2024

Is your process id between 0 and 1023?
Did not test it on windows, on mac that instanceId returned undefined so I changed it to a random value between 0 and 10
const instanceId = Math.floor(Math.random() * (10 - 0 + 1)) + 0; console.log('instanceId: ' + parseInt(instanceId, 10));

This way you can confirm that the instanceIds are not identical between the processes. Maybe parseInt does something strange with it.
The rest of the code is identical.

from flake-idgen.

dominic-p commented on August 25, 2024

Very interesting. I just ran the tests again. This time I ran each instance in a separate terminal (no funky Windows pipe thing). I also modified the script to print the instance ID to the console and each one printed correctly.

I wound up with a little over 2,000 duplicates. Strange.

I think that's about all the time I can put into debugging this at the moment, but I'll leave the issue open in case anyone else wants to look into it.

from flake-idgen.

Bramzor commented on August 25, 2024

Very strange. I think it could be a windows issue. Maybe I'll try to validate that later.

from flake-idgen.

gogomarine commented on August 25, 2024

Same here.

const FlakeId = require('flake-idgen');
const intformat = require('biguint-format');

const flakeOpts = { epoch: new Date(2019, 8, 21).getTime(), datacenter: 1, worker: 1};
const flake = new FlakeId(flakeOpts);

function nextId() {
    const sid = intformat(flake.next(), 'dec');
    return parseInt(sid, 10);
  }

const s = new Set();
Object.keys(Array(10000).fill(0)).forEach(n => {s.add(nextId())})

output:

s.size => 5348;

That's a lot duplicate, almost 50%

Envs:

Node - v10.16.3
OS - Mac os x Catalina

from flake-idgen.

prantlf commented on August 25, 2024

@gogomarine, you have a bug in your code. You must not convert the flake ID to an integer.

The flake ID needs the number precision of 64 bits, while JavaScript's Number is a 64-bit double with only 53 bits for the mantissa (IEEE 754). If you change the nextId your function to this, you'll see all generated IDs unique:

function nextId() {
  return intformat(flake.next(), 'dec');
}

My laptop is able to generate 90 IDs per millisecond using your script. It is "slow" enough to fit to the capacity of the 12-bit counter within the ID (4096 IDs per millisecond).

from flake-idgen.

prantlf commented on August 25, 2024

@dominic-p, I could not reproduce duplicates using your script. It generates the 100 IDs in the speed of 150 IDs per ms, then writes them out. It is still well below the threshold of 4096 IDs per ms. No duplicates were printed out at the end.

This is how I tested:

node nonce-gen.js 10 &
node nonce-gen.js 20 &
node nonce-gen.js 30 &
node nonce-gen.js 40 &
...wait for the finish...
cat ids-*.txt > ids.txt
uniq -d ids.txt

from flake-idgen.

T-PWK commented on August 25, 2024

@dominic-p are you still facing the issue?

I've run similar test to yours (details below) and I did not get any duplicate id. However, the test was run on MacOS and not a Windows system.

node nonce-gen.js 10 &
node nonce-gen.js 20 &
node nonce-gen.js 30 &
node nonce-gen.js 40 &
...wait for the finish...
cat ids-*.txt > ids.txt
sort ids.txt > ids-sorted.txt
uniq -d ids-sorted.txt

Identifiers generated with different generator id or datacenter & worker will never have conflicting identifiers. Also when you look at your nonce-gen.js script, batch of 100 generated ids is saved to a file. That usually takes longer than 1 ms so it is not very likely you will exceed a counter of 4096 ids in a millisecond.

Would you be able to provide several identifiers that are duplicated (either from a single file or different files) so that we can check where the duplication is coming from.

from flake-idgen.

dominic-p commented on August 25, 2024

Thanks for following up. As I mentioned before, I'm not using flake-idgen at the moment, so this issue really isn't affecting me.

That said, I did rerun the test today and wound up with a little under 500 duplicate IDs. I uploaded them to the gist here so you can take a look.

from flake-idgen.

T-PWK commented on August 25, 2024

@dominic-p, the issue you were facing was due to Node.js clock moving backwards. That's why you were getting duplicates. I managed to reproduce it on Windows server 2019.

Usually the clock moves backwards by only a few milliseconds. In average load that issue cannot be seen, however, on hight load like in your case that may happen sometimes.

Latest version of the library throws an error on next() when clock backwards move is detected. Generator will start returning correct identifiers once timestamp catches up the last timestamp (usually it is a few milliseconds).

So, error handling and delay would have to be added.

Unfortunately this is not an issue related to implementation but to an algorithm, which relies on the timestamp that never moves back. Any other implementation (in Java, C++ etc.) of that IDs generation algorithm will suffer the same problem and will have to be dealt with.

from flake-idgen.

dominic-p commented on August 25, 2024

Thanks for looking into it and the explanation. That is really helpful.

from flake-idgen.

responsible-adult commented on August 25, 2024

Very helpful info. I'm wondering is there a way to detect when this issue could be occurring or when not to trust the uniqueness?

Thanks!

from flake-idgen.

T-PWK commented on August 25, 2024

@robert-mendoza, with the latest version of [email protected] that issue is detected and error is thrown when using next(). When using next(cb) with callback function, the callback will wait until time has caught up and will return correct id (i.e. without a duplicate).

from flake-idgen.

responsible-adult commented on August 25, 2024

Perfect. Thanks,

from flake-idgen.

hatemjaber commented on August 25, 2024

@robert-mendoza, with the latest version of [email protected] that issue is detected and error is thrown when using next(). When using next(cb) with callback function, the callback will wait until time has caught up and will return correct id (i.e. without a duplicate).

I've tried many different ways to return the value from the callback and I'm not able to do that, can you provide an example? When logged it out to the console it works fine and I can see the result, but when I try to return the result I get undefined etc...

from flake-idgen.

Duplicate ids generated in torture test? about flake-idgen HOT 22 CLOSED

Comments (22)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent