Giter Club home page Giter Club logo

Comments (22)

Bramzor avatar Bramzor commented on August 25, 2024

@T-PWK Any idea? Because if there are that many collisions, we might have an issue?
@dominic-p What was the rate of ids generated per second?

from flake-idgen.

dominic-p avatar dominic-p commented on August 25, 2024

@Bramzor I just re-ran the tests. When running 4 instances in parallel I observed around 100,000 ids per second being written to disk (a little over 25,000 per instance), but I'm not sure if I'm measuring disk IO or some other factor. This was done on an i7-5820K Windows 10 machine.

I did not interrupt the process this time, and I wound up with 4,310 duplicate IDs (again with exactly 2 instances of each).

from flake-idgen.

Bramzor avatar Bramzor commented on August 25, 2024

@dominic-p You can calculate the number of unique ids you could calculate per second based on the counter bits which is 12 bits long. Not sure what the number is but I thought it should be safe at least up till 10.000 ids per second (that is per instance). Running 4 instances at the same time shouldn't matter as long as you change the datacenter/worker.
If you talk about instances, this is a process on the same machine right? Because if time is off between the instances, this might also cause duplicates.

Can you somehow limit the nr of IDs per second to a max of 10,000 (per instance) to see if that makes a difference?

from flake-idgen.

Bramzor avatar Bramzor commented on August 25, 2024

Just checked. The 12 bits counter can have 4096 values per ms so could (if evenly generated etc), generate 4M keys per second. Now this is if random generators would be perfect etc. I think it makes more sense to validate if in normal use conditions, you could have duplicates. I do not think normal use is anything above 100 ids per second per instance. So to be sure you could test with 1000 ids per second which should not easily trigger a duplicate.

But your 4% duplicates rate is not normal.

from flake-idgen.

dominic-p avatar dominic-p commented on August 25, 2024

Yeah, I think my use case is definitely not what this was made for. That's why I decided to go another route, but I still wanted to point out what I was seeing.

It sounds like the issue is just with my tests generating IDs way too quickly which is not a real world scenario.

Thanks for the insights.

from flake-idgen.

Bramzor avatar Bramzor commented on August 25, 2024

What you could do is randomizing datacenter and worker. In that case you get a 22 bits counter instead of 12 bits and maybe have enough counter to have almost no collisions in your test.
Normal usage of the generator would be to generate a number of ids without having a collision between different workers and with datacenter being a static number. Reason behind this is for example if you want to run 2 databases in 2 datacenters, you can check the id and know based on the generated id which datacenter you need to end up. So you can basically have 10 DCs spread over the globe to have low latency towards a user and have an easy method to know where you need to go. (In case you do not end up in the correct location directly using DNS geolocation anyway)

Long story short: If you will never use this feature, you can just use those 10 bits to lower the possibility of duplicates and still use it. Which makes it identical to for example https://github.com/leodutra/simpleflakes

from flake-idgen.

Bramzor avatar Bramzor commented on August 25, 2024

Just ran the exact same test as you, except that I did put a hardcoded instanceId in it and the results are:

Test 1: 1min 1instance
cat ids.txt | wc -l
2641900
sort ids.txt | uniq | wc -l
2641900
-> No duplicates

Test 2: 3min 2instances
cat ids.txt | wc -l
100000000
sort ids.txt | uniq | wc -l
100000000

Test 3: 1min 4 instances
cat ids.txt | wc -l
8733200
sort ids.txt | uniq | wc -l
8733200

So I think the problem is with your instanceId in your test as I did not hit any duplicates. Will let it run for a large number but if I set the same instanceId, this is the result on a short run:

Faulty Test 4 with same instanceId set on 4 instances:
cat ids.txt | wc -l
1650400
sort ids.txt | uniq | wc -l
490081

Instance id should be a number between 0 and 32.

from flake-idgen.

dominic-p avatar dominic-p commented on August 25, 2024

Interesting results. From the README:

id (10 bit) - generator identifier. It can have values from 0 to 1023. It can be provided instead of datacenter and worker identifiers.

So, do you think I'm doing something wrong where I assign the ID here?

from flake-idgen.

Bramzor avatar Bramzor commented on August 25, 2024

Is your process id between 0 and 1023?
Did not test it on windows, on mac that instanceId returned undefined so I changed it to a random value between 0 and 10
const instanceId = Math.floor(Math.random() * (10 - 0 + 1)) + 0; console.log('instanceId: ' + parseInt(instanceId, 10));

This way you can confirm that the instanceIds are not identical between the processes. Maybe parseInt does something strange with it.
The rest of the code is identical.

from flake-idgen.

dominic-p avatar dominic-p commented on August 25, 2024

Very interesting. I just ran the tests again. This time I ran each instance in a separate terminal (no funky Windows pipe thing). I also modified the script to print the instance ID to the console and each one printed correctly.

I wound up with a little over 2,000 duplicates. Strange.

I think that's about all the time I can put into debugging this at the moment, but I'll leave the issue open in case anyone else wants to look into it.

from flake-idgen.

Bramzor avatar Bramzor commented on August 25, 2024

Very strange. I think it could be a windows issue. Maybe I'll try to validate that later.

from flake-idgen.

gogomarine avatar gogomarine commented on August 25, 2024

Same here.

const FlakeId = require('flake-idgen');
const intformat = require('biguint-format');

const flakeOpts = { epoch: new Date(2019, 8, 21).getTime(), datacenter: 1, worker: 1};
const flake = new FlakeId(flakeOpts);

function nextId() {
    const sid = intformat(flake.next(), 'dec');
    return parseInt(sid, 10);
  }

const s = new Set();
Object.keys(Array(10000).fill(0)).forEach(n => {s.add(nextId())})

output:

s.size => 5348;

That's a lot duplicate, almost 50%

Envs:

  • Node - v10.16.3
  • OS - Mac os x Catalina

from flake-idgen.

prantlf avatar prantlf commented on August 25, 2024

@gogomarine, you have a bug in your code. You must not convert the flake ID to an integer.

The flake ID needs the number precision of 64 bits, while JavaScript's Number is a 64-bit double with only 53 bits for the mantissa (IEEE 754). If you change the nextId your function to this, you'll see all generated IDs unique:

function nextId() {
  return intformat(flake.next(), 'dec');
}

My laptop is able to generate 90 IDs per millisecond using your script. It is "slow" enough to fit to the capacity of the 12-bit counter within the ID (4096 IDs per millisecond).

from flake-idgen.

prantlf avatar prantlf commented on August 25, 2024

@dominic-p, I could not reproduce duplicates using your script. It generates the 100 IDs in the speed of 150 IDs per ms, then writes them out. It is still well below the threshold of 4096 IDs per ms. No duplicates were printed out at the end.

This is how I tested:

node nonce-gen.js 10 &
node nonce-gen.js 20 &
node nonce-gen.js 30 &
node nonce-gen.js 40 &
...wait for the finish...
cat ids-*.txt > ids.txt
uniq -d ids.txt

from flake-idgen.

T-PWK avatar T-PWK commented on August 25, 2024

@dominic-p are you still facing the issue?

I've run similar test to yours (details below) and I did not get any duplicate id. However, the test was run on MacOS and not a Windows system.

node nonce-gen.js 10 &
node nonce-gen.js 20 &
node nonce-gen.js 30 &
node nonce-gen.js 40 &
...wait for the finish...
cat ids-*.txt > ids.txt
sort ids.txt > ids-sorted.txt
uniq -d ids-sorted.txt

Identifiers generated with different generator id or datacenter & worker will never have conflicting identifiers. Also when you look at your nonce-gen.js script, batch of 100 generated ids is saved to a file. That usually takes longer than 1 ms so it is not very likely you will exceed a counter of 4096 ids in a millisecond.

Would you be able to provide several identifiers that are duplicated (either from a single file or different files) so that we can check where the duplication is coming from.

from flake-idgen.

dominic-p avatar dominic-p commented on August 25, 2024

Thanks for following up. As I mentioned before, I'm not using flake-idgen at the moment, so this issue really isn't affecting me.

That said, I did rerun the test today and wound up with a little under 500 duplicate IDs. I uploaded them to the gist here so you can take a look.

from flake-idgen.

T-PWK avatar T-PWK commented on August 25, 2024

@dominic-p, the issue you were facing was due to Node.js clock moving backwards. That's why you were getting duplicates. I managed to reproduce it on Windows server 2019.

Usually the clock moves backwards by only a few milliseconds. In average load that issue cannot be seen, however, on hight load like in your case that may happen sometimes.

Latest version of the library throws an error on next() when clock backwards move is detected. Generator will start returning correct identifiers once timestamp catches up the last timestamp (usually it is a few milliseconds).

So, error handling and delay would have to be added.

Unfortunately this is not an issue related to implementation but to an algorithm, which relies on the timestamp that never moves back. Any other implementation (in Java, C++ etc.) of that IDs generation algorithm will suffer the same problem and will have to be dealt with.

from flake-idgen.

dominic-p avatar dominic-p commented on August 25, 2024

Thanks for looking into it and the explanation. That is really helpful.

from flake-idgen.

responsible-adult avatar responsible-adult commented on August 25, 2024

Very helpful info. I'm wondering is there a way to detect when this issue could be occurring or when not to trust the uniqueness?

Thanks!

from flake-idgen.

T-PWK avatar T-PWK commented on August 25, 2024

@robert-mendoza, with the latest version of [email protected] that issue is detected and error is thrown when using next(). When using next(cb) with callback function, the callback will wait until time has caught up and will return correct id (i.e. without a duplicate).

from flake-idgen.

responsible-adult avatar responsible-adult commented on August 25, 2024

Perfect. Thanks,

from flake-idgen.

hatemjaber avatar hatemjaber commented on August 25, 2024

@robert-mendoza, with the latest version of [email protected] that issue is detected and error is thrown when using next(). When using next(cb) with callback function, the callback will wait until time has caught up and will return correct id (i.e. without a duplicate).

I've tried many different ways to return the value from the callback and I'm not able to do that, can you provide an example? When logged it out to the console it works fine and I can see the result, but when I try to return the result I get undefined etc...

from flake-idgen.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.