Giter Club home page Giter Club logo

Comments (14)

wangyi-fudan avatar wangyi-fudan commented on July 18, 2024

Hi,
It is already known that wyhash and wyrand fails 64-bit collision test with large number of items. See xxhash page and flyingmutant page.
The xxhash page concludes that wyhash is of 62-bit strength against collisions. wyrand should be similar, say about 62 bit strength.
The BigCrush tests are testing 32 bit PRNG, so wyrand did pass that.
In one word, we should claim only 62 bit strength.

from wyhash.

vigna avatar vigna commented on July 18, 2024

Note that testing of hash functions and PRNGs is very different. The meaning of "collision" is entirely different in the two settings.

I'm sorry but looking around I can find only tests about hash functions, not tests about the PRNG. Could you please give me a precise reference (a link) to a PRNG statistical collision test applied to WyRand?

BigCrush's parameters were designed 20 years ago. They are absolutely insufficient for any modern PRNG.

Also, I do not understand what you mean by 62-bit strength in the context of PRNGs. That the upper 62 bits have the right number of collisions? The lower 62 bits?

from wyhash.

wangyi-fudan avatar wangyi-fudan commented on July 18, 2024

wyrand passed BigCrush and Practrand from this repo: https://github.com/lemire/testingRNG
The 62 or 63 bits strength instead of ideal 64 bits strength may due to some mathematical limitation hidden in MUM function. It is not the head/tail bits but "steals one bits from all possible bits".
So the take away message is wyhash and wyrand still works well in practice with 62/63 bits claim.

from wyhash.

vigna avatar vigna commented on July 18, 2024

Yes, but I asked you a reference to a failed test because you said in your first answer that it is known that it fails collisions tests. PractRand performs no collision tests. BigCrush only performs very small-scale collision test. So these tests have nothing to do with the test failures I've sent you. It is perfectly normal that PractRand and BigCrush gave you no problems. But having a good PRNG goes beyond that. WyRand fails mid-size collisions tests.

You also did not explain what it means "62 bits" for a PRNG. That might have a meaning for hash function, but it does not have any meaning for a PRNG.

So the take away message is that WyRand fails a basic statistical test on about 2TB of data. It should not be used as a PRNG except when very low statistical quality is acceptable.

from wyhash.

wangyi-fudan avatar wangyi-fudan commented on July 18, 2024

flyingmutant did birthday test. It is here, but not openable in China: https://gist.github.com/flyingmutant/cb69e96872023f9f580868e746d1128a

xxhash discussed 62 bit problem here: Cyan4973/xxHash#236

from wyhash.

wangyi-fudan avatar wangyi-fudan commented on July 18, 2024

A 63 bit PRNG is not of low quality.
Suppose you want to generate random number in [0,N]. when N is less than 2^63, wyrand works perfectly for you.
It just not of 64 bits capacity, but 63 bit.

from wyhash.

vigna avatar vigna commented on July 18, 2024

You are clearly not understanding what I'm saying. But it is easy to show that your last claim is false.

So extracting numbers less than 2^63, either upper or lower, will give too many collisions still.

It's nothing dramaticβ€”a 64-bit generator with 64 bits of states can either emit all possible values (like SplitMix64), and it will fail because there are no collisions, or it can decide to emit only a subset of all possible 64-bit integers (which someone might consider a problem, BTW), in which case it has a chance of passing a collision test of this kind.

Your generator however does not pass the test, even restricting to 63 bits. It's more than 11TiB of data, so it's a quite tough test. Users should be aware of the problem, though.

from wyhash.

avaneev avatar avaneev commented on July 18, 2024

Hi! I'm a competing developer and your reference to "collision tests" in PRNGs is interesting. What is meant by collision test in case of PRNG? What is the theoretical basis for measuring PRNGs for "collisions"? In general, PRNG tests are a bit of "statistical forgery" IMO, because no accepted "uniform randomness" theory exists.

from wyhash.

avaneev avatar avaneev commented on July 18, 2024

On the other hand, authors of xxhash have found out that wyhash has collision problems as is, so this may be related to your "collision test". Anyway, I'm interested to know theoretical basis for the test.

from wyhash.

vigna avatar vigna commented on July 18, 2024

If you're a competing developer I strongly suggest you read the parts about randomness in Knuth's "The Art of Computer Programming". Besides a theory of uniform randomness (the one you claim does not exist), the first practical test he suggests is the collision test. It is one of the most important tests for randomness. You can also read about the collision test in the "long guide" of the TestU01 framework for random testing.

from wyhash.

avaneev avatar avaneev commented on July 18, 2024

@vigna Please tell me the page numbers where the "theory" is provided. "The Art of Computer Programming" has many volumes and pages. Uniformity does assume absence of collisions, but I'm interested to know bounds of this test. You seem to treat the bounds rather freely.

from wyhash.

avaneev avatar avaneev commented on July 18, 2024

@vigna I've found this excerpt from Knuth's book: https://www.informit.com/articles/article.aspx?p=2221790 Specifically: "There are several ways to formulate decent abstract definitions of randomness". Which one is the prevailing "theory"? Knuth's own text only deals with "uniform probability" of drawing a specific number. But tests like PractRand are much more complex than that definition.

from wyhash.

avaneev avatar avaneev commented on July 18, 2024

@vigna For example, Knuth states "Thus, if we are choosing a million digits at random and if the first 999,999 of them happen to come out to be zero, the chance that the final digit is zero is..." This is a fundamentally flawed proposition, because PRNG can't produce a train of 999,999 zeroes - such sequence won't pass statistical tests. There are obvious bounds to uniform randomness exist. Knuth's source of mistake stems from basic linear "uniform probability of drawing a number" consideration.

from wyhash.

vigna avatar vigna commented on July 18, 2024

Well I guess you should write him and explain him his big mistake. πŸ€·πŸ»β€β™‚οΈ

from wyhash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.