I'm testing this version of WyRand: <div class="snippet-clipboard-content notransl

wyrand passed BigCrush and Practrand from this repo: <a href="https://github.com/lemir

WyRand fails 64-bit 1-dimensional collision tests about wyhash HOT 14 CLOSED

vigna commented on July 18, 2024

WyRand fails 64-bit 1-dimensional collision tests

from wyhash.

Comments (14)

wangyi-fudan commented on July 18, 2024

Hi,
It is already known that wyhash and wyrand fails 64-bit collision test with large number of items. See xxhash page and flyingmutant page.
The xxhash page concludes that wyhash is of 62-bit strength against collisions. wyrand should be similar, say about 62 bit strength.
The BigCrush tests are testing 32 bit PRNG, so wyrand did pass that.
In one word, we should claim only 62 bit strength.

from wyhash.

vigna commented on July 18, 2024

Note that testing of hash functions and PRNGs is very different. The meaning of "collision" is entirely different in the two settings.

I'm sorry but looking around I can find only tests about hash functions, not tests about the PRNG. Could you please give me a precise reference (a link) to a PRNG statistical collision test applied to WyRand?

BigCrush's parameters were designed 20 years ago. They are absolutely insufficient for any modern PRNG.

Also, I do not understand what you mean by 62-bit strength in the context of PRNGs. That the upper 62 bits have the right number of collisions? The lower 62 bits?

from wyhash.

wangyi-fudan commented on July 18, 2024

wyrand passed BigCrush and Practrand from this repo: https://github.com/lemire/testingRNG
The 62 or 63 bits strength instead of ideal 64 bits strength may due to some mathematical limitation hidden in MUM function. It is not the head/tail bits but "steals one bits from all possible bits".
So the take away message is wyhash and wyrand still works well in practice with 62/63 bits claim.

from wyhash.

vigna commented on July 18, 2024

Yes, but I asked you a reference to a failed test because you said in your first answer that it is known that it fails collisions tests. PractRand performs no collision tests. BigCrush only performs very small-scale collision test. So these tests have nothing to do with the test failures I've sent you. It is perfectly normal that PractRand and BigCrush gave you no problems. But having a good PRNG goes beyond that. WyRand fails mid-size collisions tests.

You also did not explain what it means "62 bits" for a PRNG. That might have a meaning for hash function, but it does not have any meaning for a PRNG.

So the take away message is that WyRand fails a basic statistical test on about 2TB of data. It should not be used as a PRNG except when very low statistical quality is acceptable.

from wyhash.

wangyi-fudan commented on July 18, 2024

flyingmutant did birthday test. It is here, but not openable in China: https://gist.github.com/flyingmutant/cb69e96872023f9f580868e746d1128a

xxhash discussed 62 bit problem here: Cyan4973/xxHash#236

from wyhash.

wangyi-fudan commented on July 18, 2024

A 63 bit PRNG is not of low quality.
Suppose you want to generate random number in [0,N]. when N is less than 2^63, wyrand works perfectly for you.
It just not of 64 bits capacity, but 63 bit.

from wyhash.

vigna commented on July 18, 2024

You are clearly not understanding what I'm saying. But it is easy to show that your last claim is false.

The same collision test on the lower 63 bits gives p=9.05405e-44 wyrand-coll-1-9223372036854775808-1-8000000000-200-0.txt.
The same collision test using 2^63 cells, that is, using only the upper 63 bits, gives p=1.98245e-49
wyrand-coll-0-9223372036854775808-1-8000000000-200-0.txt.

So extracting numbers less than 2^63, either upper or lower, will give too many collisions still.

It's nothing dramatic—a 64-bit generator with 64 bits of states can either emit all possible values (like SplitMix64), and it will fail because there are no collisions, or it can decide to emit only a subset of all possible 64-bit integers (which someone might consider a problem, BTW), in which case it has a chance of passing a collision test of this kind.

Your generator however does not pass the test, even restricting to 63 bits. It's more than 11TiB of data, so it's a quite tough test. Users should be aware of the problem, though.

from wyhash.

avaneev commented on July 18, 2024

Hi! I'm a competing developer and your reference to "collision tests" in PRNGs is interesting. What is meant by collision test in case of PRNG? What is the theoretical basis for measuring PRNGs for "collisions"? In general, PRNG tests are a bit of "statistical forgery" IMO, because no accepted "uniform randomness" theory exists.

from wyhash.

avaneev commented on July 18, 2024

On the other hand, authors of xxhash have found out that wyhash has collision problems as is, so this may be related to your "collision test". Anyway, I'm interested to know theoretical basis for the test.

from wyhash.

vigna commented on July 18, 2024

If you're a competing developer I strongly suggest you read the parts about randomness in Knuth's "The Art of Computer Programming". Besides a theory of uniform randomness (the one you claim does not exist), the first practical test he suggests is the collision test. It is one of the most important tests for randomness. You can also read about the collision test in the "long guide" of the TestU01 framework for random testing.

from wyhash.

avaneev commented on July 18, 2024

@vigna Please tell me the page numbers where the "theory" is provided. "The Art of Computer Programming" has many volumes and pages. Uniformity does assume absence of collisions, but I'm interested to know bounds of this test. You seem to treat the bounds rather freely.

from wyhash.

avaneev commented on July 18, 2024

@vigna I've found this excerpt from Knuth's book: https://www.informit.com/articles/article.aspx?p=2221790 Specifically: "There are several ways to formulate decent abstract definitions of randomness". Which one is the prevailing "theory"? Knuth's own text only deals with "uniform probability" of drawing a specific number. But tests like PractRand are much more complex than that definition.

from wyhash.

avaneev commented on July 18, 2024

@vigna For example, Knuth states "Thus, if we are choosing a million digits at random and if the first 999,999 of them happen to come out to be zero, the chance that the final digit is zero is..." This is a fundamentally flawed proposition, because PRNG can't produce a train of 999,999 zeroes - such sequence won't pass statistical tests. There are obvious bounds to uniform randomness exist. Knuth's source of mistake stems from basic linear "uniform probability of drawing a number" consideration.

from wyhash.

vigna commented on July 18, 2024

Well I guess you should write him and explain him his big mistake. 🤷🏻‍♂️

from wyhash.

WyRand fails 64-bit 1-dimensional collision tests about wyhash HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent