Comments (14)
Hi,
It is already known that wyhash and wyrand fails 64-bit collision test with large number of items. See xxhash page and flyingmutant page.
The xxhash page concludes that wyhash is of 62-bit strength against collisions. wyrand should be similar, say about 62 bit strength.
The BigCrush tests are testing 32 bit PRNG, so wyrand did pass that.
In one word, we should claim only 62 bit strength.
from wyhash.
Note that testing of hash functions and PRNGs is very different. The meaning of "collision" is entirely different in the two settings.
I'm sorry but looking around I can find only tests about hash functions, not tests about the PRNG. Could you please give me a precise reference (a link) to a PRNG statistical collision test applied to WyRand?
BigCrush's parameters were designed 20 years ago. They are absolutely insufficient for any modern PRNG.
Also, I do not understand what you mean by 62-bit strength in the context of PRNGs. That the upper 62 bits have the right number of collisions? The lower 62 bits?
from wyhash.
wyrand passed BigCrush and Practrand from this repo: https://github.com/lemire/testingRNG
The 62 or 63 bits strength instead of ideal 64 bits strength may due to some mathematical limitation hidden in MUM function. It is not the head/tail bits but "steals one bits from all possible bits".
So the take away message is wyhash and wyrand still works well in practice with 62/63 bits claim.
from wyhash.
Yes, but I asked you a reference to a failed test because you said in your first answer that it is known that it fails collisions tests. PractRand performs no collision tests. BigCrush only performs very small-scale collision test. So these tests have nothing to do with the test failures I've sent you. It is perfectly normal that PractRand and BigCrush gave you no problems. But having a good PRNG goes beyond that. WyRand fails mid-size collisions tests.
You also did not explain what it means "62 bits" for a PRNG. That might have a meaning for hash function, but it does not have any meaning for a PRNG.
So the take away message is that WyRand fails a basic statistical test on about 2TB of data. It should not be used as a PRNG except when very low statistical quality is acceptable.
from wyhash.
flyingmutant did birthday test. It is here, but not openable in China: https://gist.github.com/flyingmutant/cb69e96872023f9f580868e746d1128a
xxhash discussed 62 bit problem here: Cyan4973/xxHash#236
from wyhash.
A 63 bit PRNG is not of low quality.
Suppose you want to generate random number in [0,N]. when N is less than 2^63, wyrand works perfectly for you.
It just not of 64 bits capacity, but 63 bit.
from wyhash.
You are clearly not understanding what I'm saying. But it is easy to show that your last claim is false.
- The same collision test on the lower 63 bits gives p=9.05405e-44 wyrand-coll-1-9223372036854775808-1-8000000000-200-0.txt.
- The same collision test using 2^63 cells, that is, using only the upper 63 bits, gives p=1.98245e-49
wyrand-coll-0-9223372036854775808-1-8000000000-200-0.txt.
So extracting numbers less than 2^63, either upper or lower, will give too many collisions still.
It's nothing dramaticβa 64-bit generator with 64 bits of states can either emit all possible values (like SplitMix64), and it will fail because there are no collisions, or it can decide to emit only a subset of all possible 64-bit integers (which someone might consider a problem, BTW), in which case it has a chance of passing a collision test of this kind.
Your generator however does not pass the test, even restricting to 63 bits. It's more than 11TiB of data, so it's a quite tough test. Users should be aware of the problem, though.
from wyhash.
Hi! I'm a competing developer and your reference to "collision tests" in PRNGs is interesting. What is meant by collision test in case of PRNG? What is the theoretical basis for measuring PRNGs for "collisions"? In general, PRNG tests are a bit of "statistical forgery" IMO, because no accepted "uniform randomness" theory exists.
from wyhash.
On the other hand, authors of xxhash have found out that wyhash has collision problems as is, so this may be related to your "collision test". Anyway, I'm interested to know theoretical basis for the test.
from wyhash.
If you're a competing developer I strongly suggest you read the parts about randomness in Knuth's "The Art of Computer Programming". Besides a theory of uniform randomness (the one you claim does not exist), the first practical test he suggests is the collision test. It is one of the most important tests for randomness. You can also read about the collision test in the "long guide" of the TestU01 framework for random testing.
from wyhash.
@vigna Please tell me the page numbers where the "theory" is provided. "The Art of Computer Programming" has many volumes and pages. Uniformity does assume absence of collisions, but I'm interested to know bounds of this test. You seem to treat the bounds rather freely.
from wyhash.
@vigna I've found this excerpt from Knuth's book: https://www.informit.com/articles/article.aspx?p=2221790 Specifically: "There are several ways to formulate decent abstract definitions of randomness". Which one is the prevailing "theory"? Knuth's own text only deals with "uniform probability" of drawing a specific number. But tests like PractRand are much more complex than that definition.
from wyhash.
@vigna For example, Knuth states "Thus, if we are choosing a million digits at random and if the first 999,999 of them happen to come out to be zero, the chance that the final digit is zero is..." This is a fundamentally flawed proposition, because PRNG can't produce a train of 999,999 zeroes - such sequence won't pass statistical tests. There are obvious bounds to uniform randomness exist. Knuth's source of mistake stems from basic linear "uniform probability of drawing a number" consideration.
from wyhash.
Well I guess you should write him and explain him his big mistake. π€·π»ββοΈ
from wyhash.
Related Issues (20)
- Version final 4 cannot be applied to data streams HOT 1
- v4 has even more bad seeds HOT 5
- Benchmark not measuring what you expect
- How do you use practrand? HOT 2
- License issue
- Using wyhash64 to mix numbers into prngs? HOT 5
- New release for wy_hash_final4?
- Streaming hash HOT 1
- `make_secret` but for strings or other data
- Link to absl's wyhash implement seems to be changed. HOT 1
- WyRand64 (bit reversed) fails PractRand at 32TB HOT 3
- Question about wymum HOT 1
- Full round for every multiple 48-bytes, including last one HOT 1
- Secret seeds and primes HOT 4
- sprp and is_prime should be static inline? HOT 1
- c-string optimized version?
- wy2u0k returns [1, k] instead of [0, k) when WYHASH_CONDOM=2
- wyhash election started HOT 1
- ζ―ε¦ε―θ½η»ε½ζ°ζ΄ε€ηη΅ HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wyhash.