Testing Problem Poor numeric domain coverage when using default ar

And I have to correct myself regarding scalacheck, as this <a href="https://stackoverf

Poor numeric domain coverage when using default arbitraries about jqwik HOT 7 CLOSED

jqwik-team commented on July 20, 2024

Poor numeric domain coverage when using default arbitraries

from jqwik.

Comments (7)

jlink commented on July 20, 2024 1

I chose a different path. Random number generation now uses the whole domain between allowed min and max values. However, the full domain is divided into several partitions so that lower numbers are generated with a higher probability. The cut off point is determined by Math.max(tries / 2, 10).

@rkraneis I tried with your original example and evenNumbersAreEvenAndSmall is now successfully falsified.

from jqwik.

jlink commented on July 20, 2024

Documenting the range in which random values are generated is surely necessary. I'll put that on top of my todo list.

There is one fundamental question, though, for which I haven't found a good answer yet: How do you determine a good domain range for numeric types? There are a two opposing forces:

Making the range small leads to higher coverage in the (supposedly more common) area of smaller numbers, but misses out on higher numbers.
Making the range large leads to lower coverage in smaller numbers.

The PBT tools whose implementation I've had a look at so far, do all cut off the range at some value. They just determine the value with different formulas. I haven't found a compelling rationale for any of these formulas yet.

Do you have a suggestion for a formula that's objectively better than (tries/2 - 3)? To be frank, I've already forgotten why added the "-3"...

What I think would be a real improvement, is to choose a different probability distribution. But that is complicated enough to think twice if the effort will be really worth it.

from jqwik.

rkraneis commented on July 20, 2024

Yes, I fully agree that this is not an easy topic (which is why I also proposed just documenting it). Naïvely I would think that a logarithmic coverage and explicitly including <number>.MIN_VALUE, -1, 0, 1 and <number>.MAX_VALUE might give better all purpose coverage. But you are right, the other (I looked at VavrTest, JunitQuickcheck, QuickTheories and scalacheck) frameworks use a linear distribution between <number>.MIN_VALUE and <number>.MAX_VALUE. Only scalacheck also includes -1, 0 and 1.
FWIW my motivation for a logarithmic coverage would be Benford's Law. But that might as well just be opinion ...

from jqwik.

jlink commented on July 20, 2024

FWIW: Integral numbers already include 0, 1, -1, MIN, MAX explicitly. Decimal/floating point numbers also a few more. You don't see them in your example b/c you filter them out.

A logarithmic coverage wouldn't have succeeded in falsifying your property either, would it? Depending on the log base that is...

from jqwik.

rkraneis commented on July 20, 2024

Good catch, I did not follow the code completely through :-). The drop off of the logarithmic distribution is actually already much too high (log2(Long.MAX_VALUE)=64 XD). A linear (uniform) distribution would have caught it, as shown in the initial question.

from jqwik.

rkraneis commented on July 20, 2024

And I have to correct myself regarding scalacheck, as this actually seems to be biased towards the lower end when given explicit bounds.

from jqwik.

jlink commented on July 20, 2024

Change available in version 0.8.5-SNAPSHOT

from jqwik.

Poor numeric domain coverage when using default arbitraries about jqwik HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent