vincenthz / hs-gauge Goto Github PK
View Code? Open in Web Editor NEWLean Haskell Benchmarking
License: BSD 2-Clause "Simplified" License
Lean Haskell Benchmarking
License: BSD 2-Clause "Simplified" License
@vincenthz In https://github.com/composewell/streaming-benchmarks I am using the yet unreleased quick mode csv dump feature for quickly generating graphs that otherwise take a lot of time. It may be useful for other users of these benchmarks as well. We can make a patch version release to make it available.
Background:
iters :: Maybe Int64
However it seems this options used to exist (initially introduced as --no-measurments here) in criterion and is also available in the bench cli tool but in a way that, if selected, produces no stat.
Is that the case as well in gauge? [no measure produced]
As a comparison hyperfine has a --max-runs
and also a --runs
but imposes a minimum of two runs to compute a stdev
Thanks in advance for your help on this
Currently there's only instance Monoid Outliers
. Without Semigroup
instance it's impossible to build gauge
using ghc-8.4.1
:
Line 109 in 69d302b
For example, we do not see iteration 24 in the following output:
compose/all-out-filters/streamly,22,0.366724004,804929524,0.366298,0.0,0.0,3547136,0,0,0,111,2111865480,2046,50080,0.358753,0.358753,3.284e-3,4.393824e-3
compose/all-out-filters/streamly,23,0.371523978,815465102,0.371458,0.0,0.0,3547136,0,0,0,81,2207859448,2139,52280,0.364126,0.364126,3.261e-3,4.404201e-3
compose/all-out-filters/streamly,25,0.410539736,901101540,0.410004,0.0,0.0,3547136,0,0,0,172,2399847288,2325,56680,0.401229,0.401229,3.807e-3,5.039309e-3
compose/all-out-filters/streamly,26,0.421200579,924501238,0.420976,0.0,0.0,3547136,0,0,0,111,2495841144,2418,58880,0.41244,0.41244,3.652e-3,4.978051e-3
for average it make sense to have a better precision than the one that can be measured at the system level, but for some other (min, max) it's not useful to report anything more than nanoseconds.
it would be more accurate to have calculation done on floating points at the 10 or 100 picoseconds level instead of relying on Double
doing the right thing related to rounding. Some calculation might still require Double
convertion, but some like (min, max... don't)
Otherwise it leads to a crash:
analysing with 1000 resamples
bootstrapping with 2 samples
benchmarks: ./Data/Vector/Generic.hs:245 ((!)): index out of bounds (-9223372036854775808,1000)
CallStack (from HasCallStack):
error, called at ./Data/Vector/Internal/Check.hs:87:5 in vector-0.12.0.1-JlawpRjIcMJIYPJVsWriIA:Data.Vector.Internal.Check
Benchmark benchmarks: ERROR
We can add more counters to the microbenchmarking from the linux perf_event
counters. The perf list
command shows the counters, some of them are:
List of pre-defined events (to be used in -e):
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
cache-references [Hardware event]
cache-misses [Hardware event]
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
ref-cycles [Hardware event]
cpu-clock [Software event]
task-clock [Software event]
page-faults OR faults [Software event]
context-switches OR cs [Software event]
cpu-migrations OR migrations [Software event]
minor-faults [Software event]
major-faults [Software event]
alignment-faults [Software event]
emulation-faults [Software event]
dummy [Software event]
...
I have used many of these in the past to analyse and improve the performance of a C/Haskell program. Especially instructions
, cache-missses
and branch-misses
. We can add some of the most useful ones to start with and then add more later. We are already using the perf_event
counters for rdtsc
on linux so it should be pretty easy to just add a few more counters.
PR #3 introduced --measure-with
option for isolated measurement. This option requires the user to specify the path of the gauge executable itself so that it can be invoked again for running benchmarks in isolation. This is inconvenient for the user, we can find this automatically in most cases if not all.
On Unices the exact path of the executable can be figured out as it was invoked and we can use that. On Windows that is not possible, only the name of the executable can be determined. However if the gauge executable is in PATH we can try to find an executable of that name in the PATH. If we still cannot find it then we can ask the user to use --measure-with
to specify the path.
For automatic determination we can use --isolate
command line switch and in the same vein the --measure-with
option can be renamed to --isolate-with
to keep the two options intuitively related.
I am seeing this quite often:
benchmarking constantSlowConsumer/asyncly ... Adaptive: Creating an optional valid value using the optional tag
CallStack (from HasCallStack):
error, called at ./Gauge/Optional.hs:58:25 in gauge-0.2.1-HPl6LVZXpCZ5TsFkdrilsb:Gauge.Optional
toOptional, called at ./Gauge/Measurement.hs:478:44 in gauge-0.2.1-HPl6LVZXpCZ5TsFkdrilsb:Gauge.Measurement
Here are the places in the stack trace:
471 applyRUStatistics end start m
472 | RUsage.supported = m { measUtime = Optional.toOptional $ diffTV RUsage.userCpuTime
473 , measStime = Optional.toOptional $ diffTV RUsage.systemCpuTime
474 , measMaxrss = Optional.toOptional $ RUsage.maxResidentSetSize end
475 , measMinflt = Optional.toOptional $ diff RUsage.minorFault
476 , measMajflt = Optional.toOptional $ diff RUsage.majorFault
477 , measNvcsw = Optional.toOptional $ diff RUsage.nVoluntaryContextSwitch
478 , measNivcsw = Optional.toOptional $ diff RUsage.nInvoluntaryContextSwitch
479 }
55 -- | Create an optional value from a-
56 toOptional :: (HasCallStack, OptionalTag a) => a -> Optional a
57 toOptional v
58 | isOptionalTag v = error "Creating an optional valid value using the optional tag"
59 | otherwise = Optional v
Just noticed the comment on #15. Cycles is broken for linux and osx, and presumably windows.
on linux the fddev is not properly initialized, leading to an unreported invalid handle, and then an unreported read error with a cycle value being the uninitialized variables.
on osx, missing <Rts.h>
means the CPP for i386 and x86_64 returns False, and thus using the backup function
I haven't checked but same thing on windows (since the lack of include issue)
For some odd reason, I cannot get benchmark results tables to show when using a modified Config
.
When I run with defaultMain [...]
it gives me the results table as it finishes each group.
However, when I customize the config with a fixed iteration count like this:
fixedIters :: Config
fixedIters = defaultConfig { iters = Just (6000 :: Int64) }
main = do
...
defaultMainWith fixedIters [queryRenderGroup, dbManipulationsGroup]
I don't get any benchmark results at all shown in the terminal, it just tells me it has finished benchmarking:
Benchmark benchmarks: FINISH
I wonder if this is just a problem on my setup or if I'm doing something wrong here? The modified config also has displayMode = StatsTable
. I'm running on WIN10.
Here's a repo to test it out with:
https://github.com/tuomohopia/squeal
Enter the squeal-postgresql
folder and run stack bench
.
I'd like a simple terminal comparison output a la the graphical output of Criterion. How about if I submit a PR with something like the below if you pass --bars
?
bench1 ██████░░░░░░ 41.65 ns
bench2 ████████████ 163.9 ns
bench3 ██░░░░░░░░░░ 21.25 ns
I built gauge
on MacBook Air with Apple M1 chip and it raised this
Line 55 in 496a8b9
error.
When I ran
❯ cabal install --lib gauge
and got this output:
Resolving dependencies...
Build profile: -w ghc-8.10.7 -O1
In order, the following will be built (use -v for more details):
- gauge-0.2.5 (lib) (requires build)
Starting gauge-0.2.5 (lib)
Building gauge-0.2.5 (lib)
Failed to build gauge-0.2.5.
Build log ( /Users/***/.cabal/logs/ghc-8.10.7/gg-0.2.5-a59d4712.log ):
Configuring library for gauge-0.2.5..
Preprocessing library for gauge-0.2.5..
Building library for gauge-0.2.5..
[ 1 of 43] Compiling Gauge.CSV ( Gauge/CSV.hs, dist/build/Gauge/CSV.o, dist/build/Gauge/CSV.dyn_o )
[ 2 of 43] Compiling Gauge.ListMap ( Gauge/ListMap.hs, dist/build/Gauge/ListMap.o, dist/build/Gauge/ListMap.dyn_o )
[ 3 of 43] Compiling Gauge.Optional ( Gauge/Optional.hs, dist/build/Gauge/Optional.o, dist/build/Gauge/Optional.dyn_o )
[ 4 of 43] Compiling Gauge.Source.Time ( dist/build/Gauge/Source/Time.hs, dist/build/Gauge/Source/Time.o, dist/build/Gauge/Source/Time.dyn_o )
[ 5 of 43] Compiling Gauge.Time ( Gauge/Time.hs, dist/build/Gauge/Time.o, dist/build/Gauge/Time.dyn_o )
[ 6 of 43] Compiling Gauge.Source.RUsage ( dist/build/Gauge/Source/RUsage.hs, dist/build/Gauge/Source/RUsage.o, dist/build/Gauge/Source/RUsage.dyn_o )
[ 7 of 43] Compiling Gauge.Source.GC ( Gauge/Source/GC.hs, dist/build/Gauge/Source/GC.o, dist/build/Gauge/Source/GC.dyn_o )
[ 8 of 43] Compiling Gauge.Measurement ( Gauge/Measurement.hs, dist/build/Gauge/Measurement.o, dist/build/Gauge/Measurement.dyn_o )
[ 9 of 43] Compiling Gauge.Format ( Gauge/Format.hs, dist/build/Gauge/Format.o, dist/build/Gauge/Format.dyn_o )
[10 of 43] Compiling Numeric.MathFunctions.Comparison ( math-functions/Numeric/MathFunctions/Comparison.hs, dist/build/Numeric/MathFunctions/Comparison.o, dist/build/Numeric/MathFunctions/Comparison.dyn_o )
[11 of 43] Compiling Numeric.MathFunctions.Constants ( math-functions/Numeric/MathFunctions/Constants.hs, dist/build/Numeric/MathFunctions/Constants.o, dist/build/Numeric/MathFunctions/Constants.dyn_o )
[12 of 43] Compiling Numeric.SpecFunctions.Internal ( math-functions/Numeric/SpecFunctions/Internal.hs, dist/build/Numeric/SpecFunctions/Internal.o, dist/build/Numeric/SpecFunctions/Internal.dyn_o )
[13 of 43] Compiling Numeric.SpecFunctions ( math-functions/Numeric/SpecFunctions.hs, dist/build/Numeric/SpecFunctions.o, dist/build/Numeric/SpecFunctions.dyn_o )
[14 of 43] Compiling Numeric.Sum ( math-functions/Numeric/Sum.hs, dist/build/Numeric/Sum.o, dist/build/Numeric/Sum.dyn_o )
[15 of 43] Compiling Paths_gauge ( dist/build/autogen/Paths_gauge.hs, dist/build/Paths_gauge.o, dist/build/Paths_gauge.dyn_o )
[16 of 43] Compiling Gauge.Main.Options ( Gauge/Main/Options.hs, dist/build/Gauge/Main/Options.o, dist/build/Gauge/Main/Options.dyn_o )
[17 of 43] Compiling Statistics.Distribution ( statistics/Statistics/Distribution.hs, dist/build/Statistics/Distribution.o, dist/build/Statistics/Distribution.dyn_o )
[18 of 43] Compiling Statistics.Function ( statistics/Statistics/Function.hs, dist/build/Statistics/Function.o, dist/build/Statistics/Function.dyn_o )
[19 of 43] Compiling Statistics.Internal ( statistics/Statistics/Internal.hs, dist/build/Statistics/Internal.o, dist/build/Statistics/Internal.dyn_o )
[20 of 43] Compiling Statistics.Distribution.Normal ( statistics/Statistics/Distribution/Normal.hs, dist/build/Statistics/Distribution/Normal.o, dist/build/Statistics/Distribution/Normal.dyn_o )
[21 of 43] Compiling Statistics.Math.RootFinding ( statistics/Statistics/Math/RootFinding.hs, dist/build/Statistics/Math/RootFinding.o, dist/build/Statistics/Math/RootFinding.dyn_o )
[22 of 43] Compiling Statistics.Matrix.Types ( statistics/Statistics/Matrix/Types.hs, dist/build/Statistics/Matrix/Types.o, dist/build/Statistics/Matrix/Types.dyn_o )
[23 of 43] Compiling Statistics.Matrix.Mutable ( statistics/Statistics/Matrix/Mutable.hs, dist/build/Statistics/Matrix/Mutable.o, dist/build/Statistics/Matrix/Mutable.dyn_o )
[24 of 43] Compiling Statistics.Quantile ( statistics/Statistics/Quantile.hs, dist/build/Statistics/Quantile.o, dist/build/Statistics/Quantile.dyn_o )
[25 of 43] Compiling Statistics.Sample.Histogram ( statistics/Statistics/Sample/Histogram.hs, dist/build/Statistics/Sample/Histogram.o, dist/build/Statistics/Sample/Histogram.dyn_o )
[26 of 43] Compiling Statistics.Sample.Internal ( statistics/Statistics/Sample/Internal.hs, dist/build/Statistics/Sample/Internal.o, dist/build/Statistics/Sample/Internal.dyn_o )
[27 of 43] Compiling Statistics.Sample ( statistics/Statistics/Sample.hs, dist/build/Statistics/Sample.o, dist/build/Statistics/Sample.dyn_o )
[28 of 43] Compiling Statistics.Matrix ( statistics/Statistics/Matrix.hs, dist/build/Statistics/Matrix.o, dist/build/Statistics/Matrix.dyn_o )
[29 of 43] Compiling Statistics.Matrix.Algorithms ( statistics/Statistics/Matrix/Algorithms.hs, dist/build/Statistics/Matrix/Algorithms.o, dist/build/Statistics/Matrix/Algorithms.dyn_o )
[30 of 43] Compiling Statistics.Transform ( statistics/Statistics/Transform.hs, dist/build/Statistics/Transform.o, dist/build/Statistics/Transform.dyn_o )
[31 of 43] Compiling Statistics.Sample.KernelDensity ( statistics/Statistics/Sample/KernelDensity.hs, dist/build/Statistics/Sample/KernelDensity.o, dist/build/Statistics/Sample/KernelDensity.dyn_o )
[32 of 43] Compiling Statistics.Types.Internal ( statistics/Statistics/Types/Internal.hs, dist/build/Statistics/Types/Internal.o, dist/build/Statistics/Types/Internal.dyn_o )
[33 of 43] Compiling Statistics.Types ( statistics/Statistics/Types.hs, dist/build/Statistics/Types.o, dist/build/Statistics/Types.dyn_o )
[34 of 43] Compiling System.Random.MWC ( mwc-random/System/Random/MWC.hs, dist/build/System/Random/MWC.o, dist/build/System/Random/MWC.dyn_o )
[35 of 43] Compiling Statistics.Resampling ( statistics/Statistics/Resampling.hs, dist/build/Statistics/Resampling.o, dist/build/Statistics/Resampling.dyn_o )
[36 of 43] Compiling Statistics.Resampling.Bootstrap ( statistics/Statistics/Resampling/Bootstrap.hs, dist/build/Statistics/Resampling/Bootstrap.o, dist/build/Statistics/Resampling/Bootstrap.dyn_o )
[37 of 43] Compiling Statistics.Regression ( statistics/Statistics/Regression.hs, dist/build/Statistics/Regression.o, dist/build/Statistics/Regression.dyn_o )
[38 of 43] Compiling Gauge.Monad ( Gauge/Monad.hs, dist/build/Gauge/Monad.o, dist/build/Gauge/Monad.dyn_o )
[39 of 43] Compiling Gauge.IO.Printf ( Gauge/IO/Printf.hs, dist/build/Gauge/IO/Printf.o, dist/build/Gauge/IO/Printf.dyn_o )
[40 of 43] Compiling Gauge.Benchmark ( Gauge/Benchmark.hs, dist/build/Gauge/Benchmark.o, dist/build/Gauge/Benchmark.dyn_o )
[41 of 43] Compiling Gauge.Analysis ( Gauge/Analysis.hs, dist/build/Gauge/Analysis.o, dist/build/Gauge/Analysis.dyn_o )
[42 of 43] Compiling Gauge.Main ( Gauge/Main.hs, dist/build/Gauge/Main.o, dist/build/Gauge/Main.dyn_o )
[43 of 43] Compiling Gauge ( Gauge.hs, dist/build/Gauge.o, dist/build/Gauge.dyn_o )
cbits/cycles.c:55:2: error:
error: Unsupported OS/architecture/compiler!
|
55 | #error Unsupported OS/architecture/compiler!
| ^
#error Unsupported OS/architecture/compiler!
^
1 error generated.
`gcc' failed in phase `C Compiler'. (Exit code: 1)
cabal: Failed to build gauge-0.2.5. See the build log above for details.
I specifically need the fix in #85, it is becoming very painful for me to always change to a local copy in stack.yaml before I perform benchmarking. There have been no breaking changes since the last release, though a few APIs (nfAppIO and friends) have been added. So a minor version bump may be enough.
The bootstrapBCA in the statistics package has a bug causing vector index out of bounds. See haskell/statistics#150 . Needs to fixed here
Copying the statistics package into gauge may not be a good idea from maintenance perspective.
It took me some time to find out that I can select benchmarks on the command line (my-benchmark-exe -m pattern foobar
)
I suggest to show some typical ways of calling an executable that uses defaultMain
, right at https://hackage.haskell.org/package/gauge-0.2.4/docs/Gauge-Main.html#v:defaultMain , and maybe even at the top page https://hackage.haskell.org/package/gauge-0.2.4 , and some text like
An executable that uses
defaultMain
is called from the command line asexecutable <options> <arguments>
where options are described at https://hackage.haskell.org/package/gauge/docs/src/Gauge.Main.Options.html#opts , and each argument is a benchmark name (or pattern)
Current documentation/landing page adresses people that want to switch from criterion
(and wastes a lot of space by listing libraries that are avoided) - but my use case is that I want to evaluate gauge
(or recommend it to my students) without previous knowledge of other frameworks.
(NB: I find criterion's defaultMain
equally underdocumented.)
@vincenthz we have been hitting the issue described in #92. Could you please upload the new version on hackage.
Big numbers are difficult to read without separators. What is the best way to format them? I found https://hackage.haskell.org/package/format-numbers on hackage for this purpose, which is a tiny package. Should we put a dependency for this or just copy over the code, or are there better options in base or something we already depend on? Or write our own?
Refer to this line of code. Reproduced below for easy reference:
benchIO name f n = bench name $ nfIO $ f n >>= return
If I remove the bind in this operation and just keep f n
, some of the streaming libraries (conduit for example) being benchmarked by streaming-benchmarks package show ridiculously low results (in nanoseconds for a million operations). To reproduce, just remove that bind in that line and run this benchmark with and without it:
./run.sh elimination/toList/conduit
Clearly there is something wrong with that. Perhaps this function just gets optimized out because the compiler thinks the value is not being used? Or something else?
I did not investigate it much, I am hoping that someone knows what's going on and can figure this out quickly.
Without knowing this the results will remain unpredictable and benchmarking is of little use unless it is predictable. If the bind is really required then we can perhaps do that inside a wrapper so that the results are always predictable for users.
instead of defaulting to max 0 ..
for different values it would a good idea to just throw away this call and retry the whole gathering
sanity: benchmarking fib/fib 10 ... FAIL
Exception: Creating an optional valid value using the optional tag
CallStack (from HasCallStack):
error, called at ./Gauge/Optional.hs:55:25 in gauge-0.2.0-1DreTMaUhTv9BugFF6fY9:Gauge.Optional
1 out of 1 tests failed (5.11s)
This happened on a slow machine, 32-bit Intel, Archlinux32.
I'm confused why the current bound on base
is >= 4.7
(indicating compatibility with GHC 7.8) when support for GHC 7.8 was dropped in 4adcd00.
Wouldn't it have been easier to just bump the base
bounds?
I've sadly lost my build logs but when I tried to build with lts-2.22
I ran into a bad import from math-functions
and missing imports of pure
.
Currently the cabal file claims that you're testing with GHC-7.8 so you should do probably that. :)
Nice package BTW. Looks very useful for benchmarking dependencies of criterion
! :)
After commit c52432c, cycles are not reported on mac os x. I see this in time-osx.c:
tr->rdtsc = 0;
Older code used this for mac (cycles.c):
#if x86_64_HOST_ARCH || i386_HOST_ARCH
StgWord64 gauge_rdtsc(void)
{
StgWord32 hi, lo;
__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
return ((StgWord64) lo) | (((StgWord64) hi)<<32);
}
@vincenthz is this deliberate or pending?
Each rounds of measurement in a benchmarks could optionally be dumped in CSV format
name-bench,clock-time-diff,cpu-time-diff,cycles-diff,gc-diff...
...
It should make measurement/analysis debugging easier, plus it could be loaded or transformed without re-running the measurement.
I haven't actually run Gauge yet, but I assume the output looks similar to Criterion. One thing that always trips me up is comparing timings with different units. How do 123 ns
, 234 μs
, and 345 ms
compare? It would be nice if Gauge always output the same unit (probably ns
).
to make sure we have safer bindings use h2cs for getrusage and all other foreign bindings/unmarsalling.
It seems the benchmark results are being printed on stderr. Shouldn't they be printed on stdout instead?
Ideally utime + stime = cpuTime
. On Linux this holds approximately (though I would like it to be more accurate there as well, I have seen significant aberration) but on Mac it doesn't hold at all. In fact on Mac I observed that stime is always 0. Sometimes even utime is zero even though cpuTime is a significant value, for example:
cpuTime 175.8 ms
utime 0.0 s
stime 0.0 s
@vincenthz I saw some commits from you fixing cycles stuff, while your L1 cache is hot, can you take a look what's going on here. It seems there is still something wrong with measuring time on Mac. I tested at commit 6164dc4 .
On Linux there is some irregularity, I know that we measure cpuTime
and utime
/stime
using different methods and at different points but even then I do not expect an aberration of milliseconds. I am seeing an aberration of upto 2-3 ms which is quite significant:
cpuTime 182.0 ms
utime 180.0 ms
stime 4.000 ms
...
cpuTime 166.6 ms
utime 156.0 ms
stime 8.000 ms
...
cpuTime 52.83 ms
utime 56.00 ms
stime 0.0 s
I also observed that utime
and stime
never have a non-zero fractional part, maybe the aberration is due to this loss of precision and that may be the reason why it is always up to 2 ms.
In most cases I have not found the statistical regression very useful. The result from a single sample with sufficient number of iterations is almost always good enough, I am yet to see an example where this is really useful. This fancy stuff takes more CPU and generates its own load impacting the benchmarks and taking more time. And in fact it is also doing wrong computations which has gone unnoticed for a long time perhaps because of the complexity. We can use this mode when we really want that kind of rigor or have reason to believe that it will be useful.
For day to day or minute to minute runs we can use a fast mode which keeps it simple and stupid, would do a bit of warmup, measure one sample quickly and give the results. We can use a --quick
option to enable this. This will save time and we can rely on the simplicity.
When generating results, it would be nice to provide a --save baseline
flag that would produce e.g.
bench1 ██████░░░░░░ 41.65 ns
bench2 ████████████ 163.9 ns
bench3 ██░░░░░░░░░░ 21.25 ns
Saved to: baseline
And write the benchmark results to a file in the current directory e.g. .gauge/saves/
.
And then you could re-run (like when you run a test suite again with --seed
) with
--compare baseline --save inline
and you would get:
bench1
baseline: ██████░░░░░░ 41.65 ns
inline: ████░░░░░░░░ 32.65 ns
bench2
baseline: ████████████ 163.9 ns
inline: █████████░░░ 113.9 ns
bench3
baseline: ██░░░░░░░░░░ 21.25 ns
inline: ██░░░░░░░░░░ 20.22 ns
Saved to: inline
This would be a great way to optimize things and experiment with ideas. Other tooling could also use it, e.g. Emacs could automate this. The --compare
could be passed many times to compare many benchmark results at once.
Edit: Changed issue title, somehow missed that HTML support isn't available.
So I was going to go through http://www.serpentine.com/criterion/tutorial.html to see if all the instructions there also work without modification for Gauge and discovered that the --output=file.html
doesn't appear to work (or I'm holding it wrong). The benchmark completes successfully but doesn't write an output file.
Fibber.hs
with the content of the code block under the Getting Started section but replace import Criterion.Main
with import Gauge
ghc -O --make Fibber
./Fibber --output=fibber.html
Checking the help options (./Fibber --help
) it appears that --output
is a valid option.
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.4.4
$ ./Fibber --help
Microbenchmark suite - built with gauge 0.2.4
<snip>
-o FILE --output=FILE File to write report to
<snip>
Given the amount of effort the original library went to computing kernel density estimates et al wouldn't the time field be better to put into the --short
version of the output rather than mean?
AfC
I reported this against criterion, reporting it here as well, I have tested it using gauge and the same problem persists as expected.
I have the following benchmarks in a group:
bgroup "map"
[ bench "machines" $ whnf drainM (M.mapping (+1))
, bench "streaming" $ whnf drainS (S.map (+1))
, bench "pipes" $ whnf drainP (P.map (+1))
, bench "conduit" $ whnf drainC (C.map (+1))
, bench "list-transformer" $ whnf drainL (lift . return . (+1))
]
The last two benchmarks take significantly more time when I run all these benchmarks in one go using stack bench --benchmark-arguments "-m glob ops/map/*"
.
$ stack bench --benchmark-arguments "-m glob ops/map/*"
benchmarking ops/map/machines
time 30.23 ms (29.22 ms .. 31.04 ms)
benchmarking ops/map/streaming
time 17.91 ms (17.48 ms .. 18.37 ms)
benchmarking ops/map/pipes
time 29.30 ms (28.12 ms .. 30.03 ms)
benchmarking ops/map/conduit
time 36.69 ms (35.73 ms .. 37.58 ms)
benchmarking ops/map/list-transformer
time 84.06 ms (75.02 ms .. 90.34 ms)
However when I run individual benchmarks the results are different:
$ stack bench --benchmark-arguments "-m glob ops/map/conduit"
benchmarking ops/map/conduit
time 31.64 ms (31.30 ms .. 31.86 ms)
$ stack bench --benchmark-arguments "-m glob ops/map/list-transformer"
benchmarking ops/map/list-transformer
time 68.67 ms (66.84 ms .. 70.96 ms)
To reproduce the issue just run those commands in this repo. The repo works with gauge.
I cannot figure what the problem is here. I tried using "env" to run the benchmarks and putting a "threadDelay" for a few seconds and a "performGC" in it but nothing helps.
I am now resorting to always running each benchmark individually in a separate process. Maybe we can have support for running each benchmark in a separate process in criterion itself to guarantee isolation of benchmarks, as I have seen this sort of problem too often. Now I am always skeptical of the results produced by criterion.
now that Basement.Terminal.ANSI
has colors, probably a good idea to add some simple (& hopefully tasteful) coloring to the output
I was expecting the default match to be exact and I got bitten by this. I had two benchmark names such that one was a prefix of the other and when running in isolated mode passing the benchmark name as argument to run that particular benchmark in isolation I always got the results for the longer name and therefore the results of the both the benchmarks were the same. Spent precious hours to debug what's going on.
In the verbose mode, the number of iterations is also printed as an average. This should not be an average, it should be an absolute number. It does not make sense to print the average of iterations.
benchmarked elimination/toNull/streamly
time 14.56 ms (13.28 ms .. 15.77 ms)
0.997 R² (0.990 R² .. 1.000 R²)
mean 14.66 ms (14.24 ms .. 15.06 ms)
std dev 571.9 μs (406.1 μs .. 712.1 μs)
variance introduced by outliers: 14% (moderately inflated)
iters 4 (1 .. 6)
time 14.66 ms (13.98 ms .. 15.34 ms)
Notice iters
is printed as 4 (1..6)
there are total 6 samples and we print 4 in the average field.
The relevant code to fix this is in analyseBenchmark
:
_ <- traverse
(\(k, (a, s, _)) -> reportStat Verbose a s k)
measureAccessors_
we should make an exception for the iter
accessor and print it differently.
It would be nice to have the ability to select specific counters a la perf stat -e
on Linux. For two reasons, one if we support HW PMC measurements then the processor allows limited counters at a time, two measurement of a counter may impact the value of others though I am not sure how important this is.
Gauge.Benchmark.whnf doc says "Apply an argument to a function, and evaluate the result to weak head normal form (WHNF)." What it does not say is whether the time needed for evaluating the argument is included in the measurement, or not.
I guess it is not, and I take the following experiment as confirmation:
Prelude Gauge.Main> benchmark $ whnf id (sum . enumFromTo 0 $ 1000)
benchmarking function ... took 16.17 s, total 60947101 iterations
function time 23.96 ns
Prelude Gauge.Main> benchmark $ whnf (sum . enumFromTo 0) 1000
benchmarking function ... took 6.913 s, total 20152 iterations
function time 47.58 μs
But what exactly is the semantics: the argument (value) simply is shared? So the cost of its computation is accounted for by the first function call? Or is the argument forced (to WHNF?) before that? (That would be better?)
It shouldn't really happend, but some people are seeing negative time value in reports:
e.g.
benchmarked xxx
time 136.9 ms (-340.9 ms .. 511.1 ms)
0.063 R² (0.000 R² .. 0.999 R²)
mean 1.010 s (425.0 ms .. 3.335 s)
std dev 1.833 s (12.24 ms .. 3.048 s)```
Each counter has the following properties:
measureAccessor
(keys) to retrieve the value from the Measured
record.It may be a good idea to have a Counter
typeclass to represent all these in a better way compared to the ad hoc way we use as of now. The MeasureDiff
class is better but it is not sufficient as it represents only the diff operation.
In addition to operations listed above we can also have measurement source attached to each counter and we can batch multiple counters having the same source together to measure them in one go and then retrieve from the common source. This can help bunch counters dynamically if we allow the users to select counters at the command line.
The Linux perf
tool has a nice readable format for output and we can adopt the same, at least for the quick mode it directly applies:
interceptor:~$ perf stat ls
Performance counter stats for 'ls':
1.265128 task-clock (msec) # 0.703 CPUs utilized
4 context-switches # 0.003 M/sec
0 cpu-migrations # 0.000 K/sec
97 page-faults # 0.077 M/sec
2,006,759 cycles # 1.586 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1,332,533 instructions # 0.66 insns per cycle
273,542 branches # 216.217 M/sec
<not counted> branch-misses
0.001799526 seconds time elapsed
I want to run my benchmarks as quickly as possible, just to check that they work.
So I'm using --time-limit=0
to get only a single sample and --min-duration=0
to get only a single iteration per sample.
Nonetheless gauge
reports multiple iterations:
benchmarking Issue #108/Text ... took 341.6 ms, total 56 iterations
benchmarked Issue #108/Text
time 6.434 ms (5.934 ms .. 6.876 ms)
0.994 R² (0.990 R² .. 1.000 R²)
mean 6.017 ms (5.917 ms .. 6.214 ms)
std dev 232.3 μs (131.1 μs .. 337.9 μs)
The header should print the name of the benchmark as well.
iters,time,cycles,cpuTime,utime,stime,maxrss,minflt,majflt,nvcsw,nivcsw,allocated,numGcs,bytesCopied,mutatorWallSeconds,mutatorCpuSeconds,gcWallSeconds,gcCpuSeconds
elimination/toNull/streamly,1,1.6136415e-2,35418132,1.6138e-2,0.0,0.0,3706880,0,0,0,3,95989920,93,3304,1.5653e-2,1.5653e-2,2.62e-4,3.24074e-4
Basement.Terminal
have access to utf8 codepage for windows through the initialize function
I cannot get my head around this:
stime = measure (measTime . rescale) .
G.filter ((>= threshold) . measTime) . G.map fixTime .
G.tail $ meas
fixTime m = m { measTime = measTime m - overhead / 2 }
Why are we fixing the measurement time using the overhead? The overhead is the amount of time measure
itself takes. How is that relevant? measTime
has no relation to that. Maybe I am missing something but it looks to me that overhead and its use here can just be removed.
@vincenthz Commit 3f03bc9 had fixed this problem but this seems to have been nullified by the merge of new getrusage
FFI changes.
It may be a good idea to use a command based CLI interface using a command for logical sets of tasks. For example, I can imagine the following commands (with their own options not shown here):
$ gauge list # List all available benchmarks
$ gauge evlist # List all available event counters
$ gauge stats # Run selected benchmarks and measure selected counter stats
$ gauge report # Read the raw perf data file and generate a report (text, html, ...)
$ gauge diff # Show a nice diff between two raw perf data files
Where possible we can also take cues from the Linux perf
tool.
Can this be released to hackage? I want to switch to this for benchmarking suites in most of my libraries (since I never need criterion's plots), but not having a hackage release is a showstopper.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.