Giter Club home page Giter Club logo

package-benchmark's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

package-benchmark's Issues

Measure benchmark overhead on both Linux and macOS and fix any bottlenecks if excessive

Should validate what the overhead of the time not spent in the actual benchmark code is (capturing OS, malloc statistics, sampling) and see what the magnitude is - and optimise if needed - there are some cheap wins (especially on Linux, where we could cache some open files etc) that could be done if needed. Need to put in probes and measure overhead as first step (i.e. total wall clock runtime vs. measured wall clock runtime delta).

Remove all usage of fflush

Instead:

// Use unbuffered stdout to help detect exactly which test was running in the event of a crash.
setbuf(stdout, nil)

Simplify boilerplate and get rid of dynamicReplacement if possible

As suggested, maybe this can help us kill @dynamicReplacement:

Have you tried exploring calling the same methods as Argument Parser does but just from your library, e.g. you could provide a protocol called Benchmark that has a main method that looks like this

  public static func main() async {
    // Setup everything that benchmark needs
    do {
      var command = try parseAsRoot()
      if var asyncCommand = command as? AsyncParsableCommand {
        try await asyncCommand.run()
      } else {
        try command.run()
      }
    } catch {
      exit(withError: error)
    }

  // Record benchmark results etc.
  }

Since all methods are public in Argument Parser you should be able to replicate anything the package is doing in your own main method. Maybe I am missing something here though!

OS and malloc capturing should only be enabled when requested

Currently we capture OS and malloc stats for any benchmark run, we should avoid the overhead of doing that if we have a benchmark run that doesn't request those stats. Especially important for malloc stats that turned out to have the most significant overhead.

consistency of CPU time units

Exploring benchmarks, I was exploring setting
Benchmark.defaultConfiguration.timeUnits = .microseconds. It does exactly what it says for the metric Time (wall clock), but doesn't appear to do anything with Time (system CPU) (μs), Time (total CPU) (μs), or Time (user CPU) (μs). And in some cases, I'm seeing the Time (system CPU) come back in ns regardless of what I've chosen.

I suspect it'll be a little easier to track if you can either nail is down explicitly across the board (and sometimes forcing a "round to 0" result), or just letting them float with whatever comes back from the metric gathering system and floating with the relevant range.

feature request: text-based output without all the pretty around it

I was applying package-benchmark to a comparison effort between a number of different existing libraries and external packages, so compare speeds of how they did their work. When I was dumping the output, I found that I wanted to sort and view the results as an ordered list, and then wanted to go grab and knock together some simple graphs for the comparison between the different libraries. The TextTable output, which looks so great, was completely in the way, and I ended up having to do quite a lot of text editing to grab the data out into a spreadsheet, which I then used to re-order things, make charts, etc.

(I also found that TextTable was clipping some names, as I was getting pretty descriptive there)

The request is for a text based output that applies a bit more directly in this direction, or even a plain .csv output - something that can be easily copy & pasted, or a .csv output that you could open with spreadsheet-of-your-choice kind of setup.

I did look at the tsv format output, but didn't quite know what it was an how to interpret it - and it wasn't in any sort of obvious columnar form - so I don't think that applies.

Add support for absolute thresholds

Currently relative (delta) and absolute (delta) is supported, should be able to use thresholds as relative (absolute) and absolute (absolute) optionally as an addition.

Absolute thresholds should be scaled

Currently absolute threshold compare with the current unit of measurements, e.g. a p25 = 10 for resident memory size would mean 10MB if resident memory size is measured in M, but 10KB if the current scale is K. It should be changed such that the absolute threshold is in the unscaled unit (bytes in this case).

Add new command for generating executableTarget and source boilerplate for new benchmarks

Something like

swift package benchmark init MyBenchmark

  1. creates the Benchmarks/MyBenchmark directory

  2. creates the Benchmarks/MyBenchmark/MyBenchmark.swift source file with the boilerplate:

import Benchmark
import Foundation

let benchmarks = {
    Benchmark.defaultConfiguration = .init(scalingFactor: .kilo)

    Benchmark("SomeBenchmark") { benchmark in
        for _ in benchmark.scaledIterations {
            blackHole(Date()) // replace this line with your own benchmark
        }
    }
}
  1. generates this to standard out (preferably so we can pipe it to pbcopy) :
        // MyBenchmark benchmark target
        .executableTarget(
            name: "MyBenchmark",
            dependencies: [
                .product(name: "Benchmark", package: "package-benchmark"),
                .product(name: "BenchmarkPlugin", package: "package-benchmark"),
            ],
            path: "Benchmarks/MyBenchmark"
        ),

Which can then just be copy-pasted into Package.swift and we're ready to go

Cleanup benchmark.throughputScalingFactor

Would be nice with a function on the enum that returns a range for easier iterations instead of the typical

for _ in 0..<benchmark.throughputScalingFactor.rawValue

no benchmark output from `swift package benchmark`

I'm sorry to be popping open a bunch of issues here - if there's a better place to ask for guidance, I'm totally game.

I tried adding a simple benchmark setup to a library I'm working on, and adding it as an external package that imports mine with a local reference. That all appears to work correctly, swift package resolve is happy, swift build works, and swift package benchmark appears to work as well - except that there's no benchmark data visible.

The output I get from the command is:

Building for debugging...
Build complete! (0.35s)
Building targets in release mode for benchmark run...
Build complete! Running benchmarks...

If I run swift package benchmark list, it's the exact same output.

(this is with package-benchmark 0.8.0, and swift 5.8 (installed from Xcode 14.3 beta) on macOS with an M1 processor)
swift -version:

swift-driver version: 1.75.1 Apple Swift version 5.8 (swiftlang-5.8.0.117.11 clang-1403.0.22.8.60)
Target: arm64-apple-macosx13.0

The source of what I've done is public if you're willing to take a look, or I'd be happy to have any guidance on how to debug what's happening. I suspect I may be using the plugin in a manner that was slightly unexpected.

The work in progress can be seen for a quick view of how I added things at https://github.com/heckj/CRDT/pull/34/files, and it should be very easy to reproduce with these steps:

git clone https://github.com/heckj/CRDT -b benchmark2
cd CRDT/ExternalBenchmarks
swift package resolve
swift package benchmark

Question: export option for full fidelity Histogram?

I was looking through the export options, spotted the JMH and .tsv exports, but one of the things I wanted to explore was taking the full-fidelity Histogram from package-histogram, loading the file back in using Histogram's Codable conformance, and exploring some local visualization with it (SwiftUI Charts, etc).

I'm not familiar with .tsv (other than an assumption is stands for "time series values") and wanted to ask - is that a more sane path for reloading into my own Histogram instance, or would it be reasonable to extend the exports to drop out a JSON from Histogram's codable? I'm reading and learning on package-histogram, but not 100% on all the pieces and parts, and what's legit for reproducing it by reading in a file.

I'd presumed, but not yet traced, that dumping a ton of values into even a HdrHistogram was fundamentally lossy - and that it didn't store the entirety of all the values submitted to it. I'm happy to do the PR to enable this thing I want, figured I'd best ask first - maybe there's a path that's easily there I'm overlooking.

Add support for listing stored baselines

E.g. swift package benchmark list baselines.

Should list baseline name, host machine, cpus, memory and timestamp for latest update - pick up from FS / .json as needed.

Use FilePath.DirectoryView for iteration to avoid pulling in Foundation.

`swift test` failing locally due to (FB12061292)

Likely a question more than a bug report. I checked out the repo (main branch), installed jemalloc per the prerequisites (brew install jemalloc). After that swift build worked fine, but swift test in the repository failed for me locally:

Building for debugging...
[10/10] Linking BenchmarkPackageTests
Build complete! (7.69s)
error: Exited with signal code 11

When I run the tests from within Xcode (currently using the 14.3 beta), the tests trap on jemalloc - je_free_default in the stack trace.
Screenshot 2023-02-18 at 11 35 22 AM

Is there something additional I should be doing re: jemalloc?

I'm not familiar with using custom allocators and what the requirements are around that, so I suspect I'm missing something in my setup to support running swift test without issue.

Reduce dependencies

We can integrate a few of the dependencies into the project (e.g. BenchmarkClock isn't used anywhere else).

naming for iterations and/or duration

Working through writing a few sample benchmarks, I was exploring Benchmark.defaultConfiguration.desiredIterations, Benchmark.defaultConfiguration.desiredDuration, and how they relate to each other.

I do think the names could be improved by renaming them to maxIterations and maxDuration, respectively. I'm also thinking that as we cobble the documentation, expanding on Benchmark.Configuration (and maybe the article WritingBenchmarks.md), it would be worth calling out specifically that the running will go until the first of these two are hit.

And just to double check - if you specify warmupIterations on the configuration, does that take place before either the iterations vs. duration markers are measured?

Add convenience blackHole static func on Benchmark type

As we've got a module and type that has the same name, we can get problems disambiguating:

xxx.swift:119:23: error: type 'Benchmark' has no member 'blackHole'
            Benchmark.blackHole(transaction.getRetainedData())

The type can be disambiguated using the little-known import (class|struct|func|protocol|enum) Module.Symbol syntax.

import func Benchmark.blackHole

document or add static func on Benchmark.

chore: maxDuration should affect wall clock time of test run time, not of benchmark execution

Currently, maxDuration specifies the max wallClock time of the test under measure, excluding benchmark measurement overhead. This leads to the non-intuitive situation where a very fast test, where the benchmark measurement overhead is larger than the actual benchmark runtime, will lead to a true wall clock that can be significantly longer than expected.

We should let maxDuration control read world wall clock including benchmark overhead instead, as the main reason for controlling the maxDuration is that you want to have a known runtime of a test - the current implementation does not give that.

CLI exploration of the `swift package benchmark` command doesn't exist

I found the commands to invoke benchmarks in the online docs, but I was hoping to be able to invoke something like: swift package benchmark --help to get a list of the commands and how to use them, or perhaps swift package plugin benchmark -help.

Getting to CLI arguments exposed is what I was after - the DocC compiler plugin exposes some of its help in this fashion. Is that something possible here?

Pick up benchmark target names from FS

Currently there's a requirement that the executable target has a Benchmark suffix for discovery. Let's change the heuristics to instead pick up the targets that have source paths in Benchmarks/ without considering their names.

bug: fatal error while running default compare

Working on the main branch (commit: 7d6f5f9), and running through the documentation and trying out the various examples to double-check them, I found that a default baseline compare was failing:

reproduction:

swift package --allow-writing-to-package-directory benchmark baseline update
swift package benchmark baseline compare

Output of failure:

Building for debugging...
Build complete! (0.21s)
Building benchmark targets in release mode for benchmark run...
Building HistogramBenchmark
Building BenchmarkDateTime
Building Basic
Build complete!
Swift/RangeReplaceableCollection.swift:870: Fatal error: Can't remove last element from an empty collection
error: plugin process ended by an uncaught signal: 5 <command: /usr/bin/sandbox-exec -p '(version 1)
(deny default)
(import "system.sb")
(allow file-read*)
(allow process*)
(allow file-write*
    (subpath "/private/tmp")
    (subpath "/private/var/folders/8t/k6nw7pyx2qq77g8qq_g429080000gn/T")
)
(deny file-write*
    (subpath "/Users/heckj/src/package-benchmark")
)
(allow file-write*
    (subpath "/Users/heckj/src/package-benchmark/.build/plugins/Benchmark-Plugin/outputs")
    (subpath "/Users/heckj/src/package-benchmark/.build/plugins/Benchmark-Plugin/cache")
)
' /Users/heckj/src/package-benchmark/.build/plugins/Benchmark-Plugin/cache/Benchmark_Plugin>, <output:
'Building benchmark targets in release mode for benchmark run...
Building HistogramBenchmark
Building BenchmarkDateTime
Building Basic
Build complete!
Swift/RangeReplaceableCollection.swift:870: Fatal error: Can'\''t remove last element from an empty collection
'>

Workaround: if you name the baseline, there's no issue - so running swift package benchmark baseline compare default works.

Consistent support for scaling

We should rename throughputScalingFactor -> scalingFactor and add support for both scaled and unscaled output of benchmark metrics. This allows one to get e.g. the number of mallocs per actual invocation of some code under measurement, or the actual time spent in user cpu time on an actual invocation, and not just the throughput.

Support running benchmarks with isolation

Currently all benchmarks in the same suite is run in the same process context to avoid process start/stop overhead and run the benchmarks faster - this is fundamentally fine for most use cases (cpu, malloc count, context switches etc), but fails for real/VM memory counters.

We should automatically run any memory size related benchmarks in isolation to get more usable numbers.

bug: tab separated values output may be broken

On the main branch, I ran through the various formats and exported them to verify all the formats worked as expected. The tsv case looks like it might well be "broken", although I don't have historical data to compare easily.

When I looked through the files written, there was only ever a single value in a long series down the list. For example, the exported file default.HistogramBenchmark.Mean.Syscalls_(total).tsv (attached for convenience) looks almost meaningless in it's output with the variety of numbers, and flipping through the metrics makes me suspect it's iterating incorrectly.

If you flip through the various "Mean Syscalls" files in the attached zip, it looks like they build on each other - which doesn't seem correct.

Mean.Syscalls.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.