google / tachometer Goto Github PK

Statistically rigorous benchmark runner for the web

License: BSD 3-Clause "New" or "Revised" License

JavaScript 0.09% TypeScript 98.19% HTML 1.72%

tachometer's Introduction

tachometer

tachometer is a tool for running benchmarks in web browsers. It uses repeated sampling and statistics to reliably identify even tiny differences in runtime.

Install | Usage | Why? | Example | Features | Sampling | Measurement modes | Interpreting results | Swap NPM dependencies | JavaScript module imports | Browsers | Performance traces | Remote control | Config file | CLI usage

Install

npm i tachometer

Usage

npx tachometer bench1.html [bench2.html ...]

Why?

Even if you run the same JavaScript, on the same browser, on the same machine, on the same day, you'll still get a different result every time. But if you take enough repeated samples and apply the right statistics, you can reliably identify even tiny differences in runtime.

Example

Let's test two approaches for adding elements to a page. First create two HTML files:

inner.html

<script type="module">
  import * as bench from '/bench.js';
  bench.start();
  for (let i = 0; i < 100; i++) {
    document.body.innerHTML += '<button></button>';
  }
  bench.stop();
</script>

append.html

<script type="module">
  import * as bench from '/bench.js';
  bench.start();
  for (let i = 0; i < 100; i++) {
    document.body.append(document.createElement('button'));
  }
  bench.stop();
</script>

Now run tachometer:

npx tachometer append.html inner.html

Tachometer opens Chrome and loads each HTML file, measuring the time between bench.start() and bench.stop(). It round-robins between the two files, running each at least 50 times.

[==============================================------------] 79/100 chrome append.html

After a few seconds, the results are ready:

┌─────────────┬─────────────────┬─────────────────┬─────────────────┐
│ Benchmark   │        Avg time │   vs inner.html │  vs append.html │
├─────────────┼─────────────────┼─────────────────┼─────────────────┤
│ inner.html  │ 7.23ms - 8.54ms │                 │          slower │
│             │                 │        -        │    851% - 1091% │
│             │                 │                 │ 6.49ms - 7.80ms │
├─────────────┼─────────────────┼─────────────────┼─────────────────┤
│ append.html │ 0.68ms - 0.79ms │          faster │                 │
│             │                 │       90% - 92% │        -        │
│             │                 │ 6.49ms - 7.80ms │                 │
└─────────────┴─────────────────┴─────────────────┴─────────────────┘

This tells us that using the document.body.append approach instead of the innerHTML approach would be between 90% and 92% faster on average. The ranges tachometer reports are 95% confidence intervals for the percent change from one benchmark to another. See Interpreting results for more information.

Features

Measure your own specific timings with the /bench.js module, by setting the window.tachometerResult global (or by polling an arbitrary JS expression), or measure First Contentful Paint on any local or remote URL.
Compare benchmarks by round-robin between two or more files, URLs, URL query string parameters, or browsers, to measure which is faster or slower, and by how much, with statistical significance.
Swap dependency versions of any NPM package you depend on, to compare published versions, remote GitHub branches, or local git repos.
Automatically sample until we have enough precision to answer the question you are asking.
Remote control browsers running on different machines using remote WebDriver.

Sampling

Minimum sample size

By default, a minimum of 50 samples are taken from each benchmark. You can change the minimum sample size with the --sample-size flag or the sampleSize JSON config option.

Auto sample

After the initial 50 samples, tachometer will continue taking samples until there is a clear statistically significant difference between all benchmarks, for up to 3 minutes.

You can change this duration with the --timeout flag or the timeout JSON config option, measured in minutes. Set --timeout=0 to disable auto sampling entirely. Set --timeout=60 to sample for up to an hour.

Auto sample conditions

You can also configure which statistical conditions tachometer should check for when deciding when to stop auto sampling by configuring auto sample conditions.

To set auto sample conditions from the command-line, use the --auto-sample-conditions flag with a comma-delimited list:

--auto-sample-conditions=0%,10%

To set auto sample conditions from a JSON config file, use the autoSampleConditions property with an array of strings (including if there is only one condition):

{
  "autoSampleConditions": ["0%", "10%"]
}

An auto sample condition can be thought of as a point of interest on the number-line of either absolute milliseconds, or relative percent change. By setting a condition, you are asking tachometer to try to shrink the confidence interval until it is unambiguously placed on one side or the other of that condition.

Example condition	Question
`0%`	Is A faster or slower than B at all? (The default)
`10%`	Is A faster or slower than B by at least 10%?
`+10%`	Is A slower than B by at least 10%?
`-10%`	Is A faster than B by at least 10%?
`-10%`, `+10%`	(Same as `10%`)
`0%`, `10%`, `100%`	Is A at all, a little, or a lot slower or faster than B?
`0.5ms`	Is A faster or slower than B by at least 0.5 milliseconds?

In the following example, we have set --auto-sample-conditions=10%, meaning we are interested in knowing whether A differs from B by at least 10% in either direction. The sample size automatically increases until the confidence interval is narrow enough to place the estimated difference squarely on one side or the other of both conditions.

      <------------------------------->     n=50  X -10% X +10%
                <------------------>        n=100 ✔️ -10% X +10%
                    <----->                 n=200 ✔️ -10% ✔️ +10%

  |---------|---------|---------|---------| difference in runtime
-20%      -10%        0       +10%      +20%

n    = sample size
<--> = confidence interval for percent difference of mean runtimes
✔️    = resolved condition
X    = unresolved condition

In this example, by n=50 we are not sure whether A is faster or slower than B by more than 10%. By n=100 we have ruled out that B is faster than A by more than 10%, but we're still not sure if it's slower by more than 10%. By n=200 we have also ruled out that B is slower than A by more than 10%, so we stop sampling. Note that we still don't know which is absolutely faster, we just know that whatever the difference is, it is neither faster nor slower than 10% (and if we did want to know, we could add 0 to our conditions).

Note that, if the actual difference is very close to a condition, then it is likely that the condition will never be met, and the timeout will expire.

Measurement modes

Tachometer supports four modes of time interval measurements, controlled with the measurement config file property, or the --measure flag.

If measurement is an array, then all of the given measurements will be retrieved from each page load. Each measurement from a page is treated as its own benchmark.

A measurement can specify a name property that will be used to display its results.

Performance API

Retrieve a measure, mark, or paint timing from the performance.getEntriesByName API. Note this mode can only be used with a config file.

For example, in your benchmark:

performance.mark('foo-start');
// Do some work ...
performance.mark('foo-stop');
performance.measure('foo', 'foo-start', 'foo-stop');

And in your config file:

"benchmarks": [
  {
    "measurement": {
      "mode": "performance",
      "entryName": "foo"
    }
  }
]

The following performance entry types are supported:

measure: Retrieve the duration of a user-defined interval between two marks. Use for measuring the timing of a specific chunk of your code.
mark: Retrieve the startTime of a user-defined instant. Use for measuring the time between initial page navigation and a specific point in your code.
paint: Retrieve the startTime of a built-in paint measurement (e.g. first-contentful-paint).

Callback

By default with local (non-URL) benchmarks, or when the --measure flag is set to callback, your page is responsible for calling the start() and stop() functions from the /bench.js module. This mode is appropriate for micro benchmarks, or any other kind of situation where you want full control over the beginning and end times.

Global result

When the --measure flag is set to global, then you can assign an arbitrary millisecond result to the window.tachometerResult global. In this mode, tachometer will poll until it finds a result assigned here.

const start = performance.now();
for (const i = 0; i < 1000; i++) {}
window.tachometerResult = performance.now() - start;

This mode is appropriate when you need full control of the measured time, or when you can't use callback mode because you are not using tachometer's built-in server.

Alternatively, to poll an arbitrary JS expression in global measurement mode (rather than window.tachometerResult), set --measurement-expression to the JS expression to poll. This option is useful for scenarios where you cannot easily modify the code under test to assign to window.tachometerResult but are otherwise able to extract a measurement from the page using JavaScript.

First Contentful Paint (FCP)

When the --measure flag is set to fcp, or when the benchmark is an external URL, then the First Contentful Paint (FCP) time will be automatically extracted from your page using the Performance Timeline API. This interval begins at initial navigation, and ends when the browser first renders any DOM content. Currently, only Chrome supports the first-contentful-paint performance timeline entry. In this mode, calling the start() and stop() functions is not required, and has no effect.

Interpreting results

Average runtime

The first column of output is the average runtime of the benchmark. This is a 95% confidence interval for the number of milliseconds that elapsed during the benchmark. When you run only one benchmark, this is the only output.

Difference table

When you run multiple benchmarks together, you'll get an NxN table summarizing all of the differences in runtimes, both in absolute and relative terms (percent-change).

In this example screenshot we're comparing for loops, each running with a different number of iterations (1, 1000, 1001, and 3000):

This table tells us:

1 iteration was between 65% and 73% faster than 1000 iterations.
1000 iterations was between 179% and 263% slower than 1 iteration. Note that the difference between 1-vs-1000 and 1000-vs-1 is the choice of which runtime is used as the reference in the percent-change calculation, where the reference runtime comes from the column labeled "vs X".
The difference between 1000 and 1001 iterations was ambiguous. We can't tell which is faster, because the difference was too small. 1000 iterations could be as much as 13% faster, or as much as 21% slower, than 1001 iterations.

Confidence intervals

Loosely speaking, a confidence interval is a range of plausible values for a parameter like runtime, and the confidence level (which tachometer always fixes to 95%) corresponds to the degree of confidence we have that the interval contains the true value of that parameter. See Wikipedia for more information about confidence intervals.

    <------------->   Wider confidence interval
                      High variance and/or low sample size

         <--->   Narrower confidence interval
                 Low variance and/or high sample size

 |---------|---------|---------|---------|
-1%      -0.5%       0%      +0.5%      +1%

The way tachometer shrinks confidence intervals is by increasing the sample size. The central limit theorem means that, even when we have high variance data, and even when that data is not normally distributed, as we take more and more samples, we'll be able to calculate a more and more precise estimate of the true mean of the data.

Swap NPM dependencies

Tachometer has specialized support for swapping in custom versions of any NPM dependency in your package.json. This can be used to compare the same benchmark against one or more versions of a library it depends on.

Use the benchmarks.packageVersions JSON config property to specify the version to swap in, like this:

{
  "benchmarks": [
    {
      "name": "my-benchmark",
      "url": "my-benchmark.html",
      "packageVersions": {
        "label": "my-label",
        "dependencies": {
          "my-package": "github:MyOrg/my-repo#my-branch"
        }
      }
    }
  ]
}

The version for a dependency can be any of the following:

Any version range supported by NPM, including semver ranges, git repos, and local paths. See the NPM documentation for more details.

For monorepos, or other git repos where the package.json is not located at the root of the repository (which is required for NPM's git install function), you can use an advanced git configuration object (schema) in place of the NPM version string, e.g.:

{
  "benchmarks": [
    {
      "name": "my-benchmark",
      "url": "my-benchmark.html",
      "packageVersions": {
        "label": "my-label",
        "dependencies": {
          "my-package": {
            "kind": "git",
            "repo": "[email protected]:MyOrg/my-repo.git",
            "ref": "my-branch",
            "subdir": "packages/my-package",
            "setupCommands": ["npm install", "npm run build"]
          }
        }
      }
    }
  ]
}

You can also use the --package-version flag to specify a version to swap in from the command-line, with format [label=]package@version. Note that the advanced git install configuration is not supported from the command line:

tach mybench.html \
  [email protected] \
  --package-version=my-label=my-package@github:MyOrg/my-repo#my-branch

When you specify a dependency to swap, the following happens:

The package.json file closest to your benchmark HTML file is found.
A copy of this package.json, with the new dependency version swapped in, is written to the system's temp directory (use --npm-install-dir to change this location), and npm install is run in that directory.
A separate server is started for each custom NPM installation, where any request for the benchmark's node_modules/ directory is served from that location.

NOTE: Tachometer will re-use NPM install directories as long as the dependencies you specified haven't changed, and the version of tachometer used to install it is the same. To always do a fresh npm install, set the --force-clean-npm-install flag.

JavaScript module imports

JavaScript module imports with bare module specifiers (e.g. import {foo} from 'mylib';) will be automatically transformed to browser-compatible path imports using Node-style module resolution (e.g.import {foo} from './node_modules/mylib/index.js';).

This feature can be disabled with the --resolve-bare-modules=false flag, or the resolveBareModules: false JSON config file property.

Browsers

Browser	Headless	FCP
chrome	yes	yes
firefox	yes	no
safari	no	no
edge	no	no
ie	no	no

Webdriver Plugins

Tachometer comes with WebDriver plugins for Chrome, Safari, Firefox, and Internet Explorer.

For Edge, follow the Microsoft WebDriver installation documentation.

If you encounter errors while driving IE, see the Required Configuration section of the WebDriver IE plugin documentation. In particular, setting "Enable Protected Mode" so that it is consistently either enabled or disabled across all security zones appears to resolve NoSuchSessionError errors.

On-demand dependencies

Tachometer will install WebDriver plugins for Chrome, Firefox and IE on-demand. The first time that Tachometer runs a benchmark in any of these browsers, it will install the appropriate plug-in via NPM or Yarn if it is not already installed.

If you wish to avoid on-demand installations like this, you can install the related packages (chromedriver, geckodriver and iedriver, respectively) ahead of time with npm install, for example:

npm install tachometer chromedriver

In the example above, Tachometer will detect the manually installed chromedriver package and will skip any attempt to install it on-demand later.

Headless

If supported by the browser, you can launch in headless mode by adding "headless": true to the browser JSON config, or by appending -headless to the browser name when using the CLI flag (e.g. --browser=chrome-headless).

Binary path and arguments

WebDriver automatically finds the location of the browser binary, and launches it with a default set of arguments.

To customize the binary path (Chrome and Firefox only), use the binary property in the browser JSON config. For example, to launch Chrome Canary from its standard location on macOS:

{
  "name": "chrome",
  "binary": "/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary"
}

To pass additional arguments to the binary (Chrome and Firefox only), use the addArguments property in the browser JSON config. To remove one of the arguments that WebDriver sets by default (Chrome only), use removeArguments (see example in next section).

To configure Firefox preferences that are usually set from the about:config page, use the preferences property in the browser JSON config.

Profiles

It is normally recommended to use the default behavior whereby a new, empty browser profile is created when the browser is launched, so that state from your personal profile (cookies, extensions, cache etc.) do not influence benchmark results.

However, in some cases it may be useful to use an existing browser profile, for example if the webpage you are benchmarking requires being signed into an account.

In Chrome and Firefox, use the profile JSON config option to specify an existing profile to use. Other browsers do not yet support this option.

Chrome

To find your current profile location in Chrome, visit chrome://version and look for "Profile Path".

If there is an existing Chrome process using this profile, you must first terminate it. You also need to close all open tabs, or disable the "Continue where you left off" startup setting, because tachometer does not expect to find any existing tabs.

You may also need to remove the use-mock-keychain default argument if you encounter authentication problems.

For example, using the standard location of the default user profile on macOS:

{
  "benchmarks": [
    {
      "url": "mybench.html",
      "browser": {
        "name": "chrome",
        "profile": "/Users/<username>/Library/Application Support/Google/Chrome",
        "removeArguments": ["use-mock-keychain"]
      }
    }
  ]
}

Firefox

To find your current profile location in Firefox, visit about:support and look for "Profile Folder" or "Profile Directory".

Note when using the profile option in Firefox, the profile directory is copied to a temporary location.

You may encounter a no such file or directory, stat '.../lock' error, due to a bug in selenium-webdriver. Deleting this lock file should resolve the error.

For example, using the standard location of user profiles on macOS:

{
  "benchmarks": [
    {
      "url": "mybench.html",
      "browser": {
        "name": "firefox",
        "profile": "/Users/<username>/Library/Application Support/Firefox/Profiles/<profile-name>"
      }
    }
  ]
}

Performance traces

Once you determine that something is slower or faster in comparison to something else, investigating why is the natural next step. To assist in determining why, consider collecting performance traces. These traces can be used to determine what the browser is doing differently between two versions of code.

When the trace option is turned on in Chromium-based browsers, each tachometer sample will produce a JSON file that can be viewed in Chromium's about:tracing tool. Enter about:tracing in the URL bar of Chromium, click load, and select the json file you want to view. Check out the about:tracing doc page to learn more about using the trace event profiling tool.

To turn on tracing with the default configuration, add trace: true to a Chromium browser's config object. This config turns on tracing with some default categories enabled and puts the JSON files into a directory called logs in your current working directory.

For example:

{
  "benchmarks": [
    {
      "name": "my-benchmark",
      "url": "my-benchmark.html",
      "browser": {
        "name": "chrome",
        "trace": true
      }
    }
  ]
}

To customize where the logs files are placed or what categories of events are traced, pass an object to the trace config as demonstrated below. The categories property is a list of trace categories to collect. The logDir is the directory to store the log files to. If it is relative, it is resolved relative to the current working directory.

{
  "benchmarks": [
    {
      "name": "my-benchmark",
      "url": "my-benchmark.html",
      "browser": {
        "name": "chrome",
        "trace": {
          "categories": ["blink", "cc", "netlog", "toplevel", "v8"],
          "logDir": "results/trace-logs"
        }
      }
    }
  ]
}

Available trace categories can be found by going to about:tracing in a Chromium browser by entering about:tracing in the URL bar. Press "Record" in the top right (1), then expand the "Edit categories" section (2). There, all the categories available for tracing are listed. Note, for the "Disabled by Default Categories", preface the name with the string disabled-by-default- when adding it to your tachometer config. For example, to enable the disabled by default audio category shown below (3), specify disabled-by-default-audio in your browser.trace.categories tachometer config.

Tracing can also be enabled via command line flags. See the table at the end of the file for details.

Remote control

Tachometer can control and benchmark browsers running on remote machines by using the Standalone Selenium Server, which supports macOS, Windows, and Linux.

This may be useful if you want to develop on one platform but benchmark on another, or if you want to use a dedicated benchmarking computer for better performance isolation.

Note you will need to know the IP address of both your local and remote machine for the setup steps below. You can typically use ipconfig on Windows, ifconfig on macOS, and ip on Linux to find these addresses. You'll need to be able to initiate connections between these machines in both directions, so if you encounter problems, it's possible that there is a firewall or NAT preventing the connection.

On the remote machine:

Install a Java Development Kit (JDK) if you don't already have one.
Download the latest Standalone Selenium Server .jar file from seleniumhq.org.
Download the driver plugins for the browsers you intend to remote control from seleniumhq.org. Note that if you download a plugin archive file, the archive contents must be extracted and placed either in the current working directory for the next command, or in a directory that is included in your $PATH environment variable.

Launch the Standalone Selenium Server.

java -jar selenium-server-standalone-<version>.jar

On the local machine:

Use the --browser flag or the browser config file property with syntax <browser>@<remote-url> to tell tachometer the IP address or hostname of the remote Standalone Selenium Server to launch the browser from. Note that 4444 is the default port, and the /wd/hub URL suffix is required.
```
--browser=chrome@http://my-remote-machine:4444/wd/hub
```
Use the --host flag to configure the network interface address that tachometer's built-in static server will listen on (unless you are only benchmarking external URLs that do not require the static server). By default, for security, tachometer listens on 127.0.0.1 and will not be accessible from the remote machine unless you change this to an IP address or hostname that will be accessible from the remote machine.
If needed, use the --remote-accessible-host flag to configure the URL that the remote browser will use when making requests to your local tachometer static server. By default this will match --host, but in some network configurations it may need to be different (e.g. if the machines are separated by a NAT).

Config file

Use the --config flag to control tachometer with a JSON configuration file. Defaults are the same as the corresponding command-line flags.

All paths in a config file are relative to the path of the config file itself.

You will typically want to set root to the directory that contains your package's node_modules/ folder, so that the web server will be able to resolve bare-module imports.

For example, a file called benchmarks/foo/tachometer.json might look like this:

{
  "root": "../..",
  "sampleSize": 50,
  "timeout": 3,
  "autoSampleConditions": ["0%", "1%"],
  "benchmarks": [
    {
      "name": "foo",
      "url": "foo/bar.html?baz=123",
      "browser": {
        "name": "chrome",
        "headless": true,
        "windowSize": {
          "width": 800,
          "height": 600
        }
      },
      "measure": "fcp",
      "packageVersions": {
        "label": "my-branch",
        "dependencies": {
          "mylib": "github:Polymer/mylib#my-branch"
        }
      }
    }
  ]
}

Use the expand property in a benchmark object to recursively generate multiple variations of the same benchmark configuration. For example, to test the same benchmark file with two different browsers, you can use expand instead of duplicating the entire benchmark configuration:

{
  "benchmarks": [
    {
      "url": "foo/bar.html",
      "expand": [
        {
          "browser": "chrome"
        },
        {
          "browser": "firefox"
        }
      ]
    }
  ]
}

Which is equivalent to:

{
  "benchmarks": [
    {
      "url": "foo/bar.html",
      "browser": "chrome"
    },
    {
      "url": "foo/bar.html",
      "browser": "firefox"
    }
  ]
}

CLI usage

Run a benchmark from a local file:

tach foo.html

Compare a benchmark with different URL parameters:

tach foo.html?i=1 foo.html?i=2

Benchmark index.html in a directory:

tach foo/bar

Benchmark First Contentful Paint time of a remote URL:

tach http://example.com

Flag -	Default	Description
`--help`	`false`	Show documentation
`--root`	`./`	Root directory to search for benchmarks
`--host`	`127.0.0.1`	Which host to run on
`--port`	`8080, 8081, ..., 0`	Which port to run on (comma-delimited preference list, `0` for random)
`--config`	(none)	Path to JSON config file (details)
`--package-version` / `-p`	(none)	Specify an NPM package version to swap in (details)
`--browser` / `-b`	`chrome`	Which browsers to launch in automatic mode, comma-delimited (chrome, firefox, safari, edge, ie) (details)
`--window-size`	`1024,768`	"width,height" in pixels of the browser windows that will be created
`--sample-size` / `-n`	`50`	Minimum number of times to run each benchmark (details)
`--auto-sample-conditions`	`0%`	The degrees of difference to try and resolve when auto-sampling ("N%" or "Nms", comma-delimited) (details)
`--timeout`	`3`	The maximum number of minutes to spend auto-sampling (details)
`--measure`	`callback`	Which time interval to measure (`callback`, `global`, `fcp`) (details)
`--measurement-expression`	`window.tachometerResult`	JS expression to poll for on page to retrieve measurement result when `measure` setting is set to `global`
`--remote-accessible-host`	matches `--host`	When using a browser over a remote WebDriver connection, the URL that those browsers should use to access the local tachometer server (details)
`--npm-install-dir`	system temp dir	Where to install custom package versions. (details)
`--force-clean-npm-install`	`false`	Always do a from-scratch NPM install when using custom package versions. (details)
`--csv-file`	none	Save statistical summary to this CSV file.
`--csv-file-raw`	none	Save raw sample measurements to this CSV file.
`--json-file`	none	Save results to this JSON file.
`--manual`	`false`	Don't run automatically, just show URLs and collect results
`--trace`	`false`	Enable performance tracing (details)
`--trace-log-dir`	`${cwd}/logs`	The directory to put tracing log files. Defaults to `${cwd}/logs`.
`--trace-cat`	default categories	The tracing categories to record. Should be a string of comma-separated category names

tachometer's People

Contributors

Stargazers

Watchers

Forkers

lordoftheflies andresantos doc22940 andrewiggins oflenake developit nolanlawson classicvalues igavelyuk felippe-regazio alexrogalskiy isabella232 pinkdiamond1 spmonahan sambacha ghas-results sunilgitb victor-homyakov westbrook

tachometer's Issues

Don't show query params when benchmark has a label

When you use the built in server with query params, then the output table always displays query params, even if you gave it a concise label. This is not the case for external URLs.

Default measurement should always be global

Currently the default measurement is bench.start/stop when using the built-in server, and first-contentful-paint when using an external URL.

FCP is a particularly bad default, because if you implement either of the other two measurement styles in your code, FCP will still work and return some number, and it's easy to think you are measuring what you implemented instead of FCP.

Let's make the window.tachometerResult global the default in all cases, since it's the most universal (works in all browsers and with both internal/external URLs).

Also, the CLI flag is called "measure", but the JSON config file property is "measurement". These should be the same.

Support Internet Explorer

Since IE doesn't support modules, fetch, or first-contentful-paint, there are currently no measurements that could be done from IE even if we added launching support. So, this is blocked by #60 to add a new global-based API that IE benchmarks could use.

HTML files sometimes truncated

It appears that HTML files are sometimes truncated when served from the built-in static server. Could be related to koa-node-resolve. Needs more investigation.

Tachometer node module resolution doesn't work when a packageVersions variant has a new module

CSV output

We often put tachometer results into a spreadsheet for sharing. A CSV output format would make this easier.

Ability to control browser window size and placement

cc @sorvell

We should also have a smaller default window size, and also maybe try to tile out the different browsers, because being visible on the desktop can affect whether the browser considers itself in the foreground or not (which can affect timer throttling etc.)

Can't use config file and manual mode at the same time

cc @sorvell

errors logged about favicon.ico response type

  Error: Unknown response type undefined for /favicon.ico
      at Server.cache (/usr/local/google/home/bicknellr/Code/test/ce-polyfill-perf-test/node_modules/tachometer/src/server.ts:194:13)

cc @bicknellr

macOS integration test

Travis now supports macOS, and I was able to get it to launch Safari, but it seemed flaky.

EPIPE and ECONNRESET errors

Errors like this get logged to the terminal sometimes:

Error: write EPIPE
      at WriteWrap.afterWrite (net.js:836:14)

Error: read ECONNRESET
      at TCP.onread (net.js:660:25)

It doesn't crash the runner, and doesn't seem to affect the result, just logs. The reported case was a large site, with many files, some of which were large.

Support labels for GitHub checks

To support lit/lit#920, we'll need the ability to specify custom labels to use when reporting GitHub check data.

Support remote benchmarking on mobile devices

--resolve-bare-modules flag can turn off bare module resolution

If you set the flag --resolve-bare-modules, then the flag will be set to empty string, which be falsey (actually we check === true anyway) and turn off bare module resolution. No value should mean on, and anything other than no value, "true", or "false" should be an error.

cc discoverer @e111077

FCP gets measured even with no HTTP response

When when the browser never gets an HTTP response (e.g. a server that is down), we still get an FCP measurement, presumably from the built-in error page. We'll need to detect these cases somehow to make sure we aren't giving very misleading results.

Resolve bare module specifiers

Once https://github.com/Polymer/koa-node-resolve is released, we should integrate it into tachometer, possibly with a flag that can disable it.

JSON output file doesn't work when there is a config file

cc @bicknellr

Tachometer integration test

The Travis build for the tachometer repo should include an end-to-end test that actually launches a browser.

Bytes keep coming up as 0.00 KiB on remote server

I'm using the node main function to run the tests and the size in bytes keeps comings up as zero. Here is an example of what I'm running:

await main([
  '$button:test-basic=test/benchmark/bench-runner.html?bench=test-basic&package=button',
  '--measure=fcp',
  '--browser=chrome@http://localhost:4444/wd/hub',
  '--sample-size=5'
])

It results in the following:

┌─────────────┬───────────────────────────────┐
│   Benchmark │ button:test-basic             │
├─────────────┼───────────────────────────────┤
│     Version │ <none>                        │
├─────────────┼───────────────────────────────┤
│     Browser │ chrome                        │
│             │ @http://localhost:4444/wd/hub │
├─────────────┼───────────────────────────────┤
│ Sample size │ 5                             │
├─────────────┼───────────────────────────────┤
│       Bytes │ 0.00 KiB                      │
└─────────────┴───────────────────────────────┘

It is important to note that I'm dynamically loading the tests. Here is an example of what I'm doing in bench-runner.html

<html>
<head>
  <link rel="shortcut icon" href="data:image/x-icon;," type="image/x-icon">
</head>
<body>
  <script type="module">
    const params = new URLSearchParams(window.location.search);
    const pack = params.get('package');
    const bench = params.get('bench');
    import(`./${pack}/${bench}.js`);
  </script>
</body>
</html>

This then loads a self-running test.

Cache and warmup

Now that we do bare module rewriting, we're spending time parsing every HTML and JS request. This could distort first-contentful-paint measurements, and increases time waiting for benchmark results in all cases.

The best solution might be to add an in-memory cache to the Koa server for all requests, and to always do a single warmup run for each benchmark configuration before starting measurements.

javascript syntax error in HTML file causes empty response

Might be something koa-node-resolve does

cc @bicknellr

Use case: Polymer vs LitElement based components benchmarks

Hi, I have a question about using this tool for our purpose at Vaadin.

We would like to setup a sample app similar to shack for using in benchmarks test.
But what we need is not to test against older commit, but agains the older version.

So we'd like to test LitElement-based app agains Polymer-based app.
Both apps would be implemented using exactly the same components.

Is this doable with tachometer at all (using another app as baseline)?

@aomarks I would appreciate any advice on this.

Provide a way to enable CPU throttling to emulate slow CPUs

Similar to CPU throttling in Chrome DevTools

JSON output file should include confidence interval statistics

Currently it only includes the raw data. I think it should actually not include the raw data, and only include the statistics.

cc @bicknellr

Support a custom global measurement expression

Currently the result for a global measurement is read from window.tachometerResult.

It would be convenient to be able to specify an custom, arbitrary expression to poll for to retrieve the result, which opens up flexibility for measuring pages in production without modifying the code.

Display time remaining until timeout during auto-sampling.

I always forget to note when I started running a benchmark with a really long timeout, so I don't know how long I'll be waiting. It would be nice if the time remaining until the timeout expires was printed during auto-sampling.

Ability to filter rows and columns

Since we display an NxN matrix of benchmark results, the size of the table can get too large to fit in a normal terminal window. Also, in cases where we want to draw attention to only a subset of the results (e.g. for CI integration we usually only care about the results that compare the current PR to master and released, but not the other way around)

Add the ability to filter rows and/or columns of the NxN matrix.

package version install dirs should cache less aggressively

Currently, when you use the package version feature, a deterministic temporary directory is used per-label, which is always re-used. We should definitely invalidate this cache if the package.json dependencies changes. Possibly we should always re-install (and we could then auto-delete the temp directory), but need to think about whether that's required for real scenarios (since installing can be slow).

cc @bicknellr

Allow user to pause/resume sampling

Right now If you want to stop sampling before the timeout, you can Ctrl-C, but that will kill the process without showing any results at all. And if you want to keep going after the results are in (e.g. because you want a tighter confidence interval), you have to start over entirely which means throwing away the samples you've already done.

Ideas:

If the user sends SIGINT (Ctrl-C) at any time, then we should show the results we have so far, gracefully shut down, and exit.
After the initial results come in, we should pause rather than exit, so that the user has the option to keep sampling.
If the user presses [p]ause while sampling, we should show the results so far, and pause for more input.
If the user presses [r]esume while paused, we should continue sampling (maybe for 60 seconds each time? maybe you can type how many minutes to go?)
When you're running a non-headless browser, it might be nice to pause for a few seconds every minute or so, because if the browser is stealing focus, it can be hard/impossible to send any input to the runner terminal.

(6. This should probably be another issue for another time, but the entire UI could also be re-worked to fully control the terminal, and update a single table continuously using ANSI clear and cursor control sequences, so you'd just watch the intervals tighten up, and press [p] or quit when you want. My concern would be that it might make it tempting to mentally extrapolate a false convergence and draw a conclusion too early).

cc @justinfagnani @sorvell for comment

Move browser drivers to peer dependencies

Hi,

I am currently focusing on performance testing on Google Chrome.

In order to improve build times, it would help me to move the browser driver dependencies to peer dependencies. There is not going to be a need to test performance on any other browser but Chrome for me.

Remote benchmarking

It would be useful to be able to launch a browser on a remote machine. In particular, we would like to be able to launch Edge or IE on a remote Windows testing machine (on the same network), and point it at a tachometer server running on the local dev machine.

Windows integration test

We now have an end to end integration test running on Linux with Travis. We should also have one for Windows. Travis actually now has Windows support, but it is immature according to their own docs. I was not able to get it to launch Edge (not even sure if the images have Edge). We should probably use AppVeyor instead for now.

browser shows as [object Object] in result table

Since refactoring the internal representation of a browser from a string to an object, I forgot to update the result table output to show the name instead of trying to just print the object, which displays as [object Object].

Ability to control browser binary path, to control versions

cc @justinfagnani

Simpler top-level JSON configuration style for browser and measurement

It's not obvious how to use the generic "expand" feature in the JSON config file. People expect to just put the browser/measurement type at the top-level of the config file. We should support this, and maybe just remove the "expand" feature in the interest of simplicity. It's always possible to explicitly enumerate any more complex sets of benchmarks you might want.

chrome window size e2e test is flaky in Travis

Maybe 50% of the time, the end-to-end test that checks whether we can control Chrome's window size fails in Travis:

  68 passing (1m)
  1 failing
  1) e2e
       chrome-headless
         window size:
      AssertionError: expected 50000 to equal 100000
      + expected - actual
      -50000

Support for setting browser preferences (i.e. about:config flags) in browser config

At least for Firefox and Chrome

Support Windows and Edge

We should be able to launch tachometer from Windows, and control Edge.

Note IE is tracked separately at #61 because it has an additional blocker.

Cannot run benchmarks on non-chrome browsers

Error: Browser firefox does not support the first contentful paint (FCP) measurement

Support for dropping outliers

One would think that the more samples we take, the more stable is the result. However, taking more samples also means there is a higher chance of getting an interference from the system (some daemon doing expensive work, flushing of caches, etc).

This feature request is about having a statistically rigorous way of dropping outliers before computing the confidence interval, so that one or two crazy measurements don't cause an "unsure" result, and adding more samples guarantees getting a more stable result.

This should be optional, not hard-coded, because outliers are not always independent from the page being tested (e.g. if a page has a 1% chance of hitting an expensive GC).

Chrome WebDriver Windows is super slow

It's extremely slow compared to other browsers on Windows, and to Chrome other operating systems.

I filed https://bugs.chromium.org/p/chromedriver/issues/detail?id=2963. The responder has reproduced the slowdown, but doesn't see to consider it significant enough to warrant further investigation.

Needs will need some more work into demonstrating the significance of the problem.

chrome screen size doesn't work on macOS

Maybe firefox too. Probably just settings flags wrong.

Crash on reporting results

$ tach --config tach.json 
Running benchmarks

[==========================================================] done
TypeError: Reduce of empty array with no initial value
    at Array.reduce (<anonymous>)
    at sumOf (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/stats.ts:142:15)
    at Object.summaryStats (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/stats.ts:51:15)
    at /Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:635:44
    at Array.map (<anonymous>)
    at makeResults (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:633:31)
    at automaticMode (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:673:27)
    at processTicksAndRejections (internal/process/task_queues.js:89:5)
    at Object.main (/Users/rictic/.nvm/versions/node/v12.4.0/lib/node_modules/tachometer/src/cli.ts:390:14)

Config:

{
  "$schema": "https://raw.githubusercontent.com/Polymer/tachometer/master/config.schema.json",
  "resolveBareModules": false,
  "sampleSize": 10,
  "timeout": 0,
  "benchmarks": [
    {
      "browser": "chrome-headless",
      "measurement": "global",
      "expand": [
        {
          "name": "v1 ytcp-video-row",
          "url": "http://localhost:8001/?tag=ytcp-video-row"
        },
        {
          "name": "v2 ytcp-video-row",
          "url": "http://localhost:8002/?tag=ytcp-video-row"
        },
        {
          "name": "v1 ytcp-video-metadata-editor-section",
          "url": "http://localhost:8001/?tag=ytcp-video-metadata-editor-section"
        },
        {
          "name": "v2 ytcp-video-metadata-editor-section",
          "url": "http://localhost:8002/?tag=ytcp-video-metadata-editor-section"
        }
      ]
    }
  ]
}

Add --version flag

Server should not cache responses in --manual mode

node_modules sometimes gets mounted to //node_modules

Happens when node_modules is a direct child of the root directory, and package versions are in use. Possibly happens in other scenarios too. Probably just some error in path construction somewhere.

cc @bicknellr

Support multiple measurements per benchmark

I think we'd like to return multiple labeled measurements from a benchmark, so that we can see metrics like FCP, TTFI, idle time, GC time, memory, etc.

To view multiple measures we probably want a table that shows results relative to a chosen baseline, rather than the NxN matrix.

Support JSON configuration file

We should support a JSON configuration file as an alternative to using command-line flags.

Lets you check in and share benchmark suite configurations.
Will allow more complex and readable configurations, e.g. where it is unclear when a flag like --package-version will do a cross product, just apply to one benchmark, etc.

Add global reporting API

Currently the bench.js library is an ES module, and does a fetch call when bench.stop() is called, to send the timing data back to the server. Users can also send this response themselves if they know the right JSON format and address (though this isn't documented).

There are a few problems with this:

If you aren't using the built-in server, then you don't know what host/port to send your response to.
IE doesn't support ES modules or fetch.

So, we can add a new reporting API (or replace bench.js?), which is simply that the benchmark puts the timing data onto window somewhere (e.g. window.tachometer = 123). The runner would then just poll using WebDriver until it finds the data. We already find first contentful paint in a similar way, by polling the performance.getEntriesByName API.

Another option might be to have the user add a performance entry, but then the runner would need to know which entry to look for (so it would need to be an additional config parameter). It would also not be possible to put arbitrary numbers in there, since the performance API doesn't support that AFAIK. The global approach seems like the simplest solution that covers all the use-cases.

A follow on, related to #34, could be to support sending back key/value pairs (e.g. window.tachometer = {foo:123, bar:456}), so that multiple timing numbers could be reported from the same benchmark.

cc @sorvell @frankiefu

Support GitHub package versions with monorepos

NPM's github: protocol for specifying package versions only works when the top-level of the git repository maps to the NPM package (see npm/npm#2974 which is won't-fix). In the case of monorepos, such as the one for Material Web Components, packages are organized into sub-directories. This means there's no way to use tachometer's package-version feature for benchmarking commits of repos like this.

The simplest solution is to write a bash script that makes a few clones of the repo, and then sets the package version to the path of the local clone (plus the package directory). We should at least document this pattern if that's the answer.

We could also think about building some kind of support into tachometer. It would basically do the same thing as above, with temp directories. Not convinced yet it's worth the complexity, maybe we should start with documenting the manual pattern.

cc @e111077

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.