testground / testground Goto Github PK

View Code? Open in Web Editor NEW

407.0 407.0 73.0 32.79 MB

🧪 A platform for testing, benchmarking, and simulating distributed and p2p systems at scale.

Home Page: https://testground.github.io/docs

License: Other

Go 88.76% Makefile 0.57% Dockerfile 0.38% Shell 5.68% JavaScript 2.81% HTML 1.28% Rust 0.26% CSS 0.26%

continuous-integration distributed-systems p2p testing

testground's People

Contributors

Stargazers

Watchers

testground's Issues

Testground daemon runtime

The daemon is a long-lived server process that handles incoming GitHub webhooks, as well as explicit API calls, orchestrates test plan execution, and offers the dashboard view. It needs to model a job queue of some kind. It schedules only against nomad.

It will need a configuration file of some kind to define things like the Nomad daemon endpoint.

Test the TestGround

A few people have reported being uncertain if TestGround is buildable or if they just don't have the right setup in their machine.

Given that TestGround is designed to be runnable in a local node for non exhaustive tests. Having a Test Plan that is designed so that we check that a TestGround is buildable on Travis (and therefore, granting that it is buildable at least for its essential features) would be a great thing.

Looking for a comment on how urgent should this be and/or if there is any reason to not do it.

Trace API invocations in state db

Assuming we have a DB where we track the state of the system and jobs (#148), and that we're running in a shared, service setup (Maturity Stage 2, #643), the daemon should audit/trace all incoming testground API calls in the database, for future reference.

Review: Chewing strategies for Large DataSets

see: https://github.com/ipfs/testground/blob/02ec37ecee35c5ccd62911eea19a8f275b2903f0/plans/chew-large-datasets/README.md

Test Parameters

The directory depths allow specifying file sizes but file sizes can also be specified independently. How are these resolved?
As far as I can tell, one currently needs to explicitly list out every file to be added in File Sizes. Instead, we should have a list of [{average: ..., variance: ..., percent: ...}] and a final total file count (where the percent fields add to 100%). Otherwise, generating realistic tests will be really tedious.

For example:

{
  depth: 4,
  numFiles: 1e6, // maybe allow a list and test with each one?
  fileSizes: [
    {average: "1MiB", variance: "10%", percent: "90%"},
    {average: "1KiB", variance: "0%", percent: "10%"},
  ]
}

IPFS MFS Write

- Pin the MFS root hash

The MFS root hash is always pinned (in go-ipfs).

IPFS Url Store / IPFS File Store

To be explicit:

Run FileStore doesn't really make sense. I assume you meant: add the files using the filestore (e.g., ipfs add --nocopy).
Verifying that all the files are listed involves running ipfs filestore ls to list all blocks and their filenames/urls.

Contrast Testground and Testlab

The libp2p testlab project is a preceding framework initiative to build a system for running large distributed test cases.

The testground spec mentions it in the "Incremental Implemention Plan":

Study how libp2p/testlab can cover the distributed deployment requirements of this design, and understand how it can be reused within this context.

Initial assessment:

The current version can be regarded as a domain-specific deployer, capable of launching p2p and ipfs daemons (PR in review) in a Nomad cluster. It does not use Docker, nor does it have support for deploying test logic. It essentially schedules N containers in a cluster, and returns the API multiaddrs to the scheduling process to enable creating client stubs for control, from the code that launched the deployment.

In some ways, it can be regarded as a cloud-native iptb.

The observability/monitoring/metrics elements are not yet developed.

Testlab's roadmap covers some of the same ground as the spec for Testground. IPFS support using Docker exists as a PR (starts IPFS nodes and connects them, but no scenario yet).

Testground has a similar approach to writing test code, using a config file + go logic.

Testground:

Testlab:

The Testlab "deployment configuration" JSON file is being used to declaratively describe stages which are transformed via plugins into deployable sets of software controlled by nomad. The test code execution is baked into the "scenario" binary that gets deployed in the final stage.

Contrasting that, the Testground "manifest" TOML file appears to be used to specify a series of individual tests, but the go code is responsible for launching any instances itself - for example, the "smlbench" test uses (currently disabled) the "IPTB ensemble" wrapper that is part of the Testground SDK.

There are many other differences. As both projects are out there, and represent different design choices, it might be good to capture the "why?" and how that relates to the core needs and desires for a large scale test framework. There remains the possibility of utilizing some or all of Testlab with Testground ... but they do appear to be competing on the basis of how they are configured and tests are written, so it may make more sense to deprecate one in favour of the other if we are only going to pursue development on one of them.

This issue is a bit of a placeholder ... we could choose to do more "study" (as suggested in the spec) or we could add some documentation.

CLI describe command

Outputs information about a test plan or a test case for now. In the future it may be extended to inspect builders and runners, and their configuration properties.

AWS Tags

If you are creating virtual machines or other resources on AWS, please add some tags so we can track costs ... anything that isn't tagged will be deleted (if we can't figure out what it's being used for)

Edit the table below to add tags:

Tag Key	Tag Value	Contact	Description	Keep until date
Project	Jim_Testground	@jimpick	Experimenting with Testground	2019.10.25
Project	Packer	@jimpick	Building images	2019.10.25

archive service: design and implement API endpoint

Design an HTTP API endpoint that can ingest the following artifacts, emitted by the test runner:

Go CPU profiles (e.g. 10 seconds).
Go heap profiles.
Go mutex profile.
Go block profile.
stdout logs.
event logs.

Each archive request will carry, at least:

name of the test case.
test run number.
node tag.
commit hash.
timestamp

github integration: schedule tests upon go-ipfs/master activity

`docker:go` builder: Allow cross-repo replace directives

Currently, they only allow selecting a branch. Ideally, I'd be able to point at a different repo or even a local directory.

Testground roadmap/timeline

I wanted to clarify and write down our goals/expectations for Testground to ensure we stay on track. This is a continuation of the conversation @Stebalien @jimpick @raulk @daviddias and I had last week.

Goal: Ship the next go-ipfs minor release (using testground to validate healthy network performance at scale) by November 15th, 2019.

Our updated release process takes ~3 weeks from the time we cut our RC, on or before October 25th.

At the time of cutting our RC, all tests need to be passing - aka, testground should have sufficient coverage to validate that master performs as well or better vs prod in simulated network conditions. Master currently has a large backlog of changes, including a large libp2p refactor IIRC, so it will probably take 2-3 weeks to use testground to validate and fix issues in master to ensure the build is green. This means we need to have testground working well enough by ~October 4th that we can start using it (or a portion of it) to identify any bugs introduced in the stabilize branch or other areas that we want to release in 0.5.0.

Does this seem feasible? If not, what do we need to change to hit this deadline? (parallelize more? more knowledge sharing? etc) Our goal was to get a go-ipfs release out in Q3. This new deadline, 11/15, is nearing the end of Q4 (given holidays). Delaying our next release to Q1 2020 would be unacceptable - what support / focus / etc is needed to get us on this trajectory?

Contrast Testground and ipfs/benchmarks

The ipfs/benchmarks is a preceding framework initiative to build a system for running regular test cases.

The ipfs/benchmarks system was primarily built to run several benchmarks for js-ipfs on a nightly basis in order to measure progress / catch regressions. It utilizes a dedicated bare metal server in order to get consistent benchmark runs so that it is possible to compare against historically archived benchmark results with some level of confidence.

It also has a benchmarks dashboard (currently down) built with Grafana.

You can see the types of tests that are being run nightly:

ipfs/benchmarks#271 (note: the screenshot here is an enhanced Grafana dashboard using non-production data on a development deployment)

The following tests are run nightly against js-ipfs built from master:

adding files to js-ipfs (multiple sizes, strategies)
adding files to go-ipfs (using a stable version, for comparison)
extracting a local file from js-ipfs (multiple sizes)
transfer between two js-ipfs nodes (no network throttle, multiple sizes, multiplexers, websockets, encryption)
transfer between from a js-ipfs node to a go-ipfs node (multiple sizes)
transfer between from a go-ipfs node to a js-ipfs node (multiple sizes)
multi-peer transfer - content is on 4 js-ipfs peers, a 5th peer retrieves a file (multiple sizes, multiplexers, websockets, encryption)

Some work was performed to see if we could re-use the infrastructure to also run nightly go-ipfs tests against a go-ipfs built from master (with no new tests), but the changes have not been deployed yet (needs more discussion)

An important feature of ipfs/benchmarks is that Clinic.js is used to generate runtime profiles and flamegraphs of the Node.js tests.

These are saved into IPFS and can be retrieved as HTML files:

https://ipfs.io/ipfs/QmeBBjfQgLfAtPwSKLVLnk6uVMq7gXmQ7omE3oDmCHFwjR/addMultiKb_balanced

Compared to what is currently implemented for Testground, ipfs/benchmarks has:

support for nightly scheduled test runs
support for building js-ipfs (go-ipfs in a PR)
small test suite focused on simple scenarios (mostly js-ipfs)
dedicated bare-metal test minion hardware for reproducible results
metrics collection database (InfluxDB)
javascript tracing/profile (Clinic.js)
a dashboard (Grafana) ... currently only has one dashboard which is poorly laid out, but Grafana is flexible and the dashboard could be greatly improved. Grafana has multiple user support, but the currently setup has limited access

Testground appears to be on-track to implement the following differentiating features:

support for building Docker containers
testing across clusters of machines instead of a single machine
complex scenarios
ability to test continuously vs. nightly
go-ipfs and libp2p tests
metrics/results sent to ElasticSearch
no dashboards yet, evaluating Kibana ... Grafana is also capable of using ElasticSearch as a data source, so that might also be an option

In terms of maturity, ipfs/benchmarks was started last year, and is a complete working system. It has some minor docker startup issues that occasionally need devops intervention. We have been doing js-ipfs releases even though the system has been offline, so it's likely not a crucial part of the release testing process.

Testground is in a very immature state currently, but it has a planned feature set that will surpass ipfs/benchmarks and is being actively invested in with developer time. It will have a dashboard as well. There is a desire to not have 2 separate sets of dashboards for developers to regularly check.

It's not yet entirely clear if we should be trying to run two separate testing infrastructures, or if we should be actively working to merge them together.

Fake bitswap + datastore for mixed testing

I would be neat to do a "fake bitswap + datastore" thing we could slap into an IPFS test build that would pretend it was transferring over the network at a simulated rate, but wouldn't actually transfer blocks ... it would use a out-of-band backchannel to communicate any hashes that the other end needed.

We could do mixed simulation, but with many lightweight (but real) nodes on a real network for things like package manager use cases with massive numbers of clients.

It's pretty inexpensive to spin up lots of nodes in the cloud using a serverless platform such as Cloud Run (or maybe AWS Fargate), but if we want to send lots of real traffic across multiple regions, that costs $$$.

Port js benchmark test cases to the `smlbench` test plan

https://github.com/ipfs/benchmarks/tree/master/tests

Produce build outputs with deterministic names for caching

Idea: plan name + hash of build input struct.

For the executable build strategy, we should be storing executables under a user-determined/autogenerated-but-consistent directory, and naming them deterministically.

For the docker build strategy, we should name/tag docker images with the deterministic name.

When the build step runs, we should check if we have a cached artifact.

We should provide a clean command to prune artefacts cached by a builder.

github integration: (ipfscanarybot) react on comments in PRs

smlbench tests do not run

I was looking through the code of the smlbench tests to use it as a reference for Test Case 1. They don't run at the moment and it is easy to understand why. The manifest is a c&p from the DHT tests one.

Should I take any inspiration at all from these tests? It seems that the way that is being used to spawn a go-ipfs daemon is by running a wired version of IPTB, programmatically. Is this intended? Still the plan or should we look into spawning a go-ipfs daemon and using the go-ipfs-api to operate it.

Combine random/interest dataset plans

There are currently two test plans that cover transferring datasets: interest and random. However, the tests will be identical. The only difference is the input data.

We could simplify them by:

Combining them.
Allowing the user to specify the data source:

Generated files as specified in #80
An IPFS path.

Migrate sync service from Consul to Redis

The current version of the keyprefix watch in Consul is pretty inefficient.

It is intended to watch a tree node and return updates whenever anything under it changes.

Unfortunately, it sends the entire subtree with every update, instead of the delta. So in a 100,000 node scenario, this scales very poorly, as each node will be receiving the entire subtree every time that a node appears. That's O(n^n) or O(n!) 😨

In all fairness, the Consul community is addressing this by introducing "streaming queries": hashicorp/consul#6310. However, we can't wait to characterise the performance and feasibility, so we will be migrating this component to Redis.

Various patterns are possible:

Redis Streams (X command group).
(Sorted) sets/lists with keyspace notifications via pubsub.
Strings with keyspace notifications via pubsub.

I'll analyse the tradeoffs in comments.

Related to #23.

Produce docker container for DHT tests

Consume a list of upstream dependencies/commits, emit replace directives, append to gomod file.
Template Dockerfile.
Trigger Docker build.
Collect Docker container ID.

epic: archive service

This service will likely be backed by a blob store. It will index and store CPU, heap, mutex, etc. profiles.

Acquire a domain name

It would be nice to have a URL where we could host things like:

dashboard
grafana
blog for infra test updates
sub-namespaces for specific teams / endeavours to post results

Keep in mind that we might want to share resources between IPFS, libp2p, and possibly even Filecoin and community projects / collaborations down the road.

Use Ansible to automate cluster setup

This is close to becoming a PR ... I've been experimenting and prototyping in the aws-ansible branch to learn a bit of Ansible so I can have reproducible cluster setups (a serious pain point for me so far).

I've got playbooks working for:

Dynamic Inventory - retrieve the list of machines from the AWS EC2 API that are tagged with the same 'TG' tag as the current host
Connectivity Test - ping all the hosts in the inventory
Redis - setup the redis.conf config file on the first machine and start the server
Filebeat - setup the filebeat.yml from a template populated from an ansible config stored in an S3 bucket
Networking - setup Docker networking on each machine with non-overlapping subnets and GRE tunnels (currently setup for 2 machines, will generalize it for 2+ soon)

I'm planning to take these individual scripts and put them into a directory as Ansible "roles" so they can be run all together.

DHT find_peers distributed test over 1000 nodes

A test scenario that we'll use as an example/template to build the system gradually.

Initially this will run within a single process.
Run as multiple processes locally, coordinated by the sync service (Consul). Local test runner.
Deploy to a Nomad cluster. Nomad test runner.

Impl Traffic Shaping

We need an API/tool to declaratively enforce traffic shaping rules via tc. Think of this as Terraform for traffic shaping. It takes an object (serializable into JSON or YAML or whatever), and applies the rules expressed within. It then exposes an API to "release" those rules and revert the network adaptor.

reporting service: client proxy to inject in test scenarios

The reporting HTTP service will be exposed by the coordinator, but we don't want the test scenarios to deal with raw HTTP calls. They should receive a proxy client that encapsulates the network calls and exposes a nice, simple API.

type Reporter interface {
	RecordMetric(name string, value float) error
}

// Encapsulates the reporting context and implements Reporter.
type httpReporter struct {
	commit		string
	run			int
	scenario	string
}

master test plan: sketch out the skeleton

The master test plan contains all the test scenarios that the scheduler will run for every single commit that is scheduled to be canary-tested.

Each test scenario can be a struct implementing a TestScenario interface. The TestScenario interface could be mono-method.
The master test plan instantiates an IPTB of, say, 16 nodes.
The grand total node count can vary over time as the master test plan evolves.
Test scenarios should be parallelisable.
Test scenarios will "check out" IPFS instances from the IPTB pool, and will conduct tests on them. For example, an "add-then-get" test scenario will check out two instances: the adder and the getter.
Something needs to supervise the assignment of available IPFS instances to test scenarios.
While those instances remain checked out, they will be unavailable to other concurrent test scenarios.
When a test scenario finishes, it's not clear if we should completely dispose of that IPFS instance, or if it can be reused.

The scheduler will run the master test plan N times per commit, in order to acquire various observations. However, the master test plan does not need to know it's being run repeatedly.

Test plan executables should generate nomad HCLs for test cases

We want the HCL definition of a nomad test case to be co-located with the test case itself. The way to achieve this is to have the test plan executable print the HCL to stdout.

Introduce a TEST_COMMAND environment variable that can take values: run, schedule.

When the value is run, the executable will run the test case designated by TEST_CASE_SEQ, as normal.
When the value is schedule, the executable will spit out the Nomad HCL for scheduling the test case designated by TEST_CASE_SEQ, with the parameters conveyed in other environment variables.

The emitted HCL may in itself be a Go template, as there are certain elements that cannot be determined by the test case itself (e.g. test run ID, etc.). We'll need to define the expectations and input/output clearly here.

Provide an option to replace the base docker image

That would allow us to pull a customized docker image including additional utilities.

github integration: lay the foundation

What we need to do

Write a GitHubBridge component that:

sets up a webhook endpoint.
subscribes to commits on master, commits on pull requests, and comments on PRs.
prints out those events to stdout.

Can be tested manually against a personal repo (don't create test PRs or commits on go-ipfs itself!). Not sure how this can be unit tested; maybe create integration tests that use a personal repo and the GitHub API to make commits, etc. that trigger the events we'll receive via the webhook endpoint. You can register and remove webhooks dynamically via the WebHook API: https://developer.github.com/v3/repos/hooks/.

This integration test will also need to run on Travis (we need to set up Travis -- can do that in this issue too?).

Definition of Done

Documented code.
Integration tests against a personal repo, using the webhook API to dynamically register a webhook, and the general GitHub API to make actions that lead to webhook notifications.
Pull request.
Travis config.

epic: test scheduler

The test scheduler is the component of the system that takes a commit hash, checks out the go-ipfs tree, builds any necessary artifacts, and schedules a master test plan run against that build.

The scheduler should be developed as an abstraction, with an initial implementation that schedules test runs locally and in a serial fashion (FIFO queue).

In the near future, it should evolve to schedule test runs on a nomad cluster, so that we can parallelise the live canary testing of multiple commits at the same time.

Error while deleting containers

Ran the command from the readme for running the tests locally:

TESTGROUND_BASEDIR=pwd testground -vv run dht/lookup-peers --builder=docker:go --runner=local:docker --build-cfg bypass_cache=true

And I get errors about deleting containers. Log below, from after it created all the containers.

log:

305.87214s INFO started containers {"runner": "local:docker", "run_id": "8c5f91b5-4a26-427f-86bf-63e67e823221", "count": 50}
305.90342s ERROR Error: No such container: f60ad6ea40133850ab76fccd0e3dbe195880002db33f88d2132dfd710565f984 {"runner": "local:docker", "run_id": "8c5f91b5-4a26-427f-86bf-63e67e823221"}
github.com/ipfs/testground/pkg/runner.(*LocalDockerRunner).Run
/Users/dietrich/go/src/github.com/ipfs/testground/pkg/runner/local_docker.go:237
github.com/ipfs/testground/pkg/engine.(*Engine).DoRun
/Users/dietrich/go/src/github.com/ipfs/testground/pkg/engine/engine.go:225
github.com/ipfs/testground/cmd.runCommand
/Users/dietrich/go/src/github.com/ipfs/testground/cmd/run.go:128
github.com/urfave/cli.HandleAction
/Users/dietrich/go/src/github.com/urfave/cli/app.go:523
github.com/urfave/cli.Command.Run
/Users/dietrich/go/src/github.com/urfave/cli/command.go:174
github.com/urfave/cli.(*App).Run
/Users/dietrich/go/src/github.com/urfave/cli/app.go:276
main.main
/Users/dietrich/go/src/github.com/ipfs/testground/main.go:37
runtime.main
/usr/local/go/src/runtime/proc.go:203
305.90969s INFO deleting containers {"runner": "local:docker", "run_id": "8c5f91b5-4a26-427f-86bf-63e67e823221", "ids": ["f60ad6ea40133850ab76fccd0e3dbe195880002db33f88d2132dfd710565f984", "18de494c88af62bfb614ae9f9c1ef9f875f6de887383b586d69b2be0921df42d", "110351118efb8c56924b08176b15034b221420280c11286ec76428b758facbd4", "9ae807880b811897909ea91362467f7cdb91f872c30032ef2ea908111094d6dc", "48e64f8cc2954699db0da9c22c83d3f1c377948d2444a6d326bdc73e290670ad", "6016eaab4e6d1bedab9fe06618ecd9edc4b9c5cc7a26ce8b6e6f3d0ecc24a31f", "dd1f01c45b03ac8a23a4b4a9fb3b2b8bdc7b850848ee39d6f8cf7fabb3140ef2", "028443d4686419e13bcb945352cb56258866c47a9c62287ffa78c7de28eda44d", "8cd6b61bdb80bb25ddad18517ac6897f670fe68a380e1be98bbebdb1dc8458a7", "1a38f9c3e3add39ef5315886efc0db229d4639b60d1f1dd3eb0f085a95fd2aa0", "8e5eeb3a0707d4a8b477662f9b66a59822394099ceecf04482f54d893fe64b93", "5cbb26c5d196209d3457ca41c40fc5a5f48e5ce905ef7dceda7336f3a3a328ff", "90f1370c399dc28bca53352bf7bf12b2ab1060c1e39d88465122bd4a58276593", "0eb7d340f8453fe0a4b9b61c68fbbae16b247fa63822082c4fda19fb455fc754", "2eb0db65b7bbd22349527becda0600d86b0b6828f4267e09c04502463c04625e", "21747a4afd2c532701aca25f09feb868eab91e661ebdf3f23ee64d210ec3f122", "e2029145cce0ece593cb979bd9710e6798ad1868d2fce42aa5294fd871ffc098", "9974da148aec9a952208486fd125f3e0623344bb893469a86a24ed200f4d7a9e", "9d089215a44b52495aa5b6392a3ed18aaf8045807970f8437de02c9986bef8c9", "46e6bd86605929ad4b8aeed173c6f04e1377076d70b898ad8cde769b99ecb5c2", "a43c8327f795b83464d89ab1c90a8e5d4b06fd1734b9e0eeef04791266a6f703", "4d300eff9709b28f19d5bc03477010519e5c2f60cc08ed39c90da1f5ebbdbf8c", "6e4fd0993962816884eb305cf77812468bc53a10c16a33576531b942c3dd7fb3", "2737bac62e45936ac9a12dd93367d4fa614a12a5b461d2864e3895a9ba0b25d6", "c8418f988df67b44ad8e4b4c77029792fa19ff61e2eaf81ab6b0b6ced7b523a3", "37c2b0361e3ad53dbb39b6e85310373e9146fcf73b03d4796f2064a47b580156", "8fc69571f791d0016beec8bf9c028158e0160abf87742a70f3da322cf221dd96", "458809f2fcee4a837b95fb0d3cbf300e353290cd37b3e2c7be6c35284b0d5e1e", "9ec5921c95bad596aa4b079e63ca93d31d591cfeef1d9c5679c6659ebc9d2717", "76b9a7821f1d45a8784b1bcf291acb80e04ec003b5dc33db19c2232542241abd", "f953adaaf34ffa4e9be2e92ea6940c68cb8ac1a6b70d709e9d12ecb3e6227ead", "9b99f9b20fcc9a677ea9aa0caeef0ddf08736b96dd360cde454a169b282b2bf0", "b5764f5be1a539dc07efb4cf26dd90728b9dd9dfefda141b87ff3e69fb2176e7", "1dff45985a770f8d8d1e74e29a44e5598aae541239c51d12de123521087e62cd", "4985c012dce1425b634c025d77de53b38a700d3950d0f076e9e9e7215ef9695d", "7bf4f4f4dd9dc16cb1c66510c6bd727bdfe431a1acc41c1b40ec291b126a3ba9", "5d9e66432d9cbb82b5873420d70c6e50b01657309abb9385e6413e72bf726b59", "a04bd280e828af4759a581565a2a1c259667523e19898d988e29ede3b5bbb69c", "227b1d2c8c5aa90f0be7c2d46099aaff5f66cdd691033477d99917b75848ae24", "1b68e36153c9a48747b1faecdd509ceb697e83949ce0c54f68557076134a3ab9", "ada537bc80fdcd021d4eb44e0a1856db6584576ba542a0e9c7e7aa834c9b7d48", "40152c6cd7b1ba7e8351d070843572c840bd26467d4401089456c961497fbf6f", "286bdf4c86cf3b25f8ecbb757914ae326f9c838937f47d3137d2db79f41d5ea5", "0184c77eb0548229e37055aebe18378367b0a433fc02c739c1e7ab707cf19b82", "61bd6ab454fd86af8b01088ac1f94321ca15de605838973d3849d7e8b6e51cec", "72d2ecc82cbc9cac23830befb4da9ede1db2770001f8d029b67b9b16efce5217", "1d90304528b25fac622bd39504f5444f5c3067189b54c25dee17dc192d26209b", "185161f998f8952a632c1ccb322d217bfa758637e50db99fa8338febe7c630ca", "a13096d7ed77a712e1edaa11fb39a8fbfe9b5472ebd592553d416ff2f41f0c05", "eee6c43871d68c558d5b03ddc2732f8b6888ad1b5cc3b565a97e05f0984b9daa"]}
305.91539s DEBUG deleting container {"runner": "local:docker", "run_id": "8c5f91b5-4a26-427f-86bf-63e67e823221", "id": "f60ad6ea40133850ab76fccd0e3dbe195880002db33f88d2132dfd710565f984"}
305.93339s ERROR failed while deleting containers {"runner": "local:docker", "run_id": "8c5f91b5-4a26-427f-86bf-63e67e823221", "error": "Error: No such container: f60ad6ea40133850ab76fccd0e3dbe195880002db33f88d2132dfd710565f984"}
github.com/ipfs/testground/pkg/runner.deleteContainers
/Users/dietrich/go/src/github.com/ipfs/testground/pkg/runner/local_docker.go:286
github.com/ipfs/testground/pkg/runner.(*LocalDockerRunner).Run
/Users/dietrich/go/src/github.com/ipfs/testground/pkg/runner/local_docker.go:238
github.com/ipfs/testground/pkg/engine.(*Engine).DoRun
/Users/dietrich/go/src/github.com/ipfs/testground/pkg/engine/engine.go:225
github.com/ipfs/testground/cmd.runCommand
/Users/dietrich/go/src/github.com/ipfs/testground/cmd/run.go:128
github.com/urfave/cli.HandleAction
/Users/dietrich/go/src/github.com/urfave/cli/app.go:523
github.com/urfave/cli.Command.Run
/Users/dietrich/go/src/github.com/urfave/cli/command.go:174
github.com/urfave/cli.(*App).Run
/Users/dietrich/go/src/github.com/urfave/cli/app.go:276
main.main
/Users/dietrich/go/src/github.com/ipfs/testground/main.go:37
runtime.main
/usr/local/go/src/runtime/proc.go:203
Error: No such container: f60ad6ea40133850ab76fccd0e3dbe195880002db33f88d2132dfd710565f984

archive service: implement proxy and inject in master test plan

Is it known that `exec:go` does not run?

» TESTGROUND_BASEDIR=`pwd` testground run smlbench/store-get-value --builder=exec:go
resolved testground base dir from env variable: /Users/imp/code/go-projects/src/github.com/ipfs/testground
Incorrect Usage: invalid value "exec:go" for flag -builder: allowed values are docker:go; got: exec:go

Support hashicorp/go-getter to fetch the source of a test plan

This will enable us to host the source for test plans externally, e.g. in GitHub, rather than in a monolithic repository.

In the test plan manifest, we should use go-getter URLs for the source_path field.

epic: reporting service

Test scenarios need to record their results and observations in a database, from where the dashboard will read them. The reporting service will be an HTTP API with a proxy object that we'll inject in test scenarios so they can record metrics easily without having to worry about constructing HTTP clients, nor providing the context (i.e. test run number, test case, commit id).

ElasticSearch deployment alternatives

There are many, many different options for deploying an ElasticSearch cluster...

Right now, we're using the "Elastic Cloud" offering from Elastic, for ease-of-setup + unified billing (via AWS Marketplace) + it's got the latest features and matches the online documentation. However, it's not open source ... here's the full feature matrix:

https://www.elastic.co/subscriptions (click on "Expand all features")

The "Elastic Cloud" version automates a lot of the tasks involved in running a cluster, but it's not fully managed - there are still admin tasks that somebody will have to perform.

The open source version probably gives us most of what we need for Filebeat/ElasticSearch (needs to be confirmed) ... the Elastic Cloud version of Kibana has a lot of features that aren't in the Open Source version.

"Elastic Cloud" can be purchased directly from Elastic.co or via the AWS Marketplace. If purchased direct, it can be deployed on either AWS or GCP. If purchased via AWS Marketplace, only AWS is available (no surprise) ... the nice thing is that the billing is integrated into the AWS bill. I'm only provisioning a tiny install so far, so I haven't evaluated costs - it's billed hourly on AWS Marketplace based on the size and number of servers used in the cluster.

The Open Source version of the Elastic Stack can be set up manually, and is available in many Linux distributions. Setting up a cluster is a fair amount of effort though. If the volume of data we need to store gets really large, it might be more economical to go this route.

AWS has a competing hosted ElasticSearch offering: https://aws.amazon.com/elasticsearch-service/ ... and is sponsoring an alternative open source distribution to the Elastic.co commercial version: https://opendistro.github.io/for-elasticsearch/

There are also other paid hosted services, eg. https://logz.io/

reporting service: generate mock data to unblock dashboard work

Using testground to test Bitswap in varying leech / seed combinations

There are a couple of test plans for testing Bitswap performance:

These are good real-world tests that can be used to determine if a change to Bitswap is safe to release.

As mentioned in the above test plans, for Bitswap we are most concerned about

overall transfer time
overall data transfer (ie how much unique vs duplicate data is transferred)

It would be helpful to also run more focused tests that isolate the effect of changes, in particular to benchmark each combination of L leeches pulling from S seeds:

1 leech pulling from
- 1 seed
- 2 seeds
- 4 seeds
- 8 seeds
- 16 seeds
2 leeches pulling from
- 1 seed
- 2 seeds
- 4 seeds
- 8 seeds
- 16 seeds
4 leeches ...
8 leeches ...
16 leeches ...

For these tests we would not be concerned with connection management - all nodes would be connected directly to each other.

Conceptualise GitHub automation layer

I regard this as a mux router for GitHub webhook events. Apparently nothing like this exists (I only checked go-land).

In essence, it will be a rulebook (defined in configuration) that enumerates rules like: “on a new commit on go-ipfs/master, trigger this test plan”. Feels a bit like GitHub Actions.

Actually, at the bare minimum we should define a series of commands that befriended committers will trigger via a @testbot mention — e.g. @testbot run <testplan> with <dependency=gitref> <dependency=gitref> <dependency=gitref>

epic: github integration

reporting service: design database schema

A metric is defined by:

metric name (e.g. time_to_get_1mb_file) -- unique
unit: full (e.g. milliseconds), abbreviated (e.g. ms)
direction of improvement: +1, -1 (so that the reporting UI can colour it appropriately when rendering deltas)

A observation of a metric (aka result) should be identified by:

metric name (e.g. time_to_get_1mb_file)
test case name (e.g. add_get_file)
test run (e.g. run number #123)
value: float

Minimal sync service

Create a wrapper around Consul watches for:

adding and removing peers to a test scenario.
leader election via locks.
increment a state counter (e.g. how many nodes have entered state X).

These are important primitives for distributed test cases.

Dashboard concept using mock data

Let's make an HTML dashboard that displays the output from some mock test runs.

Primarily, this iteration of the dashboard will be used to further the design discussion.

We know we will be running tests, and collecting data from the tests, and we'll want to make that data viewable somewhere.

We want to further discuss or design additional responsibilities of the dashboard, which might include:

high-level overview of all the testing
easy pinpointing of improvements or regressions in key metrics
ability to view historical graphs of individual metrics
ability to navigate to different code branches / pull-requests
test plan scheduling management - accept commands to schedule certain test plans to run continuously, periodically, or on an ad-hoc (manual) basis
infrastructure resource management - view and/or change cluster setups

@raulk proposed the following for the initial dashboard concept:

A simple matrix like https://build.golang.org/ would work well for us, for now.
• rows: reported result (e.g. time to first provider record, time to get file, time to find peer); grouped by test case (e.g. add and get 8mb file, find random peer, bitswap transfer no provider lookup, bitswap transfer with provider lookup).
• columns: commits that have been tested. potentially with a second layer of nested columns inside each commit, one per test run — if we run each test plan several times in the canary setup, to account for variance
• each cell contains the numeric value of the metric for that test run.
• cell bg are traffic-light colour coded based on if the metric improved or worsened vs. a baseline.
• we pick the baseline in a dropdown (master, commit X, release Y, etc.)

I'd like to propose that we generate some realistic "mock" data for the first batch of tests that we would like to start running, and use that as a basis to rapidly prototype a design.

Contrast Grafana vs. Kibana

This issue is here to collect experiences of Grafana vs. Kibana so we can make some decisions about which ones to use...

Testground census and commands/actions

The testground contains test plans, which in turn contain test cases. Test plans are black boxes with a cleanly defined environment. They can be developed in Go, Shell scripts, JavaScript, etc. Test plans need to enlist with the testground's census.

Test plans have three actions:

Build.
Schedule.
Run.

Build action

The build action needs to take a dependency manifest as an input. Why? Test plans will be triggered against specific commits of upstream dependencies. A Go test plan would, for example, add replace directives to its go.mod file before it calls go build and creates the Docker container.

$ cat << EOF | testground build <testplan>
github.com/libp2p/go-libp2p=18daff102bb2...
github.com/libp2p/go-libp2p-kad-dht=89ac3dbe740...

^^ This produces a Docker container for the test plan, against those upstream dependencies. Alternatively, we can introduce a binary builder that generates an executable rather than a Docker container, --output=binary.

Schedule action

The schedule action produces the Nomad HCL job descriptions for running a testplan, provided a container ID:

$ testground schedule --container=<container-id> <testplan>[/<testcase>]

Alternatively we could have a local scheduler which produces a shell script to run locally.

Run action

The run action may not belong in test plans, but rather in test runners.

The run action takes a batch of Nomad HCL job descriptions and submits them to the Nomad cluster for execution. TBD.

reporting service: design/implement HTTP API (exposed by coordinator)

Pluggable test plans (via manifests)

Currently, test plans are part of the source tree of the test ground. It would be nice to define a clean, lightweight, abstract API between the testground and testplans, such that test plans could be built and loaded as plugins.

~~https://godoc.org/plugin~~

EDIT: after a conversation with @Stebalien, Go plugins are difficult and unergonomic to use. Instead, we're detaching test plans by introducing a "test plan manifest" -- which is a descriptor of the test plan, the supported build strategies, run strategies, and its test cases.

Example:

name = "dht"
# hashicorp/go-getter URLs, so in the future we can support fetching test plans
# from GitHub.
source_path = "file:${TESTGROUND_REPO}/plans/dht"

[[ build_strategies ]]
type = "executable:go"
go_version = "1.13"
module_path = "github.com/ipfs/testground/plans/dht"
exec_pkg = "exec"

[[ build_strategies ]]
type = "docker:go"
go_version = "1.13"
module_path = "github.com/ipfs/testground/plans/dht"
exec_pkg = "exec"

[[ run_strategies ]]
type = "local:binary"

[[ run_strategies ]]
type = "local:docker"

[[ run_strategies ]]
type = "distributed:nomad"

[[testcases]]   # seq 0
name = "lookup-peers"
instances = { min = 2, max = 100, default = 50 }

  [testcases.params.bucket_size]
  type = "int"
  desc = "bucket size"
  unit = "peers"

[[testcases]]   # seq 1
name = "lookup-providers"
instances = { min = 2, max = 100, default = 50 }

  [testcases.params.bucket_size]
  type = "int"
  desc = "bucket size"
  unit = "peers"

[[testcases]]   # seq 2
name = "store-get-value"
instances = { min = 2, max = 100, default = 50 }
roles = ["storer", "fetcher"]

  [testcases.params.bucket_size]
  type = "int"
  desc = "bucket size"
  unit = "peers"

epic: master test plan

Content-based deterministic canonical build ID

Right now the canonical build ID is computed from the build parameters and upstream dependencies. We need to walk the source tree of the test plan and digest it to add a content-addressed component. Otherwise, we can get cache hits even if the source of the test plan has changed.