confluentinc / ducktape Goto Github PK

View Code? Open in Web Editor NEW

12.0 161.0 93.0 1.47 MB

System integration and performance tests

Python 98.14% CSS 0.25% HTML 1.44% Dockerfile 0.17%

safe-settings

ducktape's Introduction

Distributed System Integration & Performance Testing Library

Overview

Ducktape contains tools for running system integration and performance tests. It provides the following features:

Isolation by default so system tests are as reliable as possible.
Utilities for pulling up and tearing down services easily in clusters in different environments (e.g. local, custom cluster, Vagrant, K8s, Mesos, Docker, cloud providers, etc.)
Easy to write unit tests for distributed systems
Trigger special events (e.g. bouncing a service)
Collect results (e.g. logs, console output)
Report results (e.g. expected conditions met, performance results, etc.)

Documentation

For detailed documentation on how to install, run, create new tests please refer to: http://ducktape.readthedocs.io/

Contribute

Source Code: https://github.com/confluentinc/ducktape
Issue Tracker: https://github.com/confluentinc/ducktape/issues

License

The project is licensed under the Apache 2 license.

ducktape's People

Contributors

Stargazers

Watchers

ducktape's Issues

Running ducktape with no parameters returns an error

{code}
Chens-MacBook-Pro:kafka gwen$ ducktape
Traceback (most recent call last):
File "/usr/local/bin/ducktape", line 9, in
load_entry_point('ducktape==0.3.2', 'console_scripts', 'ducktape')()
File "/Library/Python/2.7/site-packages/ducktape-0.3.2-py2.7.egg/ducktape/command_line/main.py", line 139, in main
tests = loader.discover(args.test_path)
File "/Library/Python/2.7/site-packages/ducktape-0.3.2-py2.7.egg/ducktape/tests/loader.py", line 108, in discover
assert type(test_discovery_symbols) == list, "Expected test_discovery_symbols to be a list."
AssertionError: Expected test_discovery_symbols to be a list.
{code}

I'd expect a usage printout (same as ducktape -h)

test discovery problem due to module not being in sys.path

@ewencp and @Ishiihara have both found ducktape unable to discovery tests when running in muckrake.

The root of the problem was that <path_to_muckrake> was not in sys.path, so attempts to import muckrake modules failed. It's easy to work around this by explicitly setting PYTHONPATH to include <path_to_muckrake> but this would be pretty annoying for users.

Ewen's suggested approach seems simple and reasonable:
(in loader.py) maybe something like walking up the directories until we stop seeing init.py files (do this once for each input path), then add whatever set of paths result from that to sys.path?

multiple test methods per class

test writers should be able to write test classes that have multiple test methods

With this in place, the examples we present to the community can be both cleaner and look more familiar to those with junit background.

Mirror binary packages for dependencies in S3

Bringing up a fresh cluster is a lot less frustrating if you don't have to rely on downloads from third party systems. We can get reliably fast (and free) bandwidth by storing copies of the dependencies we need in S3 (e.g. JDK, CDH). This only applies to things that don't already have mirrors, e.g. all Ubuntu system dependencies already use fast mirrors inside EC2.

Update ssh to return boolean

If we call ssh(cmd, allow_fail=True), it's useful to return a boolean indicating whether cmd succeeded or failed.

This really amounts to updating _ssh_quiet to return a boolean

Automate release process

Can be an extremely simple script to begin with, but this will document the release process as well as avoid human error.

See for example
https://github.com/mitsuhiko/jinja2/blob/master/scripts/make-release.py

Current approach to cleaning up nodes doesn't guarantee a clean test run

This is potentially a pretty critical bug. Consider a test with three services: zookeeper, kafka, and a kafka client. If a previous run of the test failed and we didn't get to clean up properly, we would expect the next round to clean up before starting because we have each service do a _stop_and_clean before starting services.

However, what we actually need is for all nodes to be cleaned before we start any services. The reason is that, in the above example, we'll start the second test by making sure the zookeeper nodes are clean (killing the leftover processes, deleting their data), and then start them up. However, the Kafka nodes may still be running. Since we're also not doing any sort of port randomization, the old Kafka brokers will reconnect to the new Zookeeper nodes, and may do things like write ZK nodes in the mean time.

Cleaning up only the nodes we'll need is probably difficult because tests have no easy way to know the number of nodes they'll need. One option for fixing this bug is to just run the cleaning process on all nodes before doing anything else in a test. Another option is requiring each test to have an annotation indicating the number of nodes it needs (possibly dynamically determined after instantiation for tests that can be run on different cluster sizes) so that those nodes can all be preallocated and cleaned.

And of course all of this depends on being able to run a generic cleaning process to at least make sure that all processes are shutdown so they can't do anything bad before each individual service can do things like clean up old state before running on a node. This works ok for now because we have only simple java processes that we can find easily, but it's possible we should have a way for each service to register a cleanup method that will be run regardless of whether the current test will use that service, truly ensuring all old state is cleaned up.

Add an integration test for using AvroMessageReader and AvroMessageFormatter in kafka-console-producer and kafka-console-consumer

system stats tooling

It would be helpful to gather system stats on the driver and slave nodes while tests are running and make this visible in the test summaries.

Naarad is a possibility since has some tooling for this sort of thing.

v0.3.3 can leak cluster nodes

An error in Service.free can cause cluster nodes to not be fully freed.

Cause: v.0.3.3 introduced logic which actually removes nodes from a list while iterating through the list.

Use simple HTTP requests instead of time.sleep to more reliably detect when services are ready

Any services that provide HTTP (kafka-rest, schema-registry) can provide better implementations of start() by actually making requests against the service to check whether it has finished coming up. This will be a lot less flaky since we can specify much larger timeouts before we fail the test, but can detect that the service is up much earlier.

This should be structured as a helper on RemoteAccount that looks something like wait_for_http_service(port, timeout, path='/').

setUp -> setup and tearDown -> teardown

This is a relatively minor aesthetic thing, but it would be good to get rid of the camel-case on these methods.

Add support for multiple versions of projects

Currently we just checkout the current version of projects. We'll need to be able to write tests that have different versions of different projects running concurrently so we can test upgrade paths. To support this, we'll need to have builds of all the different versions and both services and tests will need to be parameterizable by the versions of the different services involved in the test.

Improve robustness of Service.stop, Service.clean

Implementation of Service.stop and Service.clean should have each call to node.stop etc in its own try/except block so that failure of one call does not prevent calls to stop/clean on other nodes.

rethink min_cluster_size

At best, min_cluster_size is an imperfect heuristic

Allowing multiple test methods per test class and allowing services to register themselves with test_context outside of the constructor make this inaccurate in general. Maybe this be ok since min_cluster_size just provides a way to fail fast.

Consider adding an annotation? Or?

Difficult to discover bugs in cleanup code

When developing a test, I started noticing weird behavior and eventually tracked it down to the fact that my service had not been cleaning up properly. There were still instances of it running when I ran the test again and it caused the test to break (they were auto-registering Kafka topics when the test wasn't expecting them to).

It took me awhile to discover this because even though there was a log indicating that the stop method was failing, it was buried in the info/debug logs and at the WARNING level:

[WARNING - 2015-08-14 10:20:39,176 - service_registry - stop_all - lineno:37]: Error stopping service <CopycatStandaloneService: num_nodes: 1, allocated: True, nodes: ['worker3']>: pids() takes exactly 1 argument (2 given)

Any reason this is only at the WARNING level? I think the obvious thing to do is log it at the error level, make sure it gets logged to the console (not just in the log files that I'd have to look at separately).

mechanism for data collection

current tests just pipe collected performance (etc) data to logs.

ducktape should pipe it (json is probably fine) to its own file so it's easily discoverable when regression checks are added

Gracefully handle keyboard interrupt

We want cleanup to take place, but no more tests to run.

e.g. this comment from #43
You might want the exceptions you're catching here to be BaseException. There are a few that won't be caught by Exception. See https://docs.python.org/2/library/exceptions.html#exception-hierarchy

Actually, KeyboardInterrupt in particular might need careful thought and handling -- it's unlikely we want to kill the process immediately in most cases, but we'll probably need to coordinate a couple of things. When shutting down a test, we'd probably like for that shutdown to finish (so we need to handle the exception). But then the test runner ideally wouldn't finish running any tests that were left. Might just want to change these to BaseException now and file a ticket for graceful ctrl-c handling.

BackgroundThreadService: don't set _workers to None in wait()

Occasionally this can lead to annoying NPE. Better to make the list of workers is empty

Gather log files etc even if --no-teardown flag is used

Existing behavior omits the step where logs are collected from remote machines if --no-teardown flag is supplied.

In my experience however, this behavior is somewhat annoying in particular since you'd only really use ---no-teardown while debugging

Improve code reuse in Failover tests

There is a lot of parallel structure in the various Failover tests which can probably be consolidated.

See PR #13

add python module to test result path

Currently, results for a test are stored in a directory organized thusly:

results/latest/<TestClass>/<test_method>

This however requires that TestClass names must be unique. In order to avoid surprising behavior, it's probably better to have:

results/latest/<test_module>/<TestClass>/<test_method>

test runner - per-test timeout mechanism

The test runner should have some ad-hoc mechanism for guessing that a test is in an infinite loop, and terminating the test if has run too long

Add test of REST proxy consumers behind load balancer

REST proxy instances should work reliably when the initial access point is behind a load balancer. This may require support for advertised.host.name for REST proxy instances.

test output - include cluster usage

It would be useful to have an easy way to examine the number of nodes used for each test.

Add command-line hook to specify type of test runner

Of course, this depends on the existence of another type of test runner.

Right now 'SerialTestRunner' is the only option, but it should be possible to choose between serial and parallel.

Automatic validation of Camus performance output

These tests are currently called "Performance" tests, but they really just test the basic functionality. They should be validating the output of the Hadoop job to check for at least a few things:

That we're even running where expected! We previously discovered one was running on a local runner because of a misconfiguration.
That the expected records were handled, which we should be able to do by picking out the right counters from the job output.
2.a. Even better would be if we could read the output file from HDFS and validate its contents.
The test intentionally triggers an error by publishing some null data, which Camus has to ignore. We want to see the report of this specific exception but still see the job finish.

The last item makes this especially important -- we should probably move the output to debug level since the exception output is expected but misleading.

While implementing this, it'd probably also be worthwhile to intentionally trigger some other type of error and make sure we're detecting unexpected exceptions from the job, i.e. that we can properly detect if the job did not finish as expected.

Setup Hudson CI server

Not directly ducttape related, but we need a CI server for our repositories. It can live on the same server that kicks off nightly ducttape jobs. It should do builds for all our repositories on push, as well as dependent repositories to warn us of compatibility issues.

Add command line hook to control log level for console output

When debugging tests, it's a lot easier if you can just add a flag that turns all the output to DEBUG (and include all output from all services).

Refactor RegisterSchemasService to share logic with PerformanceService

Per Ewen's comment in PR #4 -

This code looks nearly the same as PerformanceService, which I refactored so it's 4 subclasses wouldn't need to reimplement this. Maybe we should refactor further so it's available more generally? Maybe called SynchronousWorkerService since the problem it's addressing is when the service's task is synchronous and so needs to be put on a background worker thread. It could just be included in the service.py file.

min_cluster_size error can be a bit nicer

Currently we have:
There aren't enough available nodes to satisfy the resource
request. Your test has almost certainly incorrectly implemented its
min_cluster_size() method.

Will be nice to include:

Expected cluster size
Where is it actually configured (in case I want to fix my incorrect setting)
Maybe even justification ("We need 3 brokers, 1 zk, 1 runner")
Actual cluster size
How did you get the actual size (in case there is a bug and you are miscounting)
Output of Vagrant Status, so we will see who is right
"Your test has almost certainly incorrectly implemented its
min_cluster_size() method" should be changed to "Either your test incorrectly implemented its
min_cluster_size() method or you did not start enough Vagrant workers for the test to execute."

Also, kittens and unicorns :)

add synchronize decorator

Minor, but it would be nice if Python's threading module had a decorator for synchronization.

Consider implementing one and adding to a util module.

run parametrized tests with specific parameters

Now that we've added @parametrize and @matrix, we can concisely define a large group of related tests.

e.g.

@matrix(x=[1, 2, 3])
@matrix(y=['a', 'b', 'c']
def some_test(x, y):
    # ...

But when running with ducktape, if we run some_test specifically, it will now run 9 different tests. It would be very useful for poor test developers to be able to run some_test with specific parameters.

For example, if we wanted to run some_test with x = 1 and y = 'a', we might do something like:
ducktape tests/the_test.py::TheTest.some_test --args {'x': 1, 'y': 'a'}

Add decorators to control ordering

For some tests it's useful to specify ordering -- it makes more sense to validate simple operations before more complex ones. It would be useful to have 2 types of annotations:

Specify dependencies which the test runner respects, i.e. @dependency(EverythingRunsTest) to annotate more complicated tests. This can also speed up failing test runs by immediately failing all dependent tests (or report them as SKIPPED).
Indicate a levels of testing, e.g. @smoke vs. @exhaustive. We'd need to think about the levels (or make them generic as @testlevel(1)) but the idea is that we could easily run a smaller set of tests that still give good coverage, but finish much faster.

Allow tests to specify files to be downloaded and included in output

We can specify fixed log files for a service that should or should not be downloaded, but sometimes there are other files involved in tests that are defined dynamically and the service isn't directly aware of. For example, the Copycat tests echo to an input file, and then the sink Copycat worker write its output to another file. But the Copycat service never knows about these files directly since they are specified in the connector configuration.

It would be helpful for tests to somehow be able to specify additional files (per service) that should be downloaded when the test completes. This would, for example, help debug what is going wrong when the test fails since the two files are supposed to be identical.

Make shared util for querying ZK

Added convenience method for querying ZK

Ewen Cheslack-Postava
ewencp added a note 13 days ago
Do we think reading data out of ZK is going to be a common pattern? Maybe not worth it for this patch, but if we end up with more copies of code that looks like this we might want to refactor it into a utility that you only have to pass a node, ZK command and a regex for the output to match.

Example without convenience method:

def get_leader_node(self, topic, partition=0):
+        """ Get the leader replica for the given topic and partition.
+        """
+        cmd = "/opt/kafka/bin/kafka-run-class.sh kafka.tools.ZooKeeperMainWrapper -server %s " \
+              % self.zk.connect_setting()
+        cmd += "get /brokers/topics/%s/partitions/%d/state" % (topic, partition)
+        self.logger.debug(cmd)
+
+        node = self.nodes[0]
+        self.logger.debug("Querying zookeeper to find leader replica for topic %s: \n%s" % (cmd, topic))
+        partition_state = None
+        for line in node.account.ssh_capture(cmd):
+            match = re.match("^({.+})$", line)
+            if match is not None:
+                partition_state = match.groups()[0]
+                break

Test report/output should include test description

Ideally test descriptions should be easily discoverable in the test reports.

ducktape could without much trouble provide a mechanism for grabbing test descriptions and adding to the reports. Simplest most natural implementation I can think of would be to grab docstrings from test methods.

Add command-line hook to specify type of cluster

Right now command_line.main fires up a VagrantCluster, but it should be possible to choose.

configurable test output location

User should be able to specify where the output of one run of ducktape gets writting (e.g. ./results or <path_to_all_test_results>)

A sensible default would go in command_line.config

Copying logs causes exception if a service was never started

For example, due to a failure of a different service during a test, this test didn't start the ConsoleConsumer service. Therefore it never allocated any nodes and triggered this exception:

Traceback (most recent call last):
  File "/Users/ewencp/confluent/ducktape.git/ducktape/tests/runner.py", line 143, in teardown_single_test
    self.current_test.copy_service_logs()
  File "/Users/ewencp/confluent/ducktape.git/ducktape/tests/test.py", line 70, in copy_service_logs
    for node in service.nodes:
AttributeError: 'ConsoleConsumer' object has no attribute 'nodes'

Add required min # of worker method to tests

As suggested here: #6 (comment) Forcing this to be implemented would allow sanity checking before we even start the test. It's not perfect since these values can fall out of date, but it's better than just failing mid test and easier than having to guess when you need to run a test you aren't very familiar with.

templates.py should search up the directory tree for templates directory

This will make it possible for sub-packages to use template files without having to duplicate template files within subpackages.

I ran into this issue while creating a sub-package to hold a related group of services. In order to use template files, I had to then duplicate some of the files into another 'templates' folder within the subpackage.

add --cleanup (or similar) option

For test developers using the --no-teardown option, it is very useful to have a way to put cluster nodes into a known clean state.

Might look something like
ducktape --cleanup

This would exercise sort of a nuclear option and clear everything on all nodes. One question is how this should work given that what a node even means depends on the Cluster type.

This is related to #19

Looks like ducktape measures time inconsistently

The test output says:

[INFO:2015-06-24 19:58:29,198]: SerialTestRunner: kafkatest.tests.benchmark_test.Benchmark.test_three_producers_async: running test 10 of 14
[INFO:2015-06-24 19:58:29,199]: SerialTestRunner: kafkatest.tests.benchmark_test.Benchmark.test_three_producers_async: setting up
[INFO:2015-06-24 19:59:03,804]: SerialTestRunner: kafkatest.tests.benchmark_test.Benchmark.test_three_producers_async: running
[INFO:2015-06-24 19:59:29,899]: SerialTestRunner: kafkatest.tests.benchmark_test.Benchmark.test_three_producers_async: PASS
[INFO:2015-06-24 19:59:29,900]: SerialTestRunner: kafkatest.tests.benchmark_test.Benchmark.test_three_producers_async: tearing down
======================================================================================================================================================================================================
test_id:    2015-06-24--003.kafkatest.tests.benchmark_test.Benchmark.test_three_producers_async
status:     PASS
run time:   29.359 seconds

If we started at 19:58:29 and ended at 19:59:29, it took a bit longer than 29.359 to run.

I'd expect run time to match wall time, and I suspect it was longer than even the minute I can infer from the log messages.

Add ParallelTestRunner

This should happen after the basic framework has congealed a bit more.

update html report to include test data

Some tests now return data (e.g. various performance tests)

This data is now available in Result objects, and should be visible on generated html reports

Add timeouts to services

Services need to handle errors more gracefully. In some cases it's possible for run() or wait() to block indefinitely because something goes wrong but the underlying process doesn't die (e.g. it just loops forever trying to reconnect). We need timeouts on all these service methods, which tests will have to come up with reasonable, situation-specific values for when they call those methods.

As a result, we'll also need to provide cleanup code when something fails. We can catch timeout exceptions and provide a standard cleanup() method that services and tests can implement to carefully attempt to cleanup anything that might still be running.

    if os.path.exists(latest_test_dir):  # Problem: in this case, os.path.exists -> False
        os.unlink(latest_test_dir)
    os.symlink(results_dir, latest_test_dir)  # Now this fails because latest_test_dir was not unlinked