twitter / rpc-perf Goto Github PK
View Code? Open in Web Editor NEWA tool for benchmarking RPC services
License: Apache License 2.0
A tool for benchmarking RPC services
License: Apache License 2.0
Given size/num are now set for 4 different parameters, I wonder if it makes sense to reconsider how workload is specified- explicitly, does it make sense to create separate section about the dataset (key space, key generation scheme, value size etc), and load (rate, arrival pattern for the future). @brayniac what do you think?
Originally posted by @thinkingfish in #167
Currently log level is fixed. This should be turned into an option.
src/buffer.rs:27:5: 32:6 warning: you should consider adding a `Default` implementation for `buffer::Buffer`, #[warn(new_without_default)] on by default
src/buffer.rs:27 pub fn new() -> Buffer {
src/buffer.rs:28 let mut buffer = Vec::<u8>::new();
src/buffer.rs:29 buffer.resize(4, 0);
src/buffer.rs:30
src/buffer.rs:31 Buffer { buffer: buffer }
src/buffer.rs:32 }
src/buffer.rs:27:5: 32:6 help: for further information visit https://github.com/Manishearth/rust-clippy/wiki#new_without_default
CI builds are reporting failures for 273.3 due to the following error:
$ cargo fmt -- --write-mode=diff
This version of rustfmt is deprecated. Use rustfmt-nightly. Use --force to run deprecated rustfmt.
usage: rustfmt [options] <file>...
Options:
-h, --help show this message
-V, --version show version information
-v, --verbose print verbose output
-f, --force force rustfmt to actually run
--write-mode [replace|overwrite|display|diff|coverage|checkstyle]
mode to write in (not usable when piping from stdin)
--skip-children
don't reformat child modules
--config-help show details of rustfmt configuration options
--dump-default-config PATH
Dumps the default configuration to a file and exits.
--dump-minimal-config PATH
Dumps configuration options that were checked during
formatting to a file.
--config-path [Path for the configuration file]
Recursively searches the given path for the
rustfmt.toml config file. If not found reverts to the
input file path
--file-lines JSON
Format specified line ranges. See README for more
detail on the JSON format.
The command "cargo fmt -- --write-mode=diff" exited with 1.
cache.2
store build cache
0.01s
9.76schanges detected, packing new archive
.
uploading archive
Done. Your build exited with 1.
On client systems that have multiple NICs/IP addresses, it would be useful to be able to specify which local interface (either ip address or interface name, e.g. eth0) to use for the connection to the server.
An example...I have a 10Gb NIC and a 25Gb NIC on my client system. It is not clear which will be used by rpc-perf.
Allow users to test their Thrift services with rpc-perf. This requires having a way for rpc-perf to understand the Thrift IDL potentially through a TOML config or parsing the IDL directly. Consideration needs to be given to how to generate the actual requests in the workload lib.
Currently the details of each protocol leak into many areas that shouldn't really care such as connection management and workload buffer generation. An example of this tight coupling leading to problems when generating a workload can be found here: each protocol needs to be concerned with types that it probably has no notion of.
I propose changing the structure. A protocol needs to do these things:
This can be balled up into a trait
with the relatively simple interface:
trait Protocol {
fn generate_request(&mut self) -> Vec<u8>;
fn parse_response(&mut self, bytes: &[u8]) -> ParsedResponse;
}
mod thrift {
fn parse_config(config: Toml) -> Box<Protocol> // Could also give a concrete type and box later...
}
From there, each module will implement its own protocol specific parsing, types, etc. This moves the coupling of the protocols to only a single point: the initialization of the parsing when a protocol must be selected.
Benefits include:
swaisbrot@tw-mbp-swaisbrot:~/workspace/rpc-perf% multirust run nightly cargo build --release
Compiling histogram v0.3.2
Compiling bitflags v0.1.1
Compiling winapi-build v0.1.1
Compiling mpmc v0.1.0 (https://github.com/brayniac/mpmc#0cc33c9d)
Compiling winapi v0.2.4
Compiling regex-syntax v0.2.2
Compiling rustc-serialize v0.3.16
Compiling libc v0.1.12
Compiling getopts v0.2.14
Compiling shuteye v0.0.5
Compiling lazy_static v0.1.15
Compiling clock_ticks v0.0.5
Compiling winapi v0.1.23
Compiling crc v1.2.0
Compiling log v0.3.3
Compiling advapi32-sys v0.1.2
Compiling nix v0.3.9
Compiling kernel32-sys v0.2.1
Compiling memchr v0.1.6
Compiling bytes v0.2.11 (https://github.com/carllerche/bytes?branch=v0.2.x#d9eac250)
Compiling request v0.1.0 (file:///Users/swaisbrot/workspace/rpc-perf)
Compiling aho-corasick v0.3.4
Compiling parser v0.1.0 (file:///Users/swaisbrot/workspace/rpc-perf)
Compiling libc v0.2.2
Compiling slab v0.1.2
Compiling time v0.1.34
Compiling rand v0.3.11
Compiling mio v0.4.3 (file:///Users/swaisbrot/workspace/rpc-perf)
Compiling simple_logger v0.4.0
Compiling ratelimit v0.1.8
Compiling workload v0.1.0 (file:///Users/swaisbrot/workspace/rpc-perf)
Compiling regex v0.1.41
Compiling toml v0.1.23
Compiling rpc-perf v0.1.0 (file:///Users/swaisbrot/workspace/rpc-perf)
Assertion failed: (Ty == resolve(Ty->getRef()) && "type was not uniqued, possible ODR violation."), function getOrCreateTypeDIE, file /Users/rustbuild/src/rust-buildbot/slave/nightly-dist-rustc-mac/build/src/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp, line 713.
Could not compile `rpc-perf`.
swaisbrot@tw-mbp-swaisbrot:~/workspace/rpc-perf% multirust run nightly rustc --version
rustc 1.7.0-nightly (b4707ebca 2015-12-27)
We have found that the total heap being actively used in cache may have a significant impact on cache performance. For testing performance with different cache heap sizes, it would be helpful to make it so that the total data set size being queried by rpc-perf clients is easily configurable.
Typical clients provide the ability to set a timeout on a request, by adding timeout support to rpc-perf we can more closely emulate client behaviors for slow requests by abandoning them.
I propose the timeout is implemented at a global level (eg - a single timeout setting applies to all requests). Until we have pipelining support, the behavior should be to trash and re-establish the connection when a timeout occurs.
The install instructions currently indicate to use 'cargo install' but this version lags behind master, and may contain bugs fixed since then. Revert the readme to recommend building from master
Let's add integration tests. Initial thought is to have a mock server for each protocol. Open to other ideas on this.
Include information on multirust, or a link to official rust install documentation. Let's also specify which rust versions are targeted
With my current set up, it seems like the open connections stat (calculated as socket_create_count - socket_close_count) is only reflecting the open connections for a single thread. For example, with 3 threads and 50 connections per thread, I am seeing in the rpc perf logs:
2018-08-01 21:07:36.410 INFO [rpc-perf] Config: Threads: 3 Poolsize: 50
...
2018-08-01 21:09:37.449 INFO [rpc-perf] Window: 1
2018-08-01 21:09:37.449 INFO [rpc-perf] Connections: Ok: 0 Error: 0 Timeout: 0 Open: 50
I am seeing on my server that all of the expected connections are there. So it would appear that rpc perf is creating all of the connections it is configured with, but the stat is counting them incorrectly. For reference, here is the config file I am using:
[general]
threads = 3
connections = 50
windows = 60
duration = 60
request-timeout = 200
connect-timeout = 500
[[workload]]
name = "get_counter"
method = "get"
rate = 800
[[workload.parameter]]
style = "random"
size = 25
num = 100000000
regenerate = true
[[workload]]
name = "get_tweet"
method = "get"
rate = 800
[[workload.parameter]]
style = "random"
size = 24
num = 40000000
regenerate = true
[[workload]]
name = "set_counter"
method = "set"
rate = 100
[[workload.parameter]]
style = "random"
size = 25
num = 100000000
regenerate = true
[[workload.parameter]]
style = "random"
size = 8
num = 10
regenerate = false
[[workload]]
name = "set_tweet"
method = "set"
rate = 100
[[workload.parameter]]
style = "random"
size = 24
num = 40000000
regenerate = true
[[workload.parameter]]
style = "random"
size = 1024
num = 10
regenerate = false
It is impossible to use the memcache "version" command from rpc-perf
Config used:
[general]
threads = 1
connections = 1
duration = 1
windows = 300
protocol = "memcache"
tcp-nodelay = false
ipv4 = true
ipv6 = false
database = 0
[[workload]]
name = "version"
method = "version"
rate = 1
Expected result: rpc-perf runs and executes the version command against memcache at a rate of 1 qps
Actual result:
2018-02-06 20:45:20.794 INFO [rpc-perf] rpc-perf 2.1.0-pre initializing...
2018-02-06 20:45:20.794 INFO [rpc-perf] feature: asm: disabled
2018-02-06 20:45:20.794 ERROR [rpc-perf] malformed config: 'parameter' must be an array
Currently we use AtomicUsize
within the datastructures crate because that was the only available atomic at the time. Rust 1.34 is due to stabilize atomics with different widths. Commit: rust-lang/rust@14b36fb
If we can make the datastructures generic across the different atomic types, users can better tailor the memory consumption of the datastructures by choosing to use smaller atomics - eg, a histogram of AtomicU32
would be much smaller on x86_64 than one using AtomicUsize
.
This is blocked until 1.34 is released as stable.
client.rs fails to reset t0 on reconnect. This may result in misreported connect latencies under some circumstances
As noted in #61 we currently leak protocol details, such as the need for a "stop" at the end of a struct to the user through the config. If we make the Type
and ThriftType
recursive, we may be able to represent the structure more accurately in the config.
Our cargo build profiles are slightly dated and explicitly capture defaults - for instance, the debug profile we define is just the defaults. Reduce the config to the minimum that is necessary for the behaviors we want.
the current warmup behavior is to spin endlessly on the warmup workload(s) until the target hit rate is attained. however, there is the possibility that the target hit rate will never be reached, and we should detect this condition by monitoring the hit rate to see if it levels off over some number of windows.
Two issues here. We don't report the rpc-perf version early enough in the output. This results in it not reporting if there is an error parsing options or config. We should also add a --version
flag for completeness.
Upon quick sifting I was a bit confused in the usage section. After re-reading more slowly, my interpretation is:
From chat, I believe this was correctly interpreted, though turns out there's some stuff that's only available at the cmdline.
In any case, here's a suggested phrasing that I think would have made me not re-read:
rpc-perf can be configured through a combination of a configuration file (in TOML format), and command line arguments. All settings are available in the configuration file, and most are available as command line arguments. Where a setting is specified in both places, the command line takes precedence.
The settings that are only available in the configuration file pertain to XXX, and aren't available at the command line because of YYY.
The general intent is to define profiles, or default configurations, in configuration files and use command line arguments for quick ad-hoc iteration.
rpc-perf/rpc-perf/src/config/mod.rs
Line 160 in fbe2748
This can overflow when a keyspace config has the value for length greater than 19. It then gives a cryptic error that is hard to debug. I think someone would want to be able to create a workload that has larger keys then 19 bytes.
While testing using thrift protocol, we discovered that rpc-perf doesnot respect the rate param specified in config file
#61 left trace logging enabled
src/lib.rs:150:16: 153:34 warning: called `map(f).unwrap_or(a)` on an Option value. This can be done more directly by calling `map_or(a, f)` instead, #[warn(option_map_unwrap_or)] on by default
src/lib.rs:150 let seed = parameter.get("seed")
src/lib.rs:151 .and_then(|k| k.as_integer())
src/lib.rs:152 .map(|i| i as usize)
src/lib.rs:153 .unwrap_or(index);
src/lib.rs:150:16: 153:34 note: replace `map(|i| i as usize).unwrap_or(index)` with `map_or(index, |i| i as usize)`
src/lib.rs:150:16: 153:34 help: for further information visit https://github.com/Manishearth/rust-clippy/wiki#option_map_unwrap_or
src/lib.rs:155:16: 158:30 warning: called `map(f).unwrap_or(a)` on an Option value. This can be done more directly by calling `map_or(a, f)` instead, #[warn(option_map_unwrap_or)] on by default
src/lib.rs:155 let size = parameter.get("size")
src/lib.rs:156 .and_then(|k| k.as_integer())
src/lib.rs:157 .map(|i| i as usize)
src/lib.rs:158 .unwrap_or(1);
src/lib.rs:155:16: 158:30 note: replace `map(|i| i as usize).unwrap_or(1)` with `map_or(1, |i| i as usize)`
src/lib.rs:155:16: 158:30 help: for further information visit https://github.com/Manishearth/rust-clippy/wiki#option_map_unwrap_or
Current delete_channel()
optimistically takes a lock and removes the channel, even if it doesn't exist. This forces checks down into the code using the library if taking the lock is to be avoided. https://github.com/twitter/rpc-perf/blob/master/metrics/src/metrics/mod.rs#L98
delete_channel()
should check if the channel exists and avoid taking action if it does not exist.
delete_channel()
takes a lock and optimistically tries to remove the channel, even if it does not exist. This causes no correctness issues, but does take a lock unnecessarily.
n/a
Something I've run into hacking on #64 is that the workload::Value enum and request::thrift::ThriftValue enum are nearly identical. This results in tree translation here. By using the same tree structure the copy could be avoided. I'm suggesting making Parameter
and Value
the 'tree to rule them all' which can be directly used by things like thrift
for generating a buffer. IMO this makes sense because all the information harvested from the config parsing should be available in this tree, rendering everything else strictly a subset.
A complication is that the build is split out in such a way that rpcperf_workload
depends on rpcperf_request
but the tree is defined in rpcperf_workload
and it results in a cyclic dependency when trying to expose this tree to both modules. Two possible solutions for this are to make yet another module just for that tree or merge rpcperf_workload and rpcperf_request modules. The workload module is a single file so I think merging them is the easiest thing to do.
On the other hand, maybe this is not a good idea altogether for reasons I've not yet contemplated.
Readme references using multirust for rust install, which is now deprecated for rustup. Update instructions in the readme.
A few engineers on our team reported that without specifying the request connect timeout setting, reconnect on timeout has the old behavior without exponential back off. We should make exponential back off the default, since it is the most sensible reconnect policy.
It currently keeps trying to run for a long time.
Add a contributors section to the readme
rename nodelay to tcp-nodelay
The command
parameter has been renamed method
but the config syntax is not updated. Fix config.rs
and main.rs
to bring clarity to the choice in naming here.
As noted in #61 the write_*
methods could be cleaned up by using the WriteBytesExt
from byteorder
. http://burntsushi.net/rustdoc/byteorder/trait.WriteBytesExt.html#method.write_u16
Include a section on typical usage and practices to consider when benchmarking.
Extend rpc-perf to generate random key workloads
connect is a fairly expensive operation for the cache server, at large volume it can easily overwhelm the server thread accepting new connections temporarily, leading to test instabilities. Since it's currently impossible to coordinate different rpcperf instances at a fine granularity, I think we should introduce a few features into the connect behavior to reduce the chance of "reconnect storms":
Document how to contribute to rpc-perf
mux session layer support would be pretty nice. It is also a pretty complex topic considering it adds a session layer to the protocol stack.
I've started some work on codec support for the mux protocol. It can be found here:
https://github.com/bryce-anderson/rust-mux
Currently it uses a custom buffer type which needs to either be expanded upon or replaced. The Buf
type in the bytes
crate is pretty close to what I want but not quite and the project doesn't seem very active.
Why did you include mio source into the rpc-perf repo instead of using a version from crates.io or linking directly to its git repo? Just curious :)
There's value in exposing statistics for rpc-perf itself so we can observe details of the runtime behavior. Some of the things to expose would mirror the output (rate, percentile latencies, etc), but we could also expose things like work queue overflow.
Update the code to make use of ?
instead of the try!()
macro
The --endpoint
CLI option is marked required and the argument parsing does not account for the ability to pass endpoints via the config.
The nightly version loses the buffer contents when the response parses as incomplete
Avoid wildcard use statements
My current use case of rpc-perf with explicitly sized dataset involves first configuring rpc-perf to send write heavy traffic to warm up the cache, then restarting rpc-perf with the workload being tested. It would be nice if we had a "warm up" option that would systematically write all keys, then automatically transition to the configured workload.
Better error messaging when the key size is insufficient to contain num
strings
Add microbenchmarks to the codec crate so that we can do performance qualification of future changes and test for potential optimizations.
Problem: Currently, unknown keys in the config do not cause errors. This may lead to confusion for users as they may not realize their config is invalid or uses features not present in the binary they are using.
Solution: Validate the config does not contain any unknown fields. Error out if it does.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.