I clone this repo when initializing a new VM for Cassandra benchmarking (using YCSB).
This repo is also a toolbox for managing these VMs and this readme file documents hard-won wisdom and best practices I realized.
Usage
Benchmark Workflow
Add an Algorithm
Change the Workload
When Servers Reboot
Gotchas
Other Things
More Best Practices
Future Works
The code blocks below is quite frequently used so I put it at the top of this file. More details are in Benchmark Workflow.
setup.sh
currently support installing YCSB, algorithms like ABD, ABD-Opt, Treas-Opt, and the community version they forked from.
See Add an Algorithm when you want to benchmark a new algorithm.
See Change the Workload when you want to modify existing workloads or add a new one.
These two sections serve as a great tour in this codebase.
sudo su
apt-get install -y git
cd && git clone https://github.com/haochenpan/CCM.git
. ~/CCM/bash/setup.sh ycsb
sudo su
apt-get install -y git
cd && git clone https://github.com/haochenpan/CCM.git
. ~/CCM/bash/setup.sh bsr
sudo su
apt-get install -y git
cd && git clone https://github.com/haochenpan/CCM.git
. ~/CCM/bash/setup.sh abd-machine-time
- On GCP: create new virtual instances for Cassandra servers and YCSB clients. Names could be Cass-all-1, Cass-all-2, Cass-all-3, Cass-all-ycsb-1, Cass-all-ycsb-2, and Cass-all-ycsb-3
- On all remote instances: run appropriate initialization code, see the above section
- On a local computer: clone this repo, modify (or create)
servers.sh
,credentials.sh
,ycsb.sh
, andrun_bench.sh
.servers.sh
format see below - On a local computer: modify
bash/change_seed.sh
to server ips, incontrol.sh
: run it (some algorithms may need extra steps here) - On a local computer: in
control.sh
: (clear cassandra and) start cassandra - On a local computer: in
control.sh
: uploadycsb.sh
to all YCSB clients (through prepare_ycsb) - On a server instance: check nodes join and load the table schema (see below)
- On a local computer or a remote controller: run
btest.sh
- see the second last step and below about the remote controller
- If use vnStat, in
control.sh
, clear and start vnStart (through misc) - On GCP: upload
servers.sh
,credentials.sh
, andrun_bench.sh
to a remote controller (andbenchmark.sh
for the first time) - On the remote controller instance:
nohup . run_bench.sh &
As you can see, for the most of the time, we are working with control.sh
.
I feel that I'd better make some short commands readily available, so they are commented out lines in control.sh.
abd_cluster=(
35.231.111.138
34.73.116.123
34.74.107.157
35.231.193.238
35.237.236.187
)
abd_cluster_ycsb=(
35.243.215.174
34.74.192.43
35.227.30.246
)
create a keyspace and the corresponding schema of an algorithm
CREATE KEYSPACE ycsb WITH REPLICATION = {'class' : 'SimpleStrategy',
'replication_factor': 3};
CREATE KEYSPACE ycsb WITH REPLICATION = {'class' : 'SimpleStrategy',
'replication_factor': 5};
community
CREATE TABLE ycsb.usertable ( y_id varchar primary key, field0 varchar);
abd*
CREATE TABLE ycsb.usertable ( y_id varchar primary key, field0 varchar,
tag varchar);
treas*
CREATE TABLE ycsb.usertable ( y_id varchar PRIMARY KEY, field0 varchar,
field1 varchar, tag1 varchar,
field2 varchar, tag2 varchar,
field3 varchar, tag3 varchar,
field4 varchar, tag4 varchar,
field5 varchar, tag5 varchar);
oreas*
CREATE TABLE ycsb.usertable( y_id varchar PRIMARY KEY, field0 varchar,
field1 varchar, tag1 bigint,
field2 varchar, tag2 bigint,
field3 varchar, tag3 bigint);
treas2
CREATE TABLE ycsb.usertable( y_id varchar PRIMARY KEY, field0 varchar,
field1 varchar, tag1 varchar,
field2 varchar, tag2 varchar,
field3 varchar, tag3 varchar);
generic
CREATE TABLE ycsb.usertable ( y_id varchar PRIMARY KEY, field0 varchar,
z_value int, writer_id varchar);
sbq
CREATE TABLE ycsb.usertable ( y_id varchar PRIMARY KEY, field0 varchar,
ts int);
causal-3
CREATE TABLE ycsb.usertable ( y_id varchar PRIMARY KEY, field0 varchar,
vcol0 int, vcol1 int, vcol2 int,
sendfrom int);
causal-5
CREATE TABLE ycsb.usertable ( y_id varchar PRIMARY KEY, field0 varchar,
vcol0 int, vcol1 int, vcol2 int,
vcol3 int, vcol4 int, sendfrom int);
shoud have a similar codebase, i.e. in ~/CCM, need to have:
|- setup/ -- id
|- data/
|- benchmark.sh
|- credentials.sh
|- run_bench.sh
|- servers.sh
In setup.sh
, follow the case switch pattern in function install_cass
to add a case entry, so that a command like . setup.sh my_algorithm
on terminal would pick up this case.
You may also need to add a table schema for this algorithm.
In folder ycsb
, define a new YCSB workload.
In bash/ycsb.sh
, you can override some behavior of the workload by changing variable parameters.
- run
rm ~/.ssh/known_hosts
when server ip changes
- perform
chmod 400 id
on SSH private key for the first time. - In Ubuntu 14.04, sometimes foreground jobs (e.g. run_bench.sh, btest.sh) hang in ssh sessions but not background jobs (i.e. use nohup ... &).
- download from the controller to local:
scp -i ./setup/id -r panhi_bc_edu@server_ip:VMCM/data ./data
- now we do ssh-keygen -m PEM -f id -C root on MacOS to generate SSH keys
- the use of google cloud console & use a command to populate an instance
- project wide ssh key
control.sh
-> python + bash
ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected] ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected] ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected] ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected] ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected]
ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected] ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected] ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected]
~/Syrupy/scripts/syrupy.py --interval=5 --poll-command='.*java' & vnstat -l
./cqlsh 10.142.0.2 -e "CREATE KEYSPACE ycsb WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 7};" ./cqlsh 10.142.0.17 -e "CREATE KEYSPACE ycsb WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 7};" ./cqlsh 10.142.0.17 -e "CREATE TABLE ycsb.usertable( y_id varchar PRIMARY KEY, field0 varchar, field1 varchar, tag1 bigint, field2 varchar, tag2 bigint, field3 varchar, tag3 bigint);" ./cqlsh 10.142.0.2 -e "CREATE TABLE ycsb.usertable( y_id varchar PRIMARY KEY, field0 varchar, field1 varchar, tag1 varchar, field2 varchar, tag2 varchar, field3 varchar, tag3 varchar);" ./cqlsh 10.142.0.31 -e "CREATE TABLE ycsb.usertable ( y_id varchar primary key, field0 varchar);"