Giter Club home page Giter Club logo

ccm's Introduction

Cassandra Cluster Management: Deployment hard-won wisdom & Benchmarking best practices

I clone this repo when initializing a new VM for Cassandra benchmarking (using YCSB).
This repo is also a toolbox for managing these VMs and this readme file documents hard-won wisdom and best practices I realized.

Table of Contents

Usage
Benchmark Workflow
Add an Algorithm
Change the Workload
When Servers Reboot
Gotchas
Other Things
More Best Practices
Future Works

Usage

The code blocks below is quite frequently used so I put it at the top of this file. More details are in Benchmark Workflow.

setup.sh currently support installing YCSB, algorithms like ABD, ABD-Opt, Treas-Opt, and the community version they forked from.

See Add an Algorithm when you want to benchmark a new algorithm.
See Change the Workload when you want to modify existing workloads or add a new one.
These two sections serve as a great tour in this codebase.

sudo su
apt-get install -y git
cd && git clone https://github.com/haochenpan/CCM.git
. ~/CCM/bash/setup.sh ycsb


sudo su
apt-get install -y git
cd && git clone https://github.com/haochenpan/CCM.git
. ~/CCM/bash/setup.sh bsr

sudo su
apt-get install -y git
cd && git clone https://github.com/haochenpan/CCM.git
. ~/CCM/bash/setup.sh abd-machine-time

Benchmark Workflow

  • On GCP: create new virtual instances for Cassandra servers and YCSB clients. Names could be Cass-all-1, Cass-all-2, Cass-all-3, Cass-all-ycsb-1, Cass-all-ycsb-2, and Cass-all-ycsb-3
  • On all remote instances: run appropriate initialization code, see the above section
  • On a local computer: clone this repo, modify (or create) servers.sh, credentials.sh, ycsb.sh, and run_bench.sh. servers.sh format see below
  • On a local computer: modify bash/change_seed.sh to server ips, in control.sh: run it (some algorithms may need extra steps here)
  • On a local computer: in control.sh: (clear cassandra and) start cassandra
  • On a local computer: in control.sh: upload ycsb.sh to all YCSB clients (through prepare_ycsb)
  • On a server instance: check nodes join and load the table schema (see below)
  • On a local computer or a remote controller: run btest.sh
    • see the second last step and below about the remote controller
  • If use vnStat, in control.sh, clear and start vnStart (through misc)
  • On GCP: upload servers.sh, credentials.sh, and run_bench.sh to a remote controller (and benchmark.sh for the first time)
  • On the remote controller instance: nohup . run_bench.sh &

As you can see, for the most of the time, we are working with control.sh.
I feel that I'd better make some short commands readily available, so they are commented out lines in control.sh.

servers.sh example content

abd_cluster=(
35.231.111.138
34.73.116.123
34.74.107.157
35.231.193.238
35.237.236.187
)

abd_cluster_ycsb=(
35.243.215.174
34.74.192.43
35.227.30.246
)

cassandra table schemas

create a keyspace and the corresponding schema of an algorithm

CREATE KEYSPACE ycsb WITH REPLICATION = {'class' : 'SimpleStrategy',
                                         'replication_factor': 3};
                                         
CREATE KEYSPACE ycsb WITH REPLICATION = {'class' : 'SimpleStrategy',
                                         'replication_factor': 5};

community                                
CREATE TABLE ycsb.usertable ( y_id varchar primary key, field0 varchar);

abd*
CREATE TABLE ycsb.usertable ( y_id varchar primary key, field0 varchar,
                              tag varchar);
treas*
CREATE TABLE ycsb.usertable ( y_id varchar PRIMARY KEY, field0 varchar,
                              field1 varchar, tag1 varchar,
                              field2 varchar, tag2 varchar,
                              field3 varchar, tag3 varchar,
                              field4 varchar, tag4 varchar,
                              field5 varchar, tag5 varchar);

oreas*
CREATE TABLE ycsb.usertable( y_id varchar PRIMARY KEY, field0 varchar, 
                              field1 varchar, tag1 bigint, 
                              field2 varchar, tag2 bigint, 
                              field3 varchar, tag3 bigint);

treas2
CREATE TABLE ycsb.usertable( y_id varchar PRIMARY KEY, field0 varchar, 
                             field1 varchar, tag1 varchar, 
                             field2 varchar, tag2 varchar, 
                             field3 varchar, tag3 varchar);

                            
generic 
CREATE TABLE ycsb.usertable ( y_id varchar PRIMARY KEY, field0 varchar,
                              z_value int, writer_id varchar);

sbq
CREATE TABLE ycsb.usertable ( y_id varchar PRIMARY KEY, field0 varchar,
                              ts int);
      
causal-3
CREATE TABLE ycsb.usertable ( y_id varchar PRIMARY KEY, field0 varchar,
                              vcol0 int, vcol1 int, vcol2 int,
                              sendfrom int);
    
causal-5
CREATE TABLE ycsb.usertable ( y_id varchar PRIMARY KEY, field0 varchar,
                              vcol0 int, vcol1 int, vcol2 int, 
                              vcol3 int, vcol4 int, sendfrom int);

the remote controller instance

shoud have a similar codebase, i.e. in ~/CCM, need to have:

|- setup/ -- id
|- data/
|- benchmark.sh
|- credentials.sh
|- run_bench.sh
|- servers.sh

Add an Algorithm

In setup.sh, follow the case switch pattern in function install_cass to add a case entry, so that a command like . setup.sh my_algorithm on terminal would pick up this case.
You may also need to add a table schema for this algorithm.

Change the Workload

In folder ycsb, define a new YCSB workload.
In bash/ycsb.sh, you can override some behavior of the workload by changing variable parameters.

When Servers Reboot

  • run rm ~/.ssh/known_hosts when server ip changes

Gotchas

  • perform chmod 400 id on SSH private key for the first time.
  • In Ubuntu 14.04, sometimes foreground jobs (e.g. run_bench.sh, btest.sh) hang in ssh sessions but not background jobs (i.e. use nohup ... &).

More Best Practices

  • download from the controller to local: scp -i ./setup/id -r panhi_bc_edu@server_ip:VMCM/data ./data
  • now we do ssh-keygen -m PEM -f id -C root on MacOS to generate SSH keys

Future Works

  • the use of google cloud console & use a command to populate an instance
  • project wide ssh key
  • control.sh -> python + bash

ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected] ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected] ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected] ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected] ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected]

ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected] ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected] ssh -o StrictHostKeyChecking=no -i ~/Desktop/Cassandra-Cluster-Management/setup/id [email protected]

~/Syrupy/scripts/syrupy.py --interval=5 --poll-command='.*java' & vnstat -l

./cqlsh 10.142.0.2 -e "CREATE KEYSPACE ycsb WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 7};" ./cqlsh 10.142.0.17 -e "CREATE KEYSPACE ycsb WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 7};" ./cqlsh 10.142.0.17 -e "CREATE TABLE ycsb.usertable( y_id varchar PRIMARY KEY, field0 varchar, field1 varchar, tag1 bigint, field2 varchar, tag2 bigint, field3 varchar, tag3 bigint);" ./cqlsh 10.142.0.2 -e "CREATE TABLE ycsb.usertable( y_id varchar PRIMARY KEY, field0 varchar, field1 varchar, tag1 varchar, field2 varchar, tag2 varchar, field3 varchar, tag3 varchar);" ./cqlsh 10.142.0.31 -e "CREATE TABLE ycsb.usertable ( y_id varchar primary key, field0 varchar);"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.