Giter Club home page Giter Club logo

Comments (3)

beinfluential88 avatar beinfluential88 commented on September 25, 2024

Procedure to implement Gatling

Objective: With the aim of monitoring the performance of our system and detecting imminent system failures preemptively, we are planning to implement the following tools for system analysis purposes.

  1. Gatling: A load-testing tool that mimics the behavior of a user. (Heavier load can be applied to the system by increasing the number of synthetic users)
  2. Grafana: A dashboard tool that can be used to real-time monitoring of the system performance. Grafana queries Prometheus to retrieve relevant metrics from Prometheus, and these queries can be adjusted for different analysis.
  3. Prometheus: A system for gathering and storing metrics as a time-series database.

Pre-requisite

  1. AWS EKS cluster should be up and running which will serve the 4 microservices, db (database service), s1 (user service), s2 (music service), and s3 (playlist service).
    a. Starting a fresh EKS cluster.
    make -f eks.mak start

  2. Ensure AWS DynamoDB is initialized. The tables have to be available for db (database service) to serve the other 3 microservices. (s1/s2/s3)
    aws dynamodb list-tables

  3. Provision the cluster. This includes;
    a. Create a namespace within the cluster in which applications will be placed.
    kubectl create ns c756ns
    kubectl config set-context --current --namespace=c756ns
    b. Provision the Kubernetes cluster. This includes “Installing Istio”, “Installing Prometheus stack by calling obs.mak recursively”, and “Deploying and monitoring the four microservices”.
    make -f k8s.mak provision

  4. Get the Grafana URL using which we can access the dashboard.
    make -f k8s.mak grafana-url
    • User: admin
    • Password: prom-operator
    Note: The hostname is obtained from ‘istio” namespace.
    kubectl get -n istio-system svc/grafana-ingress -o jsonpath="{.status.loadBalancer.ingress[0]['ip','hostname']}"
    Parameters:
    1: path to kubectl
    2: namespace
    3: the resource to query (typically an svc)

How does Gatling work?

1. We will be using a Gatling docker image which will allow us to create and apply synthetic load to our system.
ghcr.io/scp-2021-jan-cmpt-756/gatling:3.4.2

2. Scenarios of user behavior are defined in “ReadTables.scala”.
a. The original package name is defined, which will be used to trigger a Gatling instance later.
package proj756
b. The required imports for Gatling to work.
import scala.concurrent.duration._
import io.gatling.core.Predef._
import io.gatling.http.Predef._
c. “Utility” object
envVarToInt(“USERS”, 1) - Utility to get an Int from an environment variable. (e.g., The number of users is defined under “docker container run” command.)
envVar("CLUSTER_IP", "127.0.0.1") - Utility to get a string from an environment variable. (e.g., The cluster IP is defined under “docker container run” command.)
d. “RMusic” and “RUser” objects – Scenarios defined to be tested for respective services. Sending an HTTP get request with {UUID} continuously every one second.
Note: “eager()” loads the whole data in memory before the Simulation starts, saving disk access at runtime. “random()” randomly picks an entry in the sequence. “circular()” goes back to the top of the sequence once the end is reached.
e. “RUserVarying” and “RMusicVarying” objects – Scenarios defined to be tested for respective services. Sending a HTTP get request with {UUID} continuously with different intervals between calls. (Each interval is randomly selected between 1 and 60 seconds)
f. “ReadTablesSim” class – This class inherits properties from “Simulation” class and used to define HTTP protocols for simulations. (e.g., cluster IP is read from environment variables defined under “docker container run” command)
g. “ReadUserSim” and “ReadMusicSim” classes – These classes are directly called by “docker container run” command, which will inject independent users (as defined in “RMusic” and “RUser” objects) via HTTP protocols defined in “ReadTablesSim” class
h. “ReadBothVaryingSim” class – This class is directly called by “docker container run” command, which will inject concurrent users (as defined in “RMusicVarying” and “ReadUserVarying” objects) via HTTP protocols defined in “ReadTablesSim” class.
Note: There are 2 types of workload model for injection. - Open vs. Closed
Closed systems, where you control the concurrent number of users. Closed system are system where the number of concurrent users is capped. At full capacity, a new user can effectively enter the system only once another exits.
Open systems, where you control the arrival rate of users. Open systems have no control over the number of concurrent users: users keep on arriving even though applications have trouble serving them.
Note: For closed model, We have two methods that we use to inject users.
constantConcurrentUsers(nbUsers).during(duration): Inject so that number of concurrent users in the system is constant
rampConcurrentUsers(fromNbUsers).to(toNbUsers).during(duration): Inject so that number of concurrent users in the system ramps linearly from a number to another

3. Create a script that will trigger Gatling. (e.g., gatling-1-music.sh)

docker container run --detach --rm \
  -v ${PWD}/gatling/results:/opt/gatling/results \
  -v ${PWD}/gatling:/opt/gatling/user-files \
  -v ${PWD}/gatling/target:/opt/gatling/target \
  -e CLUSTER_IP=`tools/getip.sh kubectl istio-system svc/istio-ingressgateway` \
  -e USERS=1 \
  -e SIM_NAME=ReadMusicSim \
  --label gatling \
  ghcr.io/scp-2021-jan-cmpt-756/gatling:3.4.2 \
  -s proj756.ReadMusicSim

4. To list Gatling containers currently running
tools/list-gatling.sh

5. To stop all the Gatling containers.
tools/kill-gatling.sh

from term-project-cloudriven.

beinfluential88 avatar beinfluential88 commented on September 25, 2024

Prometheus Basics - Time Series

  1. We can query Prometheus directly without Grafana.
  2. We can output metrics to Prometheus.

Prometheus Basics – Two Fundamental Roles

  1. First, it gathers and records metrics in a time-series database (TSDB), which includes special compression techniques optimized for this type of data.
  2. Second, it supports queries against that database. It features a query language, PromQL, that meets the specific needs of time series data.

Prometheus Technical Details

  1. The set of metrics available from a given container is determined by that container, not Prometheus.
  2. The set of metrics available from our three microservices are defined by the Python client library we use, the Python Prometheus Flask exporter. We may define new metrics for our term project.

Pre-requisite

  1. AWS EKS cluster should be up and running which will serve the 4 microservices, db (database service), s1 (user service), s2 (music service), and s3 (playlist service).
    a. Starting a fresh EKS cluster.
    make -f eks.mak start

  2. Ensure AWS DynamoDB is initialized. The tables have to be available for db (database service) to serve the other 3 microservices. (s1/s2/s3)
    aws dynamodb list-tables

  3. Provision the cluster. This includes;
    a. Create a namespace within the cluster in which applications will be placed.
    kubectl create ns c756ns
    kubectl config set-context --current --namespace=c756ns
    b. Provision the Kubernetes cluster. This includes “Installing Istio”, “Installing Prometheus stack by calling obs.mak recursively”, and “Deploying and monitoring the four microservices”.
    make -f k8s.mak provision

  4. Get the Promethus URL using which we can directly run queries on Promethus.
    make -f k8s.mak prometheus-url

Note: The hostname is obtained from ‘istio” namespace.
kubectl get -n istio-system svc/prom-ingress -o jsonpath="{.status.loadBalancer.ingress[0]['ip','hostname']}"
Parameters:
1: path to kubectl
2: namespace
3: the resource to query (typically an svc)

A query returning a single time series

The following query requests the current values of all time-series that have their service label assigned the string cmpt756db.
flask_http_request_total{service="cmpt756db"}

flask_http_request_total{container="cmpt756db",endpoint="http", instance="10.244.1.10:30002",job="cmpt756db",method="GET",namespace="c756ns", pod="cmpt756db-79ddc5446d-2566f",service="cmpt756db",status="200"}

Instant vector: A query returning multiple time series

Requesting any time series for our sample metric, regardless of the values for its keys. Note that the returned values were not necessarily sampled at the same time but are simply the most recent samples returned for each time series.
flask_http_request_total

image

Range vector: A query returning several values from a single series

Our next query will return to the single time series but we will ask for all the samples over a given time range, returning a range vector.
flask_http_request_total{service="cmpt756db"}[5m]
The [5m] suffix requests all samples from the most recent 5 minutes, ordered from oldest to most recent.
The entries in the Value column will now include both a count and a timestamp, separated by an @ symbol. The timestamp is in seconds since January 1, 1970, GMT. Copy one of the timestamps and paste it into the Unix epoch converter to decode the time into something more understandable.

image

Multiple range vectors

we can run a query requesting ranges for multiple time series.
flask_http_request_total[5m]

image

Matching query types to vector type

The PromQL language enforces the distinction between instant and range vectors.
• Aggregation operators such as avg or min can only be applied to instant vectors.
• Functions that compute a value over time, such as increase or rate, can only be applied to range vectors.
The list of PromQL functions specifies for each function whether a vector argument must be instant or range.

Computing a rate across a range (feat. range vectors)

The rate of HTTP calls per second (divided by number of seconds)
rate(flask_http_request_total{service="cmpt756db"}) WRONG!!!
rate(flask_http_request_total{service="cmpt756db"}[5m]) CORRECT!!!

Computing an average across an instant (feat. instant vectors)

the average number of HTTP requests per time series since each series began.
avg(flask_http_request_total{service="cmpt756db"}[5m]) WRONG!!!
avg(flask_http_request_total{service="cmpt756db"}) CORRECT!!!

from term-project-cloudriven.

bingsoorim avatar bingsoorim commented on September 25, 2024

This is considered done. Closing the issue.

from term-project-cloudriven.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.