mesosphere / serenity Goto Github PK

View Code? Open in Web Editor NEW

69.0 163.0 21.0 8.39 MB

Intel:Mesosphere oversubscription technologies for Apache Mesos

License: Other

Shell 0.78% C++ 94.69% Protocol Buffer 0.59% CMake 2.43% Python 1.50%

dcos-compute-guild dcos

serenity's Introduction

Serenity

Building the Modules

Intel and Mesosphere are working on creating cutting-edge oversubscription technologies for Mesos. Follow the Mesos Oversubscription Architecture, it is a very flexible solution which drives the internal semantics in Mesos but leaves all actual estimation and controller logic to module implementors.

We consider oversubscription as a series of estimates i.e. how much can safely be oversubscribed and decisions i.e. how to protect production workloads. The different substages of estimates and decision-making should be able to influence each other. For example, dramatic corrections may have to involve limiting or stopping current estimates.

We aim for a very flexible solution where both estimation and corrections are done in a pipelined approach with shared knowledge between each stage, referred to as Filters with a shared bus.

For more documentation, please refer to docs.

Installing

Quickstart: build-and-test with Docker

With the Serenity repository cloned locally:

cd serenity
docker build .

The Dockerfile located in the project root is based on the mesosphere/mesos-modules-dev image. This image has newest Mesos (from master) pre-built with unbundled dependencies for convenience. See the contents of that Dockerfile here.

Prerequisites

Building Mesos modules requires system-wide installation of google-protobuf, glog, boost and picojson.

Currently it supports 0.27.x Mesos. (since Stout & libprocess changes appeared in older versions)

Build Mesos with some unbundled dependencies

Preparing Mesos source code

Start by pulling a recent version of Apache Mesos:

git clone https://git-wip-us.apache.org/repos/asf/mesos.git ~/mesos

Building and Installing Mesos

Due to the fact that modules will need to have access to a couple of libprocess dependencies, Mesos itself should get built with unbundled dependencies to reduce chances of problems introduced by varying versions (libmesos vs. module library).

We recommend using the following configure options:

cd ~/mesos
mkdir build && cd build
../configure --with-glog=/usr/local --with-protobuf=/usr/local --with-boost=/usr/local
make
make install

Building Serenity with Cmake

Once Mesos is built and installed, clone the Serenity package.

Build serenity with these commands:

cd build
cmake -DWITH_MESOS="/usr" ..
make

Run the tests:

make test

Deploying Serenity Module

Create a JSON file that describes the shared library and its parameters to the Mesos slave process:

{
    "libraries": [
    {
        "file": "./build/libserenity.so",
        "modules": [
        {
            "name": "com_mesosphere_mesos_SerenityEstimator"
        },
        {
            "name": "com_mesosphere_mesos_SerenityController"
        }
      ]
    }
  ]
}

You can reuse sample serenity.json.in. In order to use serenity, add these lines to your mesos-slave command line options:

--modules=file://serenity.json.in \
--resource_estimator="com_mesosphere_mesos_SerenityEstimator" \
--qos_controller="com_mesosphere_mesos_SerenityController"

Deploying Serenity Module using Deployment Scripts

There is useful Serenity-Formula project for the Mesos & Serenity deployment. It can be used to prepare cluster for Serenity end-to-end tests.

You are welcome to use it & contribute in case of any bug or enhancement.

Contributing

Send pull requests for code review before merging. Make sure that commits describes the changes and can be applied atomically.

The code base follows the Google C++ Style Guide and is linted by cpplint.

Before submitting code, make sure to run:

$ # Run style checker
$ ./scripts/lint.sh

$ # Make sure newly added APIs are documented
$ doxygen

To install the style checker as a git pre-commit hook:

$ ln -s scripts/pre-commit .git/hooks/pre-commit

Details about using Mesos Modules

See Mesos Modules

Serenity Smoke Test Framework

Serenity includes Test Mesos Framework with convienient JSON API.

For more documentation, please refer to docs.

serenity's People

Contributors

Stargazers

Watchers

Forkers

skonefal nqn jimenez lelezi supertest erikdw bwplotka yyzreal cloudxtreme honestzbf wxdublin bugshacker anrs denniskong weiwei04 jetmuffin aburan28 raghu999 scape1989 lyuhao isabella232

serenity's Issues

Are names "Decider", "Observer", "ChangePointDetector" proper for our specific pipeline components?

ChangePointDetector : Object responsible for detecting drops in given stream of "doubles" using different algorithms.

Observer : Observer should rather produce signals/suggestions. It does not produce QoS corrections. It's rather "As an observer, I observe degradation of CPI metric, which could be a result of resource contention, so I raise a ResourceContention signal" or "As an observer I see this amout of slack".

Decider : "As a RE decider I decide to expose this amount of observed slack to Master"
"As a QoS decider I decide to kill these executors because of this contention signal"

We would like to know your opinion. For us (PL) it is understandable, however you, native speakers, could have opposite feeling.

QoS Controller guarantees

Serenity needs to add guarantees to QoS controller. We need to make sure, that revocation of BE tasks will last until:
a) Signal that raised interference returned back to normal
b) All BE tasks are revoked

Logic that has to be inside controller:

QoS controller should kill BE tasks in groups from youngest to oldest (already done)
QoS controller should revoke multiple of tasks. Amount of tasks revoked per decision should be based on strength of interference
After killing BE tasks, QoS controller should do next killing after eg. 15s to see if signal has changed. If the interference signal prevailed
While issuing BE revocation signal, QoS controller should also inform RE (through communication bus) to stop sending slack estimation, so we could observe our signals without any interference.
When the Serenity is restored, QoS controller should send message to RE to re-enable oversubscription

Implement CPI^2 as a way to find proper aggressors to revoke.

Make valve filter enablement more robust.

As is, the valve filters allow any value to be set and matches on "true". We should return an error if the value is malformed.

Rename TooHighUtilizationDetector and IPCSignalDropDetector

s/TooHighUtilizationDetector/OverloadDetector

s/IPCSignalDropDetector/IPCDropDetector

Add InfluxDB 9 support

Add InfluxDB9 as a database backend.
Deprecate InfluxDB 8.

QoS Metric Drop observer should only raise interference signal when there are BE tasks

Currently QoS controller revokes only BE tasks, so observers should only raise interference signals when BE tasks exists.

Introduce async components (Producers/Consuments) using the Libprocess.

Using libprocess we could improve our components and consider them as seperate actor. As a result each component can work in parallel.

Make doxygen treat warning (missing documentation) as errors.

Wrap ResourceUsage into Serenity's wrapper class

We need to have ResourceUsage class wrapped in our class to elegantly add some features:

Helper methods for geting lists of BE and PR executors
Helper methods for timestamping executors

Add CMT revocation strategy to QosObserver.

See: https://docs.google.com/document/d/1A9VnA-CCI4btoBvLLeYn3b0ScJrMQ-0CIiX4sTYnNhI/edit#heading=h.t2zzorsmh53l

Extend ResourceUsage class

ResourceUsage in Serenity is currently Mesos protobuffer message.
We should extend it and add some features:

Executor age information
Helper methods for executor (cut those from ResourceUsageHelper)
TBD.

Substract resources used by BE executors in ResourceEstimator

Currently, RE counts only slack from production executors (from observers/slack_resource.cpp).
RE should substract BE resource usage from this value (it is currently done in estimator/serenity_estimator.cpp which is not best place to do it).

CPI/IPC contention detector should not raise signal when PR task utilization is less than 0.25 cpu-secs/sec

With low PR tasks CPU utilization, we will have false positives.

Refactor and standarize error handling in pipelines

tbd

Refactor Exponential Moving Average filter

Current moving average filter code (ema.cpp/ema.hpp) is a mix of "base" filter class and "exponential moving average" algorithm.
We should refactor the code:

Create separate .hpp and .cpp for SmoothingFilter that will be base class, and separate folder for smoothing strategies (eg. ExponentialMovingAverage)
SmoothingFilter should accept muliple pairs of getter&setter to be smoothed
SmoothingFilter should accept smooting_strategy and should expose addMetic(getter_setter_tuple) function.

When this is done, all instances of setEmaIps, getEmaIps, setEmaCpuUsage etc. should be removed.

Configure lint for Mesos code style instead of Google style.

Our code could be understandable easier when having similar style to Apache Mesos itself.
This lint script could be also upstreamed to Mesos then.

Refer to Mesos Agent instead of Mesos Slave in Serenity

Throttling BE tasks (under CPU stress) could be not enough.

Our throttled BE tasks can still interfere with LC PR task because they still can be scheduled for a short while by Linux Scheduler. They could create ctx switching and interfere with PR. (There could be no IPC drop, however interference appears)
Mitigations:

We should try to find a way to remove BE tasks when too many of them are on our node (even throttled)
Move BE jobs into Freezer cgroup?

We should make some experiments on how does BE task cntx switches affects LC task on different levels of PR utilization and amount of CPU-stress BE tasks.

Support for running components (filters, observers) in parallel.

The initial idea was to make this pipeline flexible and fast. Let's try to introduce parallel way to run pipeline.

Proposed ideas:

Libprocess based parallelism (each component is an actor - to much overhead?)
Sys threads?

Remove syncConsumer class

SyncConsumer functionality should be implemented in Filter class.

Make Serenity compatible with Mesos 0.26

Add CpuUsageNormalization Filter

Currently, some of statistics (cpu_usage) is cumulative - that means, it’s value is sum of values since beginning of executor. Others are sampled/counted - value is sample from specific timestamp/timeframe

We should create a filter that takes cumulative values (cpu_usage) and normalizes them to sampled/counted value. This should simplify filter development/debugging because all values would be treated the same way

Use cases:

As a new Serenity developer I don’t want to double check which metrics are cumulative, and which are sampled - I would like to just use them as easy as possible
As a Serenity developer I don’t want to add additional logic to every filter that uses cpu_usage, because this metric is counted differently

Enable passing Serenity configuration parameters from the mesos modules json.

Instead of hardcoding configuration parameters (even with a separated cpp file), we can take those directly from the modules json to Mesos (which let's you pass parameters in key value pairs)

Add code coverage percentage on Github

Add code coverage badge, eg. from Coveralls.

Building Serenity with mesos-1.3.0

Hello,

I have built Serenity with mesos-1.3.0 but when I try to load the libserenity.so to the agent I get the following error: undefined symbol: _ZNK6google8protobuf7Message13SpaceUsedLongEv

Is it actually possible to build with mesos-1.3.0 or I have to use mesos-0.27.x?

Refactor data transformation between components in pipelines.

Currenlty we have not flexible solution. We are reusing some double fields in ResourceUsage (e.g net statistics) for EMA puposes. For better code readability we should reuse the same fields and add some Tags in which we will write what filtering were done on given data. e.g:
Source -> RU with perf instructions,tag<> -> EMA -> RU with EMA instructions, tag<ema>
Using aforementioned mitigation enables for us the possibilty to configure pipeline even without recompling.

To elaborate about tagging, here's use case:
"As a serenity module developer I would like to register for ResourceUsage information from two different sources and I would like to know which message is from which filter (eg, raw, ema)". It's about adding distinguishable labels to non-distinguishable messages.
I'm reluctant to it, because it leads to hardcoded pipelines - but it's a idea that comes from time to time (but I haven't seen any real "i need this" yet).

Enable valgrind memory check on code checkin.

To identify leaks, let's get valgrind wired up to our CI.

Implement communication between Resource Estimator and QoS component

Serenity needs to have a communication channel between RE and QoS component.

The idea is to create two libprocess queues on both sides and communicate based on their names. Those queues will need to implement Producer/Consumer interfaces and be connected in pipelines.

Introduce machine learing to Serenity. Adjust policies based on past results.

E.g Start Serenity with conservative policies to more liberal one when no one is affected.

Investigate percentage of dropped network packets as an interference hint.

Observe for a network dropped packets increase and trigger action based on that.

Create Pipeline functional tests

We have tested most of our components using unit tests, but we need to make sure that our pipelines produce results.

We few tests to check if Estimator returns slack when:

No executors are on machine
3 PR executors on machnie
3 PR executors on machne + some revocable executors
3 PR executors on machine, but two of them are very recently started
3 PR executors on machine, but two of them are very recently started + revocables

And for QoS Controller:

No executors are on machine
3 PR executors on machnie, no IPC drop
3 PR executors on machnie, IPC drop
3 PR executors on machnie, revocable executors, no IPC drop
3 PR executors on machnie, revocable executors, IPC drop, expected some revocations

Consumer and Producer class should track IDs of producers

#149 Implemented simple tracking of products for consumption and production in components.

This should be extended to keep track of specific ID of producers, to be sure that our pipelines are properly working.

Remove WorkID class in Serenity code

WID class are redundant and it's occurrences should be removed. Whenever they were useful, they should be changed to ExecutorInfo message.

Modify Spark to support revocable offers.

Create QoS Mesos isolator.

Track perf events for only the high-priority (PR) tasks.

Make consistent error handling in all pipeline components.

We should design how serenity filters will behave in case of any error during pipeline flow.

Some definitions:
Continuing pipeline -> run produce() and pass some form of ResourceUsage further.
Abort pipeline -> return Nothing() and wait for next usage.

The are plenty issues here:

How we validate if ResourceUsage_Executor executors have required fields filled? Different filters requires differend fields.
1. One validation filter at the front of pipeline which will validate (strictly) each excutor given in ResourceUsage.
- Faster pipeline - no need to implement redundant validation in each filter!
- Less code in filters.
  2) Validation in each filter (currently it works like that)
How we handle error situations? E.g we don't have required field in one executor, we don't have all executors filled with needed fields (which is required for example in UtilizationThreshold filter). How we handle situation when we want to cut off the oversubscription?
1. We abort the pipeline. (current behaviour)
2. We continue pipeline with bad executors filtered out, or even empty ResourceUsage
3. We continue pipeline with data in Error state (IMO good idea)
How we handle situation when we need 2 or even 3 iterations of pipeline for some filter to be able to calculate something? (e.g ema IPC filter needs to calculate IPC in one second window)

Add Too High utilization filter to Revoke BE task when cpu utilization on host is too high.

From Heracles paper we can learn that scheduling Best Effort tasks while having utilization over 85% on host is non-optimal.

Add support for running Serenity in stand-alone mode.

Add exponential moving average filter to Estimator pipeline

Refactor SerenityConfig class

getS, getD should return Result. Inside catch all exceptions.
Simplify use of the SerenityConfig class

Standardize constructors in Serenity

Our current filter's constructors and pipelines are messy. They accept a lot of parameters, have a lot of parameters defaulted and it's hard to what do they mean. It is also hard to recognize how our pipelines are created.
I would like to propose a standardization:

Each component would expose an API:

Zero-parameter constructor to initialize default component (it should have best parameters that we know)
Constructor that accept SerenityConfig. Parameter default values would be overwritten by those in SerenityConfig. It should be used to expose configuration to operators in the future.
Set of "setters" to expose what parameters can be changed "in code"
If filter is designed to use dependency injection (like QoS Controller or Smoothing filter), it should accept DI object in both constructors (but no configuration for it)

For pipeline, we should move from 'constructor initialization list' to constructor body and create a pipeline in logical "phases" to make it easier to read and understand.