Giter Club home page Giter Club logo

serenity's Introduction

Serenity Build Status

Building the Modules

Intel and Mesosphere are working on creating cutting-edge oversubscription technologies for Mesos. Follow the Mesos Oversubscription Architecture, it is a very flexible solution which drives the internal semantics in Mesos but leaves all actual estimation and controller logic to module implementors.

We consider oversubscription as a series of estimates i.e. how much can safely be oversubscribed and decisions i.e. how to protect production workloads. The different substages of estimates and decision-making should be able to influence each other. For example, dramatic corrections may have to involve limiting or stopping current estimates.

We aim for a very flexible solution where both estimation and corrections are done in a pipelined approach with shared knowledge between each stage, referred to as Filters with a shared bus.

Serenity pipeline

For more documentation, please refer to docs.

Installing

Quickstart: build-and-test with Docker

With the Serenity repository cloned locally:

cd serenity
docker build .

The Dockerfile located in the project root is based on the mesosphere/mesos-modules-dev image. This image has newest Mesos (from master) pre-built with unbundled dependencies for convenience. See the contents of that Dockerfile here.

Prerequisites

Building Mesos modules requires system-wide installation of google-protobuf, glog, boost and picojson.

Currently it supports 0.27.x Mesos. (since Stout & libprocess changes appeared in older versions)

Build Mesos with some unbundled dependencies

Preparing Mesos source code

Start by pulling a recent version of Apache Mesos:

git clone https://git-wip-us.apache.org/repos/asf/mesos.git ~/mesos

Building and Installing Mesos

Due to the fact that modules will need to have access to a couple of libprocess dependencies, Mesos itself should get built with unbundled dependencies to reduce chances of problems introduced by varying versions (libmesos vs. module library).

We recommend using the following configure options:

cd ~/mesos
mkdir build && cd build
../configure --with-glog=/usr/local --with-protobuf=/usr/local --with-boost=/usr/local
make
make install

Building Serenity with Cmake

Once Mesos is built and installed, clone the Serenity package.

Build serenity with these commands:

cd build
cmake -DWITH_MESOS="/usr" ..
make

Run the tests:

make test

Deploying Serenity Module

Create a JSON file that describes the shared library and its parameters to the Mesos slave process:

{
    "libraries": [
    {
        "file": "./build/libserenity.so",
        "modules": [
        {
            "name": "com_mesosphere_mesos_SerenityEstimator"
        },
        {
            "name": "com_mesosphere_mesos_SerenityController"
        }
      ]
    }
  ]
}

You can reuse sample serenity.json.in. In order to use serenity, add these lines to your mesos-slave command line options:

--modules=file://serenity.json.in \
--resource_estimator="com_mesosphere_mesos_SerenityEstimator" \
--qos_controller="com_mesosphere_mesos_SerenityController"

Deploying Serenity Module using Deployment Scripts

There is useful Serenity-Formula project for the Mesos & Serenity deployment. It can be used to prepare cluster for Serenity end-to-end tests.

You are welcome to use it & contribute in case of any bug or enhancement.

Contributing

Send pull requests for code review before merging. Make sure that commits describes the changes and can be applied atomically.

The code base follows the Google C++ Style Guide and is linted by cpplint.

Before submitting code, make sure to run:

$ # Run style checker
$ ./scripts/lint.sh
$ # Make sure newly added APIs are documented
$ doxygen

To install the style checker as a git pre-commit hook:

$ ln -s scripts/pre-commit .git/hooks/pre-commit

Details about using Mesos Modules

See Mesos Modules

Serenity Smoke Test Framework

Serenity includes Test Mesos Framework with convienient JSON API.

For more documentation, please refer to docs.

serenity's People

Contributors

bwplotka avatar connordoyle avatar nqn avatar pittma avatar skonefal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

serenity's Issues

Are names "Decider", "Observer", "ChangePointDetector" proper for our specific pipeline components?

ChangePointDetector : Object responsible for detecting drops in given stream of "doubles" using different algorithms.

Observer : Observer should rather produce signals/suggestions. It does not produce QoS corrections. It's rather "As an observer, I observe degradation of CPI metric, which could be a result of resource contention, so I raise a ResourceContention signal" or "As an observer I see this amout of slack".

Decider : "As a RE decider I decide to expose this amount of observed slack to Master"
"As a QoS decider I decide to kill these executors because of this contention signal"

We would like to know your opinion. For us (PL) it is understandable, however you, native speakers, could have opposite feeling.

QoS Controller guarantees

Serenity needs to add guarantees to QoS controller. We need to make sure, that revocation of BE tasks will last until:
a) Signal that raised interference returned back to normal
b) All BE tasks are revoked

Logic that has to be inside controller:

  • QoS controller should kill BE tasks in groups from youngest to oldest (already done)
  • QoS controller should revoke multiple of tasks. Amount of tasks revoked per decision should be based on strength of interference
  • After killing BE tasks, QoS controller should do next killing after eg. 15s to see if signal has changed. If the interference signal prevailed
  • While issuing BE revocation signal, QoS controller should also inform RE (through communication bus) to stop sending slack estimation, so we could observe our signals without any interference.
  • When the Serenity is restored, QoS controller should send message to RE to re-enable oversubscription

Extend ResourceUsage class

ResourceUsage in Serenity is currently Mesos protobuffer message.
We should extend it and add some features:

  • Executor age information
  • Helper methods for executor (cut those from ResourceUsageHelper)
    TBD.

Substract resources used by BE executors in ResourceEstimator

Currently, RE counts only slack from production executors (from observers/slack_resource.cpp).
RE should substract BE resource usage from this value (it is currently done in estimator/serenity_estimator.cpp which is not best place to do it).

Refactor Exponential Moving Average filter

Current moving average filter code (ema.cpp/ema.hpp) is a mix of "base" filter class and "exponential moving average" algorithm.
We should refactor the code:

  • Create separate .hpp and .cpp for SmoothingFilter that will be base class, and separate folder for smoothing strategies (eg. ExponentialMovingAverage)
  • SmoothingFilter should accept muliple pairs of getter&setter to be smoothed
  • SmoothingFilter should accept smooting_strategy and should expose addMetic(getter_setter_tuple) function.

When this is done, all instances of setEmaIps, getEmaIps, setEmaCpuUsage etc. should be removed.

Throttling BE tasks (under CPU stress) could be not enough.

Our throttled BE tasks can still interfere with LC PR task because they still can be scheduled for a short while by Linux Scheduler. They could create ctx switching and interfere with PR. (There could be no IPC drop, however interference appears)
Mitigations:

  • We should try to find a way to remove BE tasks when too many of them are on our node (even throttled)
  • Move BE jobs into Freezer cgroup?

We should make some experiments on how does BE task cntx switches affects LC task on different levels of PR utilization and amount of CPU-stress BE tasks.

Add CpuUsageNormalization Filter

Currently, some of statistics (cpu_usage) is cumulative - that means, it’s value is sum of values since beginning of executor. Others are sampled/counted - value is sample from specific timestamp/timeframe

We should create a filter that takes cumulative values (cpu_usage) and normalizes them to sampled/counted value. This should simplify filter development/debugging because all values would be treated the same way

Use cases:

  • As a new Serenity developer I don’t want to double check which metrics are cumulative, and which are sampled - I would like to just use them as easy as possible
  • As a Serenity developer I don’t want to add additional logic to every filter that uses cpu_usage, because this metric is counted differently

Building Serenity with mesos-1.3.0

Hello,

I have built Serenity with mesos-1.3.0 but when I try to load the libserenity.so to the agent I get the following error: undefined symbol: _ZNK6google8protobuf7Message13SpaceUsedLongEv

Is it actually possible to build with mesos-1.3.0 or I have to use mesos-0.27.x?

Refactor data transformation between components in pipelines.

Currenlty we have not flexible solution. We are reusing some double fields in ResourceUsage (e.g net statistics) for EMA puposes. For better code readability we should reuse the same fields and add some Tags in which we will write what filtering were done on given data. e.g:
Source -> RU with perf instructions,tag<> -> EMA -> RU with EMA instructions, tag<ema>
Using aforementioned mitigation enables for us the possibilty to configure pipeline even without recompling.

To elaborate about tagging, here's use case:
"As a serenity module developer I would like to register for ResourceUsage information from two different sources and I would like to know which message is from which filter (eg, raw, ema)". It's about adding distinguishable labels to non-distinguishable messages.
I'm reluctant to it, because it leads to hardcoded pipelines - but it's a idea that comes from time to time (but I haven't seen any real "i need this" yet).

Create Pipeline functional tests

We have tested most of our components using unit tests, but we need to make sure that our pipelines produce results.

We few tests to check if Estimator returns slack when:

  • No executors are on machine
  • 3 PR executors on machnie
  • 3 PR executors on machne + some revocable executors
  • 3 PR executors on machine, but two of them are very recently started
  • 3 PR executors on machine, but two of them are very recently started + revocables

And for QoS Controller:

  • No executors are on machine
  • 3 PR executors on machnie, no IPC drop
  • 3 PR executors on machnie, IPC drop
  • 3 PR executors on machnie, revocable executors, no IPC drop
  • 3 PR executors on machnie, revocable executors, IPC drop, expected some revocations

Make consistent error handling in all pipeline components.

We should design how serenity filters will behave in case of any error during pipeline flow.

Some definitions:
Continuing pipeline -> run produce() and pass some form of ResourceUsage further.
Abort pipeline -> return Nothing() and wait for next usage.

The are plenty issues here:

  • How we validate if ResourceUsage_Executor executors have required fields filled? Different filters requires differend fields.
    1. One validation filter at the front of pipeline which will validate (strictly) each excutor given in ResourceUsage.
    • Faster pipeline - no need to implement redundant validation in each filter!
    • Less code in filters.
      2) Validation in each filter (currently it works like that)
  • How we handle error situations? E.g we don't have required field in one executor, we don't have all executors filled with needed fields (which is required for example in UtilizationThreshold filter). How we handle situation when we want to cut off the oversubscription?
    1. We abort the pipeline. (current behaviour)
    2. We continue pipeline with bad executors filtered out, or even empty ResourceUsage
    3. We continue pipeline with data in Error state (IMO good idea)
  • How we handle situation when we need 2 or even 3 iterations of pipeline for some filter to be able to calculate something? (e.g ema IPC filter needs to calculate IPC in one second window)

Standardize constructors in Serenity

Our current filter's constructors and pipelines are messy. They accept a lot of parameters, have a lot of parameters defaulted and it's hard to what do they mean. It is also hard to recognize how our pipelines are created.
I would like to propose a standardization:

Each component would expose an API:

  • Zero-parameter constructor to initialize default component (it should have best parameters that we know)
  • Constructor that accept SerenityConfig. Parameter default values would be overwritten by those in SerenityConfig. It should be used to expose configuration to operators in the future.
  • Set of "setters" to expose what parameters can be changed "in code"
  • If filter is designed to use dependency injection (like QoS Controller or Smoothing filter), it should accept DI object in both constructors (but no configuration for it)

For pipeline, we should move from 'constructor initialization list' to constructor body and create a pipeline in logical "phases" to make it easier to read and understand.

Implement flattenedConsumables() in Consumer class

#149 Implemented std::vector consumables() function.

Because some products are vectors, some components are consumers of vector<vector>.
For easier development, we should have flattenedConsumables that will return flattened result.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.