Giter Club home page Giter Club logo

big-brother's Introduction

Big Brother

This project defines a service to effectively communicate observability events to application stakeholders.

How does it work

motive

Basically, it collects the necessary metrics from a client provided BB Promster cluster endpoint.

More specifically, the Cortex app monitors and stores metrics sent by the BB Promster Clusters, and starts collecting Big Brother specific metrics, with the help of some useful programming libraries.

These metrics are treated as the fundamental protocol behind Big Brother's capabilities.

Big Brother Metric Protocol

A valid Big Brother library should expose the following metrics:

request_seconds_bucket{type, status, isError, errorMessage, method, addr, le}
request_seconds_count{type, status, isError, errorMessage, method, addr}
request_seconds_sum{type, status, isError, errorMessage, method, addr}
response_size_bytes{type, status, isError, errorMessage, method, addr}
dependency_up{name}
dependency_request_seconds_bucket{name, type, status, isError, errorMessage, method, addr, le}
dependency_request_seconds_count{name, type, status, isError, errorMessage, method, add}
dependency_request_seconds_sum{name, type, status, isError, errorMessage, method, add}
application_info{version}

In detail:

  1. request_seconds_bucket is a metric that defines the histogram of how many requests are falling into the well defined buckets represented by the label le;
  2. request_seconds_count is a counter that counts the overall number of requests with those exact label occurrences;
  3. request_seconds_sum is a counter that counts the overall sum of how long the requests with those exact label occurrences are taking;
  4. response_size_bytes is a counter that computes how much data is being sent back to the user for a given request type. It captures the response size from the content-length response header. If there is no such header, the value exposed as metric will be zero;
  5. dependency_up is a metric to register weather a specific dependency is up (1) or down (0). The label name registers the dependency name;
  6. dependency_request_seconds_bucket is a metric that defines the histogram of how many requests to a specific dependency are falling into the well defined buckets represented by the label le;
  7. dependency_request_seconds_count is a counter that counts the overall number of requests to a specific dependency;
  8. dependency_request_seconds_sum is a counter that counts the overall sum of how long requests to a specific dependency are taking;
  9. Finally, application_info holds static info of an application, such as it's semantic version number;

Labels

For a specific request:

  1. type tells which request protocol was used (e.g. grpc, http, etc);
  2. status registers the response status (e.g. HTTP status code);
  3. method registers the request method;
  4. addr registers the requested endpoint address;
  5. version tells which version of your app handled the request;
  6. isError lets us know if the status code reported is an error or not;
  7. errorMessage registers the error message;
  8. name registers the name of the dependency;

Ecosystem

The following libraries make part of Big Brother official libraries:

  1. express-monitor for Node JS Express apps;
  2. servlet-monitor for Java Servlets apps;
  3. quarkus-monitor for Java Quarkus apps;
  4. flask-monitor for Python Flask apps;
  5. mux-monitor for the Golang Mux apps;
  6. fiber-monitor for the Golang Fiber apps;
  7. gin-monitor for the Golang Gin apps;
  8. [TODO] iris-monitor for Golang Iris apps;

Without these, you would have to expose the metrics by yourself, possibly leading to inconsistencies and other errors when setting up your app's observability infrastructure with Big Brother.

Components

The Big Brother app is composed by an ETCD cluster, a Dialogflow Bot, a Prometheus Alertmanager, a Grafana, a Promster cluster, a Cortex, and a BB Manager, all with their own configuration needs.

ETCD

The ETCD cluster serves 3 purposes:

  1. Register client bb-promster clusters;
  2. Register versions of the apps, for updating alerts dynamically;
  3. [TODO] Register Big Brother's alertmanager cluster, for high availability;

Gets configured by:

  1. ETCD_LISTEN_CLIENT_URLS: the addresses ETCD daemon listens to client traffic;
  2. ETCD_ADVERTISE_CLIENT_URLS: list of an ETCD client URLs to advertise to the rest of the cluster;

Dialogflow Bot

A bot to communicate with the interested stakeholders. It's purposes are to:

  1. Enable CRUD of client apps to be observed by Big Brother; and
  2. Alert on possible problems;

Prometheus Alertmanager

A service to host alerting configuration on top of the alerts being dispatched by the Promster Cluster.

Gets configured by:

  1. WEBHOOK_URL: the bot address

Grafana

A service to generate graphics to help to query, visualize and understand your metrics.

BB-Promster Cluster

The service that federate's on the client's bb-promster cluster, hosts and evaluates alerting rules and dispatches alerts accordingly.

Gets configured by:

  1. [TO BE DEPRECATED] BB_PROMSTER_LEVEL: integer greater than 0 that defines which level this promster sits on it's own federation cluster;
  2. ETCD_URLS: defines the ETCD cluster urls, separated by comma;
  3. ALERT_MANAGER_URLS: defines the alertmanager cluster urls;

Cortex

The service that monitors and stores metrics sent by the BB Promster Clusters

BB-Manager

A front-end interface to register apps and version.

How to Run locally

Using Docker

  1. Talk to Telegram's Bot Father, create your own bot and get it's Telegram Token;

  2. Open a Dialogflow account, create a new project and import the configs from the folder bot/dialogflow;

  3. Train your intents;

  4. Setup a Telegram integration with the Token obtained in step 1;

  5. Expose your port 3001 and inform a reachable HTTPS address to the Dialogflow fulfillment configuration. We recommend using ngrok for that;

  6. Type the following commands in your terminal to interact with your bot directly through Telegram:

    TELEGRAM_TOKEN=<XXXXX:YYYYYY> docker-compose up -d --build

    This will run an example app with its own bb-promster cluster and the Big Brother app with its components.

  7. Now go to the bot on Telegram, and add a new App. Inform the App name (e.g. Example) and the app address (e.g. example-bb-promster:9090). You'll be automatically subscribed to the app you've just added.

[TO BE DEPRECATED] The example client app bb-promster cluster will get registered to the Big Brother's ETCD and Big Brother will then start collecting metrics by federating it.

Open your browser on http://localhost:3000 to access the provided Grafana dashboard (user bigbrother, password bigbrother).

Also, access http://localhost:3001/test on your browser to dispatch test alerts and see if you get them at your Telegram chat.

Using Kubernetes

Follow this tutorial to run using Kubernetes.

Trivia

The name is inspired by George Orwell's 1984 Big Brother character.

In this book, Big Brother is an entity that is omniconscious, being able to watch everyone, everywhere.

This is exactly what we aim to achieve with this project: a way for you to easily and effectively watch every project you have without any prior knowledge of observability concepts and Prometheus best practices.

big-brother's People

Contributors

eabili0 avatar gilliardmacedo avatar gutorc92 avatar karinevalenca avatar mfurquimdev avatar rafamarts avatar ralphg6 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

big-brother's Issues

Setup Oathkeeper

Oathkeeper needs to be setup to listen to requests to /api/prom/push and check for the existence of a valid Access Token emitted by Big Brother's Hydra.

The token should be present at the HTTP Authorization header with a Bearer: prefix.

All PRs should be directed to the multi-tenancy branch.

Disable/exclude metrics buckets

From labbsr0x/servlet-monitor#31

Only more advanced users of Prometheus metrics really uses buckets/histograms. By removing metrics buckets, we will reduce in over 60% the amount of processing and byte output in our metrics endpoint with no loss in quality.

Currently we have something like this for each metrics/label combination:

request_seconds_bucket 1
request_seconds_bucket 2
request_seconds_bucket 3
request_seconds_bucket 4
request_seconds_bucket +inf
request_seconds_sum
request_seconds_count
response_size_bytes

By removing buckets, we would reduce to something like

request_seconds_sum
request_seconds_count
response_size_bytes

Proposal

  • Buckets metrics made optional for request_seconds and dependency_request_seconds
  • Disable buckets output by default (will only generate _sum and _count metrics for each request)
  • Only enable buckets if configured explicitly using existing param "buckets" with values like "0.1,0.3,2,10" for example

Integrate ETCD entries for BB-Promster and Bot Service

To enable a simpler architecture with cortex, as proposed by @gutorc92 in #7, the way BB Promster's get registered to the ETCD should follow a structure that the Bot Service can use to recognize new apps, allowing bot users to list and subscribe to apps without having to explicitly add them through the bot.

Important: BB Promster easy federation abilities should not be lost with this integration.

Create submodules for bot and alertmanager

the Big Brother repo should be central for all the components, by holding them as submodules.

Each component should have its own automated release cycle using Github actions. For that, each component should have it's own Github repo.

Add simple security layer to create App action

When user requests the creation of a new app via Telegram, the Bot Service should:

  • 1. register the client at Hydra;
  • 2. perform a client_credentials flow to get a valid Access Token;
  • 3. The access token must be informed to the user as a result.

All PRs should be directed to the cortex branch.

Add quarkus extension for Big Brother

Marcelo Rubim and I, made a quarkus extension for Big Brother, it is in Rubim repo
We use the servlet-monitor as example to create this extension.
We think to move this project to labbsr0x repo, but some adjustments maybe necessary.
Could you look if this project is suitable for Big Brother ?

Eu e o Marcelo Rubim fizemos uma extensão do quarkus para o Big Brother, esse projeto esta no repo do Rubim.
Usamos o servlet-monitor como exemplo para criar essa extensão.
Estamos pensando em mover esse projeto para o repositório do labbsr0x, talvez precise de alguns ajustes.
Vocês poderiam dar uma olhada no projeto e ver se ele esta de acordo com o padrões do Big Brother ?

Add k8s deployment and service configs

To support the latest computing trends, Big Brother should have a k8s folder with all the deployment, statefulsets and service configs to run Big Brother in a k8s environment

Application version info

According to the big-brother protocol, the application version info is set as histogram and counter labels.
e.g.

request_seconds_bucket{type="http",status="404",method="GET",addr="/app", version="1.0.1",isError="true",le="1.0",} 2.0

GO and other systems use a pattern to collect and expose the application version, collecting as a gauge metric with the version info label and the gauge value set to 1.
The pattern is described in this post.

e.g.

application_info{version="v1.0.1"} 1

I think It would be interesting to big-brother follows the same pattern.

Possibilidade de adicionar descrição ao DependencyChecker

Avaliar a possibilidade de adicionar descrição no resultado de DependencyChecker.run(), principalmente se DependencyState for DOWN.

Por exemplo, quando ocorrer uma Exception, na execução de run(), capturar a Exception e usar getMessage() para usar na descrição.

Ou, caso o usuário da api deseje, adicionar mais informação no retorno de DependencyState.

Wrong build context in "alertmanager" service

While trying to run docker-compose build, the following error appears:

ERROR: build path /big-brother/alertmanager either does not exist, is not accessible, or is not a valid URL.

The value of the build context inside alertmanager service is outdated.

Should Big Brother come in two flavors?

From #38, we noticed that Big Brother somewhat silently supports both the Prometheus Federation mechanism and also HA with Cortex.

Should we keep supporting both? If yes, we have tons of improvement to make to our Readmes, examples and alerting rules (hosted at BB-Promster vs hosted at Cortex's ruler service).

If not, we should review our readme, prioritize issue #27 and completely remove federation references from both our docker and k8s guidance material (example deployment files, helm charts, docker-compose, README, etc)

Create gorilla mux middleware

Labbs' Whisper project uses a self-created observability lib. We want to use the Big Brother's protocol, but there is no lib available. With that in mind, we should:

  1. Create a new labbsr0x repo;
  2. Create a valid go module lib that exposes the Big Brother metrics at a configurable endpoint and all the features the other libs provide;
  3. When done and tested, add this new lib as a submodule to this repo.

Dev Chart

As we have made efforts to deploy cortex on k8s, we could create a chart to deploy a stack of test.

This stack will include generator-metrics, bb-promster and etcd.
The chart have to attend to the requirements:

  • 1 bb-promster for each generetor-metrics

  • 1 etcd deployment and service

  • generetor's metrics replicas configurable

Send Grafana to its own repo

The Grafana and it's dashboards as it stands today should be sent to it's own repo and added as a valid dashboard to the grafana.com marketplace

Front-end proposal

The bb team needs to update targets on prometheus for some projects. This process is done by generating a file and copying it to prometheus container. Eventually, the big brother project will need those targets on a Etcd, a possible solution to help the bb team and keep the targets is create a front-end to interact to Etcd. Bb-bot project already has the endpoints to make operations on Etcd, then a solution will need just add more one endpoint to delete Ips for an app. The front-end solution can be implemented on a lightweight framework like Svelte. The requirements of front-end project will be:

  • List apps

  • List Ips for an app.

  • Add a Ip on the app

  • Generete a json file to update prometheus with a giving number of Ips.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.