Giter Club home page Giter Club logo

gcc-cwhd's Introduction

GCC Azure - Central Workload Health Dashboard (AZCWHD)

CWHD is a custom Azure monitoring solution leveraging Grafana to monitor the following aspects:

Color code signals in Grafana dashboards showing Green, Amber and Red tiles depending on:

  • overall resource heath from Azure Resource Health signals
  • all App health using App Insights Standard Test (HTTP ping) web app availability signals * for VM only - configurable threshold of CPU, Memory and Disk usage to display Amber color when threshold is met. (only works for VM)
  • dashboard visualization tiles uses Green, Amber and Red color code to determine the overall availability of an application aggregated by one or more Azure resource's Resource Health

The dashboards are organized in Level 0 and Level 1 depicting the "depth" of monitoring.

  • Level 0 - shows availability status if all Apps.
  • Level 1 - drills into Resource Health of each Azure resource used by the app


Prerequisites

  • Required Telemetry / Logs

  • Azure Resources Required

    • a "central" Log Analytics Workspace
    • Azure Managed Grafana
      • enable Managed Identity
      • add Azure role assignment (RBAC) for Grafana Managed Identity with Monitor Reader to:
        • Subscriptions containing resources under monitoring
        • Log Analytics Workspace (if workspace in different subscription from above)
    • Azure Function - App Service Plan S1
      • enable Managed Identity
      • add Azure role assignment (RBAC) for Function Managed Identity with Monitor Reader to:
        • Subscriptions containing resources under monitoring
        • Log Analytics Workspace (if workspace in different subscription from above)
    • All Application Insights must be linked to the same central Log Analytics Workspace
    • Create App Insights Standard Tests to perform availability tests for all App Services and Web Apps. (Standard Tests logs are stored in AppAvailabilityResults table)
  • Assumption

    • has an existing Log Analytics Workspace where "all" Application Insights are linked to

Architecture

image

CWHD uses a variety of Azure resources including a core Azure Function named Resource Health Retriever, acting as health status aggregator to retrieve and aggregate metrics and health statuses from different data sources depending on the resource types under monitoring.

In the health status aspect of CWHD, Resource Health Retriever function supports the following:

  • "General" resource types (all non App Service types): get their health status from Azure Resource Health via Resource Health Rest API.

  • App Service: function performs log query from Log Analytics AppAvailabilityResults table to get the latest Standard Test result. Reason for not getting health status from Resource Health API is that when an App Service is stopped, Resource Health still shows "Available", this behaviour is by design. Requirement is to show "Unavailable" when an App Service is stopped.

  • VM: health status is determine by 2 factors

    • Resource Health availability status determines if VM is available or not depicting the Green or Red status.
    • If resource health status is Available/Green, additional 3 metrics CPU, Memory and Disk usage percentage will be monitored according to a set of configurable thresholds. In Grafana, VM Stat visualization will show Amber status if one or more of the 3 metrics reaches the threshold.

Level 0 Dashboard

image

The overall available status (green) depends on the dependent Azure resources that each app here is using. If there is any one of the Azure resource used by Cloud Crafty or Pocket Geeks apps that has Resource Health status as "Unavailable", the overall health status at Level 0 will be Unavailable. For example Cloud Crafty uses 3 Azure resources: App Service, Key Vault and APIM. The overall availability status will only be Green when all 3 resourcecs' Resource Health + App Insight Standard Test availability status is available.

Level 1 - Cloud Crafty Dashboard

image

Level 1 - Pocket Geek Dashboard

image image

Level 2 Dashboard

Proposed Distributed Tracing with OpenTelemetry Collector to collect OpenTelemetry traces from apps, collector sends traces to Jaeger backed by Azure Managed Cassandra. Grafana gets traces from Jaeger as datasource to display traces within Grafana centrally, in addition to viewing traces in Jaeger UI.

gcc-cwhd's People

Contributors

weixian-zhang avatar

Watchers

Gunn avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.