Giter Club home page Giter Club logo

ukhomeoffice.service_monitoring's Introduction

Service Monitoring

This is a repository to help standardise on a way of approaching monitoring, so that we have consistency as well as homing in on important specifics resulting in useful and effective monitoring.

All of this information is taken from SRE, (Site Reliability Engineering), by google and their lessons / approaches used.

Issues with Monitoring

A lot of time people approach monitoring at the tool end as oppose at the development cycle. This usually, despite best efforts; tends to be a last minute approach in producing dashboards / alerts within whatever tool provided and tweaking throughout the lifecycle. This tends to lack any real structure about what you are monitoring and why. What does the business care about? What do you care about, enough to be called up in the middle of the night?

Without the approach being considered upfront, new features may lack monitoring or some features may go unmonitored completely, despite it offering functionality to users.

Adoption

We have taken core aspects of SRE principles, but thought through it's application at development time. As we already have BDD (Behavior Driven Development), we want to establish how to apply monitoring to those principles as you iterate on a service.

It's important for us to note that there will be varying users that may not be a user of a normal BDD scenario, so we need to fulfil all aspects as part of the same principles as BDD, despite it not being specific to a user journey. This means knowing about aspects of things that might impact or be about to impact some functionality of that service, before it happens.

Service Level Object

As mentioned above, we have taken the adoption and focused it towards BDD. With this in mind, we would expect certain objectives about a service to be driven by that scenario. You would also have an overall object of that service, that might mean you focus more on specific scenarios more than others i.e. losing partial functionality in some areas may not be deemed too critical to the serivce than others.

Once you select the objectives for each scenario we can then drive the metrics i.e.

  • A user would log on (<1s with x amount of concurrent users)
  • A user would then search and get a paginated response within x period (define what the period would be, would you expect this to be per data set size?)

Four Golden Signals

The four golden signals are core monitoring categories that help focus on specifics on what to monitor. These are:

  • Latency The time it takes to serve a request
  • Traffic Transactions / retrievals per second or HTTP requests
  • Errors Number of errors received, either via codes or missing headers you'd expect etc.
  • Saturation How "full" your service is i.e. memory constrained / CPU constrained.

We would expect these to be applied to the BDD scenarios we create as mentioned above in the SLO (Service Level Object), we would then expect indicators to drive the specifics of what you are looking at.

Service Level Indicator

A defined measurement of some aspect of the level of service

  • How many errors a service can have to meet an SLO
  • How much latency can you deal with and number of errors for a logon to take the SLO period you've defined for that amount of users.

ukhomeoffice.service_monitoring's People

Contributors

jon-shanks avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.