Giter Club home page Giter Club logo

slogen's People

Contributors

agaurav avatar arpitjain305 avatar jinkane avatar kumoroku avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

slogen's Issues

Burnrate alerts aren't working correctly

I have an SLO that is 30m (short window) and 6h (long window). I've put the threshold the same on both.

When the SLO was triggered, it was quite quick (within 5m) but the alert took 6 hours to resolve after it went back to normal.

I would have expected it to be resolved quickly according to https://sre.google/workbook/alerting-on-slos/

Looking into this a bit deeper, I think that the threshold values on the monitor take 6 hours to evaluate, and it might not be possible to do "Multiwindow, Multi-Burn-Rate Alerts" using sumologic's monitors.

Missing Data Alerts

The Monitors created don't have useful missing data alerts.

I'd like to be alerted if I screw up a query and the total count goes to 0 for x minutes.

SLO Burn rate monitoring is incorrect

Hi team, I've been using your tool extensively and I am loving it!

I have come across an issue with the monitoring of an SLO.

My current alerting configuration is as follows:

apiVersion: openslo/v1alpha
kind: SLO
metadata:
  displayName: xxx
  name: xxx
spec:
  service: xxx
  budgetingMethod: Occurrences
  objectives:
    - ratioMetrics:
        total:
          source: sumologic
          queryType: Logs
          query: |
            xxx
        good: 
          source: sumologic
          queryType: Logs
          query: 'xxx'
        incremental: true
      displayName: xxx
      target: 0.99
alerts:
  burnRate:
    - shortWindow: '10m'
      shortLimit: 14
      longWindow: '1h'
      longLimit: 14
      notifications:
        - connectionType: 'Email'
          recipients:
            - 'xxx@xxx'
          triggerFor:
            - Warning
            - ResolvedWarning
    - shortWindow: '30m'
      shortLimit: 6
      longWindow: '6h'
      longLimit: 6
      notifications:
        - connectionType: 'Email'
          recipients:
            - 'xxx@xxx'
          triggerFor:
            - Warning
            - ResolvedWarning
    - shortWindow: '6h'
      shortLimit: 1
      longWindow: '24h'
      longLimit: 1
      notifications:
        - connectionType: 'Email'
          recipients:
            - 'xxx@xxx'
          triggerFor:
            - Warning
            - ResolvedWarning

When I evaluate the SLO over a 24h period, it is currently at 98.41897 (which is below the 99 to meet the SLO).

I would have expected that I would receive at least 1 email stating that this SLO is not being met, however all the monitors generated aren't being triggered.

I'm wondering if the calculation of one of these items may be incorrect?


Current version: There is no slogen command to output the version, but I'm pointing to the latest of the main branch.

aggregated overview dashboards at service level

apps are made of services and apps might be part of a portfolio. For example, a bank may have online banking as a portfolio consisting of the following hierarchy:

mobile banking app -> consisting of account service and bill pay service
credit card app -> consisting of account service and payment service
In this example, SLO/SLIs defined at the service level would roll up to the corresponding app

Pre-requisite:

One or more SLOs have been defined for various services
Success Scenario:

Operations user (e.g. developer) picks which services should be grouped (ideally by consulting the service map)
System validates the following
the evaluation type is identical for chosen services (e.g periodic or aggregate but not mixing the two)
compliance period and type match
If the grouping are valid, SLI/SLO/error budget budgets are visualised

Changing SLOs causing issues with scheduled views

I've noticed that when I update an SLO, often the dashboards and graphs do not update with the new data for that SLO.

As an example, I'm currently writing latency SLOs as follows:

apiVersion: openslo/v1alpha
kind: SLO
metadata:
  displayName: api Latency
  name: api-ltc
spec:
  service: account-opening
  description: The amount of POST requests to /api that are faster than 2s
  budgetingMethod: Occurrences
  objectives:
    - ratioMetrics:
        total:
          source: sumologic
          queryType: Logs
          query: |
            _sourceCategory=/xxx
            | kv "path", "elapsed_time"
            | where path matches /api/
        good: 
          source: sumologic
          queryType: Logs
          query: 'elapsed_time < 2000'
        incremental: true
      displayName: api calls that are faster than 2s
      target: 0.90

Now, if I update the SLO to a new value of 1s, the graphs and data that populates the new dashboards is the same data that was from the original elapsed_time < 2000.

I have a feeling this is because the data kept in a scheduled view is not deleted, but just disabled.
https://help.sumologic.com/Manage/Scheduled-Views/Pause-or-Disable-Scheduled-Views#disable-a-scheduled-view

Once disabled, no additional data can be indexed in a scheduled view. A disabled scheduled view is not technically deleted, but it can't be re-enabled. If you disable a view and later create a new view with the same name, you won't see duplicate results; instead all the data from both scheduled views are treated as one.

If this is true, I wonder if it's worth hashing the query and appending it to the scheduled view name?

Often when I switch an SLO, I expect it to stop firing and re-evaluate the data.

binary for linux x86_64 does not work

From the release page for v1.0.1, I downloaded slogen_1.0.0_Linux_x86_64.tar.gz (strange that the version number is different!). And I attempted to run it:

$ ./slogen --help
./slogen: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./slogen)
./slogen: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ./slogen)
./slogen: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./slogen)
./slogen: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.35' not found (required by ./slogen)
./slogen: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by ./slogen)

I'm not sure what the problem is here, but the previous release works fine on my system (HP Elitebook G5 running Ubuntu 18.04.6

Here's the release from https://github.com/OpenSLO/slogen/releases/tag/v1.0-beta :

$ slogen --help

Generates terraform files from openslo compatible yaml configs. 
Generated terraform files can be used to configure SLO monitors, scheduled views & dashboards in sumo.
One or more config or directory containing configs can be given as arg. Doesn't supports regex/wildcards as input.

Usage:
  slogen [paths to yaml config]... [flags]
  slogen [command]

Examples:
slogen service/search.yaml 
slogen ~/team-a/slo/ ~/team-b/slo ~/core/slo/login.yaml
slogen ~/team-a/slo/ -o team-a/tf


Available Commands:
  completion  Generate the autocompletion script for the specified shell
  destroy     destroy the content generated from the slogen command, equivalent to 'terraform destroy'
  docs        generate markdown documents of this tool in the specified path
  help        Help about any command
  list        utility command to get additional info about your sumo resources e.g. 
  new         create a sample config from given profile
  validate    A brief description of your command

Flags:
  -o, --out string                output directory where to create the terraform files (default "tf")
  -d, --dashboardFolder string    root dashboard folder where to organise the dashboards per service (default "slogen-tf-dashboards")
  -m, --monitorFolder string      root monitor folder where to organise the monitors per service (default "slogen-tf-monitors")
  -i, --ignoreErrors              whether to continue validation even after encountering errors
  -p, --plan                      show plan output after generating the terraform config
  -a, --apply                     apply the generated terraform config as well
  -c, --clean                     clean the old tf files for which openslo config were not found in the path args
      --asModule                  whether to generate the terraform config as a module
      --useViewHash               whether to use descriptive or hashed name for the scheduled views, hashed names ensure data for old view is not used when the query for it changes
      --onlyNative                whether to generate only the native slo resources
      --sloFolder string          root slo folder where to organise native slos by service (default "slogen")
      --sloMonitorFolder string   root monitor folder where to organise native slo monitors by service (default "slogen")
  -h, --help                      help for slogen

Use "slogen [command] --help" for more information about a command.

And here's my machine info:

$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

$ uname -a
Linux SunilChop2019-Lunix 4.15.0-194-generic #205-Ubuntu SMP Fri Sep 16 19:49:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

enable dash variables to be regex/wildcards

currently its in form where ("{{task}}"="*" or task="{{task}}"), needs to be | where ("{{task}}"="*" or task matches "{{task}}") instead so that wildcards also used while filtering.

SLO dashboards should also be able to do 30 days rolling

All of the dashboards created by slogen are "This Month", and since its the 1st of March, most of the data is only showing for 1 day.

I'd like the dashboards to be the last 30 days rolling instead so that I can better assess if the SLO is being met.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.