Giter Club home page Giter Club logo

configuration's Introduction

Red Hat Observability Service

This project holds the configuration files for our internal Red Hat Observability Service based on Observatorium.

See our website for more information about RHOBS.

Requirements

  • Go 1.17+

macOS

  • findutils (for GNU xargs)
  • gnu-sed

Both can be installed using Homebrew: brew install gnu-sed findutils. Afterwards, update the SED and XARGS variables in the Makefile to use gsed and gxargs or replace them in your environment.

Usage

This repository contains Jsonnet configuration that allows generating Kubernetes objects that compose RHOBS service and its observability.

RHOBS service

The jsonnet files for RHOBS service can be found in services directory. In order to compose RHOBS Service we import many Jsonnet libraries from different open source repositories including kube-thanos for Thanos components, Observatorium for Observatorium, Minio, Memcached, Gubernator, Dex components, thanos-receive-controller for Thanos receive controller component, parca for Parca component, observatorium api for API component, observatorium up for up component, rules-objstore for rules-objstore component.

Currently, RHOBS components are rendered as OpenShift Templates that allows parameters. This is how we deploy to multiple clusters, sharing the same configuration core, but having different details like resources or names.

This is why there might be a gap between vanilla Observatorium and RHOBS. We have plans to resolve this gap in the future.

Running make manifests generates all required files into resources/services directory.

Observability

Similarly, in order to have observability (alerts, recording rules, dashboards) for our service we import mixins from various projects and compose all together in observability directory.

Running make prometheusrules grafana generates all required files into resources/observability directory.

Updating Dependencies

Up-to-date list of jsonnet dependencies can be found in jsonnetfile.json. Fetching all deps is done through make vendor_jsonnet utility.

To update a dependency, normally the process would be:

make vendor_jsonnet # This installs dependencies like `jb` thanks to Bingo project.
JB=`ls $(go env GOPATH)/bin/jb-* -t | head -1`

# Updates `kube-thanos` to master and sets the new hash in `jsonnetfile.lock.json`.
$JB update https://github.com/thanos-io/kube-thanos/jsonnet/kube-thanos@main

# Update all dependancies to master and sets the new hashes in `jsonnetfile.lock.json`.
$JB update

Testing cluster

The purpose of RHOBS testing cluster is to experiment before changes are rolled out to staging and production environments. The objects in the cluster are managed by app-interface, however the testing cluster uses a different set of namespaces - observatorium{-logs,-metrics,-traces}-testing.

Changes can be applied to the cluster manually, however they will be overridden by app-interface during the next deployment cycle.

Refresh token

The refresh token can be obtained via token-refresher.

./token-refresher --url=https://observatorium.apps.rhobs-testing.qqzf.p1.openshiftapps.com  --oidc.client-id=observatorium-rhobs-testing  --oidc.client-secret=<token> --log.level=debug --oidc.issuer-url=https://sso.redhat.com/auth/realms/redhat-external --oidc.audience=observatorium-telemeter-testing --file /tmp/token
cat /tmp/token

App Interface

Our deployments our managed by our Red Hat AppSRE team.

Updating Dashboards

Staging: Once the PR containing the dashboard changes is merged to main it goes directly to stage environment - because the telemeter-dashboards resourceTemplate refers the main branch here.

Production: Update the commit hash ref in the saas file in the telemeterDashboards resourceTemplate, for production environment.

Prometheus Rules and Alerts

Use synchronize.sh to create a MR against app-interface to update dashboards.

Components - Deployments, ServiceMonitors, ConfigMaps etc...

Staging: update the commit hash ref in https://gitlab.cee.redhat.com/service/app-interface/blob/master/data/services/telemeter/cicd/saas.yaml

Production: update the commit hash ref in https://gitlab.cee.redhat.com/service/app-interface/blob/master/data/services/telemeter/cicd/saas.yaml

CI Jobs

Jobs runs are posted in:

#sd-app-sre-info for grafana dashboards

and

#team-monitoring-info for everything else.

Troubleshooting

  1. Enable port forwarding for a user - example
  2. Add a pod name to the allowed list for port forwarding - example

configuration's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

configuration's Issues

Dashboard widgets with error information should include error code/identifier

Many of the dashboard widgets with error information in this configuration are using a hardcoded "errors" caption in the charts, like so (from the Observatorium API dashboard):

image

This hides the details about which errors is this. It would be greatly useful if we could see the error code or any other error attribute that helps identifying it:

image

This happens in Observatorium's and Thanos' dashboards, across gRPC and HTTP errors charts.

Trace namespace SHOULD be a template parameter

It isn't possible to configure the trace namespace without regenerating the templates.

This means that oc process can't alter the namespaces. They aren't configurable.

resources/services/observatorium-traces-template.yaml has { name: 'OBSERVATORIUM_TRACES_NAMESPACE', value: 'observatorium-traces' }, in the Jsonnet.

It gets substituted in the correct places by resources/services/observatorium-template.yaml

- --experimental.traces.read.endpoint-template=http://observatorium-jaeger-{tenant}-query.${OBSERVATORIUM_TRACES_NAMESPACE}.svc.cluster.local:16686/
...
--experimental.traces.read.endpoint-template=http://observatorium-jaeger-{tenant}-query.${OBSERVATORIUM_TRACES_NAMESPACE}.svc.cluster.local:16686/

The problem is that the substitution happens when the Templates are being made. Rather than being something that is fixed during template generation, the traces namespace SHOULD become an OpenShift template parameter similar to NAMESPACE, OPENTELEMETRY_OPERATOR_NAMESPACE or JAEGER_OPERATOR_NAMESPACE.

cc @pavolloffay

Incorrect health check URL for OPA AMS

The opa-ams sidecar of the dev/test Observatorium API is sending error spans to the internal Jaeger. The errors are 404s when doing GET :8082/

I suspect the reason is that the sidecar is started with --web.healthchecks.url=http://127.0.0.1:8082, but that URL 404s.
See https://github.com/rhobs/configuration/blob/main/resources/services/observatorium-template.yaml#L296

I don't know what the correct health check URL is. Note that OPA AMS is started with --ams.url=${OCM_BASE_URL}
(This is not overridden anywhere on dev/test, as can be verified with kubectl -n observatorium-testing exec deployment/observatorium-observatorium-api -c opa-ams -- ps -ef.)

Issues in deploying RHOBS on OCP cluster

Some issues observed while deploying RHOBS on OCP cluster using launch script

  1. Deployment script under the test folder gets failed while adding rules This happened when deploying observatorium-template.yaml.
  2. Older image of observatorium-api is used in observatorium-template.yaml due to which observatorium-api remains in CrashLoopBackOff state.
  3. Older image of Thanos is used in observatorium-metrics-template.yaml due to which Thanos remains in CrashLoopBackOff state.
  4. The PVC which were supposed to be created in observatorium-metrics-template.yaml were all in pending state due to the gp2 storage class not available on the OCP cluster.

Do not annotate alerts with dashboard URL if none exists

Currently we annotate all alerts with dashboard URL in format https://grafana.app-sre.devshift.net/d/<dashboard_ID>/* - however, for alerts which do not have a dashboard we are returning link to non-existing dashboard URL https://grafana.app-sre.devshift.net/d/no-dashboard/*. We should not provide dashboard annotation if there is none.

Build problems on MacOS under Go 1.18

Makefile commands that worked in the past such as XARGS=gxargs SED=gsed make all now fail:

(re)installing /Users/snible/go/bin/jsonnetfmt-v0.18.0
# golang.org/x/sys/unix
../../../go/pkg/mod/golang.org/x/[email protected]/unix/syscall_darwin.1_13.go:25:3: //go:linkname must refer to declared function or variable
../../../go/pkg/mod/golang.org/x/[email protected]/unix/zsyscall_darwin_amd64.1_13.go:27:3: //go:linkname must refer to declared function or variable

See See https://stackoverflow.com/questions/71507321/go-1-18-build-error-on-mac-unix-syscall-darwin-1-13-go253-golinkname-mus for discussion. The solution suggested, go get -u golang.org/x/sys, isn't quite accurate for our situation.

To reproduce without our Makefile, I did

cd .bingo
GOBIN=$(go env GOPATH)/bin
go build -mod=mod -modfile=jsonnet.mod -o=${GOBIN}/jsonnet-v0.18.0 "github.com/google/go-jsonnet/cmd/jsonnet"

This fails, but if I then do go get -modfile=jsonnet.mod -u golang.org/x/sys and repeat the go build it succeeds.

I am working on a PR; expect it soon.

Compatibility issues with on macOS & documentation

I was trying to update some manifests lately and encountered two different set of issues:

Need fixing

ARM64 related issues with Golang based tools

Mostly caused by the jsonnet tools that have to be updated to be arm64 compatible. This is easy to solve.

Need updated documentation

GNU vs BSD tools

It's not documented that macOS users have to install gnu-sed and findutils (for GNU xargs) and update the Makefile to make use of them.

Jsonnet dependencies update

The instructions provided at the moment do not work, because make vendor doesn't exist anymore.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.