Giter Club home page Giter Club logo

odh-deployer's People

Contributors

accorvin avatar andrewballantyne avatar anishasthana avatar atheo89 avatar cfchase avatar dharmitd avatar diegolovison avatar dlabaj avatar gmfrasca avatar harshad16 avatar humairak avatar jkoehler-redhat avatar jooho avatar laubai avatar lavlas avatar lucferbux avatar maroroman avatar maulikjs avatar rimolive avatar samuelvl avatar starburst-blumbert avatar taneem-ibrahim avatar tmckayus avatar vaishnavihire avatar vannten avatar vedantmahabaleshwarkar avatar vpavlin avatar xaenalt avatar zdtsw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

odh-deployer's Issues

Create full automated test on validating Prometheus alert rules

Some context

We want to create a full automated test on validating prometheus alert rules, i.e., when a PR is made, test is ran and checks whether the changes will cause the alerts to fire.

The alert rules are defined in the odh-deployer prometheus config-map, while the components such as CodeFlare Operator, MCAD, and others are in their own separate repos, making it more challenging to run these tests against all components.

Questions that come to mind:

  • Should this test be included in the e2e test or on its own.
  • Should the test be ran on each component including the odh-deployer repo?
  • How would the test check if changes on one repo would cause alerts (defined in a odh-deployer repo) to fire?

Suggestions

  • RobotFramework: Generic automation framework for acceptance testing and RPA. There is test automation for RHODS. There are some tests verifying the existing alerts in ods-ci
  • PromTool: Tooling for the Prometheus monitoring system. This tool can check the rules for syntax errors, it can run unit tests, which I suppose if we run the unit tests as part of the e2e tests, would that work to verify that the alert is working as expected?

Update Jupyterhub SLI dashboard

We need to revisit the grafana dashboard once we have the DB monitoring metrics ready and do the following:

  1. Update panels for corresponding Jupyterhub DB Metrics
  2. There is a mismatch between our dashboard and the list of SLIs we want to monitor, we currently don't have a panel for Error Measurement for Jupyterhub Database
  3. We need to polish all the panels and make sure the scales/values/comments/titles are clean and make easy to make sense of.

Inconsistent Namespace prefix

RHODS creates the following namespaces:

redhat-ods-applications
redhat-ods-monitoring
redhat-ods-operator
rhods-notebooks

rhods-notebooks uses a different prefix name from the other namespaces that RHODS creates making it more difficult to search/find the namespaces related to RHODS.

Add CodeFlare components as scrape targets and alerting rules

We require Prometheus to actively look for the pods and fire alerts if the conditions are met. CodeFlare and MCAD components need to expose metrics endpoint before alerts are passing.

Alerts can be added to test if Prometheus is successfully scraping the endpoints and to meet with SLIs/SLOs

Update ODH Application CRDs and CRs.

Currently ODH Application, ODH Documentation, and ODH Quickstart CRDs and CRs are stored in the deployer repo. For consistency and maintainability we should remove these resources from the deployer repo and move them over to the manifest repo. This will bring us in line with what is currently being done with other CRDs within the project.

Here is the PR with the current implementation #260

This will require updates to the deployer bash script as well.

Grafana datasources secret

So the way grafana is defined is we need to provide it with the data source (in our case Prometheus) definition in grafana-datasources secret. In the secret, we provide it with a bearer token from a service account. We need some sort of templating/automation which would create this secret for us with the right bearer-token every time the script is deployed. We arent tracking this secret in git currently so once we have a solution we need to add it to this repo.

Refactor prometheus-configs to allow for PromTool unit testing

Issue

We would like to create and perform unit tests on the rules that are added. To achieve this, the simplest way is to make use of PromTool. PromTool requires the alerts to be in their own yaml file as the tool is not able to directly parse from the ConfigMap.

Possible Resolution

  • Move alerting-rules and possibly recording-rules from the ConfigMap to their own files.
  • Create Make target that makes use of PromTool to automatically run all tests.
  • Include a simple unit test to verify this new feature.

Review docker file for root usage

The Dockerfile sets the HOME env var to /root, does this imply the image is running as root? Does it need to? If it isn't running as root, HOME should be set somewhere else.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.