operate-first / apps Goto Github PK
View Code? Open in Web Editor NEWOperate-first application manifests
License: GNU General Public License v3.0
Operate-first application manifests
License: GNU General Public License v3.0
Currently we don't have another repo to store non-odh related apps. In the broader scheme of things, from the perspective of operate-first, odh is just another app, and I don't think we really need to have this entire repo be dedicated to odh only.
I propose we change this repo to be more generic and rename it to operate-first-apps apps
, and move the odh contents in to a sub directory.
I'm not convinced that starting with a single namespace for all components makes sense. Per some discussions offline,
it makes moresense to deploy groups of components per namespace. i.e Jupyterhub + spark in one namespace, Superset + data catalog(once it's in) in another.
This way we can separate concerns per namespace. This will give us better control on quotas for namespaces/applications. It would also simplify management of the ODH (split logs across namespaces)
We have set up Grafana, but we need to add a datasource (example resource) so that Grafana is connected to the monitoring prometheus instance.
ODH was added in moc cnv:
https://gitlab.com/open-infrastructure-labs/moc-cnv-sandbox/-/merge_requests/8
First confirm with odh (likely via issue upstream) if there are plans to add such service monitors and the eta. Based on this feedback, we should discuss here whether we need to proceed with adding our own and sending them upstream.
[anishasthana] At a minimum, we should update/create monitoring for the following:
Ideally via openshift
We need to create MOC rolebindings for the operatefirst declaratively for each component / namespace, they should go here:
https://github.com/operate-first/apps/tree/master/odh/overlays/moc
@HumairAK @tumido can we easily add https://github.com/opendatahub-io/odh-dashboard#open-data-hub-dashboard ?
It'll be included in the 0.9 release of ODH, but having it now would ease user interaction
Since the odh-dashboard requires a single component instance installed via kfdef to pick up it's route, it would be ideal if we could have only one odh deployed prometheus instance via one odh deployed prometheus operator.
This issue is to test out if we can have service/pod monitors all deployed in a single namespace like opf-monitoring
then give rbac to a single prometheus instance to other namespaces like opf-jupyterhub
and opf-argo
and monitor them from a single source.
In the case for Quicklab or General Openshift:
The purpose of this issue is to see if this possible
Please link all secrets that will need to be added as part of the kustomize builds in this repository. Once we have our secrets-management infrastructure in place, we'll use this thread as an aggregate point to update all kustomizations that need to be updated.
Kube state metrics provide the individual pod resource usage metrics that we would like to have for our dashboards.
Add Prometheus alertmanager to our monitoring stack
Related PRs:
Related upstream PRs:
Related internal DH PRs:
Related: #11
The instructions to create new Kafka topics need to be updated after the recent directory restructure.
Superset is running into the following error on MOC:
running "VolumeBinding" filter plugin for pod "superset-1-f9zll": pod has unbound immediate PersistentVolumeClaims
The PVC shows the following error:
superset-data:
no persistent volumes available for this claim and no storage class is set
I'm guessing this is related to #22 - where JH had similar issues. But superset
component doesn't seem to give the same method to change storage class via a parameter. We should try overriding this for now. If it works, suggest upstream to add a parameter to specify storage class.
We are already starting to see some moc
specific implementation (e.g. storage class for JH #22 and possibly superset #35). We need a way to separate these out from general deployments.
We could separate them out into bases/overlays (where overlays have folders like moc
and quicklab
). The problem is then our argocd-apps
repo will become environment specific (since we specify a path
). As a solution we could also extend the base/overlays structure to the argocd-apps
repo. That's one suggestion, open to other ideas.
Once ODH merges Data Catalog, switch OPF manifests repo pointers.
Related to: #32
I accidentally pushed a commit to master (shameful commit) and had to revert it. Can we disable commits to master in the org level?
This is a master task for categorical encoding JH images deployment within the Operate First CD. The aim is to have a fully transparent, open source, deterministic deployment of a selected usecase.
Steps:
All components should go in a single namespace, based on the examples for the kfdef files provided from upstream. This will require restructuring of the operate-first/odh repo.
We encountered an issue where prometheus could not see alertmanager on one of the quicklab clusters. It seems this is because the default
SA that Prometheus gets assigned to did not have sufficient permissions to see alertmanager. Assigning prometheus a serviceAccountName
in the .spec
of the prometheus CR where the SA had project admin permissions solved this issue.
We should remove the namespaces from base and assign these in the respective overlays, see #98 for more discussion.
Related Issues:
Recently we've been coming across situations where we'd prefer to overlay instead of override resources. The difference being, when we overlay, we are having argocd deploy them, whereas with overriding in ODH, we have ODH deploy them.
Context:
So in some cases ODH deploys operators, which includes CRDs. Let's take for example KafkaTopics. Say I have ArgoCD deploy the KafkaTopics and a KFDef that deploys strimzi. Great, say ArgoCD tries to deploy KafkaTopics
first, well it can't because it needs to deploy the KFDef first. So the application will fail.
We can use ArgoCD waves
to tell ArgoCD to deploy our kfdef (and thus Strimzi) first, then deploy KafkaTopics
second. This too will fail because the ODH operator needs to read the KFDEF and deploy the CRDs itself. So we need ODH to deploy to the KafkaTopics
by using overrides if we want to prevent errors from showing up. \
Problem:
I personally don't know if ODH is at a point where we can confidently override certain resources and expect to have them be sync'ed immediately. I have personally encountered situations where ODH will take a while to update the resources, sometimes having to restart ODH itself. I would be interested to hear what your own experiences have been.
Another situation is, by being forced into override, we are forced into following the folder structure of the odh-manifests
repo. For example if I want to override grafana dashboards, then I need to update this file here. But what if I want to add another dashboard? The way I understand it I would need to override the kustomization here. Okay, but then I have a kustomization, that looks like itt's pulling some files from this repo, and some files from another repo (and the other repo's existence isn't even clear right away by looking at this directory). It just seems messy to me.
Yet another issue is that we cannot see kfdef child resources any more, see here for more details. So we have no sync/visual control over these resources via argocd.
Related: #11
Some of the SRE monitoring dashboards such as the SLI/SLO dashboard for JH, require additional kube state metrics which consist of some interesting metrics such as individual pod resource usage metrics.
When deploying the monitoring setup in quicklab (via argocd), ran into the following Application Sync Error
:
The monitoring manifests should be updated to be deployed in waves: https://argoproj.github.io/argo-cd/user-guide/sync-waves/
so that the kfdef is deployed first, and then followed by the grafana/prometheus custom resources
Add a proxy to the prometheus instance so that only authenticated users have access to it.
This can be done by adding a proxy container to the Prometheus config:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: odh-monitoring
labels:
app: prometheus
prometheus: k8s
spec:
serviceMonitorSelector: {}
securityContext: {}
ruleSelector: {}
replicas: 2
containers:
- name: oauth-proxy
image: openshift/oauth-proxy:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8443
name: public
args:
- --https-address=:8443
- --provider=openshift
- --openshift-service-account=proxy
- --upstream=http://localhost:9090
- --tls-cert=/etc/tls/private/tls.crt
- --tls-key=/etc/tls/private/tls.key
- --cookie-secret=SECRET
Add service monitor for loki
Add new Prometheus deployment for opf-observatorium namespace
Based on our discussion in operate-first/support#1086, we should add documentation for adding new grafana dashboards
(also related to operate-first/sre#5)
Add KfDefs and deploy all components added via kfdef #9 on MOC using same method as the one employed here
Related: #11
This should probably be split into separate issues for the individual data catalog components
The repo has gone through an overhaul, and a lot of the path's/structure is currently out of date in the README.
As per the discussions in this issue thread here we should document / summarize the important bits of the discussion. Mainly we want to document the override process and include instructions on how one can override changes. We should also link to the examples presented and Vasek's blog post for further reading.
ODH Dashboard requires all sub-components to be deployed in order to show the component on the Dashboard. We're violating that in our Grafana deployment, since we deploy only the grafana-cluster
while grafana-operator
is not deployed via kfdef and we rather use our own manifests for it. However that causes troubles for ODH Dahboard.
Solutions:
grafana-operator
component to make ODH Dashboard happy.I would prefer solution 1. WDYT?
Related: #11
The default access should be read only for users.
Continuing discussions from here #3
There is a bit of disagreement on which method to follow, please air your concerns here once more so we can continue the discussion.
Related: #11
Upstream ODH already has prometheus monitors in place for Argo, but they haven't been updated in a long time. It's very possible that they don't correctly function any more / don't correctly grab metrics
Once the ODH dashboard can handle multi-namespace deployments, switch to the upstream images.
See: #42
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.