Comments (14)
But I'm fine with the #13 (comment) suggestion. Try to keep the number of logical groups low and if the separation creates a problem, we'll report it and merge the components back into a single namespace so that operations can resume.
@tumido @anishasthana @4n4nd does this work for you guys?
from apps.
I think we're pretty settled on multi namespaces and all our current work is making this assumption. Our talks with upstream also suggest that they recognize that multi-namespaces is something they need to support. We don't foresee this changing any time soon so I'm closing this issue again, if the question arises again we can revisit. Feel free to continue linking issues/threads here to keep track of any discussions surrounding this topic.
from apps.
I'm not convinced that starting with a single namespace for all components makes sense. Per some discussions offline,
it makes more sense to deploy groups of components per namespace. i.e Jupyterhub + spark in one namespace, Superset + data catalog(once it's in) in another.
This way we can separate concerns per namespace. This will give us better control on quotas for namespaces/applications. It would also simplify management of the ODH (split logs across namespaces). It would be very painful to have to debug issues if everything was in one namespace. Catastrophic failures could also snowball into everything being down. (An admin accidentally deletes all routes as opposed to just the routes for superset)
from apps.
I think the biggest concern has been to stick close to upstream, then should we find problems/issues, we relay them back upstream.
But we already know from past experience (e.g. from idh) that grouping all components in a single namespace can be counter-productive. So in line with our operate-first goals and influencing upstream, may I suggest we do the following:
- we list out the issues with single namespace deployment (Anish has already done some of this above)
- we go ahead with our initial plan to use multi-namespace, with a logical grouping of components per namespace
- we make an issue upstream linking this proposed structure and this particular thread as the motivation, and air out concerns with the upstream proposed kfdef with all components (citing reasons for why we feel this is not particularly useful) in 1 ns.
let me know what you guys think!
from apps.
Let's take this a bit broadly:
kfdef
resource is a namespaced resource. It's easy to put everything in it and have it deployed into a single namespace and a single network. However it's not realistic expectation to have a customer do it.
Deploying full ODH + Kubeflow in a single kfdef
would result into many pods (~50 or more) being run at the same time at the single namespace - all of that without any workload on top of it. I also expect many clashing resource names (like configmaps for odhargo
and Kubeflow's argo
, database pods, because all of them are named postgres
, etc... ) as a result.
On the other hand, the kfdef
resource is flexible enough, so you can really define any renderable manifests as an application in it. Or any set of applications. And this only defines what the kfclt
operator will try to deploy into your single namespace.
I don't think that anybody in ODH upstream expects customers to deploy it all into a single namespace. Customer is probably expected to pick what they want and need and modify it. The same way, nobody it expecting customers to be running a Superset instance with a sqlite as a production service, yet that's what ODH manifests suggest. It's a showcase that is expected to be further modified to be operational.
And part of that is to scope the resources into manageable pieces. With this comes the project/namespace differentiation. Let me illustrate this on couple of examples:
- Do I want my spark cluster to deplete all the namespace quota while I also have Superset and Argo running in there? Definitely not. My SRE's responsibility is to ensure that if I have too many Spark instances running it's blocking the spark only, therefore I need to have it running in a separate project with appropriate resource quota on that namespace.
- Let's say I'm deploying ODH Argo and Kubeflow's argo at the same namespace. Now I have 2 argo instances in different versions in the very same namespace. That's not gonna behave.
- Let's say I deploy Airflow and the datacatalog into the same namespace. Now I have a namespace which is deploying multiple database clusters. And even 2 databases of the same kind. One postgres instance for AirFlow, one mysql for Hue, another postgres for Thriftserver... (this example is a bit artifical, since data catalog is not in the new ODH yet, though the original implementation has 2 different databases)
- Argo users are usually expected to have permissions for lowlevel Kubernetes resouce interactions, view pods, logs, pvcs... Do I want all my users of Argo to have access to these resources and messing around, when I'm running my personalized jupyterhub clusters and databases?
To sum it up, we shouldn't be deploying everything into the same namespaces only because we can do that and it's easy to do. It's not a reasonable and realistic expectation. We need to provide realistic service as an outcome of this project.
Let's do things the right way and deploy it in the way we see it being managed for the best SRE experience. And if we encounter issues with that? That's expected. Let's report it or even better fix it. We can provide overrides for our local setting or bundle them into overlays which can be applied on top of ODH to allow for proper inter-namespace networking...
from apps.
To me, this whole kfdef
thing in a single namespace sounds like buffet, a nice little Smörgåsbord. You can pick whatever you want, fix up yourself a nice delicious plate. But nobody ever expected you to eat it all at once. And nobody's pleased you did it.
from apps.
I agree with keeping multiple namespaces, it seems like the logical way to do it. We should create an issue on upstream ODH with a list of groupings that we think are logical and have the ODH team approve it.
This could be used as documentation by ODH "customers".
from apps.
We could also have upstream ODH point to Operate-first as a good, opinionated way to do things.
from apps.
My suggestion would be to deploy everything into one large NS and then separate out piece by piece.
Does separation out pose any problems? Like leftovers and garbage?
But I'm fine with the #13 (comment) suggestion. Try to keep the number of logical groups low and if the separation creates a problem, we'll report it and merge the components back into a single namespace so that operations can resume.
And btw, if a conflict of resources exists because of a single NS deploy, this is also a bug to be reported.
from apps.
Great write-up @tumido , I agree with pretty much everything. For this bit right here:
I don't think that anybody in ODH upstream expects customers to deploy it all into a single namespace. Customer is probably expected to pick what they want and need and modify it. The same way, nobody it expecting customers to be running a Superset instance with a sqlite as a production service, yet that's what ODH manifests suggest. It's a showcase that is expected to be further modified to be operational.
There are a lot of assumptions here I think. This may be true, but I'd argue that this is not clear. I wonder if there are docs that explain this level of flexibility as being intended for such a use case and encouraged. If there are, we should find them and confirm our thoughts. If there are not, we should either make an issue upstream asking for docs or submit a pr ourselves in the appropriate location.
from apps.
I'm happy with that proposal
from apps.
Cool, closing this issue, we'll go with the aforementioned proposal.
from apps.
@HumairAK @anishasthana looks like upstream is not really clear on single-ns vs multi-ns.
opendatahub-io/odh-dashboard#26 (comment)
I'm reopening this issue until we get clearer guidance
from apps.
We're also continuing further discussions upstream with regards to monitoring, follow that discussion here
Let's use this issue to track the various issues being spawned upstream from the multi/single namespace discussions. Please link any other issues you are all starting here, thanks.
from apps.
Related Issues (20)
- OSC cluster-resources app failing to sync servie catalog plugin configmap HOT 5
- OS-Climate cluster-scope overlays do not maintain folder structure HOT 5
- ODF/OCS cleanup in manifests HOT 9
- document adding cert manager to a cluster HOT 4
- [EPIC] - Os-cimate cluster 2 resource usage optimizing needed HOT 4
- Decommission Balrog cluster (AWS) HOT 3
- Kepler Edge Demo Environment Set up HOT 7
- Install alerting-stack operators alongside MCO HOT 4
- Upgrade to ODH 1.3 For Smaug, osc-cl1, osc-cl2 HOT 5
- Kubernetes vault auth on jerry failing becuase of staging certificate HOT 1
- increase stockage space and create PVs HOT 3
- Argocd User projects cannot deploy external secrets
- review exposure to openssl vulnerability CVE-2022-3602, CVE-2022-3786 HOT 4
- [Resource Request]: change users in b4mad group HOT 7
- test
- Link results in a 404 File not Found HOT 1
- [Resource Request]: MySQL Deployment Test HOT 2
- Op1st Decomissioning Overview
- Op1st Which namespaces can and cannot be decomissioned HOT 3
- Rapidast migration information HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apps.