open-cluster-management-io / community Goto Github PK
View Code? Open in Web Editor NEWopen-cluster-management governance material.
Home Page: https://open-cluster-management.io
License: Apache License 2.0
open-cluster-management governance material.
Home Page: https://open-cluster-management.io
License: Apache License 2.0
We need to onboard submariner integration with open-cluster-management with addon-framework
/kind feature
To attract users who are not familar with OCM but interesting with multi-cluster solutions, we should provide a document about what end user can do with OCM in a sample user scenario. With this document, potential users can get what they can do with OCM in a shorter time instead of checking some function documents and put the functions together by themselves.
Some user scenarios in my mind (OCM is already installed):
We can mention some in processing features if necessary.
Community Participant
Asked some questions and replied some messages on slack channel "#open-cluster-mgmt"
Initialized some features/enhancement and contribute ideas like add a user scenario ......, update placement API with PrioritizerConfigs
Contributed some bug fixed at placement, registration, community.
Requirements
+1
Do we need any of this template info?
Continues to contribute regularly, as demonstrated by having at least [TODO: Number] [TODO: Metric] a year, as demonstrated by [TODO: contributor metrics source]. -> placement, registration, community.
[TODO: Number] accepted PRs, -> 4
Reviewed [TODO: Number] PRs, -> 4
Resolved and closed [TODO: Number] Issues, -> 4
Must have been contributing for at least [TODO: Number] months -> 3
The search component is currently not operational when installing the community version due to restrictions with the database component.
We are planning to replace the database component so we can enable the search capability.
https://github.com/open-cluster-management-io/config-policy-controller
https://github.com/open-cluster-management-io/governance-policy-template-sync
https://github.com/open-cluster-management-io/governance-policy-status-sync
https://github.com/open-cluster-management-io/governance-policy-spec-sync
https://github.com/open-cluster-management-io/governance-policy-propagator
I run make test-dependency
and get an error:
mv: cannot move '/tmp/kubebuilder_2.3.0_linux_amd64' to '/usr/local/kubebuilder/kubebuilder_2.3.0_linux_amd64': File exists
This is because I have test other repo before and I already run make test-dependency
once.
Do we need to check whether the /usr/local/kubebuilder/kubebuilder_2.3.0_linux_amd64
is existed before mv
it?
What have I done:
Step1: Install Policy framework
Step2: Install Policy controllers
Step3: Apply IamPolicy on OKD kubectl apply -f policy-test/iam/policy-iam.yaml
and the yaml file is:
apiVersion: policy.open-cluster-management.io/v1
kind: IamPolicy
metadata:
name: iam-grc-policy
label:
category: "System-Integrity"
spec:
namespaceSelector:
include: ["default","kube-*"]
exclude: ["kube-system"]
remediationAction: inform
disabled: false
maxClusterRoleBindingUsers: 5
---
apiVersion: policy.open-cluster-management.io/v1
kind: PlacementBinding
metadata:
name: binding-policy-iam
placementRef:
name: placement-policy-iam
kind: PlacementRule
apiGroup: apps.open-cluster-management.io
subjects:
- name: iam-grc-policy
kind: IamPolicy
apiGroup: policy.open-cluster-management.io
---
apiVersion: apps.open-cluster-management.io/v1
kind: PlacementRule
metadata:
name: placement-policy-iam
spec:
clusterConditions:
- status: "True"
type: ManagedClusterConditionAvailable
clusterSelector:
matchExpressions: []
Then I got an error:
error: unable to recognize "policy-test/iam/policy-iam.yaml": no matches for kind "IamPolicy" in version "policy.open-cluster-management.io/v1"
https://github.com/open-cluster-management-io/config-policy-controller
I run make create-ns
and get an error:
error: exactly one NAME is required, got 0
This is because make create-ns
aim to create two namespace:
create-ns:
@kubectl create namespace $(CONTROLLER_NAMESPACE) || true
@kubectl create namespace $(WATCH_NAMESPACE) || true
But there is no default value set for CONTROLLER_NAMESPACE
in Makefile.
It should be fixed by adding a default value for CONTROLLER_NAMESPACE
.
Please provide comments on Contributor ladder template - draft data here: https://docs.google.com/document/d/1-zBXqvnGaxPPVHbcoEyyzlkK0UI3eU1RXVQYpS2dVmU/edit
Ignore preliminary draft and reference new one below.
There is no workload monitor in OCM on the management hub now, it is inconvenient for user to check the job status on the managed clusters. Please provide guidance about the integratoin with workload monitor tools, like Thanos on ACM.
https://github.com/open-cluster-management-io/config-policy-controller
I run make deploy
and get an error:
make: `deploy' is up to date.
this should be fixed by adding .PHONY: deploy
before:
deploy:
kubectl apply -f deploy/ -n $(CONTROLLER_NAMESPACE)
kubectl apply -f deploy/crds/ -n $(CONTROLLER_NAMESPACE)
kubectl set env deployment/$(IMG) -n $(CONTROLLER_NAMESPACE) WATCH_NAMESPACE=$(WATCH_NAMESPACE)
After adding .PHONY
, it still not work and get an error:
error: unable to recognize "deploy/crds/policy.open-cluster-management.io_v1alpha1_configurationpolicy_cr.yaml": no matches for kind "ConfigurationPolicy" in version "policy.open-cluster-management.io/v1"
I would like to know that both placement and policy add-ons can control the distribution of resources,
What is the difference between policy add-ons and placement
In a hub cluster, cluster-manager-work-webhook
outputs below error log
reflector.go:138]] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.PriorityLevelConfiguration: failed to list *v1beta1.PriorityLevelConfiguration: prioritylevelconfigurations.flowcontrol.apiserver.k8s.io is forbidden: User "system:serviceaccount:open-cluster-management-hub:cluster-manager-work-webhook-sa" cannot list resource "prioritylevelconfigurations" in API group "flowcontrol.apiserver.k8s.io" at the cluster scope
and cluster-manager-registration-webhook
outputs similar error
reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.FlowSchema: failed to list *v1beta1.FlowSchema: flowschemas.flowcontrol.apiserver.k8s.io is forbidden: User "system:serviceaccount:open-cluster-management-hub:cluster-manager-registration-webhook-sa" cannot list resource "flowschemas" in API group "flowcontrol.apiserver.k8s.io" at the cluster scope
reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta1.PriorityLevelConfiguration: failed to list *v1beta1.PriorityLevelConfiguration: prioritylevelconfigurations.flowcontrol.apiserver.k8s.io is forbidden: User "system:serviceaccount:open-cluster-management-hub:cluster-manager-registration-webhook-sa" cannot list resource "prioritylevelconfigurations" in API group "flowcontrol.apiserver.k8s.io" at the cluster scope
image: quay.io/open-cluster-management/registration:latest (SHA256: 9a9db2eb9c8a
)
clustermanager csv 0.4.0
https://github.com/open-cluster-management-io/config-policy-controller
I run make lint
and get an error:
xargs: yamllint: No such file or directory
Do we need any pre-installation before make lint
?
Run make deploy-community-managed
as this document said and got an error when downloading "build-harness-bootstrap".
fixed it after deleted token ${GITHUB_TOKEN}'
in this line: https://github.com/open-cluster-management/multicloud-operators-subscription/blob/59b66a70ce1b02a243db9140c917e7caaa209b09/Makefile#L67
-include $(shell curl -H 'Authorization: token ${GITHUB_TOKEN}' -H 'Accept: application/vnd.github.v4.raw' -L https://api.github.com/repos/open-cluster-management/build-harness-extensions/contents/templates/Makefile.build-harness-bootstrap -o .build-harness-bootstrap; echo .build-harness-bootstrap)
Maybe another approach is considering whether GITTHUB_TOKEN existed as governance-policy-framework did:
ifndef GITHUB_TOKEN
-include $(shell curl -H 'Accept: application/vnd.github.v4.raw' -L https://api.github.com/repos/open-cluster-management/build-harness-extensions/contents/templates/Makefile.build-harness-bootstrap -o .build-harness-bootstrap; echo .build-harness-bootstrap)
else
-include $(shell curl -H 'Authorization: token ${GITHUB_TOKEN}' -H 'Accept: application/vnd.github.v4.raw' -L https://api.github.com/repos/open-cluster-management/build-harness-extensions/contents/templates/Makefile.build-harness-bootstrap -o .build-harness-bootstrap; echo .build-harness-bootstrap)
endif
We should be able to define that the workload should not be placed in unhealthy or unreachable clusters.
We can consider similar logic of taint/toleration
This is a public repository I would like to contribute to open-cluster-management, that enables a Kubernetes cronjob that will Hibernate and Resume clusters provisioned by Hive.
After processed these steps:
Step1 : Install community operator from OperatorHub.io
Step2 : Create a Cluster Manager on console
Step3 : Install the managedcluster-import-controller from source files
Step4 : Manually register a cluster
We got error:
E0526 05:51:18.661381 1 lease_controller.go:127] unable to get cluster lease "managed-cluster-lease" on hub cluster: leases.coordination.k8s.io "managed-cluster-lease" is forbidden: User "system:open-cluster-management:cluster1:cv6xm" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "cluster1"
We fixed it by change all images version in the import.yaml from "latest" to "0.0.3" at Step4
"To facilitate community members to easily contribute to the Open Cluster Management project, we need to find a way to securely add contributors to the open-cluster-management GitHub org." - @ray-harris
https://github.com/open-cluster-management/multicloud-operators-subscription
After run make deploy-managed
, the container start failed, got err msg as the following:
unknown flag: --cluster-namespace
After deleted - --cluster-namespace=managed_cluster_name # cluster1
in deploy/managed/operator.yaml
, the container start successfully, but then get error in log:
I0706 06:32:32.132637 1 manager.go:60] LeaderElection enabled as running in a cluster
I0706 06:32:32.222951 1 manager.go:118] Starting ... Registering Components for cluster: cluster1/cluster1
I0706 06:32:32.504304 1 namespace_subscriber.go:126] default namespace subscriber with id:cluster1/cluster1
I0706 06:32:32.505549 1 namespace_subscriber.go:129] Done setup namespace subscriber
I0706 06:32:32.509991 1 subscription.go:1013] No multiclusterHub resource found, err: the server could not find the requested resource
I0706 06:32:32.510045 1 controller.go:54] Add helmrelease controller when the remote subscription is NOT running on hub or standalone subscription
I0706 06:32:32.510384 1 helmrelease_controller.go:84] The MaxConcurrentReconciles is set to: 10
E0706 06:32:32.556900 1 placement.go:131] ACM Cluster API service NOT ready: no matches for kind "ManagedCluster" in version "cluster.open-cluster-management.io/v1"
I0706 06:32:32.586759 1 spoke_token_controller.go:78] Adding klusterlet token controller.
I0706 06:32:32.635489 1 lease_controller.go:72] trying to update lease "open-cluster-management-agent-addon"/"application-manager"
I0706 06:32:32.636434 1 manager.go:187] Starting the Cmd.
I0706 06:32:32.636588 1 leaderelection.go:243] attempting to acquire leader lease kube-system/multicloud-operators-remote-subscription-leader.open-cluster-management.io...
I0706 06:32:32.650651 1 lease_controller.go:113] addon lease "open-cluster-management-agent-addon"/"application-manager" updated
I0706 06:32:49.265782 1 leaderelection.go:253] successfully acquired lease kube-system/multicloud-operators-remote-subscription-leader.open-cluster-management.io
I0706 06:32:49.266868 1 sync_server.go:170] start synchronizer
I0706 06:32:49.309966 1 discovery.go:86] Synchronizer cache (re)started
I0706 06:32:49.310041 1 sync_server.go:190] stop synchronizer
E0706 06:32:49.310268 1 leaderelection.go:325] error retrieving resource lock kube-system/multicloud-operators-remote-subscription-leader.open-cluster-management.io: Get "https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/multicloud-operators-remote-subscription-leader.open-cluster-management.io": context canceled
I0706 06:32:49.310650 1 leaderelection.go:278] failed to renew lease kube-system/multicloud-operators-remote-subscription-leader.open-cluster-management.io: timed out waiting for the condition
I0706 06:32:49.310821 1 sync_server.go:231] stop synchronizer channel
E0706 06:32:49.310886 1 manager.go:191] controller was started more than once. This is likely to be caused by being added to a manager multiple timesManager exited non-zero
As @swopebe be pointed out, there is an ongoing effort of moving away from the word master
and adapting the branch main
.
All new GitHub repos are default to using main
but some of our older repos are still using master
.
This is an issue to keep track of that work.
https://github.com/open-cluster-management-io/governance-policy-propagator
I run make kind-deploy-controller-dev
and get an error:
Error from server (AlreadyExists): namespaces "open-cluster-management" already exists
This because in the previous command make kind-bootstrap-cluster
, the namespace is already created.
We can fix it by removing the kubectl create ns $(KIND_NAMESPACE)
in kind-bootstrap-cluster-dev
In maintenance window, some worker nodes may be cordon as unscheduled to do some repair work, it may take a long time. When managedcluster calculate total capacity, these cordon worker nodes capacity should be removed.
Today, when I cordon some worker nodes in managedcluster, no capacity change in ManagedCluster CR status.
apiVersion: cluster.open-cluster-management.io/v1
kind: ManagedCluster
metadata:
name: managed-cluster1
status:
allocatable:
cpu: "8"
memory: 13504Mi
capacity:
cpu: "8"
memory: 15504Mi
We have following use cases which requires calling K8s API or ingress endpoint of the services in managed cluster directly from hub cluster.
A few open source projects support API proxy which could be leveraged in OCM to support push mode, such as ANP (https://github.com/kubernetes-sigs/apiserver-network-proxy), clusternet (https://github.com/clusternet/clusternet). Alternatively, it could be implemented in existing OCM component or introducing a new OCM component.
Currently, we don’t have an easy way to stand up the entire community edition of open cluster management. Ideally, we should have an installer or an operator that performs the deployment.
This deployment is about getting all the open pieces to stack up properly and requires efforts from all components and not just from install/mch-operator.
The current deployment method is either via OKD operator hub marketplace or install from component source repo. See https://open-cluster-management.io for more details.
Most OCM components are built using Travis which limits the ability of external collaborators to submit pull requests that are automatically tested.
To address this, we are working to migrate the build infrastructure from Travis to OpenShift CI Prow. This will allow us to approve external collaborators to open PRs that can be automatically tested before they are reviewed and merged.
The work currently consists of these tasks:
The Prow based build harness extensions are here:
https://github.com/open-cluster-management/build-harness-ext-osci
The Prow image builder for Go is here (this is where the NodeJS image builder will be added as well):
https://github.com/open-cluster-management/image-builder
The Travis based build harness extensions are here (this is where the Prow based extensions will be integrated):
https://github.com/open-cluster-management/build-harness-extensions
There are a few OCM repos that were created in Prow. Those repos are configured here in OpenShift CI:
https://github.com/openshift/release/tree/master/ci-operator/config/open-cluster-management
https://github.com/openshift/release/tree/master/ci-operator/jobs/open-cluster-management
The Prow workflows will be here:
https://github.com/openshift/release/tree/master/ci-operator/step-registry/acm
Links to other relevant repos and documentation will be added as they are created.
This issue will be updated as work progress and plans are updated.
We anticipate this work being completed by the end of January 2021, but this is not a firm schedule and is subject to change.
What have I done: I installed policy framework and policy controllers successfully.
What happened: The day after I installed policy framework and policy controllers, policies can not be propagated.
The reason why:
To login an OKD we need to use command:
oc login ...
But oc login
only generate a user token (not a certification or username&password) in the kubeconfig.
- name: kube:admin/api-aws-okd-dev04-red-chesterfield-com:6443
user:
token: sha256~7Zc0cmUbXBFWTRP5pUOZL8C8ZlP45pGrdedoNixFSA4
Since we store this kubeconfig as a secret when we are installing "policy-framework" on managed cluster, the information can not be sync anymore after token expired.
as the purpose and design stated in open-cluster-management-io/enhancements#19, opening this ticket to request a new repo name "cluster-proxy" as a pluggable module in ocm that supports direct api calls from hub-> spoke cluster.
The current policy binding is using application management's PlacementRule API. It should support the Placement API instead because PlacementRule API is being deprecated.
It will free up the "Application management" requirement from the policy framework. See https://open-cluster-management.io/getting-started/integration/policy-framework/#prerequisite
When scheduling placement, select managed clusters based on cluster capacity or resource usage to balance the load cross the fleet.
https://github.com/open-cluster-management-io/config-policy-controller
https://github.com/open-cluster-management-io/governance-policy-template-sync
https://github.com/open-cluster-management-io/governance-policy-status-sync
https://github.com/open-cluster-management-io/governance-policy-spec-sync
https://github.com/open-cluster-management-io/governance-policy-propagator
Run make build-image
failed, check the .build-harness-bootstrap
get the following:
{"message":"API rate limit exceeded for 66.187.232.127. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)","documentation_url":"https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting"}
Check the Makefile and found we do download everytime run a command:
ifndef USE_VENDORIZED_BUILD_HARNESS
ifeq ($(TRAVIS_BUILD),1)
-include $(shell curl -H 'Accept: application/vnd.github.v4.raw' -L https://api.github.com/repos/open-cluster-management/build-harness-extensions/contents/templates/Makefile.build-harness-bootstrap -o .build-harness-bootstrap; echo .build-harness-bootstrap)
endif
else
-include vbh/.build-harness-vendorized
endif
Is it possible to ignore the download part if the file already exist?
To add a ManagedCluster to a ManagedClusterSet, user needs to set a label cluster.open-cluster-management.io/clusterset={clusterset name}
on the ManagedCluster.
Does this mean a ManagedCluster can not be added into multi ManagedClusterSet?
There exist globally scoped managed clusters in OCM which don't belong to any clusterset. The Placement APIs need to support to select such kind of clusters as well.
The README says:
Meeting dial-in details, meeting notes and agendas are announced and published to the open-cluster-management mailing list on Google Groups
...but there's no information in the google group; the meeting details appear to be on the project board instead. Update the README? Or post to the mailing list?
go get -u github.com/open-cluster-management-io/governance-policy-propagator
go get: github.com/open-cluster-management-io/governance-policy-propagator@none updating to
github.com/open-cluster-management-io/[email protected]: parsing go.mod:
module declares its path as: github.com/open-cluster-management/governance-policy-propagator
but was required as: github.com/open-cluster-management-io/governance-policy-propagator
I'm thinking it won't work for our other repos either, but we'll need to verify:
We should be able to define delete propagation option in work api.
Related enhancement proposal https://github.com/open-cluster-management-io/enhancements/tree/main/enhancements/sig-architecture/10-deletepropagationstrategy
Easily create ACM policies for various sources of policy:
Some sample sources of policies might include:
The goal is to point to a source above and provide a placement detail, the this new tool can auto generate ACM policies.
A script may be the starting point, and then a CLI could be integrated to enhance this.
https://github.com/open-cluster-management-io/config-policy-controller
https://github.com/open-cluster-management-io/governance-policy-template-sync
https://github.com/open-cluster-management-io/governance-policy-status-sync
https://github.com/open-cluster-management-io/governance-policy-spec-sync
https://github.com/open-cluster-management-io/governance-policy-propagator
Run make e2e-test
and get an error:
ginkgo -v --slowSpecThreshold=10 test/e2e
/bin/bash: ginkgo: command not found
make: *** [e2e-test] Error 127
This is because we installed ginkgo
under .go/bin
, the default GOBIN
set in Makefile.
GOPATH_DEFAULT := $(PWD)/.go
export GOPATH ?= $(GOPATH_DEFAULT)
GOBIN_DEFAULT := $(GOPATH)/bin
export GOBIN ?= $(GOBIN_DEFAULT)
Add export PATH := $(PATH):$(GOBIN)
should fix this.
There is no way for an user to submit workload to the target managed cluster directly now, the user has to write some codes to get information from placementdecision, analyse it and then forword workoad to the target cluster, this is inconvenient.
A draft idea is to enhance clusteradm, then user can submit workload like following:
OCM should provide a way to let end user know the status of the real workload (tf job in above scenario).
What have I done:
Step1: Install from source
Step2: Run kubectl apply -f application.yaml
and the yaml file is:
apiVersion: app.k8s.io/v1beta1
kind: Application
metadata:
name: okd
spec:
componentKinds:
- group: apps.open-cluster-management.io
kind: Subscription
descriptor: {}
Then I got an error:
Error "failed calling webhook "applications.apps.open-cluster-management.webhook": Post "https://multicluster-operators-application-svc.multicluster-operators.svc:443/app-validate?timeout=10s": no endpoints available for service "multicluster-operators-application-svc"" for field "undefined".
e2e test automation might fail during a pull request for some repos. ie: https://github.com/open-cluster-management/application-ui
This is a work in progress. The current workaround is the repo owner can run e2e test in another branch for you.
We should be able to support policy in placement:
For application subscription operator, deprecate the deployable API and start using the work API to propagate application hub subscription to all managed clusters.
This might improve the performance and scalability of the subscription operator.
Hi,
Is there any public slack channel where users could make questions?
Thank you.
Use case scenario:
If a user deploys a manifest work in the OCM environment and this manifest work contains a deployment A which has a replica field value equal to 5. However, this OCM environment also has a HPA controller that owns and manages the replica field for every deployment. The HPA controller will change deployment A's replica field to 3, but after that the work agent will change deployment A's replica field back to 5 since the work agent thinks he owns the replica field so he will make sure the replica field equals the desired value. It will start an infinity loop, the deployment A's replica field will change from 5 to 3, 3 to 5, 5 to 3, 3 to 5...
It's not what we wanted. What we want is Deploy a manifest work deployment, and give one specific field ownership to another player, (for example give the replica field ownership to hpa controller) and let another player change some specific field (for example hpa changes replica field), and the work agent will not change it back.
Placement should support match expression within the clusterSelector predicate that can operate against klusterlet addon enabled/disabled status.
I have a need to select managed clusters that have the policyController
addon enabled because I want to utilize GRC policies to manage those clusters.
Currently I cannot create a valid Placement that will accomplish this goal because klusterlet addon enabled/disabled status is not available for me to use in the match expression of the cluster selector.
Installing Application lifecycle management on OKD as this document said:
Container multicluster-operators-argocdcluster-xxx start failed:
E0526 10:06:50.536999 1 manager.go:99] no matches for kind "KlusterletAddonConfig" in version "agent.open-cluster-management.io/v1"Manager exited non-zero
Fixed by install klusterlet CRD First.
See how to do it in : https://github.com/open-cluster-management/klusterlet-addon-controller
The GRC Policy framework needs to expose some metrics to help get visibility into what is going on inside of the policy framework. The metrics must provide details about which policies are compliant and not compliant, and what managed cluster has caused the noncompliance. There is also a need to find out how many managed clusters a policy is distributed to.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.