kubewharf / kubeadmiral Goto Github PK

View Code? Open in Web Editor NEW

641.0 17.0 85.0 2.08 MB

Multi-Cluster Kubernetes Orchestration

License: Apache License 2.0

Makefile 0.28% Shell 2.92% Go 96.77% Dockerfile 0.03%

kubernetes multi-cluster

kubeadmiral's Introduction

KubeAdmiral - Enhanced Kubernetes Federation

English | 简体中文

KubeAdmiral is a multi-cluster management system for Kubernetes, developed from Kubernetes Federation v2. Kubernetes Federation v2 allows users to manage Kubernetes resources across multiple clusters through the use of federated types such as FederatedDeployment, FederatedReplicaSet, FederatedSecret, etc. KubeAdmiral extends the Kubernetes Federation v2 API, providing compatibility with the Kubernetes native API and more powerful resource management capabilities. KubeAdmiral also adds new features such as:

A new scheduling framework with a rich set of scheduling plugins.
Override policies.
Automatic propagation of dependencies with follower scheduling.
Status aggregation of member cluster resources.
Scalability, stability and user experience enhancements

Getting started

KubeAdmiral supports Kubernetes versions from 1.16 up to 1.24. Using lower or higher Kubernetes versions may cause compatibility issues. For setup please refer to Quickstart.

Community

Contributing

If you are willing to be a contributor for the KubeAdmiral project, please refer to our CONTRIBUTING document for details.

Contact

If you have any questions or wish to contribute, you are welcome to communicate via GitHub issues or pull requests. Alternatively, you may reach out to our Maintainers.

License

KubeAdmiral is under the Apache 2.0 license. See the LICENSE file for details. KubeAdmiral is a continuation of Kubernetes Federation v2, and certain features in KubeAdmiral rely on existing code from Kubernetes — all credits go to the original Kubernetes authors. We also refer to Karmada for some of our architecture and API design, all relevant credits go to the Karmada Authors.

kubeadmiral's People

Contributors

Stargazers

Watchers

kubeadmiral's Issues

cluster available resource is less than 0

When a part of the deployed Pods is in the pending state, the resources of this part of the deployed Pods are considered to be unavailable resources. When calculating the available resources, the available resources may be less than 0 if the resource request of pending pod is very large.

Readiness checks should have more explicit error messages

The error message for failed controller health checks should be more explicit, instead of just "controller not ready".

Webhook scheduler plugins should return reason when filtering out clusters

Currently, there's no Reason field in the plugin's response structs. We have added reason reporting in #231 and the reasons reported by built-in filter plugins would get reported in an event. We should allow webhook plugins to return reasons too.

[Error] in installation: make local-up

Description

Run make local-up

What happended：

Fails with

[+] Building 925.5s (9/13)                                                                                                           docker:default
 => [internal] load build definition from Dockerfile                                                                                           0.0s
 => => transferring dockerfile: 798B                                                                                                           0.0s
 => [internal] load .dockerignore                                                                                                              0.0s
 => => transferring context: 2B                                                                                                                0.0s
 => [internal] load metadata for docker.io/library/debian:buster                                                                              95.4s
 => [internal] load metadata for docker.io/library/golang:1.19                                                                               127.1s
 => [internal] load build context                                                                                                              0.1s
 => => transferring context: 40.03kB                                                                                                           0.0s
 => ERROR [builder 1/4] FROM docker.io/library/golang:1.19@sha256:3025bf670b8363ec9f1b4c4f27348e6d9b7fec607c47e401e40df816853e743a           798.4s
 => => resolve docker.io/library/golang:1.19@sha256:3025bf670b8363ec9f1b4c4f27348e6d9b7fec607c47e401e40df816853e743a                           0.0s
 => => sha256:1611d4f97c5ab666d3c123d72b3bb646dd12bbf7577dd1388fdb191d54cdf440 1.58kB / 1.58kB                                                 0.0s
 => => sha256:80b76a6c918cba6d8e68fb2b80a7afdd7ce3af457d87e413d985434ae7897533 6.87kB / 6.87kB                                                 0.0s
 => => sha256:012c0b3e998c1a0c0bedcf712eaaafb188580529dd026a04aa1ce13fdb39e42b 49.56MB / 49.56MB                                              37.1s
 => => sha256:3025bf670b8363ec9f1b4c4f27348e6d9b7fec607c47e401e40df816853e743a 2.36kB / 2.36kB                                                 0.0s
 => => sha256:00046d1e755ea94fa55a700ca9a10597e4fac7c47be19d970a359b0267a51fbf 24.03MB / 24.03MB                                              34.5s
 => => sha256:9f13f5a53d118643c1f1ff294867c09f224d00edca21f56caa71c2321f8ca004 0B / 64.11MB                                                  798.3s
 => => sha256:190fa1651026077cee00b53a754acbe0dc734b99255c6b274e798f6cd877ae18 92.27MB / 92.27MB                                             175.2s
 => => extracting sha256:012c0b3e998c1a0c0bedcf712eaaafb188580529dd026a04aa1ce13fdb39e42b                                                      0.5s
 => => sha256:0808c64687902329ac64331848304f6f0bc14b86d6be46bccf79292b564f6587 149.14MB / 149.14MB                                           252.0s
 => => extracting sha256:00046d1e755ea94fa55a700ca9a10597e4fac7c47be19d970a359b0267a51fbf                                                      0.2s
 => => sha256:5ec11cb68eac452710eadb46df5e6cf6ead699755303758bf1e262e47b013417 155B / 155B                                                   205.6s
 => CACHED [stage-1 1/4] FROM docker.io/library/debian:buster@sha256:46ca02d33c65ab188d6e56f26c323bf1aa9a99074f2f54176fdc3884304f58b8          0.0s
 => [stage-1 2/4] RUN if [ "cn" = "cn" ]; then sed -i 's#http://deb.debian.org#http://mirrors.tuna.tsinghua.edu.cn#g' /etc/apt/sources.list;   0.3s
 => [stage-1 3/4] RUN apt-get update -y && apt-get install -y ca-certificates && rm -rf /var/lib/apt/lists/*                                   5.0s
------
 > [builder 1/4] FROM docker.io/library/golang:1.19@sha256:3025bf670b8363ec9f1b4c4f27348e6d9b7fec607c47e401e40df816853e743a:
------
WARNING: buildx: failed to read current commit information with git rev-parse --is-inside-work-tree
Dockerfile:2
--------------------
   1 |     # multi-arch image building for kubeadmiral-controller-manager
   2 | >>> FROM --platform=${BUILDPLATFORM} golang:1.19 as builder
   3 |     ARG TARGETPLATFORM GOPROXY BUILD_FLAGS
   4 |     ADD . /build
--------------------
ERROR: failed to solve: failed to read expected number of bytes: unexpected EOF
make[1]: *** [Makefile:113: images] Error 1
make[1]: Leaving directory '/home/larry/code/kubeadmiral'
make: *** [Makefile:83: local-up] Error 2
root:kubeadmiral/ # client_loop: send disconnect: Connection reset                                                                       [13:33:25]

Then

And when I replace golang:1.20 with golang:1.19 in https://github.com/kubewharf/kubeadmiral/blob/main/hack/dockerfiles/Dockerfile#L2
It works and successfully deploy KubeAdmiral.
What happend?

Kubeadmiral support for versions higher than K8s v1.24

Hi, the official documentation here states that it currently supports running with K8s version 1.24. Considering, the Kubernetes ecosystem moves very fast with at-least 3 releases per year, are there any plans to support higher versions?

kubeadmiral can schedule no native k8s crd, like argo workflow or volcano job?

I find kubeadmiral transfer native k8s api to fedrate type, now support no native k8s crd, like argo workflow or volcano job?

Pod propagation should disable the use of custom pod informer

PR #46 introduces a custom pod informer that transform pod objects before caching.

With support for pod federation, we need to ensure that this custom pod informer is disabled for certain pod subcontrollers to work properly, e.g. sync, status

Clean up unused functions in `pkg/util/naming`

The unused functions are legacy we inherited from KubeFed. We should verify if we still need them and remove them as needed.

`readyz` should reflect `InformerManager` and `FederatedInformerManager`'s cache sync status

InformerManager and FederatedInformerManager manage informers for the host and member clusters, respectively, based on FederatedTypeConfigs. For each FederatedTypeConfig, the managers start an informer for the corresponding type.

If the informer for a type cannot be started or its cache cannot be synced (be it for host cluster or member clusters), we should expose this information for visibility. The current mechanism for doing so is by including the information in the response returned by the controller manager's readyz endpoint. Refer to existing controller's readyz implementation for how one can implement cache sync check for InformerManager and FederaterdInformerManager.

`make kind` does not create $HOME/.kube/kubeadmiral/kubeconfig.yaml as expected

I followed the quick start doc, in step 3("Bootstrap Kubernetes clusters") of the prerequisites session, I run make kind command, everything went well, but no "$HOME/.kube/kubeadmiral/kubeconfig.yaml" file generated as expected, as bellows:

% KUBECONFIG=$HOME/.kube/kubeadmiral/kubeconfig.yaml kubectl config get-clusters

W0504 15:22:12.931407   22785 loader.go:223] Config not found: /Users/bytedance/.kube/kubeadmiral/kubeconfig.yaml
NAME

% ls -l $HOME/.kube/kubeadmiral/kubeconfig.yaml
ls: /Users/bytedance/.kube/kubeadmiral/kubeconfig.yaml: No such file or directory


% ls -l $HOME/.kube/kubeadmiral/

total 64
-rw-------  1 bytedance  staff  5586  5  4 11:44 kubeadmiral-host.yaml
-rw-------  1 bytedance  staff  5610  5  4 11:43 kubeadmiral-member-1.yaml
-rw-------  1 bytedance  staff  5606  5  4 11:44 kubeadmiral-member-2.yaml
-rw-------  1 bytedance  staff  5606  5  4 11:44 kubeadmiral-member-3.yaml

Clean up kubebuilder marker comments

The APIs use a few deprecated kubebuilder marker comments, which are confusing and verbose. We should remove/replace them with currently supported alternatives.

Ref: https://book.kubebuilder.io/migration/legacy/migration_guide_v1tov2.html

Implement status aggregation for more native resources

At the time of creating this issue, we only support status aggregation for Deployments.

`FederatedObject.spec.template` should have type `map[string]interface{}`?

FederatedObject.spec.template is currently of type k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1.JSON which stores the raw json bytes. This type was used because controller-gen refuses to implement support for map[string]interface{} fields for CRD generation as discussed in kubernetes-sigs/controller-tools#636. However, map[string]interface{} would be more efficient and ergonomic as it avoids the unmarshaling whenever we need to access the template. We'd like to use map[string]interface{} if we can find a way to hack controller-gen to generate CRDs nonetheless.

We could start by looking at post-gen patches (which is already done in config/crds/patches). Maybe we could ask controller-gen to ignore the template field and let the post-gen patches add it?

Scheduler fixes

Scheduler should distinguish between an Unschedulable result and an Error result
Scheduler should pass all clusters to filter and score webhooks instead of passing one by one
Each unschedulable cluster should be rejected with a message, and scheduler should aggregate these messages to provide meaningful information to users

difference with karmada

I found many simular concepts shared with karmada, anyone can describe the advantage than karmada?

`pkg/util/annotation` and `pkg/util/finalizers` should use `metav1.Object` instead of `pkgruntime.Object` for more ergonomic annotation handling

k8s.io/apimachinery/pkg/runtime.Object does not have GetAnnotations and SetAnnotations methods. As a result the annotation util functions in pkg/controllers/util/annotation/annotation.go resort to calling meta.Accessor(obj) to convert the object. But meta.Accessor(obj) returns an error that must be propagated upwards.

In fact, all current usages of the annotation util functions pass in objects that implement metav1.Object interface, which allows calling SetAnnotations and GetAnnotations without returning errors.

Replace `github.com/pkg/errors` with `fmt`?

The former has not been maintained since 2021.
The latter is not a complete replacement of the former, because the former adds stack traces to errors. However, we are not making use of stack traces in errors.

Should we replace the former with the latter?

Scheduler should handle add events for SchedulingProfiles

This is required to reconcile federated objects that reference a profile which did not exist previously, and the user creates the corresponding new profile to fix it. With the current behavior of the scheduler, it would not automatically reconcile these objects on profile add, and would require a manual trigger.

Are we equipped to support Helm's deployment mode effectively?

Context

Presently, our services are uniformly deployed via Helm across two separate Kubernetes clusters catering to distinct business requirements. However, this approach has led to a proliferation of redundant code, specifically concerning the deployment of services into different clusters.

Therefore, I'd like to pose a question: Are we equipped to support Helm's deployment mode effectively? 😉

Misconfiguration of a single FTC prevents the FTC controller from progressing

Because certain controllers (e.g. status aggregation controller) run by the FTC controller wait for cache sync in the main thread, a wrongly configured FTC can prevent the whole FTC controller from progressing.

The image above shows the logs from the FTC controller indicating that it got blocked after starting the status controller.

Steps to reproduce:

Run the KubeAdmiral controller manager against a 1.23 Kubernetes host cluster using the default FTCs in config/sample/host/01-ftc.yaml.
Because the yaml file contains an older version of CronJob, informers for CronJob will fail to list and their informer cache will never be synced, resulting in the above behavior

Accidental UT errors

link: https://github.com/kubewharf/kubeadmiral/actions/runs/6307418487/job/17124013179?pr=226

The error case is t.Run("ClusterEventHandlers should receive correct old and new clusters", func(t *testing.T).
I ran it locally multiple times and found no similar errors.

I feel that there is a small time difference between the function entering assertionCh and calling callback, which is captured by the test case. I think it would be better to put the callBackCount.Add(1) before the assertionCh operation.

Data race with sourcefeedback in federate controller

Previous write at 0x00c00e78e030 by goroutine 3451:
  runtime.mapassign_faststr()
      /home/chankyin/.asdf/installs/golang/1.20/go/src/runtime/map_faststr.go:203 +0x0
  k8s.io/apimachinery/pkg/apis/meta/v1/unstructured.setNestedFieldNoCopy()
      /home/chankyin/go/pkg/mod/k8s.io/[email protected]/pkg/apis/meta/v1/unstructured/helpers.go:228 +0x2f5
  k8s.io/apimachinery/pkg/apis/meta/v1/unstructured.SetNestedStringMap()
      /home/chankyin/go/pkg/mod/k8s.io/[email protected]/pkg/apis/meta/v1/unstructured/helpers.go:255 +0x1cb
  k8s.io/apimachinery/pkg/apis/meta/v1/unstructured.(*Unstructured).setNestedMap()
      /home/chankyin/go/pkg/mod/k8s.io/[email protected]/pkg/apis/meta/v1/unstructured/unstructured.go:175 +0x178
  k8s.io/apimachinery/pkg/apis/meta/v1/unstructured.(*Unstructured).SetAnnotations()
      /home/chankyin/go/pkg/mod/k8s.io/[email protected]/pkg/apis/meta/v1/unstructured/unstructured.go:417 +0xb5
  github.com/kubewharf/kubeadmiral/pkg/controllers/util/annotation.AddAnnotation()
      /data00/home/chankyin/go/src/github.com/kubewharf/kubeadmiral/pkg/controllers/util/annotation/annotation.go:96 +0x281
  github.com/kubewharf/kubeadmiral/pkg/controllers/util/sourcefeedback.setAnnotation()
      /data00/home/chankyin/go/src/github.com/kubewharf/kubeadmiral/pkg/controllers/util/sourcefeedback/util.go:34 +0x1ef
  github.com/kubewharf/kubeadmiral/pkg/controllers/util/sourcefeedback.PopulateSchedulingAnnotation()
      /data00/home/chankyin/go/src/github.com/kubewharf/kubeadmiral/pkg/controllers/util/sourcefeedback/scheduling.go:79 +0x455
  github.com/kubewharf/kubeadmiral/pkg/controllers/federate.(*FederateController).updateFeedbackAnnotations()
      /data00/home/chankyin/go/src/github.com/kubewharf/kubeadmiral/pkg/controllers/federate/controller.go:449 +0xcd
  github.com/kubewharf/kubeadmiral/pkg/controllers/federate.(*FederateController).reconcile()
      /data00/home/chankyin/go/src/github.com/kubewharf/kubeadmiral/pkg/controllers/federate/controller.go:284 +0x13d2
  github.com/kubewharf/kubeadmiral/pkg/controllers/federate.(*FederateController).reconcile-fm()
      <autogenerated>:1 +0x7a
  github.com/kubewharf/kubeadmiral/pkg/controllers/util/worker.(*asyncWorker).worker()
      /data00/home/chankyin/go/src/github.com/kubewharf/kubeadmiral/pkg/controllers/util/worker/worker.go:165 +0xf3
  github.com/kubewharf/kubeadmiral/pkg/controllers/util/worker.(*asyncWorker).worker-fm()
      <autogenerated>:1 +0x39
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
      /home/chankyin/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157 +0x48
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
      /home/chankyin/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158 +0xce
  k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      /home/chankyin/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135 +0x10d
  k8s.io/apimachinery/pkg/util/wait.Until()
      /home/chankyin/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92 +0x48
  github.com/kubewharf/kubeadmiral/pkg/controllers/util/worker.(*asyncWorker).Run.func4()
      /data00/home/chankyin/go/src/github.com/kubewharf/kubeadmiral/pkg/controllers/util/worker/worker.go:133 +0x58

Consider changing MaxClusters plugin to respect avoidMigration option

The MaxClusters scheduler plugin does not have any preference for current placements. This undermines the avoidMigration field in propagation policies when maxClusters is set. We can consider adding support for the avoidMigration field.

The new behavior when avoidDisruption == true would look something like:

if len(currentClusters) == maxClusters {
    return currentClusters
} else if len(currentClusters) > maxClusters {
    // remove lowest scoring clusters
    removeLowestScoreClusters(currentClusters, clusterScores, maxClusters)
    return currentClusters
} else {
    // add highest scoring clusters that are not already present
    addHighestScoreClusters(currentClusters, clusterScores, maxClusters)
    return currentClusters
}

Investigate whether we can combine `pkg/util/collectedstatusadapters` and `pkg/util/fedobjectadapters`

The code in the two packages are virtually identical except for the types they handle. Can we combine them, probably using generics?

Replace `pkg/client/generic` with dynamic client from `client-go`

pkg/client/generic is legacy we inherited from KubeFed. It is an unnecessary layer of abstraction and has different usage patterns from client-go's dynamic client. We'd like to standardize all clients to those provided by client-go.

Env vars HOST_CLUSTER_KUBECONFIG and HOST_CLUSTER_CONTEXT was not defined before in quick start

In quickstart "Source the kubeconfig of the host cluster", shall I define HOST_CLUSTER_KUBECONFIG and HOST_CLUSTER_CONTEXT in my bash profile manually?

Provide proxy APIs for users to access member cluster resources

What would you like to be added?

Provide proxy API for users to access member cluster resources

Why is this needed?

Users may need to check the distribution of application resources in each member cluster. And Users do not want to log in to each cloud provider’s website or switch the kubeconfig context. It will greatly improve the convenience for users to use kubeAdmiral if we provide the proxy APIs to access member cluster resources.

Support joining/unjoining member cluster via admiralctl command-line tool

What would you like to be added?

Enhance the admiralctl command-line tool to support joining or unjoining member cluster.

Why is this needed?

Currently, users need to create auth secret and FederatedCluster resource manually. when joining a member cluster. It will greatly improve the convenience for users to use kubeAdmiral if we provide the joining/unjoining command-line tool.

Should not `append` to global slices

Sometimes we append to these global slices. If a slice has cap > len, then appending to it would mutate the underlying storage. https://go.dev/play/p/f5YRITlfiGj. (Golang doesn't seem to guarantee that []int{2,3} must have a capacity of 2.)

If we append to these global slices on different goroutines, it would result in a data race.

Can you provide relevant official technical documents, similar to Karmada?

Karmada documents : https://karmada.io/docs/

ClusterResourcesFit does not work because cannot get resourceQuest of object

Currently, ClusterResourcesFit filter plugin does not work because it cannot get resourceQuest of object.
We need to interpret the logic of calculating resourceQuest according to GVK.
We used ftc's pathDefinition to interpret replicas before, but for CRD, the process of calculating resources may be a bit complicated. We may need hook functions to allow users to customize the calculation logic of resource requests.

Need a webhook to constrain user input

As the fields of policy API expand, there are more and more composite scenarios that require verification.
Though we can disable user misconfiguration at runtime, it is not as effective as blocking user input at the source.
At the same time, users configure fields in the API, but it does not take effect, which also makes users confused.

For example，
SchedulingMode is Duplicate, will not work with replicaStrategy.
ReplicaStrategy is spread, will not work with maxClusters.

Moreover, in the early stages of some features, we can prevent some unpredictable scenarios by limiting its changes.
We will not allow users to make these kinds of modifications until we have fully verified it.

For example, users change replicaStrategy=weighted to binpack.

Organize the constants

Currently constants are defined in a couple of places:

pkg/controllers/common/constants.go
controller's own package
pkg/util packages

We should organize them so they are more maintainable. When organizing, we should make sure:

the constants are exposed to all packages in the project and not just for pkg/controllers
the constants are categorized properly
unused constants should be removed

Refactor controllers to follow logging conventions

The following controllers have to be refactored to follow the new logging conventions in the code style guide.

This will reduce the amount of noise produced in the controller-manager's logs and make the logs much more readable.

kubewharf / kubeadmiral Goto Github PK

kubeadmiral's Introduction

KubeAdmiral - Enhanced Kubernetes Federation

Getting started

Community

Contributing

Contact

License

kubeadmiral's People

Contributors

Stargazers

Watchers

Forkers

kubeadmiral's Issues

Description

What happended：

Then

What would you like to be added?

Why is this needed?

What would you like to be added?

Why is this needed?

Recommend Projects

Recommend Topics

Recommend Org