kubernetes-retired / kubefed Goto Github PK

View Code? Open in Web Editor NEW

2.5K 78.0 532.0 82.68 MB

Kubernetes Cluster Federation

License: Apache License 2.0

Go 93.34% Shell 5.82% Makefile 0.55% Dockerfile 0.09% Mustache 0.21%

k8s-sig-multicluster hacktoberfest federated-clusters

kubefed's Issues

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Use the dynamic client for interacting with the cluster-registry

So that we don't need to vendor the client. This will ease the transition to the CRD-based cluster registry when the time comes.

Add a user guide

Write a user guide to document setting up a fedv2 deployment. This guide should be different from the developer guide and the developer guide can reference it to avoid duplicating docs.

Config mechanism for API instantiation

At a given time frame, federation might have a set of k8s resources federated. Users can further federate their CRD resources into the same federation. Additionally, with the layered approach of a lower level federated types enabling simple federation and higher level types based on the lower level controllers, some users might want all or subset of APIs and controllers eventually in their federation deployment.
This calls for a configuration mechanism, which can make some APIs (and respective controllers) optional by default, and additionally provide a mechanism to enable/disable individual API resources in the federation control plane.
The mechanism should be easy to configure (for an end user) and the implementation can ideally target the infrasctructure setup post the move to CRD based resources (if that happens).

DNS: feature parity with v1

This issue is to achieve feature parity for external/in-cluster DNS with v1. I believe in use-case form, this would be:

As a user of federation, I can created a federated service and have external federated DNS both on a federation-wide and per-zone basis programmed for me, and have kubernetes cluster DNS programmed for me so that if I look up a federated service inside a cluster, I'll have it resolved to the DNS in another cluster if the service is down in my cluster

@shashidharatd's current PR (#59) gets us most of the way there - but we need some additional glue between FederatedService and MultiClusterDNSRecord to make this story complete.

Extend ReplicaSchedulingPreferences to support replicasets

ReplicaSchedulingPreferences is implemented (as in #46) to support deployments using the same name <ns/name> as the key to target the low level FederatedDeployment resources (template, override and placement). This needs to be extended and implementation made generic, such that the same API schema can be used to target FederatedReplicaset also using a targetRef. The following 2 items will need to be implemented for the same.

Make the overall code generic (akin to adapter interface) to be able to take different scheduling types.
Implement usage of targetRef (kind+name) to specify the target resource of the given preferences used.

Administrivia: add boilerplate checker

... for k8s boilerplate; as I understand it, this is a precondition of donation to a kubernetes github org.

Higher level placement specification aka label selector based placement

Per resource placement, where the corresponding placement resource specifies a list of clusters to place the template is already available. It also makes a lot of sense to have a default behavior for all resources in the abscence of this per object placement specification (as alluded in #75).
There additionally is a need for a higher level placement intent that a user can specify. Although there could be more extensive placement policies that can be defined, this issue lists the need of a simple label selector based placement intent useful to the end user.

A simple spec for this resource (names could change) could be as below:

apiVersion: placement.federation.k8s.io/v1alpha1
kind: Preference
metadata:
  namespace: my-ns
  name: my-placement-preference
spec:
  # Select resources in the namespace whose labels match
  resourceSelector: app=myapp
  # Select labelled clusters
  clusterSelector: region=eu, zone=east

Replace dynamic client usage with controller-runtime's generic client

The current dynamic client in use has now been deprecated. We need to update the client-go vendored dependency and update the dynamic client to use the new version. See kubernetes/kubernetes#63446 for more details.

Add federated Pod Support

View Pods cross clusters.

/cc @marun

Overrides field should not be capitalized in yaml serialization

This should be overrides when serialized, per: https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#naming-conventions

PropagatedVersion resources limit the length of resource names

Currently PropagatedVersion resources are named according to the scheme - (e.g. a Secret named Foo would have a PropagatedVersion named secret-foo). This scheme effectively limits the length of resource names to 253 (the max length allowed by k8s) - len(type name) - 1 (for the dash). A resource name longer than this limit could result in an ambiguous PropagatedVersion name, which is likely to invalidate version-based comparison for affected resources.

Given that PropagatedVersion is not intended to be user-facing, a potential solution could be using a hash of the resource name (e.g. -).

Feature Request: Federated Ingress

Federated ingress is a much sought after feature. There was a hacky implementation for it in federation v1 which supported only google environment. There is a tool(kubemci) which can setup ingress across multicluster which again works only for google environment. Also recently there was an excellent demo from @nikhiljindal and @G-Harmon in kubecon eu regarding Multicluster ingress which again works in google environment.
We need a similar feature which support other environments (cloud and on-prem).

Any idea(s), suggestions regarding this feature, please pitch in.

Pre-commit.sh TestFederatedDeploymentOverride is flakey

What happened:
pre-commit.sh failed with The command "./scripts/pre-commit.sh" exited with 1. during PR merge that didn't change any files involved with pre-commit.
Upon closer inspection, the FederatedDeploymentOverride test was the site of failure. with E0622 19:32:33.123174 7101 runtime.go:66] Observed a panic: "close of closed channel" (close of closed channel)

FederatedDeploymentOverride test failure message

 === RUN   TestFederatedDeploymentOverride
Running Suite: FederatedDeploymentOverride Suite
================================================
Random Seed: 1529695950
Will run 1 of 1 specs
[::]:34070
[::]:39911
[::]:36361
I0622 19:32:31.545945    7101 serving.go:295] Generated self-signed cert (/tmp/apiserver-test619575690/.crt, /tmp/apiserver-test619575690/.key)
W0622 19:32:31.822413    7101 authorization.go:34] Authorization is disabled
W0622 19:32:31.822450    7101 authentication.go:56] Authentication is disabled
I0622 19:32:32.734425    7101 serve.go:89] Serving securely on [::]:36361
2018-06-22 19:32:32.953433 I | Defaulting fields for FederatedDeploymentOverride instance-1
2018-06-22 19:32:32.953859 I | Validating fields for FederatedDeploymentOverride instance-1
2018-06-22 19:32:32.984759 I | Defaulting fields for FederatedDeploymentOverride instance-1
2018-06-22 19:32:32.996063 I | Defaulting fields for FederatedDeploymentOverride instance-1
2018-06-22 19:32:33.005890 I | Defaulting fields for FederatedDeploymentOverride instance-1
2018-06-22 19:32:33.009944 I | Running reconcile FederatedDeploymentOverride for instance-1
2018-06-22 19:32:33.102584 I | Defaulting fields for FederatedDeploymentOverride instance-1
2018-06-22 19:32:33.105992 I | Defaulting fields for FederatedDeploymentOverride instance-1
2018-06-22 19:32:33.110035 I | Defaulting fields for FederatedDeploymentOverride instance-1
2018-06-22 19:32:33.110369 I | Defaulting fields for FederatedDeploymentOverride instance-1
2018-06-22 19:32:33.110771 I | Defaulting fields for FederatedDeploymentOverride instance-1
•E0622 19:32:33.123174    7101 runtime.go:66] Observed a panic: "close of closed channel" (close of closed channel)
/home/travis/gopath/src/github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/home/travis/gopath/src/github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/home/travis/gopath/src/github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/home/travis/.gimme/versions/go1.10.1.linux.amd64/src/runtime/asm_amd64.s:573
/home/travis/.gimme/versions/go1.10.1.linux.amd64/src/runtime/panic.go:502
/home/travis/.gimme/versions/go1.10.1.linux.amd64/src/runtime/chan.go:333
/home/travis/gopath/src/github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/client-go/tools/cache/shared_informer.go:529
/home/travis/gopath/src/github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/client-go/tools/cache/shared_informer.go:388
/home/travis/gopath/src/github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71
/home/travis/.gimme/versions/go1.10.1.linux.amd64/src/runtime/asm_amd64.s:2361
panic: close of closed channel [recovered]
	panic: close of closed channel
goroutine 1241 [running]:
github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x17bef60, 0x447d820)
	/home/travis/.gimme/versions/go1.10.1.linux.amd64/src/runtime/panic.go:502 +0x229
github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/client-go/tools/cache.(*processorListener).pop(0xc420a98d00)
	/home/travis/gopath/src/github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/client-go/tools/cache/shared_informer.go:529 +0x29c
github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/client-go/tools/cache.(*processorListener).(github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/client-go/tools/cache.pop)-fm()
	/home/travis/gopath/src/github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/client-go/tools/cache/shared_informer.go:388 +0x2a
github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc4214abcc8, 0xc4222a8d70)
	/home/travis/gopath/src/github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/home/travis/gopath/src/github.com/kubernetes-sigs/federation-v2/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62
FAIL	github.com/kubernetes-sigs/federation-v2/pkg/controller/federateddeploymentoverride	3.184s
=== RUN   TestFederatedDeploymentPlacement
Running Suite: FederatedDeploymentPlacement Suite
=================================================
`` `

</details>

Simplify placement

Resource placement currently requires resource-specific placement. To be propagated to a given cluster, a template must have that cluster listed in its associated placement resource, and the containing namespace must also have a placement resource listing that cluster.

It would be desirable to make namespace-based placement the default. If the placement for a namespace specified a given cluster, all federated resources in that namespace would be propagated to that cluster if resource-specific placement was not present.

Where more fine-grained control is required, resource-specific placement could be employed. Where both namespace and resource-specific placement resources existed, placement could be determined either additively (namespace clusters + resource clusters) or exclusively (resource clusters only). Both namespace and resource-specific placement resources could indicate via a boolean whether propagation should be restricted (e.g. propagate: true|false).

official image builds

We should have an official image that we publish for this repo; should be built from master for canary build and then tagged for releases.

How to deploy storage-provisioner

At step https://github.com/kubernetes-sigs/federation-v2/blob/master/docs/userguide.md#deploy-the-cluster-registry , it is requested to create storage-provisioner. Any steps can help me to set up this?

Deploy the Cluster Registry

Make sure the storage provisioner is ready before deploying the Cluster Registry.

kubectl -n kube-system get pod storage-provisioner

Availability

Hi there, not much an issue but I dont know where else to post it. Where can I find a timeline for when v2 will reach beta stage, if there is one? Also, is there a setup guide for how to create a test cluster with v2?

Higher level Job Preferences API and controller

k8s jobs has already been federated using the simple federation resources (template, placement and override) for jobs. This however does not provide higher level job scheduling logic, which can provide an automated distribution of parallelism and completions for the given job across federated clusters.
This particular use case seems quite valuable and calls for a higher level job scheduling API (aka JobSchedulingPreferences) and an associated controller logic.
This also provides feature parity with federation V1.

Bump dependency on cluster-registry to 0.0.5

The 0.0.5 version of the cluster-registry contains the CRD refactor and a new generated client (which should be very close to the old one, if not identical). We should revendor this version so that we're working with the current version of the cluster registry.

Add controller manager configuration (for alpha)

The controller manager currently runs all controllers. Ideally it would be possible to control whether each controller is started. Flags is a quick-n-dirty solution to this, and probably preferable if controllers are likely to live in separate repos post-alpha. A more maintainable solution might be storing configuration as ConfigMap (or whatever the current best practice is for Kube components).

Add federated Node Support

This can make sure have an overview of all the nodes in different clusters.

/cc @marun comments for this?

Upgrade kubebuilder to 0.1.12

cluster-registry kubebuilder has been upgraded to 0.1.12 at kubernetes-retired/cluster-registry#244 , it is better upgrade federation v2 to this version as well.

/cc @marun

Idea: multicluster service DNS

We should have an easy way to implement DNS provider integration for multi-cluster service DNS.

Goals:

Make a vendor-independent DNS controller that does 99% of the work for you of monitoring service resources in multiple clusters and writing an intermediate resource which is the interface for vendor implementations to program to
Vendors implement controllers that program their DNS solutions by reconciling the intermediate resource

TLDR

Note: names have been chosen carelessly to avoid wasting time, we can figure out best names later

TLDR: The work common to every implementation of multicluster DNS can be done by a controller that watches Kubernetes Service resources in multiple clusters and maintains a resource (for sake of discussion, let's call it MultiClusterServiceDNS) where each instance of the resource is DNS information for a single service. Vendors implement controllers that watch MultiClusterServiceDNS and program their DNS solution.

Dependencies

Cluster Registry to hold a list of clusters you care about
FederatedCluster resource to specify how to talk to each cluster you're interested in

Resources

The MultiClusterServiceDNS programs the dns controller to monitor a set of services. Here's a very simple example with toy semantics to illustrate the idea.

Resources of this type are aligned with K8s services via name/namespace association:

metadata.namespace is the namespace to monitor in each specified cluster
metadata.name is the name of the service in the namespace to monitor in each specified cluster

The spec contains information about the clusters to monitor and how to program DNS:

spec.dnsName is the DNS name to program
spec.clusters is a list of clusters to monitor

These are just toy semantics for illustration purposes; actual semantics TBD.

kind: MultiClusterServiceDNS
apiVersion: dns.multicluster.k8s.io
metadata:
  name: my-service
  namespace: my-ns
spec:
  dnsName: my-service.myorg.org
  clusters:
  - X
  - Y
  - Z
status:
  dns:
  - cluster: us-central-1
    ip: 123.4.5.0
  - cluster: us-east-1
    ip: 4.5.6.7
  - cluster: us-west-1
    ip: 10.1.254.5

Vendors integrate with this API by writing controllers that list/watch MultiClusterServiceDNS and program their DNS solutions.

Note: in the above example, a controller maintains the list of (cluster name, IP) pairs in the status of the MultiClusterServiceDNS resource. It is likely that status is not the best place for this information. When we discussed this idea in the April 18 2018 meeting of the Federation WG, we discussed a number of possibilities:

Have a controller put the endpoint information into another resource's spec; the MultiClusterServiceDNS resource's status would contain information on what resourceVersion the controller had seen and whether their was a problem communicating with one of the named clusters
Additionally, a new resource named in the previous bullet could be a new source in external-dns
Alternatively, we could instead add a new generic DNS resource to external-dns and have a controller that populated that new resource

Additionally, we discussed emulating the behavior of the v1 federation dns solution and programming additional DNS names with cluster/zone information, examples:

my-service.my-ns.myfederation.svc.myorg.org
my-service.my-ns.myfederation.svc.us-central-1.myorg.org
my-service.my-ns.myfederation.svc.us-east-1.myorg.org
my-service.my-ns.myfederation.svc.us-west-1.myorg.org

Deletion semantics for higher level resources

Federation v2 follows a layered approach such that the higher level resources depend on the lower level federated types for propogation and rconciliation of k8s objects into federated clusters.
To create, for example an automated distribution of deployment replicas, an user can only create a FederatedDeployment and ReplicaSchedulingPreference (as defined in #46). The higher level RSP resource controller will observe this and create an override (FederatedDeploymentOverride) and a placement (FederatedDeploymentPlacement) to reconcile this user intent into clusters. The RSP controller can further update the corresponding override and template to reconcile cluster capacity and/or clusters leaving or joining this federation.

A peculiar case is when the higher level resource is deleted; should the controller also delete the corresponding override and placement resources or leave them behind? This becomes peculiar, because the lower level resources (override and placement) might have existed prior to creation of the higher level resource, in which case the higher level resource, adopted and updated them, if needed.

The answer probably is to use a cascading deleting mechanism and/or a parentRef (ObjectMeta.OwnerReferences) in such a scenario.
The user can specify --cascade=true/false (with some default) to control this behaviour.

Namespace placement should be namespaced

Currently namespace placement is cluster-scoped, but this complicates the application of fine-grained authz. If only the 'tenant' for a given namespace should be allowed to affect its placement, then an explicit rbac rule for just that placement resource would be required. Changing the namespace placement type to be namespaced would allow for namespace-based authz instead.

Explore ways to facilitate deploying 'normal' k8s resources to federation

Created after discussion in Jun 13 2018 WG meeting. This issue is to explore ways to allow people to deploy workloads expressed in 'normal' k8s resources to federation with minimal or no modification. There's a very wide spectrum on things that we could do here, and there may be more than one thing that makes sense to do to provide different types of facilitation (ie, transforming YAML resources to having direct integration in tools like helm, to any number of other things).

Refactor reused constants to a common location

The constants used here are (or can be used) in multiple controllers. It makes sense to move them, and any such similar code to a common location, to reduce duplication.

CRD validation for federated type primitives

A goal in transitioning to CRDs (depends on #74) is to maintain the same quality of validation for the resources embedded in templates as is currently provided by first-class API types. Potential strategies:

Validate via decoding of object from JSON
Validate via OpenAPI spec:
- For a given template resource (e.g. FederatedSecret), find the FederatedTypeConfig that specifies the template type.
- For the target type of the FederatedTypeConfig, retrieve the OpenAPI spec.
- Validate the target resource embedded in the template with the OpenAPI spec.
- Note that the OpenAPI spec generated may lack the ability to validate string values as part of an enumerated string type e.g. imagePullPolicy being either Always, Never, or IfNotPresent. This may require updating the OpenAPI generation utility.

Extension via CRD

It should be possible to extend the basic mechanism of federation of a new type via a set of CRDs:

Template
Override
Placement

It would be cool if kubefnord would generate CRD templates for you.

Stop relying on apiserver-boot run in-cluster

Currently, the scripts/deploy_federation.sh runs apiserver-boot run in-cluster to deploy the federation bits into a cluster. We should instead have a set of resources (statically defined is fine at first, we might consider a helm chart or other form of packaging in the future) that can be deployed without requiring the user to have a dependency installed.

This likely depends on #82 to be usable.

Federated namespace placement doesn't recreate namespace after adding cluster back in

Going through instructions at https://github.com/font/k8s-example-apps/tree/master/sample-fed-v2-demo, particularly step https://github.com/font/k8s-example-apps/tree/master/sample-fed-v2-demo#update-federatednamespaceplacement, the test-namespace is successfully removed from us-east after updating the namespace placement to remove us-east from the list of clusters.

But when the cluster us-east is added back in, the test-namespace does not get recreated in the us-east cluster causing all the other resources to no get added back due to for example:

E0426 19:44:46.756125       1 controller.go:617] Failed to execute updates for FederatedReplicaSet "test-namespace/test-replicaset": [Failed to create FederatedReplicaSet "test-namespace/test-replicaset" in cluster us-east: namespaces "test-namespace" not found]

Waiting long enough will eventually see the test-namespace created, possibly because the watch operation expires and the controller retries a list operation.

If you restart the federation pod, the test-namespace will quickly reappear in the us-east cluster after the controller receives the list of namespaces and namespace placements. The rest of the resources (configmap, secret, replicaset) then reappear as well once the test-namespace exists.

Component configuration for controller

Follow-on to #101; this is to add (where it makes sense) additional configuration parameters beyond basic feature gating. For example: the push reconciler is enabled/disabled by feature gate, but the resync duration for the push reconciler would be configured by a distinct component configuration API field mapped to a CLI flag.

Policy-based validation in basic federation

There are two flavors of policy that we’ve discussed in Federstion WG:

Policy governing whether placement/override of resources is allowed
Policy applied by schedulers to make placement/override decisions/requests

This issue deals with flavor (1)

We could have a very generic validation webhook that could apply label-based policy decisions to any resource. As per #34, you should be able to apply policy to new types without doing a lot of work.

Implement unjoin

kubefed can join clusters but not unjoin them, complicating roundtripping of federation deployment.

`kubefnord join` failed

root@gyliu-dev1:~/go/src/github.com/kubernetes-sigs/federation-v2# ./bin/kubefnord join cluster138.k8s.local --host-cluster-context cluster138.k8s.local --add-to-registry --v=2
ERROR: logging before flag.Parse: I0705 06:17:36.583065    3075 join.go:125] Defaulting cluster context to joining cluster name cluster138.k8s.local
ERROR: logging before flag.Parse: I0705 06:17:36.583150    3075 join.go:129] Args and flags: name cluster138.k8s.local, host: cluster138.k8s.local, host-system-namespace: federation-system, kubeconfig: , cluster-context: cluster138.k8s.local, secret-name: , dry-run: false
ERROR: logging before flag.Parse: I0705 06:17:36.589103    3075 join.go:183] Performing preflight checks.
ERROR: logging before flag.Parse: I0705 06:17:36.611982    3075 join.go:270] Registering cluster with the cluster registry.
ERROR: logging before flag.Parse: I0705 06:17:36.621322    3075 join.go:278] Registered cluster with the cluster registry.
ERROR: logging before flag.Parse: I0705 06:17:36.621351    3075 join.go:204] Creating federation-system namespace in joining cluster
ERROR: logging before flag.Parse: I0705 06:17:36.636999    3075 join.go:212] Created federation-system namespace in joining cluster
ERROR: logging before flag.Parse: I0705 06:17:36.637048    3075 join.go:215] Creating cluster credentials secret
ERROR: logging before flag.Parse: I0705 06:17:36.637059    3075 join.go:404] Creating service account in joining cluster
ERROR: logging before flag.Parse: I0705 06:17:36.652299    3075 join.go:413] Created service account in joining cluster
ERROR: logging before flag.Parse: I0705 06:17:36.652333    3075 join.go:415] Creating role binding for service account in joining cluster
ERROR: logging before flag.Parse: I0705 06:17:36.676963    3075 join.go:525] Could not create cluster role binding for service account in joining cluster: ClusterRoleBinding.rbac.authorization.k8s.io "federation-controller-manager:cluster138.k8s.local-cluster138.k8s.local" is invalid: subjects[0].namespace: Required value
ERROR: logging before flag.Parse: I0705 06:17:36.677000    3075 join.go:420] Error creating role binding for service account in joining cluster: ClusterRoleBinding.rbac.authorization.k8s.io "federation-controller-manager:cluster138.k8s.local-cluster138.k8s.local" is invalid: subjects[0].namespace: Required value
ERROR: logging before flag.Parse: I0705 06:17:36.677011    3075 join.go:221] Could not create cluster credentials secret: ClusterRoleBinding.rbac.authorization.k8s.io "federation-controller-manager:cluster138.k8s.local-cluster138.k8s.local" is invalid: subjects[0].namespace: Required value
ERROR: logging before flag.Parse: F0705 06:17:36.677025    3075 join.go:105] error: ClusterRoleBinding.rbac.authorization.k8s.io "federation-controller-manager:cluster138.k8s.local-cluster138.k8s.local" is invalid: subjects[0].namespace: Required value

Higher level replica scheduling type and controller for deployments

Deployments like other resources could be federated using the template, placement and overrides. The capability of distributing non-equal number of replicas per deployment per cluster as an user intent can be specified using overrides. However a higher level automated control loop which watches the clusters and pods health per deployement per cluster, and distrbutes (and further redistributes if need be) the replicas has its own usages and benefits.
An API spec for a resource where an user can specify his/her intent, in terms of capping the replicas (min/max) per cluster and weights per cluster is certainly useful.
This resource also provides functional parity wrt federation v1 as deployment scheduling type.
The spec for this resource is already specified in 4c96be3.

<FederatedKind>Override cannot override immutable fields

@marun @font @onyiny-ang
@pmorie FYI

While I was trying to add Completions to FederatedJobClusterOverride type I realized that with today code it's impossible to override an immutable field.

I see this as a gap. My understanding is that overridden resource should be directly created while looks like they're updated.

Modified code is in this branch: https://github.com/sdminonne/federation-v2/tree/bug_fix

I'm investigating this issue, but it would be great if you may comment.

Thanks

API driven federation of arbitrary types

Related to #34; assuming that we have a CRD-based way to extend federation by creating some new CRD resources for:

template
override
placement

...then there should be an API resource to have these CRDs created automatically. Example:

kind: FederatedAPI
metadata:
  name: statefulset
spec:
  basis:
    group: apps
    version: v1
    kind: StatefulSet
  target:
    group: federation.k8s.io
    version: v1alpha1

A controller would list/watch this API resource and ensure that the correct CRDs were created for the named basis in the API group/version described by `target. In this example those would be:

FederatedStatefulSet (possibly creating the OpenAPI schema and validations based on the OpenAPI for the basis type)
FederatedStatefulSetOverride
FederatedStatefulSetPlacement

In order to have automatic federation of all the registered types in an API server, there could also be a controller that examines the discovery endpoint for a cluster and ensures a FederatedAPI exists for the right API services. There are a number of ways a controller might be configured, like:

configmap holding explicit configuration
annotations on APIService and CustomResourceDefinition resources

idea formulated with @marun

Work Items

Factor propagated version handling out of the sync controller (#273)
Factor type-specific version manager code into adapter (#276)
Add api generation to kubefed (??)

Rename kubefnord to something more appropriate

Seems like we can do a lot better than fnord w.r.t. understandability

https://en.wikipedia.org/wiki/Fnord

kubefed ?
fedctl ?

E2e tests for higher level resources

Some higher level resources need e2e tests for actual end to end functionality validation. Below is the list of current such resources which certainly need additional e2e tests:

ReplicaSchedulingPreferences as in #46.
MultiClusterDNS as in #59

The implementation of these tests is dependent on #89 (or some other strategy, for example self validation external to CI using managed clusters or BYOC).

Retrigger CI build

How to trigger a CI build if one fails due to temporary issue in network? Here is one such failure and i am unaware of the method to re-trigger the build. This could be a frequent issue to a developer.

/cc @kubernetes-sigs/federation-wg

Transition to kubebuilder

Federation v2 is currently based on apiserver-builder, which automates the creation of API types and deployment of an aggregated API server to expose created types. An aggregated API server imposes considerable operational cost and complexity, and it is desirable to remove this requirement in preparation for an alpha release. kubebuilder provides a CRD-based alternative to apiserver-builder to transition to. The following strategy is suggested:

Transition the primitives of one or more federated types (e.g. Secrets) to CRDs
Create a generic webhook that can validate the type resource embedded in a template resource
Transition the primitives for all supported types to CRD
Reboot the repo with kubebuilder
Transition system types (FederatedCluster, PropagatedVersion, FederatedTypeConfig) to CRD

A goal in transitioning to CRDs is to maintain the same quality of validation for the resources embedded in templates as is currently provided by first-class API types. The following strategy is suggested:

For a given template resource (e.g. FederatedSecret), find the FederatedTypeConfig that specifies the template type
For the target type of the FederatedTypeConfig, retrieve the OpenAPI spec
Validate the target resource embedded in the template with the OpenAPI spec

E2e setup and actual e2e validation in CI

Currently, the setup for integration and e2e is quite similar, as in only the k8s API server is run to emulate an actual cluster. This is probably suitable for majority cases, but results in limited testing scope especially in cases where feedback from actual resources run time is needed for functionality (eg, higher level scheduling type, service dns type, .. ). Also at some point the overall CI will need to involve some infrastructure which needs actual clusters for e2e validation.

This point was already discussed in federation workgroup sync and captured in notes here.
This issue is to have a trackeable location for the same.

Higher level HPA Preferences API and controller

Its easy to federate a given k8s resource with the current federation v2 infrastructure. Federated HPA is also no exception. Default propagation of hpa with possible overrides per cluster (as with using template, overrides and placements) has some, but limited usage. A higher level HPA scheduling API and controller with extended logic makes for a feature set with more usage and value.

A suitable resource name for this could be HPASchedulingPreferences (similar to ReplicaSchedulingPreferences) under group federatedscheduling.
This also provides feature parity with federation V1.

Flake in Integration test

Integration test seems to be flaky and randomly failing. I have noticed this issue quite a few times. Here is one such example, where the code in PR have no changes to integration tests. https://travis-ci.org/kubernetes-sigs/federation-v2/builds/381710195

--- FAIL: TestCrud (64.61s)
	logger.go:52: Starting a federation of 2 clusters...
	logger.go:52: Added cluster test-cluster-j9hwz to the federation
	logger.go:52: Added cluster test-cluster-8wp7f to the federation
	logger.go:52: Starting cluster controller
	logger.go:48: Federation started.
    --- FAIL: TestCrud/FederatedConfigMap (10.53s)
    	logger.go:52: Creating new FederatedConfigMap in namespace "1ac64e3e-300a-4e9e-b7da-0f528cad036f"
    	logger.go:52: Created new FederatedConfigMap "1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l"
    	logger.go:52: Creating new FederatedConfigMapPlacement in namespace "1ac64e3e-300a-4e9e-b7da-0f528cad036f"
    	logger.go:52: Created new FederatedConfigMapPlacement "1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l"
    	logger.go:52: Creating new FederatedConfigMapOverride in namespace "1ac64e3e-300a-4e9e-b7da-0f528cad036f"
    	logger.go:52: Created new FederatedConfigMapOverride "1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l"
    	logger.go:48: Resource versions for 1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l: template "79", placement "81", override "83"
    	logger.go:52: Waiting for ConfigMap "1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l" in cluster "test-cluster-8wp7f"
    	logger.go:52: Waiting for ConfigMap "1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l" in cluster "test-cluster-j9hwz"
    	logger.go:52: Updating FederatedConfigMap "1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l"
    	logger.go:52: Waiting for ConfigMap "1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l" in cluster "test-cluster-j9hwz"
    	logger.go:52: Waiting for ConfigMap "1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l" in cluster "test-cluster-8wp7f"
    	logger.go:52: Updating FederatedConfigMapPlacement "1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l"
    	logger.go:52: Waiting for ConfigMap "1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l" in cluster "test-cluster-j9hwz"
    	logger.go:52: Waiting for ConfigMap "1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l" to be deleted from cluster "test-cluster-8wp7f"
    	logger.go:52: Deleting FederatedConfigMap "1ac64e3e-300a-4e9e-b7da-0f528cad036f/test-crud-p762l"
    	logger.go:44: Expecting PropagatedVersion configmap-test-crud-p762l to be deleted
    --- PASS: TestCrud/FederatedDeployment (3.06s)
    	logger.go:52: Creating new FederatedDeployment in namespace "cf155248-e7ec-484b-a294-1fcbd370315c"
    	logger.go:52: Created new FederatedDeployment "cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4"
    	logger.go:52: Creating new FederatedDeploymentPlacement in namespace "cf155248-e7ec-484b-a294-1fcbd370315c"
    	logger.go:52: Created new FederatedDeploymentPlacement "cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4"
    	logger.go:52: Creating new FederatedDeploymentOverride in namespace "cf155248-e7ec-484b-a294-1fcbd370315c"
    	logger.go:52: Created new FederatedDeploymentOverride "cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4"
    	logger.go:48: Resource versions for cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4: template "106", placement "108", override "109"
    	logger.go:52: Waiting for Deployment "cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4" in cluster "test-cluster-j9hwz"
    	logger.go:52: Waiting for Deployment "cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4" in cluster "test-cluster-8wp7f"
    	logger.go:52: Updating FederatedDeployment "cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4"
    	logger.go:52: Waiting for Deployment "cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4" in cluster "test-cluster-j9hwz"
    	logger.go:52: Waiting for Deployment "cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4" in cluster "test-cluster-8wp7f"
    	logger.go:52: Updating FederatedDeploymentPlacement "cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4"
    	logger.go:52: Waiting for Deployment "cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4" in cluster "test-cluster-j9hwz"
    	logger.go:52: Waiting for Deployment "cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4" to be deleted from cluster "test-cluster-8wp7f"
    	logger.go:52: Deleting FederatedDeployment "cf155248-e7ec-484b-a294-1fcbd370315c/test-crud-r5kn4"
    --- PASS: TestCrud/FederatedJob (2.88s)
    	logger.go:52: Creating new FederatedJob in namespace "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a"
    	logger.go:52: Created new FederatedJob "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g"
    	logger.go:52: Creating new FederatedJobPlacement in namespace "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a"
    	logger.go:52: Created new FederatedJobPlacement "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g"
    	logger.go:52: Creating new FederatedJobOverride in namespace "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a"
    	logger.go:52: Created new FederatedJobOverride "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g"
    	logger.go:48: Resource versions for a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g: template "142", placement "144", override "145"
    	logger.go:52: Waiting for Job "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g" in cluster "test-cluster-j9hwz"
    	logger.go:52: Waiting for Job "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g" in cluster "test-cluster-8wp7f"
    	logger.go:52: Updating FederatedJob "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g"
    	logger.go:52: Waiting for Job "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g" in cluster "test-cluster-8wp7f"
    	logger.go:52: Waiting for Job "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g" in cluster "test-cluster-j9hwz"
    	logger.go:52: Updating FederatedJobPlacement "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g"
    	logger.go:52: Waiting for Job "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g" in cluster "test-cluster-j9hwz"
    	logger.go:52: Waiting for Job "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g" to be deleted from cluster "test-cluster-8wp7f"
    	logger.go:52: Deleting FederatedJob "a83a87f5-f4f0-4f1e-90e9-35e2b0c79e2a/test-crud-fmw4g"
    --- FAIL: TestCrud/FederatedReplicaSet (4.03s)
    	logger.go:52: Creating new FederatedReplicaSet in namespace "cbb25e54-72ec-4719-8858-a9d5f50cbb0f"
    	logger.go:52: Created new FederatedReplicaSet "cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24"
    	logger.go:52: Creating new FederatedReplicaSetPlacement in namespace "cbb25e54-72ec-4719-8858-a9d5f50cbb0f"
    	logger.go:52: Created new FederatedReplicaSetPlacement "cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24"
    	logger.go:52: Creating new FederatedReplicaSetOverride in namespace "cbb25e54-72ec-4719-8858-a9d5f50cbb0f"
    	logger.go:52: Created new FederatedReplicaSetOverride "cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24"
    	logger.go:48: Resource versions for cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24: template "170", placement "172", override "174"
    	logger.go:52: Waiting for ReplicaSet "cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24" in cluster "test-cluster-j9hwz"
    	logger.go:52: Waiting for ReplicaSet "cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24" in cluster "test-cluster-8wp7f"
    	logger.go:52: Updating FederatedReplicaSet "cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24"
    	logger.go:52: Waiting for ReplicaSet "cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24" in cluster "test-cluster-j9hwz"
    	logger.go:52: Waiting for ReplicaSet "cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24" in cluster "test-cluster-8wp7f"
    	logger.go:52: Updating FederatedReplicaSetPlacement "cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24"
    	logger.go:52: Waiting for ReplicaSet "cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24" to be deleted from cluster "test-cluster-8wp7f"
    	logger.go:52: Waiting for ReplicaSet "cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24" in cluster "test-cluster-j9hwz"
    	logger.go:52: Deleting FederatedReplicaSet "cbb25e54-72ec-4719-8858-a9d5f50cbb0f/test-crud-9wq24"
    	logger.go:44: Expecting PropagatedVersion replicaset-test-crud-9wq24 to be deleted

Improve error handling for `join`

If a cluster join failed, then there will be some garbage data, such as servieAccount, cluster etc, we should make join delete those garbage resources when join failed.

/cc @marun @shashidharatd @onyiny-ang

Basic walkthrough and associated test script

There should be a relatively simple walkthrough that guides users through the experience of kicking the tires on federation-v2 by deploying a very simple application. Other attributes of this:

Should be possible to do using just minikube (ie, no cloud provider required)
Should cover installing federation and adding new clusters into a federation
There should be a test script that exercises the walkthrough and ensures that the walkthrough 'works'.

Align FederatedCluster with cluster-registry

In 0.0.5, the cluster-registry becomes namespaced; currently, our FederatedCluster resource is cluster-scoped. We should align FederatedCluster with Cluster by making it namespaced. As part of this, we should decide how federation should use the 'globally readable' namespace in the cluster-registry.

Where to get federation image

In step https://github.com/kubernetes-sigs/federation-v2/blob/master/docs/userguide.md#automated-deployment ,

Automated Deployment

If you would like to have the deployment of the cluster registry and federation-v2 control plane automated, then invoke the deployment script by running:

./scripts/deploy-federation.sh cluster2
Where is in the form //: e.g. docker.io//federation-v2:test.

Where to get the image of federation-v2:test?

Any steps for me to build this image?