gardener / gardener Goto Github PK

Kubernetes-native system managing the full lifecycle of conformant Kubernetes clusters as a service on Alicloud, AWS, Azure, GCP, OpenStack, vSphere, KubeVirt, Hetzner, EquinixMetal, MetalStack, and OnMetal with minimal TCO.

Home Page: https://gardener.cloud

License: Apache License 2.0

Makefile 0.23% Shell 1.15% Smarty 0.27% Go 98.22% Dockerfile 0.01% Python 0.04% Mustache 0.07%

kubernetes gardener golang aws azure gcp openstack cluster alicloud metalstack

gardener's Introduction

Gardener

Gardener implements the automated management and operation of Kubernetes clusters as a service and provides a fully validated extensibility framework that can be adjusted to any programmatic cloud or infrastructure provider.

Gardener is 100% Kubernetes-native and exposes its own Cluster API to create homogeneous clusters on all supported infrastructures. This API differs from SIG Cluster Lifecycle's Cluster API that only harmonizes how to get to clusters, while Gardener's Cluster API goes one step further and also harmonizes the make-up of the clusters themselves. That means, Gardener gives you homogeneous clusters with exactly the same bill of material, configuration and behavior on all supported infrastructures, which you can see further down below in the section on our K8s Conformance Test Coverage.

In 2020, SIG Cluster Lifecycle's Cluster API made a huge step forward with v1alpha3 and the newly added support for declarative control plane management. This made it possible to integrate managed services like GKE or Gardener. We would be more than happy, if the community would be interested, to contribute a Gardener control plane provider. For more information on the relation between Gardener API and SIG Cluster Lifecycle's Cluster API, please see here.

Gardener's main principle is to leverage Kubernetes concepts for all of its tasks.

In essence, Gardener is an extension API server that comes along with a bundle of custom controllers. It introduces new API objects in an existing Kubernetes cluster (which is called garden cluster) in order to use them for the management of end-user Kubernetes clusters (which are called shoot clusters). These shoot clusters are described via declarative cluster specifications which are observed by the controllers. They will bring up the clusters, reconcile their state, perform automated updates and make sure they are always up and running.

To accomplish these tasks reliably and to offer a high quality of service, Gardener controls the main components of a Kubernetes cluster (etcd, API server, controller manager, scheduler). These so-called control plane components are hosted in Kubernetes clusters themselves (which are called seed clusters). This is the main difference compared to many other OSS cluster provisioning tools: The shoot clusters do not have dedicated master VMs. Instead, the control plane is deployed as a native Kubernetes workload into the seeds (the architecture is commonly referred to as kubeception or inception design). This does not only effectively reduce the total cost of ownership but also allows easier implementations for "day-2 operations" (like cluster updates or robustness) by relying on all the mature Kubernetes features and capabilities.

Gardener reuses the identical Kubernetes design to span a scalable multi-cloud and multi-cluster landscape. Such familiarity with known concepts has proven to quickly ease the initial learning curve and accelerate developer productivity:

Kubernetes API Server = Gardener API Server
Kubernetes Controller Manager = Gardener Controller Manager
Kubernetes Scheduler = Gardener Scheduler
Kubelet = Gardenlet
Node = Seed cluster
Pod = Shoot cluster

Please find more information regarding the concepts and a detailed description of the architecture in our Gardener Wiki and our blog posts on kubernetes.io: Gardener - the Kubernetes Botanist (17.5.2018) and Gardener Project Update (2.12.2019).

K8s Conformance Test Coverage

Gardener takes part in the Certified Kubernetes Conformance Program to attest its compatibility with the K8s conformance testsuite. Currently Gardener is certified for K8s versions up to v1.29, see the conformance spreadsheet.

Continuous conformance test results of the latest stable Gardener release are uploaded regularly to the CNCF test grid:

Provider/K8s	v1.29	v1.28	v1.27	v1.26	v1.25
AWS
Azure
GCP
OpenStack
Alicloud
Equinix Metal	N/A	N/A	N/A	N/A	N/A
vSphere	N/A	N/A	N/A	N/A	N/A

Get an overview of the test results at testgrid.

Start using or developing the Gardener locally

See our documentation in the /docs repository, please find the index here.

Setting up your own Gardener landscape in the Cloud

The quickest way to test drive Gardener is to install it virtually onto an existing Kubernetes cluster, just like you would install any other Kubernetes-ready application. You can do this with our Gardener Helm Chart.

Alternatively you can use our garden setup project to create a fully configured Gardener landscape which also includes our Gardener Dashboard.

Feedback and Support

Feedback and contributions are always welcome!

All channels for getting in touch or learning about our project are listed under the community section. We are cordially inviting interested parties to join our bi-weekly meetings.

Please report bugs or suggestions about our Kubernetes clusters as such or the Gardener itself as GitHub issues or join our Slack channel #gardener (please invite yourself to the Kubernetes workspace here).

Learn More!

Please find further resources about our project here:

gardener's People

Stargazers

Watchers

Forkers

rfranzke dkistner mvladev gehoern swapnilgm mliepold docktofuture onkarkadam7 slimakcz yagrxu vpnachev hoeltcl hardikdr plkokanov richardyuwen chombium vasu1124 gnavgire raghu999 sasg therockstardba dylan2010 ptc-global devopsbox tirasmuturi matthew-marchetti amshuman-kr prashanth26 zachpuck databus23 talits kinvolk-archives stevensherah ggaurav10 fengyunpan2 jessiewu2020 shturec natasab al3xanderschmidt adracus shamanthmps jsrivatsava misterniall jwerak dkusidlo kubematic marwinski minishri timuthy harsimranmaan minchaow nikhilsap dongheekang johscheuer kirilkabakchiev symfrog terry1504 gyliu513 colinjoy martinweindel wyb1 vlvasilev zanetworker hyoussef deeptimeseries jagadeeshgowd lqj701 ialidzhikov ajaycsse wgentine kristianzh baseyou mileworks tarkleigh pc-tradelab jecho mssedusch urabenst deitch huangyingting holgerkoser gonzolino kubeforge hurner engarpe brunoklein99 schrodit vgangireddyin shammishailaj uchit66 genkovv majst01 stoyanr danielfoehrkn nucialtoiti georgeguo2018 ajesse11x shreyas-s-rao xaviercallens thomasdraebing

gardener's Issues

Make Backup Status Available

Issue by vlerenc
Tuesday Oct 24, 2017 at 00:35 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/161

Right now, the user (nor operator for that matter) has access to the status (success or failure) of the backups, their full list with date/time (taken, took) or the size. As a first step towards on-demand restore, this would be helpful.

It is not clear, whether this should be handled within the operator (as additional status information) or put into additional CRD "backup" resources that are then read from the seed cluster/shoot namespace. Other implementation proposals are welcome.

Shoot Cluster on Bate Metal PoC

Project Member RoleBindings in Seed Clusters

Story

As operator I want access to the shoot cluster namespace within the seed cluster, so that I can do my ops work.

Motivation

In order to run operations for all clusters within a project, the future operators (we for now) need access to the shoot cluster namespace within the seed cluster for everything in it. At present, access is only possible via the admin credentials.

Acceptance Criteria

Project members have access to the shoot cluster namespace within the seed cluster

Implementation Proposal

Put seed cluster under RBAC and grant all project members for a cluster appropriate role bindings in the corresponding seed cluster and shoot cluster namespace therein.

Open Questions

How to handle changes to the project member list? Is the Gardener really checking those in the Garden cluster and duplicating them across all seed clusters and shoot cluster namespaces?

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Enable Local Storage

Story

As user I want local storage, so that I get optimal performance for workload depends on it/can cope with it.

Motivation

Network-attached storage carries only that far, but some applications require local storage (Hadoop). This software, as it must expect hardware failures, usually copes well with/tolerates such failures.

Acceptance Criteria

Local storage (under quota)
When pods die, the scheduler tries to bring them up on the same node or deletes the data otherwise

Resources

See https://github.com/vishh/community/blob/ba62a3f6cb9a301e95c4b64b9052455bdac9a3fe/contributors/design-proposals/local-storage-overview.md and community progress tracker kubernetes/enhancements#245.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?

Keep Backups after Cluster Deletion

Issue by vlerenc
Tuesday Oct 24, 2017 at 00:24 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/159

At present, all backups are deleted when the cluster is deleted. It would be better to not delete them right away and instead keep them for some time and let a garbage collector of sorts delete only those older than a certain grace period (e.g. one month).

Multi-Tenancy

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:45 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/46

Stories

As user I dont want others to see or interact with (get logs, kill pod) containers deployed by me, so that nobody can spy on me.
As user I dont want to accidentally break containers deployed by others, so that I dont harm anybody.
As operator I want to utilise the hardware as best as possible, so that I can offer a competitive price.
As operator I want to manage as few as possible clusters, so that I can do a better job on these clusters.

Motivation

In order to not having to deploy too many individual clusters, we ought to be able to let multiple teams (later maybe even customers) work concurrently on one cluster.

Acceptance Criteria

No non-admin user can deploy privileged containers
Users from one namespace cant see or interact with (get logs, kill pod) containers deployed into other namespaces
Containers from one namespace cant access other containers in any direct way

Implementation Proposal

RBAC, security and network policies (defined per namespace) are the concepts that need to be explored at the time of this writing.

ℹ️ Migrated from Jira issue KUBE-23

Garden, Seed & Shoot Cluster Disaster Recovery

Story

As provider I need a plan for disaster recovery, so that I can resurrect lost infrastructure and restore my service.

Acceptance Criteria

Please find out what we need to do when we completely lose a shoot or even seed or soil cluster:

Total loss and recreation of only the shoot cluster worker environment: What do we need to do to reconstruct the worker environment if we lose it from the VPC and instance profiles down to the VMs (most likely a no-brainer, just rerun the provisioning)
Total loss and recreation of the shoot cluster control plane: What do we need to do to reconstruct the shoot cluster control plane (assuming we lost everything from the namespace down to the etcd persistence, but have a backup of the latter)
Total loss and move to another soil or seed cluster: What do we need to do to move to/ reconstruct the soil or seed cluster (assuming we lost everything from the cluster itself down to the etcd persistence, but have a backup of the latter)
Total loss and move to another garden cluster: What do we need to do to move to/reconstruct the garden cluster (assuming we lost everything from the cluster itself down to the etcd persistence, but have a backup of the latter)

Please manually execute the reconstruction of the above scenarios, validate the reconstructed clusters behave properly, and document the procedure in this ticket.

Implementation Proposal

The idea is to learn how to move a control plane from one soil/seed cluster to another.

Resources

Maybe Heptio Ark (disaster recovery) may be interesting to check out?

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?

Allow to Switch On Alpha/Beta Features

Issue by vlerenc
Monday Nov 13, 2017 at 03:53 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/198

We knew time would come that users want to tweak the control plane (configure authentication, switch on alpha or beta features). Now Frank was the first to request that (apiVersion: admissionregistration.k8s.io/v1alpha1).

How can we enable him and similar use cases, both in operator and in UI?

Spot / Low-Priotity / Preemptible VM Support

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:45 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/49

Story

As operator I want to use AWS spot, Azure Low-Priotity or GCP Preemptible VM instances, so that my landscape runs at lower costs.

Motivation

Money, sure, but also some form of chaos monkey that should help train the application developers that all resources will eventually fail.

Acceptance Criteria

Additional instance group with this machine type possible to create
- https://aws.amazon.com/en/ec2/spot/
- https://cloud.google.com/compute/docs/instances/preemptible
- https://azure.microsoft.com/en-gb/blog/announcing-public-preview-of-azure-batch-low-priority-vms/ (seems not to be general purpose?)
Nodes are either (at least) labelled or maybe even tainted so that only tolerating pods (with strict replication onto multiple nodes) are scheduled onto them

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

ℹ️ Migrated from Jira issue KUBE-95

Resilience & Robustness Hardening Concept

Check out chaos tools like (ask Juergen Schneider for input):

If so, how can we use #31 and one or more of the tools above for resilience & robustness hardening and integrate it into integration testing (for example enabled on staging to introduce random chance)? Please provide a concept paper for resilience & robustness hardening.

Investigate Multi-Tenancy

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:45 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/47

See KUBE-23 and find out what we truly need in order to implement that epic. Key issues are security (RBAC, security policies, exploits) and isolation (disk quota, network policies, fair share of resources), but also things that will become necessary in a shared environment (we know of metering, but there are certainly more topics that will require work in a shared environment). Please report results here.

Hints:

Hierarchical Namespaces, in prep: kubernetes/kubernetes#36716
Service Mesh, more versatile than plain network policies offering also other advantages: https://istio.io

ℹ️ Migrated from Jira issue KUBE-41

Backup & Restore

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:43 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/7

Stories

As operator I want to restore a broken cluster, so that I can ensure business continuity.

Motivation

We must ensure that if disaster strikes (human error or hardware failure alike), we can recreate the entire cluster.

Acceptance Criteria

Delete the etcd volumes, then recreate them from a backup
Delete the entire cluster, then recreate it from a backup

ℹ️ Migrated from Jira issue KUBE-17

Add Prometheus Monitoring

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:43 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/8

Stories

As operator I want internal monitoring, so that I can check on my cluster(s).
As operator I want alerts on issues available in internal monitoring, so that I dont have to check on my cluster(s) manually all the time.

Motivation

We need means to monitor our own cluster(s).

Acceptance Criteria

All key metrics are exposed to admins in a remotely accessible UI
- Number of nodes, pods, containers, services (running, failing)
- Corresponding infrastructure components
- Cluster load, free memory, free disk space
- Key data is put onto a dashboard
Historic data is kept for the past two weeks
When we reach certain thresholds, alerts are sent to pre-configured e-mail address(es)

ℹ️ Migrated from Jira issue KUBE-21

Investigate GKE

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:44 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/15

Check out https://cloud.google.com/container-engine. What would be necessary to do in order to remotely provision/manage/operate/access those clusters (instead of our native ones on that IaaS) and what would be the limitations? Please report results here.

ℹ️ Migrated from Jira issue KUBE-89

[ci-pipeline ] - early-exit for broken garden-operators

Issue by ccwienk
Thursday Oct 19, 2017 at 13:39 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/150

if garden-operator binary is obviously broken (as in: panics upon simple ./garden-operator --help or similar smoke-test), do not bother to run ci-pipeline
maybe this is already handled by existing unit tests?

Workflow Jobs Testing Capacity Shortage on Seed Clusters

We had to terminate the cluster auto scaler on the seeds because:

An older Calico version is deployed there and we know that it results in issues after lots of scale-up/down due to bugs that fail to cleanup IP ranges
It still lacks (was said to be implemented for v1.8 but wasn't) the feature to reserve excess capacity and our seed clusters started breathing with the integration tests, which resulted in many issues as pods take minutes to be scheduled

However, now we have a fixed number of workers in the seeds. This means we have to scale manually, but lack indication thereof.

Therefore we need workflow jobs per seed that test for the used vs. the available capacity and warn the operator of a coming shortage.

Reserve Excess Cluster Capacity

Issue by vlerenc
Tuesday Aug 08, 2017 at 05:59 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/71

Story

As user I want to be able to schedule my workload (if of reasonable size) in my shoot cluster in seconds, even if all nodes of the shoot cluster are busy, so that I am more productive.
As operator I want to be able to create new shoot cluster control planes in the seed clusters in seconds, even if all nodes of the seed cluster are busy, so that I am more productive.

Motivation

The cluster autoscaler seems to react only when pods can't be scheduled on any of the current nodes due to insufficient resources. That means, it takes minutes after the event until the pod can be scheduled.

Acceptance Criteria

We run the cluster with excess capacity, a configurable combination of an absolute excess capacity minimum and a percentage, e.g. (calculation here in "nodes", but may be CPU/Memory, too, if that's easier/better): max(1, 5% of the current number of nodes)

Implementation Proposal

None, but if this can't be done with the current cluster autoscaler and nobody works on it, consider whether a contribution would be possible.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?

Improve Logging / GroupBy Cluster and Operation + Show As Task List

Story

As operator (of the week) I want to access cluster operations logs (create, update/reconcile, delete) in a convenient way, so that I can easily see what the outcome of each operation was without trying to grep the information out of the overall/global Gardener logs.

Motivation

We should improve the way how we interact/access the Gardener/cluster logs. As an operator (of the week) I frequently have to check what the Gardener says/logs on a certain cluster and operation. Now with the reconciler, the logs grew and with all the other planned features, they will even grow more. Of course, we will have a logging stack eventually, but that's not what this story is about. Usually I need to know something specific about a particular cluster, for a particular operation at a particular date/time. A logging stack and UI will allow me to set filters, but it won't be semantic in this sense. Also, we can't give away the entire operator logs via a logging stack to the entirety of the people that will later use the Gardener dashboard as operators (there must be strong isolation, so that we can have only one Gardener instance running).

Acceptance Criteria

Keep track of logs per operation

Implementation Proposal

What springs to mind is something similar to the Bosh tasks and logs. There, the user nicely sees what the Bosh director does and did: A creation of a certain deployment, a deletion of one, any scan & fix tasks that happened on it, etc. Information (collection of logs) on that granularity (cluster + operation) would be a great improvement over a large shared operator log where I have the access rights issue and can only filter by properties instead of seeing (in a list) which operations where carried out when on one or all of my clusters in my project.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Automated Updates

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:45 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/53

Stories

As user I want the latest version with all security patches and bug fixes, so that my cluster is safe and sound.
As provider I can update the OS, so that I can deploy important security patches and bug fixes quickly.
As provider I can update Kubernetes and its components, so that I can deploy important security patches and bug fixes quickly.

Motivation

See above.

Acceptance Criteria

Changes to the OS can be rolled through all self-controlled landscapes
Changes to Kubernetes and its components can be rolled through all self-controlled landscapes
CI process that keeps track of patches, automatically pre-validates them, and prepares a PR for review

ℹ️ Migrated from Jira issue KUBE-16

Garden Operator CI

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:44 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/24

Stories

As contributor I depend on a continuous integration/delivery pipeline, so that I neither waste my time with repetitive tasks nor can I make (fatal) mistakes.

Motivation

Newly hired or joined colleagues or contributors from outside the group of colleagues who primarily work on a repo/subject/topic have a hard time contributing, especially when sitting in remote locations, but in order to sustainably grow, we need everybody to be happy and help our cause.

Acceptance Criteria

Details (not complete, to be extended by all of us as we go along)

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?

ℹ️ Migrated from Jira issue KUBE-149

Condense Backups

Issue by vlerenc
Tuesday Oct 24, 2017 at 00:29 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/160

Right now we take only backups for the past 7 days (once a day), but it would be better to have a more flexible backup plan and condense older backups:

Run backups hourly
Keep only the last 24 hourly backups and of all other backups only the last backup in a day
Keep only the last 7 daily backups and of all other backups only the last backup in a week
Keep only the last 4 weekly backups

Provision via ACS

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:44 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/16

Stories

As provider I want to provision Kubernetes via ACS, so that we dont have to provision Kubernetes ourselves on Azure (when SAP or customers need to run it side-by-side) and benefit from a probably more competitive solution than whatever we can do (in terms of full management and pricing).

Motivation

Do not engage in Kubernetes provisioning when there are competitive native solutions, especially when customers already run on that infrastructure and use/manage/operate the same native solutions.

Acceptance Criteria

ACS Kubernetes cluster can be created via the Gardener and offers about the same functionality as what we have on AWS (to be clarified)

ℹ️ Migrated from Jira issue KUBE-83

Multi-Cloud

Stories

As provider I want to provision on AWS, Azure, GCP and OpenStack, so that Kubernetes runs in the Public Hyper-Scale Clouds where our customers already have their footprint.

Motivation

We shall go where our customers are, i.e. multi-cloud and also our own DCs. We shall offer this in an operator-friendly way with simplified operations.

Acceptance Criteria

Clusters can be created, operated and destroyed (following the seed & shoot cluster approach):

Kubernetes on AWS (covering the same functionality as our kops-based solution)
Kubernetes on Azure to (almost) the same extent as we have it on AWS
Kubernetes on GCP to (almost) the same extent as we have it on AWS
Kubernetes on OpenStack to (almost) the same extent as we have it on AWS

Pods/Infrastructure Not Reconciled After Restore

Issue by vlerenc
Tuesday Nov 07, 2017 at 16:06 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/186

We seem to have issues when we restore a backup: Pods/containers that were created after the last backup turn into unmanaged zombies once the backup is restored, i.e. the kubelet doesn't kill them even though it should know that it created them once, but they are no longer scheduled for that node.

The assumption is that we may have similar issues with PVs or LBs.

Therefore we should first check whether we have indeed reconciliation issues with PVs or LBs. Next we should check what else (that we control) might be misaligned/turn into zombies. Finally we should come up with a plan how to address these issues. Maybe there are already issues filed by others or we need to file new ones. Maybe we need to contribute or we can mitigate with minimal effort (e.g. a rolling update will resolve the pod/container issue from above).

As for the customer load in the shoot cluster as such: If Kubernetes isn't addressing this issue, we can't. We should therefore inform the customers, in general (a priori) and when it happens (on occasion). Other than that, I see no way we could reconcile whatever the customer is running as cluster workload (could be custom resource definitions with all kinds of side effects or anything else). We simply don't know and can't handle this situation in a clean way.

Alert Volume

Issue by vlerenc
Tuesday Oct 24, 2017 at 07:03 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/163

Please investigate the current alert volume (at present far too many for manual inspection). Are we having too many alerts fired or too early? If not, what's wrong with the clusters, what do we need to change, which features are missing?

The key point is: How can we improve either (alerts or clusters) and lower the alert volume to a reasonable amount that an operator can/should investigate (at present, that's simply not possible, giving the sheer alert volume).

Add Cluster Logging

Stories

As operator I want cluster logging, so that I can check on my cluster(s).

Motivation

We need means to get to the logs of our own (cluster)s, but we can influence/bend this BLI. The goal is to improve the work of an operator (of the week), not to provide a 100 % solution. Certain logs are much more important than others, e.g. etcd and machine controller logs are kind of critical, API server is important as well (but may generate a lot of logs, of which I can't tell at the moment if we can hold them all at a reasonable price), but others are less important or don't need to preserved for long. If that's the case, we may not have to stuff them into the logging stack (which makes it less expensive) as we always have the Docker logs on the machines themselves. If e.g. networking fails, it fails immediately. There was never the need to go back two weeks to dig out some old logs, so maybe we do not need to preserve the daemon sets logs in our shoot cluster. Keep that in mind to reasonably form this BLI.

Acceptance Criteria

All pod/container logs are exposed to admins in a remotely accessible UI
- Shoot cluster control plane in the seed cluster (if we need, we can cut that down to the more important pods like e.g. etcd and machine controller)
- Shoot cluster addons in the shoot cluster (if we need, we can omit that)
Historic data is kept for the past two weeks (and truncated if it exceeds a certain unreasonable size)
CPU & Memory footprint is reasonable/small (if not we may have to consider having only one stack per seed; in that case we might also rethink our monitoring stack approach)

Out Of Scope

Like with the monitoring data we accept the loss of all logs when the Gardener has to drain a seed of its shoot clusters or loses its seed cluster for good and must move the shoot cluster control plane to another seed cluster

Implementation Proposal

Please check out the EFK stack (Elasticsearch + fluentd + Kibana; see various blogs, e.g. https://logz.io/blog/kubernetes-log-analysis). Is it what we should use?

If yes, should we handle it like the monitoring stack:

Aim at a logging stack that is co-deployed with every shoot cluster control plane and monitoring stack
Investigate what needs to be done technically to get to the logs in the shoot cluster namespace in the seed and the kube-system namespace in the shoot
Find our what needs to be done to contain the costs, i.e. keep historic data for the past two weeks and truncate it if it exceeds a certain unreasonable size

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Hibernate/Wake-Up Clusters

Story

As user/operator I want to hibernate clusters, so that I can save costs when I don't need the clusters anymore/temporarily.

Motivation

We have seen that users try to hibernate their clusters to save costs (we were also asked how to do that). That's actually great and we haven't seen this form of responsibility ever elsewhere.

However, users have no means at present to do so in a clean way. One user manually tweaked the ASG, something that turned off all nodes and hence also the cluster auto-scaler, which turned the cluster into NotReady in our alert dashboard/flagged the cluster, which again created unnecessary ops effort. With the machine controller manager that path will be blocked (we will grasp more control of the cluster to provide our SLAs) and we will revert or rather make such tweaks impossible.

Acceptance Criteria

Cluster can be scaled down to 0 machines across all pools
No false monitoring alerts fired

Implementation Proposal

Use the machine controller manager to achieve this. Alerts are temporarily disabled (VPN down, daemon sets down, nodes gone).

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Automated Cluster Updates

Stories

As user I want the latest version with all security patches and bug fixes, so that my cluster is safe and sound.
As provider I can update the OS, so that I can deploy important security patches and bug fixes quickly.
As provider I can update Kubernetes and its components, so that I can deploy important security patches and bug fixes quickly.

Motivation

See above.

Acceptance Criteria

Modification of shoot CRDs with a kubernetesVersion (e.g. v1.5), and an operatorVersion (that was used to create or last update the cluster, e.g. v1.35.0)
Rolling update when:
- OS version gets updated
- Kubelet version (only for new Kubernetes minor versions, otherwise by updating the cloud-config secrets)
Idempotent cluster update (Terraform & Control Plane) in all other cases (operator changes, Kubernetes patches, image or configuration changes)
Support multiple shoot cluster Kubernetes versions

Further Considerations

Kubernetes version upgrades (e.g. v1.5->v1.6) must be approved and scheduled by the end-user.
We agreed to postpone CoreOS Container Linux FastPatch updates (#37), because the nodes require a restart also with FastPatch and this isn't worth the effort on an IaaS platform where we can simply run a rolling-update of the nodes.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

CoreOS Container Linux FastPatch Updates

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:45 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/37

Story

As operator I want OS patches (functional and security) to be deployed as fast and automated as possible and ideally without any significant downtime, so that my cluster remains sound and safe.

Motivation

While we can do rolling updates on IaaS hyperscale providers (for now), we shouldnt do that respectively waste too much time doing it on bare metal.

Acceptance Criteria

Update shoot cluster worker nodes using FastPatch (a.k.a. a/b channel updates without replacing the VM)
- Shoot cluster TPR should hold exact garden operator version (which includes configuration such as CoreOS Container Linux image tag and version)
- When a new garden operator gets deployed, it checks existing shoot clusters and updates them according to a canary process
  - At first only one cluster is updated, validating that it properly works after update
  - Then all other clusters are updated in parallel according to a max_in_flight property

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

ℹ️ Migrated from Jira issue KUBE-170

Provision via GKE

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:44 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/14

Stories

As provider I want to provision Kubernetes via GKE, so that we dont have to provision Kubernetes ourselves on GCP (when SAP or customers need to run it side-by-side) and benefit from a presumably more competitive solution than whatever we can do (in terms of full management and pricing).

Motivation

Do not engage in Kubernetes provisioning when there are competitive native solutions, especially when customers already run on that infrastructure and use/manage/operate the same native solutions.

Acceptance Criteria

GKE Kubernetes cluster can be created via the Gardener and offers about the same functionality as what we have on AWS (to be clarified)

ℹ️ Migrated from Jira issue KUBE-84

Investigate ACS

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:44 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/17

Check out https://azure.microsoft.com/services/container-service. What would be necessary to do in order to remotely provision/manage/operate/access those clusters (instead of our native ones on that IaaS) and what would be the limitations? Please report results here.

ℹ️ Migrated from Jira issue KUBE-88

Investigate Vault Integration with Kubernetes

Issue by vasu1124
Wednesday Sep 20, 2017 at 12:29 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/110

Vault 0.8.3 now supports Kubernetes integration via the TokenReview API.
https://www.vaultproject.io/docs/auth/kubernetes.html

Kubelet as Pod

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:45 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/43

Acceptance Criteria

Bootstrap/deploy the kubelet as pod if possible, so that it can be conveniently updated

(!) Beware if kubelet is run in a docker container: Volume attachment does not work/needs a lot of time (works with --containerized)

ℹ️ Migrated from Jira issue KUBE-159

Improve Prometheus Grafana Dashboards

Story

As operator I want useful Prometheus Grafana dashboards, so that I have best observability for my ops tasks.

Motivation

As discussed in the internal review, please take some time to finalize the dashboards, so that we have a good starting point/it is clear what purpose the dashboards serve.

Acceptance Criteria

Remove duplicate information (some duplicate information makes sense to be repeated in different dashboards, though)
Find better names
- All Nodes -> Cluster Overview (as-is)
- Kubernetes Cluster Monitoring -> Node Details (rename Cluster gauges into Node gauges, because that's what they are)
- Deployments -> Kubernetes Deployments (we saw the data was missing for the API server)
- Pod -> Kubernetes Pods
- Resource Requests -> Resource Requests (the metrics are maybe wrong; let's either fix or remove them)

In addition we should also:

Provide guide how to tweak a dashboard, create or remove them

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Optimise Cluster VPN

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:45 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/44

Story

As provider I want to have a fast, reliable and safe cluster VPN, so that I get all these benefits for this crucial component in the seed & shoot approach.

Motivation

Performance, simplicity, and security.

Acceptance Criteria

Investigate if there is a more elegant approach to SSH for the cluster VPN between the API server in the seed cluster and the workers in the shoot cluster
Secure access with security groups

Related Resources

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

ℹ️ Migrated from Jira issue KUBE-160

Broken Node Detection & Retirement

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:44 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/29

Story

As operator I want broken nodes to be removed automatically, so that nothing gets scheduled there anymore, the existing workload is drained/moved, the nodes dont count into my budget (in general/financially and also into the auto-scaler).

Motivation

Cost saving, business continuity, depending on the reason for the retirement also security.

Acceptance Criteria

Nodes are retired if:
- Kubelet fails to call home
- Kubelet indicates health issues
- Ephemeral disk is full
Retirement means (cordoning; actually implicit with next step), draining, and finally terminating the VM so that a new one can be recreated by the scaling group

Implementation Proposal

See e.g. KUBE-70.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

ℹ️ Migrated from Jira issue KUBE-236

GPU Support

Stories

As user I want to utilise GPUs if available, so that I can implement GPU-relevant applications like ML.

Motivation

See above.

Acceptance Criteria

Users can require nodes with GPUs
The GPU is actually available to (non-privileged) containers
CUDA usage for ML is possible

Open Questions

While CPUs are well compressible, how about GPUs (shader cores)?
How do we provision the drivers into the kops-managed nodes (AMI image is fixed, no means yet to apply direct changes through kops, so would daemon sets help)?

Resources

https://vishh.github.io/docs/user-guide/gpus

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

ImagePullPolicy Admission Controller

Story

As operator I want to control which container image I allow (black-/whitelist), so that I can ensure the safety of my Kubernetes cluster.

Motivation

Security.

Acceptance Criteria

Admission controller that can be switched on or off:
- Allow images from certain registries
- Allow images with certain signatures

Implementation Proposal

Implement an image policy webhook.

Resources

Design document: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/image-provenance.md
Configuration: https://kubernetes.io/docs/admin/admission-controllers/#imagepolicywebhook
OpenShift (as an example): https://docs.openshift.com/container-platform/3.3/admin_guide/image_policy.html

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Investigate Node Problem Detector

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:44 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/33

Please check out https://github.com/kubernetes/node-problem-detector. Is it worth anything? If so, how can it be integrated into the seed & shoot cluster approach and alerts be received. Please report results here.

ℹ️ Migrated from Jira issue KUBE-70

Robustness and Resilience

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:45 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/54

Stories

As provider I want to ship a robust and resilient system, so that I have less complaints.

Motivation

The less complaints we get, the more productive our users probably are and the more we can concentrate on adding valuable features.

Acceptance Criteria

Vague: The system behaves correctly, also under stress

ℹ️ Migrated from Jira issue KUBE-86

ETCD Backup & Restore

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:44 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/31

Story

As provider I want etcd backups for my shoot clusters, so that I can restore them should they be lost.
Later: As provider I want etcd backups to be restored automatically should the PV be definitely lost or the current etcd can't start from the current state.

Motivation

Business continuity.

Acceptance Criteria

Regular etcd backups are taken by the garden operator for all shoot clusters and are made known to the Gardener UI via TPRs/CRDs
- AWS
- Azure
- GCP
- OpenStack

Resources

Take a look at the CoreOS etcd operator if you can
Maybe Heptio Ark (disaster recovery) may be interesting to check out
See also https://github.com/coreos/etcd/blob/master/Documentation/op-guide/recovery.md#restoring-a-cluster

Open Questions

How frequent can we backup the data? Can we replicate the data on a lower level with a sidecar deployment or would we need an etcd cluster then? Is there something like point in time recovery for etcd?

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

ℹ️ Migrated from Jira issue KUBE-158

Shoot Cluster Control Plane Auto-Scaling

Story

As provider I want my shoot cluster control planes to scale automatically (within certain limits), so that my shoot clusters remain operational and respond fast.

Motivation

Response time behaviour. Avoiding issues like #79. Generally being able to run larger and very large clusters safely (also important for our own shoot'ed seed clusters).

Acceptance Criteria

XS (2 nodes), S (10 nodes), M (50 nodes), L (100 nodes) and XL clusters (250 nodes) with actual workload run safely without downtime for a week (hopefully longer, but that's the KPI and then we will free the resources).
Control plane shows good metrics-based utilisation (no large over-provisioning)
#Minimal control plane unavailability (that is actually the greatest problem/threat, that we, when scaling, cause downtimes, most critically on API server and ETCD)

Implementation Proposal

Use metrics-based horizontal pod autoscaling (HPA) for all shoot cluster control plane components that can be scaled horizontally like the API server (also brings HA as a side factor).
Use metrics-based vertical pod autoscaling (VPA) for all shoot cluster control plane components that cannot be scaled horizontally like the scheduler, controller manager and most importantly ETCD.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit tests are provided: Have you written automated unit tests?
Integration tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added/changed public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide about ops-relevant changes?
User documentation: Have you updated the READMEs/docs/how-tos about user-relevant changes?

Enable GPU Usage

Story

As user I want to run GPU-dependent applications, so that I can use libraries such as Theano or Tensorflow on GPUs.

Motivation

See above.

Acceptance Criteria

Users can require nodes with GPUs
The GPU is actually available to (non-privileged) containers
CUDA usage for ML is possible
Theano or Tensorflow can consume available node GPUs

Open Questions

While CPUs are well compressible, how about GPUs (shader cores)?
How do we provision the drivers into the OS images (image is fixed, so would daemon sets help)?

Resources

See https://vishh.github.io/docs/user-guide/gpus for details. See also https://blog.openshift.com/use-gpus-with-device-plugin-in-openshift-3-9.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?

Compliance and Performance Testing

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:44 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/26

Story

As provider I want to be sure our shoot clusters are Kubernetes compliant and show a sufficient performance, so that I dont deliver crap to our consumers.

Motivation

We are still in an early phase and need to further validate our seed/shoot approach, before we roll it out.

Acceptance Criteria

Manually run, then automate in Jenkins CI:
- E2E tests against a shoot cluster pass
- Performance and scalability of a shoot cluster is comparable to our kops-based seed clusters or a Tectonic cluster

Resources

Performance tests: https://github.com/kubernetes/perf-tests
E2E test infrastructure: https://github.com/kubernetes/test-infra
E2E testing: https://github.com/kubernetes/community/blob/master/contributors/devel/e2e-tests.md
According to Martin Vladev [email protected] you specify a specific label e.g.:
```
go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Feature:Performance\]"
```
Maybe Heptio Ark (disaster recovery) may be interesting to check out?

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

ℹ️ Migrated from Jira issue KUBE-184

Investigate ALB Ingress (In Contrast to NGINX Ingress)

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:45 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/45

Investigate https://coreos.com/blog/alb-ingress-controller-intro. Is it any good? Can it replace nginx ingress on AWS as the more natural and IaaS-specific solution? What impact on costs will it have? Would you advise to switch to it? Please report results here.

Background: Vasu suggested to investigate into the subject in https://git.infra.removed/kubernetes/landscape-template/issues/4. See comments there. Basically, we must weight optimised support for individual infrastructures against the time we need for research, implementation and maintenance of optimised IaaS-specific solutions while we have no issue with the general purpose solution we have right now.

ℹ️ Migrated from Jira issue KUBE-111

Pod Container/Local Disk Quota

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:45 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/48

Story

As operator I want to limit the impact of containers that consume all disk space, so that they dont put the entire node in danger or out of operation.

Motivation

At present any container that consumes its implicitly created root volume can break down the entire node as there is no quota in place (Dockers /var/lib/docker is hosted on the ephemeral root disk at present together with everything else).

Acceptance Criteria

Ideally constrict a container root volume to a certain quota, so that only the container itself is impacted if it runs out of quota
Alternatively, if thats not possible, discuss whether mounting /var/lib/docker separately would help (then however, all containers would be affected and would that improve anything at all, e.g. not sure how operable Docker will be if /var/lib/docker is put together on a single mount point and has no disk space left; will it respond to remote queries and will it be possible to remove a container and its implicit volumes?)
Alternatively, if also thats not possible or doesnt make sense, check whether a saturated node would automatically be dumped and recreated if it becomes unresponsive or find out what must be done to achieve this (if nodes are cattle, we still need to terminate the node so that it can be recreated by the autoscaling group)

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Developer documentation: Have you updated our documentation and/or the CHANGELOG.txt?

ℹ️ Migrated from Jira issue KUBE-31

Investigate EFK Stack

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:44 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/34

Please check out the EFK Stack (Elasticsearch + fluentd + Kibana; see various blogs, e.g. https://logz.io/blog/kubernetes-log-analysis). Is it what we should use? If so, how can it be integrated into the seed & shoot cluster approach. Please report results here.

ℹ️ Migrated from Jira issue KUBE-98

Cross Integration

Shoot Cluster Control Plane High-Availability

Story

As provider I want my shoot cluster control planes to be HA (within certain limits), so that my shoot clusters remain operational.

Motivation

Stability and scalability of our clusters (see also performance tests here https://jam4.sapjam.com/blogs/show/vKBhw5503MHfMQuId0b7t0).

Acceptance Criteria

Find out whether our assumption works out that we don't have to run control plane components in HA (i.e. single pods are OK):
- How fast is a critical control plane pod rescheduled when the cluster is updated/nodes are rolled out
- How fast is a critical control plane pod rescheduled when a node fails
Decide whether we can stick to our single pod philosophy arguing that Kubernetes is fast enough to reschedule a pod (depends on knowing fast enough that a pod failed) or we have to give up this philosophy and run it in multiple pods with anti-affinity

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Enable PodSecurityPolicy

Issue by mvladev
Thursday Aug 31, 2017 at 14:57 GMT
Originally opened as https://git.removed/kubernetes-attic/garden-operator/issues/97

As part of for #47.

Documentation can be found here. Steps should be:

Enable PodSecurityPolicy admission control for all Shoot clusters
Create PSP RBAC roles by default. See this reference for examples.