Giter Club home page Giter Club logo

helm-charts's Introduction

CockroachDB Helm Charts Repository

CockroachDB - the open source, cloud-native distributed SQL database.

Charts

Self-Cert-Signer Utility

Certificate Self-Signer utility is developed to allow the cockroachdb helm chart to be able to deploy secure cluster, without any dependency on the outside tool to create or sign its certificate.

You can enable/disable this utility by setting the tls.certs.selfSigner.enabled option as true/false.

Certificates and CA managed by cockroachdb

This option allow cockroachdb to generate the CA, node and client certificates and use those certificates to form a secure cockroachdb cluster. User can configure the duration and expiry window of each certificate types. Following are the options provided as default values in hours.

# Minimum Certificate duration for all the certificates, all certs duration will be validated against this.
tls.certs.selfSigner.minimumCertDuration: 624h
# Duration of CA certificates in hour
tls.certs.selfSigner.caCertDuration: 43800h
# Expiry window of CA certificates means a window before actual expiry in which CA certs should be rotated.
tls.certs.selfSigner.caCertExpiryWindow: 648h
# Duration of Client certificates in hour
tls.certs.selfSigner.clientCertDuration: 672h
# Expiry window of client certificates means a window before actual expiry in which client certs should be rotated.
tls.certs.selfSigner.clientCertExpiryWindow: 48h
# Duration of node certificates in hour
tls.certs.selfSigner.nodeCertDuration: 8760h
# Expiry window of node certificates means a window before actual expiry in which node certs should be rotated.
tls.certs.selfSigner.nodeCertExpiryWindow: 168h

These durations can be configured by user with following validations:

  1. CaCertExpiryWindow should be be greater than minimumCertDuration.
  2. Other certificateDuration - certificateExpiryWindow should be greater than minimumCertDuration.

This utility also handles certificate rotation when they come near expiry. You can enable or disable the certificate rotation with following setting:

 # If set, the cockroachdb cert selfSigner will rotate the certificates before expiry.
tls.certs.selfSigner.rotateCerts: true

Certificate managed by cockroachdb && CA provided by user

If user has a custom CA which they already use for certificate signing in their organisation, this utility provides a way for user to provide the custom CA. All the node and client certificates are signed by this user provided CA.

To provide the CA certificate to the crdb you have to create a tls certificate with ca.crt and ca.key and provide the secret as:

# If set, the user should provide the CA certificate to sign other certificates.
tls.certs.selfSigner.caProvided: true
# It holds the name of the secret with caCerts. If caProvided is set, this can not be empty.
tls.certs.selfSigner.caSecret: "custom-ca-secret"

You will still have options to configure the duration and expiry window of the certificates:

# Minimum Certificate duration for all the certificates, all certs duration will be validated against this.
tls.certs.selfSigner.minimumCertDuration: 624h
# Expiry window of CA certificates means a window before actual expiry in which CA certs should be rotated.
tls.certs.selfSigner.caCertExpiryWindow: 648h
# Duration of Client certificates in hour
tls.certs.selfSigner.clientCertDuration: 672h
# Expiry window of client certificates means a window before actual expiry in which client certs should be rotated.
tls.certs.selfSigner.clientCertExpiryWindow: 48h
# Duration of node certificates in hour
tls.certs.selfSigner.nodeCertDuration: 8760h
# Expiry window of node certificates means a window before actual expiry in which node certs should be rotated.
tls.certs.selfSigner.nodeCertExpiryWindow: 168h

This utility will only handle the rotation of client and node certificates, the rotation of custom CA should be done by user.

Installation of Helm Chart

When user install cockroachdb cluster with self-signer enabled, you will see the self-signer job.

kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
crdb-cockroachdb-self-signer-mmxp8   1/1     Running   0          15s

This job will generate CA, client and node certificates based on the user input mentioned in previous section. You can see the following secrets representing each certificates:

kubectl get secrets 
NAME                                       TYPE                                  DATA   AGE
crdb-cockroachdb-ca-secret                 Opaque                                2      3m10s
crdb-cockroachdb-client-secret             kubernetes.io/tls                     3      3m9s
crdb-cockroachdb-node-secret               kubernetes.io/tls                     3      3m10s
crdb-cockroachdb-self-signer-token-qcc72   kubernetes.io/service-account-token   3      3m29s
crdb-cockroachdb-token-jpbms               kubernetes.io/service-account-token   3      3m8s
default-token-gmhdf                        kubernetes.io/service-account-token   3      11m
sh.helm.release.v1.crdb.v1                 helm.sh/release.v1                    1      3m30s

After this, the cockroachdb init jobs starts and copies this certificate to each nodes:

prafull@EMPID18004:helm-charts$ kubectl get pods
NAME                          READY   STATUS     RESTARTS   AGE
crdb-cockroachdb-0            0/1     Init:0/1   0          18s
crdb-cockroachdb-1            0/1     Init:0/1   0          18s
crdb-cockroachdb-2            0/1     Init:0/1   0          18s
crdb-cockroachdb-init-fclbb   1/1     Running    0          16s

At last, the cockroach db cluster comes into running state with following output:

helm install crdb ./cockroachdb/
NAME: crdb
LAST DEPLOYED: Thu Aug 19 18:03:37 2021
NAMESPACE: crdb
STATUS: deployed
REVISION: 1
NOTES:
CockroachDB can be accessed via port 26257 at the
following DNS name from within your cluster:

crdb-cockroachdb-public.crdb.svc.cluster.local

Because CockroachDB supports the PostgreSQL wire protocol, you can connect to
the cluster using any available PostgreSQL client.

Note that because the cluster is running in secure mode, any client application
that you attempt to connect will either need to have a valid client certificate
or a valid username and password.

Finally, to open up the CockroachDB admin UI, you can port-forward from your
local machine into one of the instances in the cluster:

    kubectl port-forward crdb-cockroachdb-0 8080

Then you can access the admin UI at https://localhost:8080/ in your web browser.

For more information on using CockroachDB, please see the project's docs at:
https://www.cockroachlabs.com/docs/

Upgrade of cockroachdb Cluster

Kick off the upgrade process by changing the new Docker image, where $new_version is the CockroachDB version to which you are upgrading:

helm upgrade my-release cockroachdb/cockroachdb \
--set image.tag=$new_version \
--reuse-values --timeout=20m

Kubernetes will carry out a safe rolling upgrade of your CockroachDB nodes one-by-one. Monitor the cluster's pods until all have been successfully restarted:

Migration from Kubernetes Signed Certificates to Self-Signer Certificates

Kubernetes signed certificates is deprecated from the Kubernetes v1.22+ and user will not be able to use this methods for signing certificates.

User can move from old kubernetes signing certificates by performing following steps:

Run the upgrade command with upgrade strategy set as "onDelete" which only upgrades the pods when deleted by the user.

helm upgrade crdb-test cockroachdb --set statefulset.updateStrategy.type="OnDelete" --timeout=20m

While monitor all the pods, once the init-job is created, you can delete all the cockroachdb pods with following command:

kubectl delete pods -l app.kubernetes.io/component=cockroachdb

This will delete all the cockroachdb pods and restart the cluster with new certificates generated by the self-signer utility. The migration will have some downtime as all the pods are upgraded at the same time instead of rolling update.

Installation of Helm Chart with Cert Manager

User should have cert manager >=1.0 version installed.

Create a Issuer for signing self-signed CA certificate.

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: cockroachdb
spec:
  selfSigned: {}

Now you can enable the cert-manager from values.yaml as follows:

# Disable the self signing certificates for cockroachdb
tls.certs.selfSigner.enabled: false
# Enable the cert manager
tls.certs.certManager: true
# Provide the kind
tls.certs.certManagerIssuer.kind: Issuer
# Provide the Issuer you have created in previous step
tls.certs.certManagerIssuer.name: cockroachdb
% helm install crdb ./cockroachdb
NAME: crdb
LAST DEPLOYED: Fri Aug  4 14:42:11 2023
NAMESPACE: crdb
STATUS: deployed
REVISION: 1

helm-charts's People

Contributors

absterr08 avatar aliher1911 avatar arulajmani avatar asubiotto avatar azhng avatar cameronnunez avatar celiala avatar chrisseto avatar cockroach-teamcity avatar duskeagle avatar e-mbrown avatar eladdolev avatar himanshu-cockroach avatar jlinder avatar jonathanhartley avatar jorritsalverda avatar juanleon1 avatar junaid-ali avatar keith-mcclellan avatar kpatron-cockroachlabs avatar mgartner avatar michae2 avatar pbardea avatar pha91 avatar prafull01 avatar pseudomuto avatar rail avatar sergeyshaykhullin avatar udnay avatar zhouxing19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

helm-charts's Issues

Build a final build of cert-manager utility

Build the final build of cert manager utility and push it with tag which can act as default tag for the value.yaml.

This way helm installation should not fail with default value.yaml file.

How to rotate Kubernetes signed certificates for those generated as part of the chart

As part of the cockroachdb bootstrapping, CSRs are created for the cluster in order to enable TLS communications for the cluster itself. These certificates are signed by the Kubernetes CA, but they are only valid for 1 year, after which they will need to get replaced. What is the recommended approach / is there guidance on this through deployments installed via the Helm chart ?

Deprecate incrementing `version` in values.yaml for every commit?

Our contributing guidelines require that every commit increment the version field in values.yaml. This is easy to forget, and is also a frequent source of merge conflicts.

It should be possible to only require incrementing the version field for backwards-incompatible changes. For minor changes which are backwards-compatible, a minor version could be generated automatically post-commit.

TLS Cert migration testing

At a high level, we need to verify the following workflows:

1 migrate customer from k8s CA signed certs to self signed certs
2 migrate customer from custom certs to self signed certs
3 migrate customer from self signed certs to custom certs
4 migrate customer from self signed certs to cert manager certs

Most of these can be validated manually, but I would pick either #2 or #3 and make a permanent test to make sure we don't break this functionality in the future.

Upgrading helm chart causes error on cockroachdb-init job

I tried upgrading the Helm chart from 4.1.13 to 5.0.0 for updating Cockroach db to 20.2.0, but fails because it tries to update the Job resource, which is immutable.

There's not really a good way to handle this in helm right now unfortunately due to the immutability aspect of it. The current workaround I have is to delete the job before the upgrade, but perhaps it would be better to make Job resource optional through a value like init.enabled=false so that it can be set on upgrades (I don't believe the init job needs to run on subsequent upgrades, but correct me if I'm wrong).

The full error is below:

rollback.go:83: [debug] updating status for rolled back release for cockroachdb
Error: UPGRADE FAILED: release cockroachdb failed, and has been rolled back due to atomic being set: cannot patch "cockroachdb-init" with kind Job: Job.batch "cockroachdb-init" is invalid: spec.template: Invalid value: core.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"app.kubernetes.io/component":"init", "app.kubernetes.io/instance":"cockroachdb", "app.kubernetes.io/name":"cockroachdb", "controller-uid":"d8143205-92e2-44e2-b47a-66b7da13f811", "job-name":"cockroachdb-init"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:core.PodSpec{Volumes:[]core.Volume{core.Volume{Name:"client-certs", VolumeSource:core.VolumeSource{HostPath:(*core.HostPathVolumeSource)(nil), EmptyDir:(*core.EmptyDirVolumeSource)(0xc00b786d20), GCEPersistentDisk:(*core.GCEPersistentDiskVolumeSource)(nil), AWSElasticBlockStore:(*core.AWSElasticBlockStoreVolumeSource)(nil), GitRepo:(*core.GitRepoVolumeSource)(nil), Secret:(*core.SecretVolumeSource)(nil), NFS:(*core.NFSVolumeSource)(nil), ISCSI:(*core.ISCSIVolumeSource)(nil), Glusterfs:(*core.GlusterfsVolumeSource)(nil), PersistentVolumeClaim:(*core.PersistentVolumeClaimVolumeSource)(nil), RBD:(*core.RBDVolumeSource)(nil), Quobyte:(*core.QuobyteVolumeSource)(nil), FlexVolume:(*core.FlexVolumeSource)(nil), Cinder:(*core.CinderVolumeSource)(nil), CephFS:(*core.CephFSVolumeSource)(nil), Flocker:(*core.FlockerVolumeSource)(nil), DownwardAPI:(*core.DownwardAPIVolumeSource)(nil), FC:(*core.FCVolumeSource)(nil), AzureFile:(*core.AzureFileVolumeSource)(nil), ConfigMap:(*core.ConfigMapVolumeSource)(nil), VsphereVolume:(*core.VsphereVirtualDiskVolumeSource)(nil), AzureDisk:(*core.AzureDiskVolumeSource)(nil), PhotonPersistentDisk:(*core.PhotonPersistentDiskVolumeSource)(nil), Projected:(*core.ProjectedVolumeSource)(nil), PortworxVolume:(*core.PortworxVolumeSource)(nil), ScaleIO:(*core.ScaleIOVolumeSource)(nil), StorageOS:(*core.StorageOSVolumeSource)(nil), CSI:(*core.CSIVolumeSource)(nil)}}}, InitContainers:[]core.Container{core.Container{Name:"init-certs", Image:"cockroachdb/cockroach-k8s-request-cert:0.4", Command:[]string{"/bin/ash", "-ecx", "/request-cert -namespace=${POD_NAMESPACE} -certs-dir=/cockroach-certs/ -symlink-ca-from=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt -type=client -user=root"}, Args:[]string(nil), WorkingDir:"", Ports:[]core.ContainerPort(nil), EnvFrom:[]core.EnvFromSource(nil), Env:[]core.EnvVar{core.EnvVar{Name:"POD_NAMESPACE", Value:"", ValueFrom:(*core.EnvVarSource)(0xc00b786e00)}}, Resources:core.ResourceRequirements{Limits:core.ResourceList(nil), Requests:core.ResourceList(nil)}, VolumeMounts:[]core.VolumeMount{core.VolumeMount{Name:"client-certs", ReadOnly:false, MountPath:"/cockroach-certs/", SubPath:"", MountPropagation:(*core.MountPropagationMode)(nil), SubPathExpr:""}}, VolumeDevices:[]core.VolumeDevice(nil), LivenessProbe:(*core.Probe)(nil), ReadinessProbe:(*core.Probe)(nil), StartupProbe:(*core.Probe)(nil), Lifecycle:(*core.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"IfNotPresent", SecurityContext:(*core.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, Containers:[]core.Container{core.Container{Name:"cluster-init", Image:"cockroachdb/cockroach:v20.2.0", Command:[]string{"/bin/bash", "-c", "while true; do initOUT=$(set -x; /cockroach/cockroach init --certs-dir=/cockroach-certs/ --cluster-name=noteable --host=cockroachdb-0.cockroachdb:26257 2>&1); initRC=\"$?\"; echo $initOUT; [[ \"$initRC\" == \"0\" ]] && exit 0; [[ \"$initOUT\" == *\"cluster has already been initialized\"* ]] && exit 0; sleep 5; done"}, Args:[]string(nil), WorkingDir:"", Ports:[]core.ContainerPort(nil), EnvFrom:[]core.EnvFromSource(nil), Env:[]core.EnvVar(nil), Resources:core.ResourceRequirements{Limits:core.ResourceList(nil), Requests:core.ResourceList(nil)}, VolumeMounts:[]core.VolumeMount{core.VolumeMount{Name:"client-certs", ReadOnly:false, MountPath:"/cockroach-certs/", SubPath:"", MountPropagation:(*core.MountPropagationMode)(nil), SubPathExpr:""}}, VolumeDevices:[]core.VolumeDevice(nil), LivenessProbe:(*core.Probe)(nil), ReadinessProbe:(*core.Probe)(nil), StartupProbe:(*core.Probe)(nil), Lifecycle:(*core.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"Always", SecurityContext:(*core.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]core.EphemeralContainer(nil), RestartPolicy:"OnFailure", TerminationGracePeriodSeconds:(*int64)(0xc00ea0a920), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"cockroachdb", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", SecurityContext:(*core.PodSecurityContext)(0xc008a09b80), ImagePullSecrets:[]core.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*core.Affinity)(nil), SchedulerName:"default-scheduler", Tolerations:[]core.Toleration(nil), HostAliases:[]core.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), PreemptionPolicy:(*core.PreemptionPolicy)(nil), DNSConfig:(*core.PodDNSConfig)(nil), ReadinessGates:[]core.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), Overhead:core.ResourceList(nil), EnableServiceLinks:(*bool)(nil), TopologySpreadConstraints:[]core.TopologySpreadConstraint(nil)}}: field is immutable
helm.go:81: [debug] cannot patch "cockroachdb-init" with kind Job: Job.batch "cockroachdb-init" is invalid: spec.template: Invalid value: core.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"app.kubernetes.io/component":"init", "app.kubernetes.io/instance":"cockroachdb", "app.kubernetes.io/name":"cockroachdb", "controller-uid":"d8143205-92e2-44e2-b47a-66b7da13f811", "job-name":"cockroachdb-init"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:core.PodSpec{Volumes:[]core.Volume{core.Volume{Name:"client-certs", VolumeSource:core.VolumeSource{HostPath:(*core.HostPathVolumeSource)(nil), EmptyDir:(*core.EmptyDirVolumeSource)(0xc00b786d20), GCEPersistentDisk:(*core.GCEPersistentDiskVolumeSource)(nil), AWSElasticBlockStore:(*core.AWSElasticBlockStoreVolumeSource)(nil), GitRepo:(*core.GitRepoVolumeSource)(nil), Secret:(*core.SecretVolumeSource)(nil), NFS:(*core.NFSVolumeSource)(nil), ISCSI:(*core.ISCSIVolumeSource)(nil), Glusterfs:(*core.GlusterfsVolumeSource)(nil), PersistentVolumeClaim:(*core.PersistentVolumeClaimVolumeSource)(nil), RBD:(*core.RBDVolumeSource)(nil), Quobyte:(*core.QuobyteVolumeSource)(nil), FlexVolume:(*core.FlexVolumeSource)(nil), Cinder:(*core.CinderVolumeSource)(nil), CephFS:(*core.CephFSVolumeSource)(nil), Flocker:(*core.FlockerVolumeSource)(nil), DownwardAPI:(*core.DownwardAPIVolumeSource)(nil), FC:(*core.FCVolumeSource)(nil), AzureFile:(*core.AzureFileVolumeSource)(nil), ConfigMap:(*core.ConfigMapVolumeSource)(nil), VsphereVolume:(*core.VsphereVirtualDiskVolumeSource)(nil), AzureDisk:(*core.AzureDiskVolumeSource)(nil), PhotonPersistentDisk:(*core.PhotonPersistentDiskVolumeSource)(nil), Projected:(*core.ProjectedVolumeSource)(nil), PortworxVolume:(*core.PortworxVolumeSource)(nil), ScaleIO:(*core.ScaleIOVolumeSource)(nil), StorageOS:(*core.StorageOSVolumeSource)(nil), CSI:(*core.CSIVolumeSource)(nil)}}}, InitContainers:[]core.Container{core.Container{Name:"init-certs", Image:"cockroachdb/cockroach-k8s-request-cert:0.4", Command:[]string{"/bin/ash", "-ecx", "/request-cert -namespace=${POD_NAMESPACE} -certs-dir=/cockroach-certs/ -symlink-ca-from=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt -type=client -user=root"}, Args:[]string(nil), WorkingDir:"", Ports:[]core.ContainerPort(nil), EnvFrom:[]core.EnvFromSource(nil), Env:[]core.EnvVar{core.EnvVar{Name:"POD_NAMESPACE", Value:"", ValueFrom:(*core.EnvVarSource)(0xc00b786e00)}}, Resources:core.ResourceRequirements{Limits:core.ResourceList(nil), Requests:core.ResourceList(nil)}, VolumeMounts:[]core.VolumeMount{core.VolumeMount{Name:"client-certs", ReadOnly:false, MountPath:"/cockroach-certs/", SubPath:"", MountPropagation:(*core.MountPropagationMode)(nil), SubPathExpr:""}}, VolumeDevices:[]core.VolumeDevice(nil), LivenessProbe:(*core.Probe)(nil), ReadinessProbe:(*core.Probe)(nil), StartupProbe:(*core.Probe)(nil), Lifecycle:(*core.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"IfNotPresent", SecurityContext:(*core.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, Containers:[]core.Container{core.Container{Name:"cluster-init", Image:"cockroachdb/cockroach:v20.2.0", Command:[]string{"/bin/bash", "-c", "while true; do initOUT=$(set -x; /cockroach/cockroach init --certs-dir=/cockroach-certs/ --cluster-name=noteable --host=cockroachdb-0.cockroachdb:26257 2>&1); initRC=\"$?\"; echo $initOUT; [[ \"$initRC\" == \"0\" ]] && exit 0; [[ \"$initOUT\" == *\"cluster has already been initialized\"* ]] && exit 0; sleep 5; done"}, Args:[]string(nil), WorkingDir:"", Ports:[]core.ContainerPort(nil), EnvFrom:[]core.EnvFromSource(nil), Env:[]core.EnvVar(nil), Resources:core.ResourceRequirements{Limits:core.ResourceList(nil), Requests:core.ResourceList(nil)}, VolumeMounts:[]core.VolumeMount{core.VolumeMount{Name:"client-certs", ReadOnly:false, MountPath:"/cockroach-certs/", SubPath:"", MountPropagation:(*core.MountPropagationMode)(nil), SubPathExpr:""}}, VolumeDevices:[]core.VolumeDevice(nil), LivenessProbe:(*core.Probe)(nil), ReadinessProbe:(*core.Probe)(nil), StartupProbe:(*core.Probe)(nil), Lifecycle:(*core.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"Always", SecurityContext:(*core.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]core.EphemeralContainer(nil), RestartPolicy:"OnFailure", TerminationGracePeriodSeconds:(*int64)(0xc00ea0a920), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"cockroachdb", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", SecurityContext:(*core.PodSecurityContext)(0xc008a09b80), ImagePullSecrets:[]core.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*core.Affinity)(nil), SchedulerName:"default-scheduler", Tolerations:[]core.Toleration(nil), HostAliases:[]core.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), PreemptionPolicy:(*core.PreemptionPolicy)(nil), DNSConfig:(*core.PodDNSConfig)(nil), ReadinessGates:[]core.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), Overhead:core.ResourceList(nil), EnableServiceLinks:(*bool)(nil), TopologySpreadConstraints:[]core.TopologySpreadConstraint(nil)}}: field is immutable

conf.single-node: true with tls.enabled: true fails due to host not matching certificate

When using values

conf:
  single-node: true

tls:
  enabled: true

The node starts up, performs the certificate request and after approving the csr starts up. However it then fails with:

Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for localhost, cockroachdb-0.cockroachdb.default.svc.cluster.local, cockroachdb-0.cockroachdb, cockroachdb-public, cockroachdb-public.default.svc.cluster.local, not cockroachdb-0

It seems that the following flag should be passed on startup, but isn't for conf.single-node: true:

--advertise-host=$(hostname).${STATEFULSET_FQDN}

See https://github.com/cockroachdb/helm-charts/blob/master/cockroachdb/templates/statefulset.yaml#L195

Running cockroachdb chart 5.0.6.

certGenerator: Helm chart changes for Certificate management

As part of the Certificate Management feature, we need to give helm chart user the ability
to opt-in for the new certificate management process. This ticket covers Helm chart side changes required to achieve it.

Changes required:

  • Add configurations to accept CA, Node, and Client Certificates
  • Add validations against the helm inputs
  • Add pre-install helm hook to trigger cert creation at install time
  • Add cron jobs for cert rotation
  • Add ServiceAccount, Role, and Rolebinding as required
  • Add init-container related changes for database
  • Add post-install changes

Documentation for cert rotation feature in the helm-chart

@taroface we're starting to approach feature complete for the cert rotation feature chart. We need to document how to use it and how it works. There are some dev docs which we'll be improving in the next 2 weeks. Wanted to get this on your radar now so we can have some of your time for this new feature.

Prometheus can't scrape the endpoint because of the ever changing pod IP

When using secure installations, and your own certificates, Prometheus will fail the scrapping because the pod IP was not in the original CSR.

The error on Prometheus looks like this:

Get https://10.6.26.160:8080/_status/vars: x509: certificate is valid for 127.0.0.1, not 10.6.26.160

I propose #36 to fix the problem.

Job is not run on first install when helm wait

Hi cockroachDB.

I try to install a new cluster.
I'm using --wait as a helm options.

My pod are all stuck waiting for an init job.

I think it come from the job being a post install helm hook.
As the pod are not ready (waiting for the job), the install will never be done and thus the job never run.

helm.sh/hook: post-install,post-upgrade

According to Helm documentation
https://helm.sh/docs/topics/charts_hooks/

The library loads the resulting resources into Kubernetes. Note that if the --wait flag is set, the library will wait until all resources are in a ready state and will not run the post-install hook until they are ready.

Add flag to allow usage of cert-manager.io/v1 apiVersion

We are in the special situation that for a transition period we need to run K8s 1.15.11 with cert-manager 1.1 which requires the usage of cert-manager's legacy CRD set, which is using cert-manager.io/v1 apiVersion, but lacks the features to accept older apiVersions and automatically convert them (only supported from K8s 1.16 upwards).
For that matter, we'd need a flag added to the helm chart which allows to use cert-manager.io/v1 Certificates already.

I will open a PR to propose that change as an optional switch to allow the usage of the newer API Version

Chart doesn't set requests/limits for "release-name"-init Job's own init container

Just as the title says. The root of the problem appears to be that the job.init.yaml template pulls in init.resources config from values for the containers config, but not for the initContainers.

containers stanza:

      containers:
        - name: cluster-init
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy | quote }}
[...]
        {{- with .Values.init.resources }}
          resources: {{- toYaml . | nindent 12 }}
        {{- end }}

vs. initContainers stanza with no mention of Values.init.resources:

      initContainers:
        # The init-certs container sends a CSR (certificate signing request) to
        # the Kubernetes cluster.
        # You can see pending requests using:
        #   kubectl get csr
        # CSRs can be approved using:
        #   kubectl certificate approve <csr-name>
        #
        # In addition to the Node certificate and key, the init-certs entrypoint
        # will symlink the cluster CA to the certs directory.
        - name: init-certs
          image: "{{ .Values.tls.init.image.repository }}:{{ .Values.tls.init.image.tag }}"
          imagePullPolicy: {{ .Values.tls.init.image.pullPolicy | quote }}
          command:
            - /bin/ash
            - -ecx
            - >-
              /request-cert
              -namespace=${POD_NAMESPACE}
              -certs-dir=/cockroach-certs/
              -symlink-ca-from=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              -type=client
              -user=root
          env:
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          volumeMounts:
            - name: client-certs
              mountPath: /cockroach-certs/

The stated goal of making these configurable values is to allow folks who are using Resource Quotas to avoid errors with the init job, but Resource Quotas apply to all containers in a pod, including init containers.

In order for this to work, we have to apply the logic to the initContainers as well. If we don't, users see this and are left scratching their heads as to why the chart isn't doing what the docs say it should:

$ kubectl describe job.batch/crdb-cockroachdb-init
Name:           crdb-cockroachdb-init
Namespace:      auth-login-dev
Selector:       controller-uid=5bc8024e-8f65-4a33-a04b-f4a36ca49b34
Labels:         app.kubernetes.io/component=init
                app.kubernetes.io/instance=crdb
                app.kubernetes.io/managed-by=Helm
                app.kubernetes.io/name=cockroachdb
                helm.sh/chart=cockroachdb-4.1.11
Annotations:    meta.helm.sh/release-name: crdb
                meta.helm.sh/release-namespace: auth-login-dev
Parallelism:    1
Completions:    1
Pods Statuses:  0 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/component=init
                    app.kubernetes.io/instance=crdb
                    app.kubernetes.io/name=cockroachdb
                    controller-uid=5bc8024e-8f65-4a33-a04b-f4a36ca49b34
                    job-name=crdb-cockroachdb-init
  Service Account:  crdb-cockroachdb
  Init Containers:
   init-certs:
    Image:      cockroachdb/cockroach-k8s-request-cert:0.4
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/ash
      -ecx
      /request-cert -namespace=${POD_NAMESPACE} -certs-dir=/cockroach-certs/ -symlink-ca-from=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt -type=client -user=root
    Environment:
      POD_NAMESPACE:   (v1:metadata.namespace)
    Mounts:
      /cockroach-certs/ from client-certs (rw)
  Containers:
   cluster-init:
    Image:      cockroachdb/cockroach:v20.1.8
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      -c
      while true; do initOUT=$(set -x; /cockroach/cockroach init --certs-dir=/cockroach-certs/ --host=crdb-cockroachdb-0.keycloak-crdb-cockroachdb:26257 2>&1); initRC="$?"; echo $initOUT; [[ "$initRC" == "0" ]] && exit 0; [[ "$initOUT" == *"cluster has already been initialized"* ]] && exit 0; sleep 5; done
    Limits:
      cpu:     10m
      memory:  128Mi
    Requests:
      cpu:        10m
      memory:     128Mi
    Environment:  <none>
    Mounts:
      /cockroach-certs/ from client-certs (rw)
  Volumes:
   client-certs:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
Events:
  Type     Reason        Age   From            Message
  ----     ------        ----  ----            -------
  Warning  FailedCreate  28s   job-controller  Error creating: pods "crdb-cockroachdb-init-vw46d" is forbidden: failed quota: compute-resources: must specify limits.memory,requests.cpu,requests.memory

Cluster init keeps failing with "no such host"

Hello,

I am trying to setup a basic 3 node cluster with minimal changes to helm values.
However, all nodes keeps failing with errors like these:

++ hostname
3/20/2021 1:10:18 PM + exec /cockroach/cockroach start --join=k-preprod-cockroachdb-0.k-preprod-cockroachdb.k-db.svc.cluster.local:26257,k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257,k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257 --advertise-host=k-preprod-cockroachdb-0.k-preprod-cockroachdb.k-db.svc.cluster.local --cluster-name=k-preprod --logtostderr=INFO --certs-dir=/cockroach/cockroach-certs/ --http-port=8080 --port=26257 --cache=25% --max-sql-memory=25% --locality=country=us,region=west,state=washington,city=seattle
3/20/2021 1:10:19 PM I210320 20:10:19.908769 1 util/log/flags.go:116  stderr capture started
3/20/2021 1:10:19 PM I210320 20:10:19.921024 1 cli/start.go:1168 โ‹ฎ โ€นCockroachDB CCL v20.2.6 (x86_64-unknown-linux-gnu, built 2021/03/15 16:04:08, go1.13.14)โ€บ
3/20/2021 1:10:19 PM I210320 20:10:19.987607 1 util/cgroups/cgroups.go:460 โ‹ฎ running in a container; setting GOMAXPROCS to 1
3/20/2021 1:10:20 PM I210320 20:10:20.007727 1 server/config.go:428 โ‹ฎ system total memory: โ€น256 MiBโ€บ
3/20/2021 1:10:20 PM I210320 20:10:20.008056 1 server/config.go:430 โ‹ฎ server configuration:
3/20/2021 1:10:20 PM โ€นmax offset             500000000โ€บ
3/20/2021 1:10:20 PM โ€นcache size             64 MiBโ€บ
3/20/2021 1:10:20 PM โ€นSQL memory pool size   64 MiBโ€บ
3/20/2021 1:10:20 PM โ€นscan interval          10m0sโ€บ
3/20/2021 1:10:20 PM โ€นscan min idle time     10msโ€บ
3/20/2021 1:10:20 PM โ€นscan max idle time     1sโ€บ
3/20/2021 1:10:20 PM โ€นevent log enabled      trueโ€บ
3/20/2021 1:10:20 PM I210320 20:10:20.008395 1 cli/start.go:965 โ‹ฎ using local environment variables: โ€นCOCKROACH_CHANNEL=kubernetes-helmโ€บ
3/20/2021 1:10:20 PM I210320 20:10:20.008515 1 cli/start.go:972 โ‹ฎ process identity: โ€นuid 0 euid 0 gid 0 egid 0โ€บ
3/20/2021 1:10:20 PM I210320 20:10:20.079634 1 cli/start.go:511 โ‹ฎ GEOS loaded from directory โ€น/usr/local/lib/cockroachโ€บ
3/20/2021 1:10:20 PM I210320 20:10:20.080034 1 cli/start.go:516 โ‹ฎ starting cockroach node
3/20/2021 1:10:20 PM I210320 20:10:20.081760 37 rpc/tls.go:270 โ‹ฎ [n?] server certificate addresses: โ€นIP=127.0.0.1; DNS=localhost,k-preprod-cockroachdb-0.k-preprod-cockroachdb.k-db.svc.cluster.local,k-preprod-cockroachdb-0.k-preprod-cockroachdb,k-preprod-cockroachdb-public,k-preprod-cockroachdb-public.k-db.svc.cluster.local; CN=nodeโ€บ
3/20/2021 1:10:20 PM I210320 20:10:20.082084 37 rpc/tls.go:319 โ‹ฎ [n?] web UI certificate addresses: โ€นIP=127.0.0.1; DNS=localhost,k-preprod-cockroachdb-0.k-preprod-cockroachdb.k-db.svc.cluster.local,k-preprod-cockroachdb-0.k-preprod-cockroachdb,k-preprod-cockroachdb-public,k-preprod-cockroachdb-public.k-db.svc.cluster.local; CN=nodeโ€บ
3/20/2021 1:10:20 PM I210320 20:10:20.105411 37 vendor/github.com/cockroachdb/pebble/version_set.go:142 โ‹ฎ [n?] [JOB 1] MANIFEST created 000001
3/20/2021 1:10:20 PM I210320 20:10:20.109789 37 vendor/github.com/cockroachdb/pebble/open.go:295 โ‹ฎ [n?] [JOB 1] WAL created 000002
3/20/2021 1:10:20 PM I210320 20:10:20.179600 48 vendor/github.com/cockroachdb/pebble/table_stats.go:118 โ‹ฎ [n?] [JOB 2] all initial table stats loaded
3/20/2021 1:10:20 PM I210320 20:10:20.384074 37 server/server.go:790 โ‹ฎ [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
3/20/2021 1:10:20 PM I210320 20:10:20.402901 37 vendor/github.com/cockroachdb/pebble/compaction.go:1561 โ‹ฎ [n?] [JOB 1] flushing: sstable created 000004
3/20/2021 1:10:20 PM I210320 20:10:20.411344 37 vendor/github.com/cockroachdb/pebble/open.go:295 โ‹ฎ [n?] [JOB 1] WAL created 000005
3/20/2021 1:10:20 PM I210320 20:10:20.424900 37 vendor/github.com/cockroachdb/pebble/version_set.go:442 โ‹ฎ [n?] [JOB 1] MANIFEST created 000006
3/20/2021 1:10:20 PM I210320 20:10:20.484591 37 vendor/github.com/cockroachdb/pebble/compaction.go:2300 โ‹ฎ [n?] [JOB 1] WAL deleted 000002
3/20/2021 1:10:20 PM I210320 20:10:20.485025 37 vendor/github.com/cockroachdb/pebble/compaction.go:2307 โ‹ฎ [n?] [JOB 1] MANIFEST deleted 000001
3/20/2021 1:10:20 PM I210320 20:10:20.485303 37 server/config.go:619 โ‹ฎ [n?] 1 storage engineโ€นโ€บ initialized
3/20/2021 1:10:20 PM I210320 20:10:20.485482 37 server/config.go:622 โ‹ฎ [n?] โ€นPebble cache size: 64 MiBโ€บ
3/20/2021 1:10:20 PM I210320 20:10:20.485592 37 server/config.go:622 โ‹ฎ [n?] โ€นstore 0: RocksDB, max size 0 B, max open file limit 1043576โ€บ
3/20/2021 1:10:20 PM I210320 20:10:20.486129 85 vendor/github.com/cockroachdb/pebble/table_stats.go:118 โ‹ฎ [n?] [JOB 2] all initial table stats loaded
3/20/2021 1:10:20 PM I210320 20:10:20.486348 86 vendor/github.com/cockroachdb/pebble/compaction.go:1371 โ‹ฎ [n?] [JOB 3] compacting L0 [000004] (1.0 K) + L6 [] (0 B)
3/20/2021 1:10:20 PM I210320 20:10:20.491032 86 vendor/github.com/cockroachdb/pebble/compaction.go:1410 โ‹ฎ [n?] [JOB 3] compacted L0 [000004] (1.0 K) + L6 [] (0 B) -> L6 [000004] (1.0 K), in 0.0s, output rate 120 M/s
3/20/2021 1:10:20 PM I210320 20:10:20.492244 37 util/log/log.go:50 โ‹ฎ initial startup completed
3/20/2021 1:10:20 PM Node will now attempt to join a running cluster, or wait for `cockroach init`.
3/20/2021 1:10:20 PM Client connections will be accepted after this completes successfully.
3/20/2021 1:10:20 PM Check the log file(s) for progress.
3/20/2021 1:10:20 PM I210320 20:10:20.492517 37 server/init.go:208 โ‹ฎ [n?] no stores bootstrapped
3/20/2021 1:10:20 PM I210320 20:10:20.492657 37 server/init.go:209 โ‹ฎ [n?] awaiting `cockroach init` or join with an already initialized node
3/20/2021 1:10:20 PM W210320 20:10:20.591246 98 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 โ‹ฎ โ€นgrpc: addrConn.createTransport failed to connect to {k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257  <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host". Reconnecting...โ€บ
3/20/2021 1:10:20 PM W210320 20:10:20.591708 96 server/init.go:436 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:20 PM W210320 20:10:20.599864 109 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 โ‹ฎ โ€นgrpc: addrConn.createTransport failed to connect to {k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257  <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host". Reconnecting...โ€บ
3/20/2021 1:10:20 PM W210320 20:10:20.600290 96 server/init.go:436 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:21 PM W210320 20:10:21.611737 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:22 PM W210320 20:10:22.644896 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:23 PM W210320 20:10:23.652398 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:24 PM W210320 20:10:24.686188 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:25 PM W210320 20:10:25.609230 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:26 PM W210320 20:10:26.607472 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:27 PM W210320 20:10:27.612530 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:28 PM W210320 20:10:28.609428 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:29 PM W210320 20:10:29.608309 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:30 PM W210320 20:10:30.610507 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:31 PM W210320 20:10:31.612021 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:32 PM W210320 20:10:32.609341 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:33 PM W210320 20:10:33.608949 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:34 PM W210320 20:10:34.608625 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:35 PM W210320 20:10:35.607813 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:36 PM W210320 20:10:36.608642 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:37 PM W210320 20:10:37.614025 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:38 PM W210320 20:10:38.620759 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:39 PM W210320 20:10:39.772168 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:40 PM W210320 20:10:40.634994 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:42 PM W210320 20:10:42.471977 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:42 PM W210320 20:10:42.872882 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:43 PM W210320 20:10:43.611940 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:44 PM W210320 20:10:44.607079 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:45 PM W210320 20:10:45.702508 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:46 PM W210320 20:10:46.609487 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:47 PM W210320 20:10:47.608628 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:48 PM W210320 20:10:48.607700 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:49 PM W210320 20:10:49.612551 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:50 PM W210320 20:10:50.492364 248 cli/start.go:497 โ‹ฎ The server appears to be unable to contact the other nodes in the cluster. Please try:
3/20/2021 1:10:50 PM 
3/20/2021 1:10:50 PM - starting the other nodes, if you haven't already;
3/20/2021 1:10:50 PM - double-checking that the '--join' and '--listen'/'--advertise' flags are set up correctly;
3/20/2021 1:10:50 PM - running the 'cockroach init' command if you are trying to initialize a new cluster.
3/20/2021 1:10:50 PM 
3/20/2021 1:10:50 PM If problems persist, please see โ€นhttps://www.cockroachlabs.com/docs/v20.2/cluster-setup-troubleshooting.htmlโ€บ.
3/20/2021 1:10:50 PM W210320 20:10:50.636496 250 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 โ‹ฎ โ€นgrpc: addrConn.createTransport failed to connect to {k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257  <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host". Reconnecting...โ€บ
3/20/2021 1:10:50 PM W210320 20:10:50.636674 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-2.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ
3/20/2021 1:10:51 PM W210320 20:10:51.608271 254 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 โ‹ฎ โ€นgrpc: addrConn.createTransport failed to connect to {k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257  <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host". Reconnecting...โ€บ
3/20/2021 1:10:51 PM W210320 20:10:51.608411 96 server/init.go:474 โ‹ฎ [n?] outgoing join rpc to โ€นk-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local:26257โ€บ unsuccessful: โ€นrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup k-preprod-cockroachdb-1.k-preprod-cockroachdb.k-db.svc.cluster.local: no such host"โ€บ

Here is my helm config:

image:
  repository: cockroachdb/cockroach
  tag: v20.2.6
  pullPolicy: IfNotPresent
  credentials:
    {}
    # registry: docker.io
    # username: john_doe
    # password: changeme

# Additional labels to apply to all Kubernetes resources created by this chart.
labels:
  {}
  # app.kubernetes.io/part-of: my-app

# Cluster's default DNS domain.
# You should overwrite it if you're using a different one,
# otherwise CockroachDB nodes discovery won't work.
clusterDomain: cluster.local

conf:
  # An ordered list of CockroachDB node attributes.
  # Attributes are arbitrary strings specifying machine capabilities.
  # Machine capabilities might include specialized hardware or number of cores
  # (e.g. "gpu", "x16c").
  attrs:
    []
    # - x16c
    # - gpu

  # Total size in bytes for caches, shared evenly if there are multiple
  # storage devices. Size suffixes are supported (e.g. `1GB` and `1GiB`).
  # A percentage of physical memory can also be specified (e.g. `.25`).
  cache: 25%

  # Sets a name to verify the identity of a cluster.
  # The value must match between all nodes specified via `conf.join`.
  # This can be used as an additional verification when either the node or
  # cluster, or both, have not yet been initialized and do not yet know their
  # cluster ID.
  # To introduce a cluster name into an already-initialized cluster, pair this
  # option with `conf.disable-cluster-name-verification: yes`.
  cluster-name: "k-preprod"

  # Tell the server to ignore `conf.cluster-name` mismatches.
  # This is meant for use when opting an existing cluster into starting to use
  # cluster name verification, or when changing the cluster name.
  # The cluster should be restarted once with `conf.cluster-name` and
  # `conf.disable-cluster-name-verification: yes` combined, and once all nodes
  # have been updated to know the new cluster name, the cluster can be restarted
  # again with `conf.disable-cluster-name-verification: no`.
  # This option has no effect if `conf.cluster-name` is not specified.
  disable-cluster-name-verification: false

  # The addresses for connecting a CockroachDB nodes to an existing cluster.
  # If you are deploying a second CockroachDB instance that should join a first
  # one, use the below list to join to the existing instance.
  # Each item in the array should be a FQDN (and port if needed) resolvable by
  # new Pods.
  join: []

  # Logs at or above this threshold to STDERR.
  logtostderr: INFO

  # Maximum storage capacity available to store temporary disk-based data for
  # SQL queries that exceed the memory budget (e.g. join, sorts, etc are
  # sometimes able to spill intermediate results to disk).
  # Accepts numbers interpreted as bytes, size suffixes (e.g. `32GB` and
  # `32GiB`) or a percentage of disk size (e.g. `10%`).
  # The location of the temporary files is within the first store dir.
  # If expressed as a percentage, `max-disk-temp-storage` is interpreted
  # relative to the size of the storage device on which the first store is
  # placed. The temp space usage is never counted towards any store usage
  # (although it does share the device with the first store) so, when
  # configuring this, make sure that the size of this temp storage plus the size
  # of the first store don't exceed the capacity of the storage device.
  # If the first store is an in-memory one (i.e. `type=mem`), then this
  # temporary "disk" data is also kept in-memory.
  # A percentage value is interpreted as a percentage of the available internal
  # memory.
  # max-disk-temp-storage: 0GB

  # Maximum allowed clock offset for the cluster. If observed clock offsets
  # exceed this limit, servers will crash to minimize the likelihood of
  # reading inconsistent data. Increasing this value will increase the time
  # to recovery of failures as well as the frequency of uncertainty-based
  # read restarts.
  # Note, that this value must be the same on all nodes in the cluster.
  # In order to change it, all nodes in the cluster must be stopped
  # simultaneously and restarted with the new value.
  # max-offset: 500ms

  # Maximum memory capacity available to store temporary data for SQL clients,
  # including prepared queries and intermediate data rows during query
  # execution. Accepts numbers interpreted as bytes, size suffixes
  # (e.g. `1GB` and `1GiB`) or a percentage of physical memory (e.g. `.25`).
  max-sql-memory: 25%

  # An ordered, comma-separated list of key-value pairs that describe the
  # topography of the machine. Topography might include country, datacenter
  # or rack designations. Data is automatically replicated to maximize
  # diversities of each tier. The order of tiers is used to determine
  # the priority of the diversity, so the more inclusive localities like
  # country should come before less inclusive localities like datacenter.
  # The tiers and order must be the same on all nodes. Including more tiers
  # is better than including fewer. For example:
  #   locality: country=us,region=us-west,datacenter=us-west-1b,rack=12
  #   locality: country=ca,region=ca-east,datacenter=ca-east-2,rack=4
  #   locality: planet=earth,province=manitoba,colo=secondary,power=3
  locality: "country=us,region=west,state=washington,city=seattle"

  # Run CockroachDB instances in standalone mode with replication disabled
  # (replication factor = 1).
  # Enabling this option makes the following values to be ignored:
  # - `conf.cluster-name`
  # - `conf.disable-cluster-name-verification`
  # - `conf.join`
  #
  # WARNING: Enabling this option makes each deployed Pod as a STANDALONE
  #          CockroachDB instance, so the StatefulSet does NOT FORM A CLUSTER.
  #          Don't use this option for production deployments unless you clearly
  #          understand what you're doing.
  #          Usually, this option is intended to be used in conjunction with
  #          `statefulset.replicas: 1` for temporary one-time deployments (like
  #          running E2E tests, for example).
  single-node: false

  # If non-empty, create a SQL audit log in the specified directory.
  sql-audit-dir: ""

  # CockroachDB's port to listen to inter-communications and client connections.
  port: 26257

  # CockroachDB's port to listen to HTTP requests.
  http-port: 8080

statefulset:
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  podManagementPolicy: Parallel
  budget:
    maxUnavailable: 1

  # List of additional command-line arguments you want to pass to the
  # `cockroach start` command.
  args:
    []
    # - --disable-cluster-name-verification

  # List of extra environment variables to pass into container
  env:
    []
    # - name: COCKROACH_ENGINE_MAX_SYNC_DURATION
    #   value: "24h"

  # List of Secrets names in the same Namespace as the CockroachDB cluster,
  # which shall be mounted into `/etc/cockroach/secrets/` for every cluster
  # member.
  secretMounts: []

  # Additional labels to apply to this StatefulSet and all its Pods.
  labels:
    app.kubernetes.io/component: cockroachdb

  # Additional annotations to apply to the Pods of this StatefulSet.
  annotations: {}

  # Affinity rules for scheduling Pods of this StatefulSet on Nodes.
  # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity
  nodeAffinity: {}
  # Inter-Pod Affinity rules for scheduling Pods of this StatefulSet.
  # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#inter-pod-affinity-and-anti-affinity
  podAffinity: {}
  # Anti-affinity rules for scheduling Pods of this StatefulSet.
  # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#inter-pod-affinity-and-anti-affinity
  # You may either toggle options below for default anti-affinity rules,
  # or specify the whole set of anti-affinity rules instead of them.
  podAntiAffinity:
    # The topologyKey to be used.
    # Can be used to spread across different nodes, AZs, regions etc.
    topologyKey: kubernetes.io/hostname
    # Type of anti-affinity rules: either `soft`, `hard` or empty value (which
    # disables anti-affinity rules).
    type: hard
    # Weight for `soft` anti-affinity rules.
    # Does not apply for other anti-affinity types.
    weight: 100

  # Node selection constraints for scheduling Pods of this StatefulSet.
  # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
  nodeSelector: {}

  # PriorityClassName given to Pods of this StatefulSet
  # https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
  priorityClassName: "highest"

  # Taints to be tolerated by Pods of this StatefulSet.
  # https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
  tolerations:
    - effect: NoSchedule
      key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot

  # https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/
  topologySpreadConstraints:
    maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway

  # Uncomment the following resources definitions or pass them from
  # command line to control the CPU and memory resources allocated
  # by Pods of this StatefulSet.
  resources:
    requests:
      cpu: 50m
      memory: 128Mi
    limits:
      cpu: 100m
      memory: 256Mi

service:
  ports:
    # You can set a different external and internal gRPC ports and their name.
    grpc:
      external:
        port: 26257
        name: grpc
      # If the port number is different than `external.port`, then it will be
      # named as `internal.name` in Service.
      internal:
        port: 26257
        # If using Istio set it to `cockroach`.
        name: cockroach
    http:
      port: 8080
      name: http

  # This Service is meant to be used by clients of the database.
  # It exposes a ClusterIP that will automatically load balance connections
  # to the different database Pods.
  public:
    type: ClusterIP
    # Additional labels to apply to this Service.
    labels:
      app.kubernetes.io/component: cockroachdb
    # Additional annotations to apply to this Service.
    annotations: {}

  # This service only exists to create DNS entries for each pod in
  # the StatefulSet such that they can resolve each other's IP addresses.
  # It does not create a load-balanced ClusterIP and should not be used directly
  # by clients in most circumstances.
  discovery:
    # Additional labels to apply to this Service.
    labels:
      app.kubernetes.io/component: cockroachdb
    # Additional annotations to apply to this Service.
    annotations: {}

# CockroachDB's ingress for web ui.
ingress:
  enabled: false
  labels: {}
  annotations: {}
  #   kubernetes.io/ingress.class: nginx
  #   cert-manager.io/cluster-issuer: letsencrypt
  paths: [/]
  hosts: []
  # - cockroachlabs.com
  tls: []
  # - hosts: [cockroachlabs.com]
  #   secretName: cockroachlabs-tls

# CockroachDB's Prometheus operator ServiceMonitor support
serviceMonitor:
  enabled: false
  labels: {}
  annotations: {}
  interval: 10s
  # scrapeTimeout: 10s

# CockroachDB's data persistence.
# If neither `persistentVolume` nor `hostPath` is used, then data will be
# persisted in ad-hoc `emptyDir`.
storage:
  # Absolute path on host to store CockroachDB's data.
  # If not specified, then `emptyDir` will be used instead.
  # If specified, but `persistentVolume.enabled` is `true`, then has no effect.
  hostPath: ""

  # If `enabled` is `true` then a PersistentVolumeClaim will be created and
  # used to store CockroachDB's data, otherwise `hostPath` is used.
  persistentVolume:
    enabled: true

    size: 10Gi

    # If defined, then `storageClassName: <storageClass>`.
    # If set to "-", then `storageClassName: ""`, which disables dynamic
    # provisioning.
    # If undefined or empty (default), then no `storageClassName` spec is set,
    # so the default provisioner will be chosen (gp2 on AWS, standard on
    # GKE, AWS & OpenStack).
    storageClass: "default"

    # Additional labels to apply to the created PersistentVolumeClaims.
    labels: {}
    # Additional annotations to apply to the created PersistentVolumeClaims.
    annotations: {}

# Kubernetes Job which initializes multi-node CockroachDB cluster.
# It's not created if `statefulset.replicas` is `1`.
init:
  # Additional labels to apply to this Job and its Pod.
  labels:
    app.kubernetes.io/component: init

  # Additional annotations to apply to the Pod of this Job.
  annotations: {}

  # Affinity rules for scheduling the Pod of this Job.
  # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity
  affinity: {}

  # Node selection constraints for scheduling the Pod of this Job.
  # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
  nodeSelector:
    "k.com/burstable": "true"

  # Taints to be tolerated by the Pod of this Job.
  # https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
  tolerations:
    - effect: NoSchedule
      key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot
    - effect: NoSchedule
      key: k.com/burstable
      operator: Equal
      value: "true"

  # The init Pod runs at cluster creation to initialize CockroachDB. It finishes
  # quickly and doesn't continue to consume resources in the Kubernetes
  # cluster. Normally, you should leave this section commented out, but if your
  # Kubernetes cluster uses Resource Quotas and requires all pods to specify
  # resource requests or limits, you can set those here.
  resources:
    requests:
      cpu: "100m"
      memory: "128Mi"
    limits:
      cpu: "100m"
      memory: "128Mi"

# Whether to run securely using TLS certificates.
tls:
  enabled: true
  serviceAccount:
    # Specifies whether this ServiceAccount should be created.
    create: true
    # The name of this ServiceAccount to use.
    # If not set and `create` is `true`, then a name is auto-generated.
    name: ""
  certs:
    # Bring your own certs scenario. If provided, tls.init section will be ignored.
    provided: false
    # Secret name for the client root cert.
    clientRootSecret: cockroachdb-root
    # Secret name for node cert.
    nodeSecret: cockroachdb-node
    # Enable if the secret is a dedicated TLS.
    # TLS secrets are created by cert-mananger, for example.
    tlsSecret: false

  init:
    # Image to use for requesting TLS certificates.
    image:
      repository: cockroachdb/cockroach-k8s-request-cert
      tag: "0.4"
      pullPolicy: IfNotPresent
      credentials:
        {}
        # registry: docker.io
        # username: john_doe
        # password: changeme

networkPolicy:
  enabled: false

  ingress:
    # List of sources which should be able to access the CockroachDB Pods via
    # gRPC port. Items in this list are combined using a logical OR operation.
    # Rules for allowing inter-communication are applied automatically.
    # If empty, then connections from any Pod is allowed.
    grpc:
      []
      # - podSelector:
      #     matchLabels:
      #       app.kubernetes.io/name: cockroachdb
      #       app.kubernetes.io/instance: k-preprod

    # List of sources which should be able to access the CockroachDB Pods via
    # HTTP port. Items in this list are combined using a logical OR operation.
    # If empty, then connections from any Pod is allowed.
    http:
      []
      # - podSelector:
      #     matchLabels:
      #       app.kubernetes.io/name: cockroachdb
      #       app.kubernetes.io/instance: k-preprod
      # - namespaceSelector:
      #     matchLabels:
      #       project: my-project

open /run/config/pki/apiserver-kubelet-client.crt: no such file or directory

Any idea what I'm doing wrong?

cockroachdb:
  tls:
    enabled: true
  statefulset:
    replicas: 1
    resources:
      limits:
        memory: "4Gi"
      requests:
        memory: "2Gi"
  conf:
    single-node: "yes"
    cache: "1024Mi"
    max-sql-memory: "1024Mi"

Logs:

Namespace
iron-backend-dev
Pod
temporal-cockroachdb-0
Container
db
Search...
Failed to load logs: Get "https://10.99.0.4:10250/containerLogs/iron-backend-dev/temporal-cockroachdb-0/db?tailLines=502&timestamps=true": open /run/config/pki/apiserver-kubelet-client.crt: no such file or director

Add ingress support

Add ingress helm chart support for cockroachdb web ui

ingress:
  enabled: false
  annotations:
  hosts:
  - host:
    paths: []
  tls:
  - hosts: []
    secretName:

server,ui: infinite HTTP redirect when accessing UI for secure cluster behind nginx-ingress in k8s

Describe the problem
I am getting too many redirects when trying to access secure cockroachdb cluster behind ingress-nginx using helm chart.

Please describe the issue you observed, and any steps we can take to reproduce it:

It is happening because tls.enabled == secure cluster with tls termination.
Same thing i got with argo cd and solved using --insecure flag (Insecure - just about tls termination)
So i have to use insecure cluster behind ingress (losing auth screen, users with passwords etc) or secure cluster without ingress inside k8s (And i have to manage certificates, use not 80 and 443 ports, it's painful)

To Reproduce

  1. Setup basic k8s cluster
  2. Setup nginx-ingress, cert-manager
  3. Setup secure cockroachdb cluster
  4. Create k8s Ingress for cockroachdb service
  5. Try to access admin panel

Expected behavior
I can disable tls termination in web ui, but don't lose security cluster benefits

Additional data / screenshots

Environment:

  • CockroachDB version 20.1
  • Server OS: Debian 10

Jira issue: CRDB-4219

Init job as a helm hook

For now init job is part of chart, deploys once and constantly available in a cluster

Main problem is to upgrade cluster, job is immutable and failing on crdb version upgrade. Workaround is to delete init job and helm upgrade again

I think that init job should be a helm hook like this (IF INIT JOB IS IDEMPOTENT and i think it is)

annotations:
  helm.sh/hook: post-install,post-upgrade
  # This will delete job after succeeded
  helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation

Minio implementation of buckets provisioning reference: https://github.com/minio/charts/blob/master/minio/templates/post-install-create-bucket-job.yaml

ValidationError(Certificate.spec): unknown field "organization" in io.cert-manager.v1.Certificate.spec

When I install the Helm chart with the values:

      - tls:
          enabled: true
          certs:
            certManager: true
            certManagerIssuer:
              name: letsencrypt-prod
              kind: ClusterIssuer
              group: cert-manager.io
            useCertManagerV1CRDs: true

I get the ValidationError(Certificate.spec): unknown field "organization" in io.cert-manager.v1.Certificate.spec error, probably because the organization section of the Certificate CRD is now under the subject property: https://cert-manager.io/docs/usage/certificate/ but in the Helm chart is straight under spec:

Solution fix:

Just put it under subject for cert-manager.io/v1 at least and should work.

Versions:

  • cert-manager v1.5.2
  • Cockroachdb Helm chart version: 6.0.8

Detailed error message:

Comparing release=cockroachdb, chart=cockroachdb/cockroachdb
in applications/cockroachdb/helmfile.yaml: command "/usr/sbin/helm" exited with non-zero status:

PATH:
  /usr/sbin/helm

ARGS:
  0: helm (4 bytes)
  1: diff (4 bytes)
  2: upgrade (7 bytes)
  3: --reset-values (14 bytes)
  4: --allow-unreleased (18 bytes)
  5: cockroachdb (11 bytes)
  6: cockroachdb/cockroachdb (23 bytes)
  7: --version (9 bytes)
  8: 6.0.8 (5 bytes)
  9: --namespace (11 bytes)
  10: kube-system (11 bytes)
  11: --values (8 bytes)
  12: /tmp/helmfile002135761/kube-system-cockroachdb-values-559d977b55 (64 bytes)
  13: --values (8 bytes)
  14: /tmp/helmfile103528956/kube-system-cockroachdb-values-58cccd9ddd (64 bytes)
  15: --values (8 bytes)
  16: /tmp/helmfile619607851/kube-system-cockroachdb-values-69975879d7 (64 bytes)
  17: --values (8 bytes)
  18: /tmp/helmfile188868750/kube-system-cockroachdb-values-d7bc9c658 (63 bytes)
  19: --values (8 bytes)
  20: /tmp/helmfile343799957/kube-system-cockroachdb-values-7b798f8444 (64 bytes)
  21: --detailed-exitcode (19 bytes)

ERROR:
  exit status 1

EXIT STATUS
  1

STDERR:
  Error: Failed to render chart: exit status 1: Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(Certificate.spec): unknown field "organization" in io.cert-manager.v1.Certificate.spec
  Error: plugin "diff" exited with error

COMBINED OUTPUT:
  Error: Failed to render chart: exit status 1: Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(Certificate.spec): unknown field "organization" in io.cert-manager.v1.Certificate.spec
  Error: plugin "diff" exited with error

CI Process moving forward

From prafull01

Current approach: It is to build the cert signing utility, push it to the GCR repo and use that image in our tests. To implement this approach, we had to use the pull_request_target which allow forked workflows to have access to the GCR tokens.

Advatages:
You have to build the image only once and use it in as many tests you want.

Disadvantages:
The forked workflow (untrusted) will have access to the GCR token and will be able to push the image to our GCR repo.
The multiple pushs on a PR, may be because of bug fixes, review comments etc, builds a new image will be pushed to GCR. Again waste of resources for us.

Every CI change we need have to merged before it can actually run, because pull_request_target checkout the code of the master and then we apply the pull request head ref on top of it.

Alternative Approach: Instead of building and pushing the image, we only build the image as one step in the every e2e tests job and do not push it to GCR repo. Hence workflow don't need the access to GCR repo . This build image on the same github runner can be used to run in tests. We will add a post job which triggers on merging a PR and it will initiate a docker build and push that to the GCR repo. So we will have only single image per PR merge in our GCR repo.

Advantages:
The forked workflow don't need access to GCR token through secrets.
Only merged PRs 1 image will be pushed to GCR repo.
CI can be tested while making the change as pull_request runs directly on the pull request and if the pull request have the CI change it incorporates those changes.

Disadvantages:
Might have to build the docker image multiple times, here 2 times. One for E2e for install and E2E for rotate..

/cc @rail @keith-mcclellan lets discuss here

Cockroachdb fails to init with argocd

Hi folks,

I guess we hit an issue quite similar to #69 with argocd

With argocd helm hooks are converted automatically to argocd hooks.
The init job does include annotations with post-install and post-upgrade stages. They will be converted in postSync in argocd (source: https://argoproj.github.io/argo-cd/user-guide/helm/#helm-hooks).

In fact the job will never been triggered because the sync stage will never be over untill all pods are healthy which can't occurs without init. inifinite loop :-)

One easy way to fix that is to allow customization of Job annotations (Job.spec.annotations not Job.spec.template.annotations which are already supported with init.annotations). This will allow to add an annotation for Argocd like argocd.argoproj.io/hook: Sync which i guess could do the job.

Can log configuration be added to the chart?

Currently the chart doesn't allow operating the --log <string> flag which is important to us as we've been waiting for a long time for JSON logging and want to get rid of ngrok parsers ASAP.

Thank you.

Init job breaks if you try to upgrade cockroach version

Jobs are immutable, but the init job uses the same values.yaml field (.Values.image.tag) as the actual DB pods. So when you update the version, helm (or rather, kube) errors out saying you're trying to change an immutable field.

Can't add items for NetworkPolicy

Reproduction:

# config.yaml
networkPolicy:
  enabled: true
  ingress:
    grpc: 
    - podSelector:
        matchLabels:
          foo: bar
helm lint --debug -n cockroachdb  ./cockroachdb -f ./config.yaml 
==> Linting ./cockroachdb
[ERROR] templates/: template: cockroachdb/templates/_helpers.tpl:14:14: executing "cockroachdb.fullname" at <.Values.fullnameOverride>: can't evaluate field Values in type []interface {}

Error: 1 chart(s) linted, 1 chart(s) failed
helm.go:75: [debug] 1 chart(s) linted, 1 chart(s) failed
main.newLintCmd.func1
        /private/tmp/helm-20200213-73045-zsskjg/src/helm.sh/helm/cmd/helm/lint.go:113
github.com/spf13/cobra.(*Command).execute
        /private/tmp/helm-20200213-73045-zsskjg/pkg/mod/github.com/spf13/[email protected]/command.go:826
github.com/spf13/cobra.(*Command).ExecuteC
        /private/tmp/helm-20200213-73045-zsskjg/pkg/mod/github.com/spf13/[email protected]/command.go:914
github.com/spf13/cobra.(*Command).Execute
        /private/tmp/helm-20200213-73045-zsskjg/pkg/mod/github.com/spf13/[email protected]/command.go:864
main.main
        /private/tmp/helm-20200213-73045-zsskjg/src/helm.sh/helm/cmd/helm/helm.go:74
runtime.main
        /usr/local/Cellar/go/1.13.8/libexec/src/runtime/proc.go:203
runtime.goexit
        /usr/local/Cellar/go/1.13.8/libexec/src/runtime/asm_amd64.s:135

No matches for kind "Ingress" in version "networking.k8s.io/v1" prior to Kubernetes 1.19

Unfortunately the check $.Capabilities.APIVersions.Has "networking.k8s.io/v1" is successful for Kubernetes 1.17, but then fails applying the generated manifests with error

no matches for kind "Ingress" in version "networking.k8s.io/v1"

Issue kubernetes/kubernetes#90077 show that the Ingress object isn't indeed available even though the new apiVersion is.

Looking at how the Grafana chart does this - see https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/ingress.yaml#L7 - it turns out to need $.Capabilities.APIVersions.Has "networking.k8s.io/v1/Ingress" to accomplish what I was trying to do.

I'll issue yet another PR and trying to figure out how I'll test this prior to it getting merged :S

Run init job for single node to create root user certificate

In order to run a single node 'cluster' in secure mode it's useful to have the init job run so it can issue the CSR for the root client certificate. I've verified that the job runs fine for single node. It does log the following error but completes succesfully:

ERROR: cluster has already been initialized Failed running "init"

certGenerator: make `tls.certs.generator` default certificate management authority

Currently, we use k8s cert management methodology as a default
in case of helm-chart installation. Change that to use certGenerator
Changes Required:

  • For new installation change enable tls.certs.generator.enabled as True and corresponding changes
  • Upgrade for older installations should be non-breaking.
    (Older installations continue to use the old method unless they select new)

No securityContext support in Chart

Description:
Hello, I have noticed, that the chart does not expose pod or container securityContext, which makes it impossible to run on a PSP enabled cluster. This due to the user in the container:

Error: container has runAsNonRoot and image will run as root

copy-certs init container not created when using cert-manager

I would expect when using cert-manager that setting tls.certs.certManager is enough. However, the copy-certs init container is not created since tls.certs.provided has to be true as well. Is this intended? If yes, the documentation should explicitly state that.

Cockroachdb installation with TLS enabled through certificate manager is failing

I have tried cockroachdb installation with cert manager and the installation doesn't succeed and goes into crashloopbackoff stage.

Here is my certificate issuer:

prafull@EMPID18004:helm-charts$ kubectl get issuers.cert-manager.io 
NAME                      READY   AGE
cockroachdb-cert-issuer   True    17h

Installed the helm chart using following attributes:

tls:
-  enabled: false
+  enabled: true
   serviceAccount:
     # Specifies whether this ServiceAccount should be created.
     create: true
@@ -395,14 +395,14 @@ tls:
     # TLS secrets are created by cert-mananger, for example.
     tlsSecret: false
     # Use cert-manager to issue certificates for mTLS.
-    certManager: false
+    certManager: true
     # Specify an Issuer or a ClusterIssuer to use, when issuing
     # node and client certificates. The values correspond to the
     # issuerRef specified in the certificate.
     certManagerIssuer:
       group: cert-manager.io
       kind: Issuer
-      name: cockroachdb
+      name: cockroachdb-cert-issuer

After installing the cockroachdb :

prafull@EMPID18004:helm-charts$ kgp
NAME                          READY   STATUS             RESTARTS   AGE
crdb-cockroachdb-0            0/1     CrashLoopBackOff   2          57s
crdb-cockroachdb-1            0/1     CrashLoopBackOff   2          57s
crdb-cockroachdb-2            0/1     Error              2          57s
crdb-cockroachdb-init-srbft   1/1     Running            0          56s

The secrets are properly created by the cert manager:

prafull@EMPID18004:helm-charts$ kubectl get secrets 
NAME                           TYPE                                  DATA   AGE
ca-secret-auth                 kubernetes.io/tls                     2      17h
cockroachdb-node               kubernetes.io/tls                     3      105s
cockroachdb-root               kubernetes.io/tls                     3      105s
crdb-cockroachdb-token-8dqd5   kubernetes.io/service-account-token   3      107s
default-token-44bxk            kubernetes.io/service-account-token   3      24h
sh.helm.release.v1.crdb.v1     helm.sh/release.v1                    1      108s

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.