Giter Club home page Giter Club logo

k8s-scw-baremetal's Introduction

k8s-scw-baremetal

Kubernetes Terraform installer for Scaleway bare-metal ARM and AMD64

Initial setup

Clone the repository and install the dependencies:

$ git clone https://github.com/stefanprodan/k8s-scw-baremetal.git
$ cd k8s-scw-baremetal
$ terraform init

Note that you'll need Terraform v0.10 or newer to run this project.

Before running the project you'll have to create an access token for Terraform to connect to the Scaleway API

Now retrieve the <ORGANIZATION_ID> using your <ACCESS-TOKEN> from /organizations API endpoint:

$ curl https://account.scaleway.com/organizations -H "X-Auth-Token: <ACCESS-TOKEN>"

Sample output (excerpt with organization ID):

"organizations": [{"id": "xxxxxxxxxxxxx", "name": "Organization Name"}],

Using the token and your organization ID, create two environment variables:

$ export SCALEWAY_ORGANIZATION="<ORGANIZATION_ID>"
$ export SCALEWAY_TOKEN="<ACCESS-TOKEN>"

To configure your cluster, you'll need to have jq installed on your computer.

Usage

Create an AMD64 bare-metal Kubernetes cluster with one master and a node:

$ terraform workspace new amd64

$ terraform apply \
 -var region=par1 \
 -var arch=x86_64 \
 -var server_type=C2S \
 -var nodes=1 \
 -var server_type_node=C2S \
 -var weave_passwd=ChangeMe \
 -var docker_version=18.06 \
 -var ubuntu_version="Ubuntu Bionic"

This will do the following:

  • reserves public IPs for each server
  • provisions three bare-metal servers with Ubuntu 16.04.1 LTS (the size of the master and the node may be different but must remain in the same type of architecture)
  • connects to the master server via SSH and installs Docker CE and kubeadm apt packages
  • runs kubeadm init on the master server and configures kubectl
  • downloads the kubectl admin config file on your local machine and replaces the private IP with the public one
  • creates a Kubernetes secret with the Weave Net password
  • installs Weave Net with encrypted overlay
  • installs cluster add-ons (Kubernetes dashboard, metrics server and Heapster)
  • starts the nodes in parallel and installs Docker CE and kubeadm
  • joins the nodes in the cluster using the kubeadm token obtained from the master

Scale up by increasing the number of nodes:

$ terraform apply \
 -var nodes=3

Tear down the whole infrastructure with:

terraform destroy -force

Create an ARMv7 bare-metal Kubernetes cluster with one master and two nodes:

$ terraform workspace new arm

$ terraform apply \
 -var region=par1 \
 -var arch=arm \
 -var server_type=C1 \
 -var nodes=2 \
 -var server_type_node=C1 \
 -var weave_passwd=ChangeMe \
 -var docker_version=18.06 \
 -var ubuntu_version="Ubuntu Xenial"

Remote control

After applying the Terraform plan you'll see several output variables like the master public IP, the kubeadmn join command and the current workspace admin config.

In order to run kubectl commands against the Scaleway cluster you can use the kubectl_config output variable:

Check if Heapster works:

$ kubectl --kubeconfig ./$(terraform output kubectl_config) \
  top nodes

NAME           CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%
arm-master-1   655m         16%       873Mi           45%
arm-node-1     147m         3%        618Mi           32%
arm-node-2     101m         2%        584Mi           30%

The kubectl config file format is <WORKSPACE>.conf as in arm.conf or amd64.conf.

In order to access the dashboard you can use port forward:

$ kubectl --kubeconfig ./$(terraform output kubectl_config) \
  -n kube-system port-forward deployment/kubernetes-dashboard 8888:9090

Now you can access the dashboard on your computer at http://localhost:8888.

Overview

Nodes

Expose services outside the cluster

Since we're running on bare-metal and Scaleway doesn't offer a load balancer, the easiest way to expose applications outside of Kubernetes is using a NodePort service.

Let's deploy the podinfo app in the default namespace. Podinfo has a multi-arch Docker image and it will work on arm, arm64 or amd64.

Create the podinfo nodeport service:

$ kubectl --kubeconfig ./$(terraform output kubectl_config) \
  apply -f https://raw.githubusercontent.com/stefanprodan/k8s-podinfo/7a8506e60fca086572f16de57f87bf5430e2df48/deploy/podinfo-svc-nodeport.yaml
 
service "podinfo-nodeport" created

Create the podinfo deployment:

$ kubectl --kubeconfig ./$(terraform output kubectl_config) \
  apply -f https://raw.githubusercontent.com/stefanprodan/k8s-podinfo/7a8506e60fca086572f16de57f87bf5430e2df48/deploy/podinfo-dep.yaml

deployment "podinfo" created

Inspect the podinfo service to obtain the port number:

$ kubectl --kubeconfig ./$(terraform output kubectl_config) \
  get svc --selector=app=podinfo

NAME               TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
podinfo-nodeport   NodePort   10.104.132.14   <none>        9898:31190/TCP   3m

You can access podinfo at http://<MASTER_PUBLIC_IP>:31190 or using curl:

$ curl http://$(terraform output k8s_master_public_ip):31190

runtime:
  arch: arm
  max_procs: "4"
  num_cpu: "4"
  num_goroutine: "12"
  os: linux
  version: go1.9.2
labels:
  app: podinfo
  pod-template-hash: "1847780700"
annotations:
  kubernetes.io/config.seen: 2018-01-08T00:39:45.580597397Z
  kubernetes.io/config.source: api
environment:
  HOME: /root
  HOSTNAME: podinfo-5d8ccd4c44-zrczc
  KUBERNETES_PORT: tcp://10.96.0.1:443
  KUBERNETES_PORT_443_TCP: tcp://10.96.0.1:443
  KUBERNETES_PORT_443_TCP_ADDR: 10.96.0.1
  KUBERNETES_PORT_443_TCP_PORT: "443"
  KUBERNETES_PORT_443_TCP_PROTO: tcp
  KUBERNETES_SERVICE_HOST: 10.96.0.1
  KUBERNETES_SERVICE_PORT: "443"
  KUBERNETES_SERVICE_PORT_HTTPS: "443"
  PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
externalIP:
  IPv4: 163.172.139.112

Horizontal Pod Autoscaling

Starting from Kubernetes 1.9 kube-controller-manager is configured by default with horizontal-pod-autoscaler-use-rest-clients. In order to use HPA we need to install the metrics server to enable the new metrics API used by HPA v2. Both Heapster and the metrics server have been deployed from Terraform when the master node was provisioned.

The metric server collects resource usage data from each node using Kubelet Summary API. Check if the metrics server is running:

$ kubectl --kubeconfig ./$(terraform output kubectl_config) \
 get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq
{
  "kind": "NodeMetricsList",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
  },
  "items": [
    {
      "metadata": {
        "name": "arm-master-1",
        "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/arm-master-1",
        "creationTimestamp": "2018-01-08T15:17:09Z"
      },
      "timestamp": "2018-01-08T15:17:00Z",
      "window": "1m0s",
      "usage": {
        "cpu": "384m",
        "memory": "935792Ki"
      }
    },
    {
      "metadata": {
        "name": "arm-node-1",
        "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/arm-node-1",
        "creationTimestamp": "2018-01-08T15:17:09Z"
      },
      "timestamp": "2018-01-08T15:17:00Z",
      "window": "1m0s",
      "usage": {
        "cpu": "130m",
        "memory": "649020Ki"
      }
    },
    {
      "metadata": {
        "name": "arm-node-2",
        "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/arm-node-2",
        "creationTimestamp": "2018-01-08T15:17:09Z"
      },
      "timestamp": "2018-01-08T15:17:00Z",
      "window": "1m0s",
      "usage": {
        "cpu": "120m",
        "memory": "614180Ki"
      }
    }
  ]
}

Let's define a HPA that will maintain a minimum of two replicas and will scale up to ten if the CPU average is over 80% or if the memory goes over 200Mi.

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: podinfo
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 80
  - type: Resource
    resource:
      name: memory
      targetAverageValue: 200Mi

Apply the podinfo HPA:

$ kubectl --kubeconfig ./$(terraform output kubectl_config) \
  apply -f https://raw.githubusercontent.com/stefanprodan/k8s-podinfo/7a8506e60fca086572f16de57f87bf5430e2df48/deploy/podinfo-hpa.yaml

horizontalpodautoscaler "podinfo" created

After a couple of seconds the HPA controller will contact the metrics server and will fetch the CPU and memory usage:

$ kubectl --kubeconfig ./$(terraform output kubectl_config) get hpa

NAME      REFERENCE            TARGETS                      MINPODS   MAXPODS   REPLICAS   AGE
podinfo   Deployment/podinfo   2826240 / 200Mi, 15% / 80%   2         10        2          5m

In order to increase the CPU usage we could run a load test with hey:

#install hey
go get -u github.com/rakyll/hey

#do 10K requests rate limited at 20 QPS
hey -n 10000 -q 10 -c 5 http://$(terraform output k8s_master_public_ip):31190

You can monitor the autoscaler events with:

$ watch -n 5 kubectl --kubeconfig ./$(terraform output kubectl_config) describe hpa

Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  7m    horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  3m    horizontal-pod-autoscaler  New size: 8; reason: cpu resource utilization (percentage of request) above target

After the load tests finishes the autoscaler will remove replicas until the deployment reaches the initial replica count:

Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  20m   horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  16m   horizontal-pod-autoscaler  New size: 8; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  12m   horizontal-pod-autoscaler  New size: 10; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  6m    horizontal-pod-autoscaler  New size: 2; reason: All metrics below target

k8s-scw-baremetal's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

k8s-scw-baremetal's Issues

Error when applying terraform ssh: handshake failed: ssh: unable to authenticate

Hi there!, here is the issue.

Terraform Version

Terraform v0.10.8 (Darwin)

Terraform Command

terraform apply -var nodes=3

Terraform Error

data.scaleway_image.xenial: Refreshing state...
scaleway_ip.k8s_master_ip: Creating...
  ip:     "" => "<computed>"
  server: "" => "<computed>"
scaleway_ip.k8s_node_ip[0]: Creating...
  ip:     "" => "<computed>"
  server: "" => "<computed>"
scaleway_ip.k8s_node_ip[1]: Creating...
  ip:     "" => "<computed>"
  server: "" => "<computed>"
scaleway_ip.k8s_node_ip[2]: Creating...
  ip:     "" => "<computed>"
  server: "" => "<computed>"
scaleway_ip.k8s_node_ip[2]: Creation complete after 3s (ID: cdb907dd-679b-4c08-86f9-ce16c3bfd119)
scaleway_ip.k8s_node_ip[0]: Creation complete after 3s (ID: 598ab41f-04d5-40f2-9575-4737fc125db4)
scaleway_ip.k8s_node_ip[1]: Creation complete after 3s (ID: f9100d50-07f3-4893-bc8b-940272c87ee7)
scaleway_ip.k8s_master_ip: Creation complete after 4s (ID: 4c586a8d-b86f-4e12-ae6e-d2528681d8a0)
scaleway_server.k8s_master: Creating...
  enable_ipv6:  "" => "false"
  image:        "" => "3a1b0dd8-92e1-4ba2-aece-eea8e9d07e32"
  name:         "" => "arm-master-1"
  private_ip:   "" => "<computed>"
  public_ip:    "" => "163.172.137.9"
  public_ipv6:  "" => "<computed>"
  state:        "" => "<computed>"
  state_detail: "" => "<computed>"
  type:         "" => "C1"
scaleway_server.k8s_master: Still creating... (10s elapsed)
scaleway_server.k8s_master: Still creating... (20s elapsed)
scaleway_server.k8s_master: Still creating... (30s elapsed)
scaleway_server.k8s_master: Still creating... (40s elapsed)
scaleway_server.k8s_master: Still creating... (50s elapsed)
scaleway_server.k8s_master: Still creating... (1m0s elapsed)
scaleway_server.k8s_master: Still creating... (1m10s elapsed)
scaleway_server.k8s_master: Still creating... (1m20s elapsed)
scaleway_server.k8s_master: Still creating... (1m30s elapsed)
scaleway_server.k8s_master: Still creating... (1m40s elapsed)
scaleway_server.k8s_master: Provisioning with 'file'...
scaleway_server.k8s_master: Still creating... (1m50s elapsed)
scaleway_server.k8s_master: Still creating... (2m0s elapsed)
scaleway_server.k8s_master: Still creating... (2m10s elapsed)
scaleway_server.k8s_master: Still creating... (2m20s elapsed)
scaleway_server.k8s_master: Still creating... (2m30s elapsed)
scaleway_server.k8s_master: Still creating... (2m40s elapsed)
scaleway_server.k8s_master: Still creating... (2m50s elapsed)
scaleway_server.k8s_master: Still creating... (3m0s elapsed)
scaleway_server.k8s_master: Still creating... (3m10s elapsed)
scaleway_server.k8s_master: Still creating... (3m20s elapsed)
scaleway_server.k8s_master: Still creating... (3m30s elapsed)
scaleway_server.k8s_master: Still creating... (3m40s elapsed)
scaleway_server.k8s_master: Still creating... (3m50s elapsed)
scaleway_server.k8s_master: Still creating... (4m0s elapsed)
scaleway_server.k8s_master: Still creating... (4m10s elapsed)
scaleway_server.k8s_master: Still creating... (4m20s elapsed)
scaleway_server.k8s_master: Still creating... (4m30s elapsed)
scaleway_server.k8s_master: Still creating... (4m40s elapsed)
scaleway_server.k8s_master: Still creating... (4m50s elapsed)
scaleway_server.k8s_master: Still creating... (5m0s elapsed)
scaleway_server.k8s_master: Still creating... (5m10s elapsed)
scaleway_server.k8s_master: Still creating... (5m20s elapsed)
scaleway_server.k8s_master: Still creating... (5m30s elapsed)
scaleway_server.k8s_master: Still creating... (5m40s elapsed)
scaleway_server.k8s_master: Still creating... (5m50s elapsed)
scaleway_server.k8s_master: Still creating... (6m0s elapsed)
scaleway_server.k8s_master: Still creating... (6m10s elapsed)
scaleway_server.k8s_master: Still creating... (6m20s elapsed)
scaleway_server.k8s_master: Still creating... (6m30s elapsed)
scaleway_server.k8s_master: Still creating... (6m40s elapsed)

Error: Error applying plan:

1 error(s) occurred:

* scaleway_server.k8s_master: 1 error(s) occurred:

* ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Comments

I have my SSH key registered on Scaleway (locally ~/.ssh/id_rsa). It's really strange.
I tried to test with different keys and configuration (by modifying the terraform) but nothing has solved the problem.

I did follow @stefanprodan post though
https://stefanprodan.com/2018/kubernetes-scaleway-baremetal-arm-terraform-installer/

Am I the only one who has this problem ?

Thank !

Seeking feedback on assignment of roles to worker nodes

@stefanprodan Seeking some feedback as to whether this would make a useful PR.

I'm not sure if this will be useful to k8s-scw-baremetal.

Some of the changes I am starting to make to my own copy of k8s-scw-baremetal are quite opinionated but I thought that this change might be useful - although I'm not sure if the node-role.kubernetes.io/worker role is a kubernetes convention, maybe node-role.kubernetes.io/node is better.

stephenmoloney@05a51a1

any thoughts?

Does not handle SSH keys properly

A couple of the scripts rely on having the default SSH key id_rsa. The scripts that use scp or ssh should explicitly set the -i flag to specify the key to use.

Pull request coming.

Terraform remote-exec fails on K8s stable-1.11 and ARMv7

Raising a new issue, but is related to installation #19.

Switched to stable-1.11 of kubernetes, terraform remote-exec fails with the following:

scaleway_server.k8s_master (remote-exec): 		Unfortunately, an error has occurred:
scaleway_server.k8s_master (remote-exec): 			timed out waiting for the condition

scaleway_server.k8s_master (remote-exec): 		This error is likely caused by:
scaleway_server.k8s_master (remote-exec): 			- The kubelet is not running
scaleway_server.k8s_master (remote-exec): 			- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
scaleway_server.k8s_master (remote-exec): 			- No internet connection is available so the kubelet cannot pull or find the following control plane images:
scaleway_server.k8s_master (remote-exec): 				- k8s.gcr.io/kube-apiserver-arm:v1.11.1
scaleway_server.k8s_master (remote-exec): 				- k8s.gcr.io/kube-controller-manager-arm:v1.11.1
scaleway_server.k8s_master (remote-exec): 				- k8s.gcr.io/kube-scheduler-arm:v1.11.1
scaleway_server.k8s_master (remote-exec): 				- k8s.gcr.io/etcd-arm:3.2.18
scaleway_server.k8s_master (remote-exec): 				- You can check or miligate this in beforehand with "kubeadm config images pull" to make sure the images
scaleway_server.k8s_master (remote-exec): 				  are downloaded locally and cached.

scaleway_server.k8s_master (remote-exec): 		If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
scaleway_server.k8s_master (remote-exec): 			- 'systemctl status kubelet'
scaleway_server.k8s_master (remote-exec): 			- 'journalctl -xeu kubelet'

scaleway_server.k8s_master (remote-exec): 		Additionally, a control plane component may have crashed or exited when started by the container runtime.
scaleway_server.k8s_master (remote-exec): 		To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
scaleway_server.k8s_master (remote-exec): 		Here is one example how you may list all Kubernetes containers running in docker:
scaleway_server.k8s_master (remote-exec): 			- 'docker ps -a | grep kube | grep -v pause'
scaleway_server.k8s_master (remote-exec): 			Once you have found the failing container, you can inspect its logs with:
scaleway_server.k8s_master (remote-exec): 			- 'docker logs CONTAINERID'
scaleway_server.k8s_master (remote-exec): couldn't initialize a Kubernetes cluster
scaleway_server.k8s_master: Still creating... (15m1s elapsed)
scaleway_server.k8s_master (remote-exec): Unable to connect to the server: net/http: TLS handshake timeout
scaleway_server.k8s_master: Still creating... (15m11s elapsed)
scaleway_server.k8s_master (remote-exec): Unable to connect to the server: net/http: TLS handshake timeout
scaleway_server.k8s_master: Still creating... (15m21s elapsed)
scaleway_server.k8s_master: Still creating... (15m31s elapsed)
scaleway_server.k8s_master (remote-exec): error: unable to recognize "https://cloud.weave.works/k8s/net?password-secret=weave-passwd&k8s-version=Q2xpZW50IFZlcnNpb246IHZlcnNpb24uSW5mb3tNYWpvcjoiMSIsIE1pbm9yOiIxMSIsIEdpdFZlcnNpb246InYxLjExLjEiLCBHaXRDb21taXQ6ImIxYjI5OTc4MjcwZGMyMmZlY2M1OTJhYzU1ZDkwMzM1MDQ1NDMxMGEiLCBHaXRUcmVlU3RhdGU6ImNsZWFuIiwgQnVpbGREYXRlOiIyMDE4LTA3LTE3VDE4OjUzOjIwWiIsIEdvVmVyc2lvbjoiZ28xLjEwLjMiLCBDb21waWxlcjoiZ2MiLCBQbGF0Zm9ybToibGludXgvYXJtIn0K": Get https://10.1.32.123:6443/api?timeout=32s: net/http: TLS handshake timeout
scaleway_server.k8s_master: Still creating... (15m41s elapsed)
scaleway_server.k8s_master (remote-exec): error: unable to recognize "/tmp/dashboard-rbac.yaml": no matches for kind "ClusterRoleBinding" in version "rbac.authorization.k8s.io/v1beta1"

jq required on local machine

./scripts/kubeadm-token.sh requires the commandline json processor jq installed.
For macOS: brew install jq
Perhaps this should be added to the readme?

(thanks to a really nice terraform lib!)

Error applying plan. APIMessage: Authorization required

Hello,

When I make a terraform apply I have this error. I've looked at the quotas, but they look good.

Reagrds

Error: Error applying plan:

5 error(s) occurred:

  • scaleway_ip.k8s_node_ip[0]: 1 error(s) occurred:

  • scaleway_ip.k8s_node_ip.0: StatusCode: 403, Type: authorization_required, APIMessage: Authorization required

  • scaleway_security_group.node_security_group: 1 error(s) occurred:

  • scaleway_security_group.node_security_group: StatusCode: 403, Type: authorization_required, APIMessage: Authorization required

  • scaleway_ip.k8s_master_ip: 1 error(s) occurred:

  • scaleway_ip.k8s_master_ip: StatusCode: 403, Type: authorization_required, APIMessage: Authorization required

  • scaleway_security_group.master_security_group: 1 error(s) occurred:

  • scaleway_security_group.master_security_group: StatusCode: 403, Type: authorization_required, APIMessage: Authorization required

  • scaleway_ip.k8s_node_ip[1]: 1 error(s) occurred:

  • scaleway_ip.k8s_node_ip.1: StatusCode: 403, Type: authorization_required, APIMessage: Authorization required

Possibility of choosing the size of the worker

Hi,

Currently the size of the master and also the size of the node worker.
I would like us to change that.

I propose to create a new variable :

variable "server_type_node" {
  default     = "C1"
  description = "Use C1 for arm, ARM64-2GB for arm64 and C2S for x86_64"
}

In a classic command this gives :

terraform apply -var region=par1 -var arch=x86_64 -var server_type=C2M -var server_type_node=C2S -var nodes=2 -var weave_passwd=ChangeMe -var k8s_version=stable-1.9 -var docker_version=17.03.0~ce-0~ubuntu-xenial

Does this interest you ? If yes, I can propose a PR if you wish.

Thanks.

Terraform apply fails on ARM

scaleway_server.k8s_master: Still creating... (16m50s elapsed)

scaleway_server.k8s_master (remote-exec): 		Unfortunately, an error has occurred:
scaleway_server.k8s_master (remote-exec): 			timed out waiting for the condition

scaleway_server.k8s_master (remote-exec): 		This error is likely caused by:
scaleway_server.k8s_master (remote-exec): 			- The kubelet is not running
scaleway_server.k8s_master (remote-exec): 			- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
scaleway_server.k8s_master (remote-exec): 			- No internet connection is available so the kubelet cannot pull or find the following control plane images:
scaleway_server.k8s_master (remote-exec): 				- k8s.gcr.io/kube-apiserver-arm:v1.11.4
scaleway_server.k8s_master (remote-exec): 				- k8s.gcr.io/kube-controller-manager-arm:v1.11.4
scaleway_server.k8s_master (remote-exec): 				- k8s.gcr.io/kube-scheduler-arm:v1.11.4
scaleway_server.k8s_master (remote-exec): 				- k8s.gcr.io/etcd-arm:3.2.18
scaleway_server.k8s_master (remote-exec): 				- You can check or miligate this in beforehand with "kubeadm config images pull" to make sure the images
scaleway_server.k8s_master (remote-exec): 				  are downloaded locally and cached.

scaleway_server.k8s_master (remote-exec): 		If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
scaleway_server.k8s_master (remote-exec): 			- 'systemctl status kubelet'
scaleway_server.k8s_master (remote-exec): 			- 'journalctl -xeu kubelet'

scaleway_server.k8s_master (remote-exec): 		Additionally, a control plane component may have crashed or exited when started by the container runtime.
scaleway_server.k8s_master (remote-exec): 		To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
scaleway_server.k8s_master (remote-exec): 		Here is one example how you may list all Kubernetes containers running in docker:
scaleway_server.k8s_master (remote-exec): 			- 'docker ps -a | grep kube | grep -v pause'
scaleway_server.k8s_master (remote-exec): 			Once you have found the failing container, you can inspect its logs with:
scaleway_server.k8s_master (remote-exec): 			- 'docker logs CONTAINERID'
scaleway_server.k8s_master (remote-exec): couldn't initialize a Kubernetes cluster

Error: Error applying plan:

1 error(s) occurred:

* scaleway_server.k8s_master: error executing "/tmp/terraform_1537700471.sh": Process exited with status 1

Terraform does not automatically rollback in the face of errors.

Terraform version: 0.11.10
env: arm

Error in terraform output name kubectl_config

Hi,

Terraform Versions

0.10.8

Terraform Commands

terraform apply -var region=par1 -var nodes=2 -var weave_passwd=ChangeMe -var k8s_version=stable-1.9 -var docker_version=17.03.0~ce-0~ubuntu-xenial

No workspaces have been created (default: default).

Terraform Output

k8s_master_public_ip = <IP CLUSTER>
kubeadm_join_command = kubeadm join --token <TOKEN> 10.1.52.156:6443 --discovery-token-ca-cert-hash sha256:23fe2c207f469336c5731197c2c1219664e3ce5e7c46f22a0153b9507e7182b1
kubectl_config = default.conf
nodes_public_ip = [
    default-node-1,
    default-node-2,
    <IP NODE 1>,
    <IP NODE 2>
]

I do not have any file with the name default.conf but one with <ARCH>.conf format.

I found the source of the problem and I can fix it easily.
I will propose a PR shortly.

Using our own security group

Hi,

As the project is currently, we use Scaleway's default security group. It's really bad ...

I propose to create our own security group to secure the cluster.
I will propose a PR soon.

Fixing version of terraform (0.10.X) and providers

Hi,

Terraform Versions

v0.11.X

Terraform Command

terraform apply -var nodes=3

Terraform Error

data.scaleway_image.xenial: Refreshing state...

Error: Error running plan: 1 error(s) occurred:

* output.kubeadm_join_command: Resource 'data.external.kubeadm_join' does not have attribute 'result.command' for variable 'data.external.kubeadm_join.result.command'

Comments

It seems that the problem has been shown to the Hashicorp team and there is quite a bit of activity on the subject:

While waiting for it to be solved, I can via a PR fix the version of terraform a 0.10.8 :

terraform.tf

terraform {
  required_version = "<= 0.10.8"
}

Thanks !

Terraform 0.12.1 not supported

Getting an error on terraform init

Error: Unsupported Terraform Core version

This configuration does not support Terraform version 0.12.1. To proceed,
either choose another supported Terraform version or update the root module's
version constraint. Version constraints are normally set for good reason, so
updating the constraint may lead to other errors or unexpected behavior.

ping

Thanks for this repository, helped me a lot.

Just wanted to ask, is there any particular reason for block of ping? As of now, you can't ping outside of pods (even if they are inbound rules only, I think they are effectively blocking ping reply as well) which can be bit of a headache for debug purposes.

"error: metrics not available yet"

When I launch a cluster creation with the how to, I can't access the metrics :

kubectl --kubeconfig ./$(terraform output kubectl_config) \
  top nodes
error: metrics not available yet

It seems there is an issue visible on the dasboard :

0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
screenshot 2019-01-19 at 15 11 36

Max validated docker version

I've been able to start up the cluster with docker 17.12.0~ce-0~ubuntu but I do see this message in the output when running terraform apply

[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 17.12.0-ce. Max validated version: 17.03

Default storageclass volume missing

When i provision these scripts from scratch the k8s environment missing a default storageclass for automatic volume claim.
So when using a simple helm install like mysql it will be stucked on mounting a volume.
Any clue how to get pass that issue?

Is VC1S supported at AMS1?

Is it possible to deploy k8s at VC1S instances?

terraform apply \
-var region=ams1 \
-var arch=x86_64 \
-var server_type=VC1S \
-var nodes=1 \
-var server_type_node=VC1S \
-var weave_passwd=fk43fnk%4$ \
-var docker_version=17.12.0-ce-0-ubuntu
provider.scaleway.organization
  The Organization ID (a.k.a. 'access key') for Scaleway API operations.

  Enter a value: XXXXXXXXXXXXXXXXXx

provider.scaleway.token
  The API key for Scaleway API operations.

  Enter a value: XXXXXXXXXXXXXXXXXXx

data.scaleway_image.xenial: Refreshing state...

Error: Error refreshing state: 1 error(s) occurred:

* data.scaleway_image.xenial: 1 error(s) occurred:

* data.scaleway_image.xenial: data.scaleway_image.xenial: The query returned more than one result. Please refine your query.

Prevent kubeadm version mismatches

Issue:

Kubeadm can be installed at higher versions than kubernetes. Rather than allowing different versions, it may be better to use similar level versions to avoid cluster malfunction.

Potential solution:

Pin kubeadm at versions at the same Major.Minor level as kubernetes

PR to follow.

#24

Setting up log-rotation

Issue

I recently started to think about log handling and took a peek into the /var/log/pods folder and here is what I found:

root@node-1:~# du -sh -L /var/log/pods/
18G    /var/log/pods/

This is with just a few days usage of the cluster and not so many pods (about 20).

The issue is that there does not seem to be any default log-rotation.

Proposed solutions

  1. Introduce log rotation at the docker daemon level by modifying the daemon.json configuration file

  2. Introduce log rotation using a cronjob inside the k8s cluster

  3. Introduce log rotation in each node and master using logrotate. Some of the options within the logrotate could come from terraform vars too. Also, it could be entirely optional from a terraform var.

Option 1 has limited options but should be easy and reliable.
Option 2 doesn't sound like a great idea.
Option 3 sounds best to me.

@stefanprodan
I can do some work on this if you think a PR is beneficial, what do you think?

Terraform Apply Validation Error

Him

When I follow your guide, I end up facing an error on the terraform apply command:
* data.scaleway_image.xenial: data.scaleway_image.xenial: StatusCode: 400, Type: invalid_request_error, APIMessage: Validation Error, Details: map[organization:[<MY_SCALEWAY_TOKEN_ACCESS_KEY> is not a valid UUID.]]

Do you have any idea what it could come from? Maybe the scaleway API changed?
I'm using v1.4.1 of the Scaleway provider (also tried with v1.0.1).

Thanks a lot for your insight.

Master: remote-exec: error validating "STDIN"

Upon running terraform apply -var 'nodes=1' -var 'weave_password=password' -var 'docker_version=18.06', I was presented with the following error when trying to apply the monitoring script commands...

scaleway_server.k8s_master (remote-exec): error: error validating "STDIN": error validating data: [apiVersion not set, kind not set]; if you choose to ignore these errors, turn validation off with --validate=false

Both urls return a 404 status, which in turns kills the terraform process.

if [ "$ARCH" == "arm" ]; then
curl -s https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/alternative/kubernetes-dashboard-arm.yaml | \
sed -e 's/v2.0.0-alpha0/v1.8.3/g' | \
kubectl apply -f -;
kubectl apply -f /tmp/heapster-arm.yaml;
kubectl apply -f /tmp/metrics-server-arm.yaml;
elif [ "$ARCH" == "x86_64" ]; then
curl -s -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/alternative/kubernetes-dashboard.yaml | \
sed -e 's/v2.0.0-alpha0/v1.8.3/g' | \
kubectl apply -f -;
kubectl apply -f /tmp/heapster-amd64.yaml;
kubectl apply -f /tmp/metrics-server-amd64.yaml;
fi

So a quick fix for anyone encountering the same problem, you'll need to change the dashboard url to the appropriate one for your architecture.

Terraform failing to apply: multiple errors

Terraform Version

Terraform v0.11.7 (macOS High Sierra 10.13.6)

Terraform Command (as per README.md)

N.B. I only need one node for my test cluster.

terraform apply -var region=par1 -var arch=arm -var server_type=C1 -var nodes=1 -var weave_passwd=ChangeMe -var k8s_version=stable-1.9 -var docker_version=17.03.0~ce-0~ubuntu-xenial

Terraform error

scaleway_server.k8s_master (remote-exec): Processing triggers for systemd (229-4ubuntu21.2) ...
scaleway_server.k8s_master (remote-exec): this version of kubeadm only supports deploying clusters with the control plane version >= 1.10.0. Current version: v1.9.10
scaleway_server.k8s_master (remote-exec): cp: cannot stat '/etc/kubernetes/admin.conf': No such file or directory
scaleway_server.k8s_master (remote-exec): The connection to the server localhost:8080 was refused - did you specify the right host or port?
scaleway_server.k8s_master: Still creating... (4m50s elapsed)
scaleway_server.k8s_master (remote-exec): The connection to the server localhost:8080 was refused - did you specify the right host or port?
scaleway_server.k8s_master (remote-exec): error: unable to recognize "https://cloud.weave.works/k8s/net?password-secret=weave-passwd&k8s-version=Q2xpZW50IFZlcnNpb246IHZlcnNpb24uSW5mb3tNYWpvcjoiMSIsIE1pbm9yOiIxMSIsIEdpdFZlcnNpb246InYxLjExLjEiLCBHaXRDb21taXQ6ImIxYjI5OTc4MjcwZGMyMmZlY2M1OTJhYzU1ZDkwMzM1MDQ1NDMxMGEiLCBHaXRUcmVlU3RhdGU6ImNsZWFuIiwgQnVpbGREYXRlOiIyMDE4LTA3LTE3VDE4OjUzOjIwWiIsIEdvVmVyc2lvbjoiZ28xLjEwLjMiLCBDb21waWxlcjoiZ2MiLCBQbGF0Zm9ybToibGludXgvYXJtIn0K": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
scaleway_server.k8s_master (remote-exec): error: unable to recognize "/tmp/dashboard-rbac.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
Error: Error applying plan:

1 error(s) occurred:

* scaleway_server.k8s_master: error executing "/tmp/terraform_1448824324.sh": Process exited with status 1

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.