Automation around setting up the cloud-native content (Kubernetes) on Clear Linux.
- clr-k8s-examples: script tools to deploy a Kubernetes cluster
- metrics: tools to aid in measuring the scaling capabilities of Kubernetes clusters.
Automation around setting up the cloud-native content (kubernetes) on Clear Linux.
License: Apache License 2.0
This allows the http gateway to be directly exposed on well known ports to the rest of the network.
apiVersion: v1
kind: Service
metadata:
name: ingress-nginx
namespace: ingress-nginx
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
spec:
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: 80
protocol: TCP
- name: https
port: 443
targetPort: 443
protocol: TCP
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
---
Update admit-kata once kata-containers/tests#1080 merges
kata-deploy adds multiple entries for the kata-runtime in /etc/crio/crio.conf, after the node is up.
This causes cri-o to fail causing the node state to show as Not-ready
.
For running any further pods, I needed to manually remove the duplicate entries from the crio conf file, stop the kata-deploy ds and restart cri-o.
Remove setup_kata_firecracker.sh
once kata-containers/packaging#304 merges and use kata-deploy going forward
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 74s (x34 over 75m) kubelet, clr-03 Unable to mount volumes for pod "grafana-55dcb5484d-dq5lp_monitoring(faa0fcc4-d402-4ac1-a5fa-f8b4edcbbe68)": timeout expired waiting for volumes to attach or mount for pod "monitoring"/"grafana-55dcb5484d-dq5lp". list of unmounted volumes=[grafana-storage]. list of unattached volumes=[grafana-storage grafana-datasources grafana-dashboards grafana-dashboard-k8s-cluster-rsrc-use grafana-dashboard-k8s-node-rsrc-use grafana-dashboard-k8s-resources-cluster grafana-dashboard-k8s-resources-namespace grafana-dashboard-k8s-resources-pod grafana-dashboard-nodes grafana-dashboard-pods grafana-dashboard-statefulset grafana-token-6bv4n]
clear@clr-01 ~/clr-k8s-examples $ kubectl describe pvc -n monitoring grafana-storage-pvc
Name: grafana-storage-pvc
Namespace: monitoring
StorageClass: rook-ceph-block
Status: Bound
Volume: pvc-2f14f1c5-cf4f-4567-bfdb-8f1d68f5e1fc
Labels: app=grafana
Annotations: control-plane.alpha.kubernetes.io/leader:
{"holderIdentity":"888b1f2f-a35f-11e9-9ac2-b6e407c75ca1","leaseDurationSeconds":15,"acquireTime":"2019-07-10T22:10:32Z","renewTime":"2019-...
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"labels":{"app":"grafana"},"name":"grafana-storage-pvc","na...
pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: ceph.rook.io/block
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 1Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: grafana-55dcb5484d-dq5lp
Events: <none>
clear@clr-01 ~/clr-k8s-examples $ kubectl get pvc --all-namespaces -o wide
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE
kube-system elasticsearch-logging-elasticsearch-logging-0 Bound pvc-cd198db3-0476-4f0c-9355-140e326757b8 1Gi RWO rook-ceph-block 85m Filesystem
kube-system kubernetes-dashboard-pvc Bound pvc-ea813018-181e-47b7-a765-40cb6d28f1c9 1Gi RWO rook-ceph-block 21m Filesystem
monitoring alertmanager-main-db-alertmanager-main-0 Bound pvc-8fbf3579-8d32-42a9-b31e-0355380a6b79 1Gi RWO rook-ceph-block 84m Filesystem
monitoring grafana-storage-pvc Bound pvc-2f14f1c5-cf4f-4567-bfdb-8f1d68f5e1fc 1Gi RWO rook-ceph-block 85m Filesystem
monitoring prometheus-k8s-db-prometheus-k8s-0 Bound pvc-c8a4acf3-1fd4-4011-8c39-645930e5e709 1Gi RWO rook-ceph-block 84m Filesystem
clear@clr-03 ~ $ lsmod | grep rbd
rbd 86016 0
libceph 311296 1 rbd
Due to systemd
changes in rp_filter
all non local traffic to the pods is blocked.
The canal setup has not changed, so it must be either the kernel or systemd.
This can be worked around by setting FELIX_IGNORELOOSERPF to true.
diff --git a/clr-k8s-examples/0-canal/canal.yaml b/clr-k8s-examples/0-canal/canal.yaml
index 6666a2c..17b3cd4 100644
--- a/clr-k8s-examples/0-canal/canal.yaml
+++ b/clr-k8s-examples/0-canal/canal.yaml
@@ -159,6 +159,8 @@ spec:
value: "info"
- name: FELIX_HEALTHENABLED
value: "true"
+ - name: FELIX_IGNORELOOSERPF
+ value: "true"
securityContext:
privileged: true
resources:
diff --git a/clr-k8s-examples/setup_system.sh b/clr-k8s-examples/setup_system.sh
index f1614bf..4f00498 100755
--- a/clr-k8s-examples/setup_system.sh
+++ b/clr-k8s-examples/setup_system.sh
@@ -22,8 +22,6 @@ fi
sudo mkdir -p /etc/sysctl.d/
cat <<EOT | sudo bash -c "cat > /etc/sysctl.d/60-k8s.conf"
net.ipv4.ip_forward=1
-net.ipv4.conf.default.rp_filter=1
-net.ipv4.conf.all.rp_filter=1
EOT
sudo systemctl restart systemd-sysctl
/cc @krsna1729
kubeadm reset
leaves /var/lib/kubelet
behind. Suggesting to add
sudo rmdir /var/lib/kubelet
as the last line.
the minimal subcommand has steps that a user may not want if want to hack on a fresh started cluster.
Lets add a new command to only start the cluster.
crio has started enforcing registry origin checks. Hence all registries used by the stack need to be on the whitelist. We should update /etc/containers/registries.conf or replace it with all valid registries.
# This is a system-wide configuration file used to
# keep track of registries for various container backends.
# It adheres to TOML format and does not support recursive
# lists of registries.
# The default location for this configuration file is /etc/containers/registries.conf.
# The only valid categories are: 'registries.search', 'registries.insecure',
# and 'registries.block'.
[registries.search]
registries = ['docker.io', 'quay.io']
# If you need to access insecure registries, add the registry's fully-qualified name.
# An insecure registry is one that does not have a valid SSL certificate or only does HTTP.
[registries.insecure]
registries = ['docker.io', 'quay.io']
# If you need to block pull access from a registry, uncomment the section below
# and add the registries fully-qualified name.
#
# Docker only
[registries.block]
registries = []
I'm having trouble (still) bringing up the vagrantfile on an F29 machine. Qemu reports a fail with loading the OVMF BIOS (although I think we have seen previously that may be a misleading error, and the real error maybe elsewhere).
I will note, I can use vagrant with libvirt to load an example machine, and I can use it to load @ganeshmaharaj Ubuntu based setup that is based off of this codebase.
Here is the error output:
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.
An error occurred while executing the action on the 'clr-01'
machine. Please handle this error then try again:
There was an error talking to Libvirt. The error message is shown
below:
Call to virDomainCreateWithFlags failed: internal error: process exited while connecting to monitor: qemu: could not load PC BIOS '/home/gwhaley/.vagrant.d/boxes/AntonioMeireles-VAGRANTSLASH-ClearLinux/28390/libvirt/OVMF.fd'
An error occurred while executing the action on the 'clr-02'
machine. Please handle this error then try again:
There was an error talking to Libvirt. The error message is shown
below:
Call to virDomainCreateWithFlags failed: internal error: qemu unexpectedly closed the monitor: qemu: could not load PC BIOS '/home/gwhaley/.vagrant.d/boxes/AntonioMeireles-VAGRANTSLASH-ClearLinux/28390/libvirt/OVMF.fd'
An error occurred while executing the action on the 'clr-03'
machine. Please handle this error then try again:
There was an error talking to Libvirt. The error message is shown
below:
Call to virDomainCreateWithFlags failed: internal error: qemu unexpectedly closed the monitor: qemu: could not load PC BIOS '/home/gwhaley/.vagrant.d/boxes/AntonioMeireles-VAGRANTSLASH-ClearLinux/28390/libvirt/OVMF.fd'
But, if we go look see, then it seems that file does exist:
[gwhaley@fido libvirt]$ pwd
/home/gwhaley/.vagrant.d/boxes/AntonioMeireles-VAGRANTSLASH-ClearLinux/28390/libvirt
[gwhaley@fido libvirt]$ ls -la
total 2153112
drwxrwxr-x. 2 gwhaley gwhaley 4096 Mar 20 10:52 .
drwxrwxr-x. 3 gwhaley gwhaley 4096 Mar 20 10:51 ..
-rw-r--r--. 1 gwhaley gwhaley 2200567808 Mar 20 10:51 box.img
-rw-r--r--. 1 gwhaley gwhaley 201 Mar 20 10:51 info.json
-rw-r--r--. 1 gwhaley gwhaley 58 Mar 20 10:51 metadata.json
-rw-rw-r--. 1 gwhaley gwhaley 4194304 Mar 20 10:52 OVMF.fd
-rw-r--r--. 1 gwhaley gwhaley 3750 Mar 20 10:51 Vagrantfile
Here are some component version infos:
[gwhaley@fido libvirt]$ qemu-system-x86_64 --version
QEMU emulator version 3.0.0 (qemu-3.0.0-3.fc29)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
[gwhaley@fido libvirt]$ libvirtd --version
libvirtd (libvirt) 4.7.0
[gwhaley@fido libvirt]$ vagrant --version
Vagrant 2.2.4
And the base level machine I'm on:
[gwhaley@fido libvirt]$ cat /etc/os-release
NAME=Fedora
VERSION="29 (Workstation Edition)"
ID=fedora
VERSION_ID=29
VERSION_CODENAME=""
PLATFORM_ID="platform:f29"
PRETTY_NAME="Fedora 29 (Workstation Edition)"
ANSI_COLOR="0;34"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:29"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f29/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=29
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=29
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Workstation Edition"
VARIANT_ID=workstation
If you need any more info, just ask. I will go back to trying the Ubuntu based derivative for now to continue testing.
Looking at how to run the k8s e2e tests (note, that is an ubuntu specific page, but I'm struggling to find a more generic k8s e2e test page right now), it notes that one should relate the kubernetes-e2e charm to your existing kubernetes-master nodes and easyrsa
.
@ganeshmaharaj also notes that easyrsa
is one of the options for client certificate authentication, which is a requirement for any level of 'production ready' stack up.
Should we consider adding easyrsa
to our stackup?
The cloud native setup creates two networks for the cluster.
However when the two networks are enabled
kubectl logs
does not work for worker pods from master
Interactive shell also does not work
kubectl run -i --tty busybox --image=busybox -- sh # Run pod as interactive shell
Both these features used to work on older clearlinux releases.
So the root cause of the failure is still unknown.
The default upstream cilium plugin does not work with crio which means it does not work on Clearlinux today
kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.5/examples/kubernetes/1.14/cilium.yaml
Normal Scheduled 4m32s default-scheduler Successfully assigned kube-system/cilium-ch9mr to clr-02
Warning FailedMount 22s (x10 over 4m31s) kubelet, clr-02 MountVolume.SetUp failed for volume "docker-socket" : hostPath type check failed: /var/run/docker.sock is not a socket file
Warning FailedMount 14s (x2 over 2m29s) kubelet, clr-02 Unable to mount volumes for pod "cilium-ch9mr_kube-system(f966519a-96d7-11e9-aba4-525400f0e781)": timeout expired waiting for vo
It's useful to have a CSI based storage enabled. CSI reached 1.0/GA in 1.13.
Kata now supports containerd-shim-v2. We need to support this setup as it will allow us to do proper resource accounting. We will need to hold containerd at the version that supports device mapper, so this is not ideal from a security point of view. But it is useful for us to ensure that auto-scaling and resource accounting is stress tested with kata.
Currently the simple admission controller mcastelino/kubewebhook-pod-annotate-example:1.0
is based on https://github.com/mcastelino/kubewebhook/tree/topic/hack-kata/examples/pod-annotate.
This needs to be moved to a kata specific repository and hosted under the kata dockerhub.
Has anyone looked at https://github.com/kubernetes/node-problem-detector? Would it be useful to package?
Also should we look at using sources from this directory for addons testing?
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons
especially dashboard
Hello.
I started testing k8s version v1.14.0. as:
I0326 15:20:57.920018 2334 version.go:237] remote version is much newer: v1.14.0; falling back to: stable-1.13
The following error shows that the configuration scripts will need updates:
+ sudo -E kubeadm init --config=./kubeadm.yaml
your configuration file uses a deprecated API spec:
"kubeadm.k8s.io/v1alpha3". Please use 'kubeadm config migrate --old-
config old.yaml --new-config new.yaml', which will write the new,
similar spec using a newer API version.
Is there a separate branch where this update is been worked out?
Get rid of iptables
Showcase latest kernel features w.r.t ebpf in clearlinux leveraged by cilium
https://cilium.io/blog/2018/11/20/fb-bpf-firewall/
Something to read before
https://cilium.io/blog/2018/12/03/cni-performance
Generate and apply the files for deployment, service and ingress
Instead of pinning the dot version kubernetesVersion: v1.12.0
is it better to use kubernetesVersion: stable-1.12
?
I'm using the master branch @f863bb7f, and running create_stack.sh returned with this error:
daemonset.apps/kata-deploy created error: the path "8-kata/runtimeclass_crd.yaml" does not exist
Looks like the latest commit removes the file that is required elsewhere. Could anyone please check?
As exhibited in #87 crio install won't work whenever no nested virtualization has been allowed for KVM on the host.
I think it would be great to have some check when qemu/kvm will be used for kata-containers, to abort or warn prominently if no nested vrtualization is available, which will make qemu to fail, thus crio too.
Thanks in advance.
kubectl logs -n kube-system canal-2d9cw -c calico-node
2019-04-19 16:56:55.642 [FATAL][1436] int_dataplane.go 824: Kernel's RPF check is set to 'loose'. This would allow endpoints to spoof their IP address. Calico requires net.ipv4.conf.all.rp_filter to be set to 0 or 1. If you require loose RPF and you are not concerned about spoofing, this check can be disabled by setting the IgnoreLooseRPF configuration parameter to 'true'
Currently we are getting dashboard to work by giving sudo privileges 2-dashboard/dashboard-admin.yaml
. Need to address this
Hi,
Currently the create_stack.sh script uses [ -t 0 ]
to detect a terminal but this causes Ansible to hang when it tries to run create_stack.sh. I kludged it with a sed to make it never true but it would be preferable to have a flag to disable this check. Would a PR to add this flag be accepted or is there another mechanism we should use?
With CRI-O v1.14.0 we have:
CNI integration now supports multiple directories for loading plugins.
Adapt to that by adding /usr/libexec/cni/
and /opt/cni/bin
to the configs and clean up the symlinks.
Since we made switch to @AntonioMeireles's boxes, lets take opportunity to cleanup Vagrantfile for redundant proxy, unused disk/network configs
Hi @ganeshmaharaj - I'm reporting here (but really against your https://github.com/ganeshmaharaj/vagrant-stuff/tree/master/k8s which I believe is a derivative??) as I thought we'd get more exposure and eyes on it then....
I was using your Ubuntu based instance fine, but after a host reboot, when I bought it back up my kubectl
didn't work any more. I had a dig, and I suspect it might be that some things do not come back up. I used a vagrant halt/up cycle to make it more refined and hopefully repeatable. Here are my logs:
Note, I don't add the two slave nodes here... just working with the master node.
~/cloud-native-setup/clr-k8s-examples$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuntu-01 Ready master 110s v1.13.4
vagrant@ubuntu-01:~/cloud-native-setup/clr-k8s-examples$
vagrant@ubuntu-01:~/cloud-native-setup/clr-k8s-examples$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:f4:a0:2b brd ff:ff:ff:ff:ff:ff
inet 192.168.121.170/24 brd 192.168.121.255 scope global dynamic eth0
valid_lft 2899sec preferred_lft 2899sec
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:5c:ba:9c brd ff:ff:ff:ff:ff:ff
inet 192.52.100.11/24 brd 192.52.100.255 scope global eth1
valid_lft forever preferred_lft forever
4: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 0a:58:0a:58:00:01 brd ff:ff:ff:ff:ff:ff
inet 10.88.0.1/16 scope global cni0
valid_lft forever preferred_lft forever
5: vethfd79254a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP group default
link/ether 16:8c:cb:2a:fd:53 brd ff:ff:ff:ff:ff:ff link-netnsid 0
6: vetha05bf341@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP group default
link/ether aa:6a:ff:b9:cc:5b brd ff:ff:ff:ff:ff:ff link-netnsid 1
7: veth9adc6a7d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP group default
link/ether be:e1:57:ef:a1:8b brd ff:ff:ff:ff:ff:ff link-netnsid 2
8: vethfdd2b8a2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP group default
link/ether e6:85:14:26:27:b4 brd ff:ff:ff:ff:ff:ff link-netnsid 3
[gwhaley@fido k8s]$ vagrant halt
==> ubuntu-03: Halting domain...
==> ubuntu-02: Halting domain...
==> ubuntu-01: Halting domain...
[gwhaley@fido k8s]$ vagrant status
Current machine states:
ubuntu-01 shutoff (libvirt)
ubuntu-02 shutoff (libvirt)
ubuntu-03 shutoff (libvirt)
# vagrant up --provider=libvirt
[gwhaley@fido k8s]$ vagrant ssh ubuntu-01
Last login: Wed Mar 20 04:13:25 2019 from 192.168.121.1
vagrant@ubuntu-01:~$ kubectl get nodes
The connection to the server 192.168.121.170:6443 was refused - did you specify the right host or port?
Wed 2019-03-20 04:21:47 PDT. --
e: Failed with result 'exit-code'.
e: Main process exited, code=exited, status=255/n/a
:47.776497 2117 server.go:261] failed to run Kubelet: failed to create kubelet: rpc error: code = Unav
:47.776368 2117 kuberuntime_manager.go:184] Get runtime version failed: rpc error: code = Unavailable
:47.776217 2117 remote_runtime.go:72] Version from runtime service failed: rpc error: code = Unavailab
:47.775764 2117 util_unix.go:77] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please con
:47.775617 2117 util_unix.go:77] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please con
:47.774978 2117 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list
:47.774897 2117 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Ser
:47.774332 2117 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Nod
:47.757270 2117 kubelet.go:306] Watching apiserver
:47.757233 2117 kubelet.go:281] Adding pod path: /etc/kubernetes/manifests
:47.757167 2117 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"
:47.757149 2117 state_mem.go:84] [cpumanager] updated default cpuset: ""
:47.757061 2117 state_mem.go:36] [cpumanager] initializing new in-memory state store
:47.757029 2117 container_manager_linux.go:272] Creating device plugin manager: true
:47.756872 2117 container_manager_linux.go:253] Creating Container Manager object based on Node Config
:47.756844 2117 container_manager_linux.go:248] container manager verified user specified cgroup-root
:47.756590 2117 server.go:666] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaul
:47.743312 2117 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-cli
:47.740768 2117 plugins.go:103] No cloud provider specified.
:47.740588 2117 server.go:407] Version: v1.13.4
lv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's
lv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's
t: The Kubernetes Node Agent.
t: The Kubernetes Node Agent.
agrant@ubuntu-01:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:f4:a0:2b brd ff:ff:ff:ff:ff:ff
inet 192.168.121.170/24 brd 192.168.121.255 scope global dynamic eth0
valid_lft 3500sec preferred_lft 3500sec
inet6 fe80::5054:ff:fef4:a02b/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:5c:ba:9c brd ff:ff:ff:ff:ff:ff
inet 192.52.100.11/24 brd 192.52.100.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe5c:ba9c/64 scope link
valid_lft forever preferred_lft forever
ah, no CNI ?? This output is significantly different from our first successful run...
The static policy allows containers in Guaranteed pods with integer CPU requests access to exclusive CPUs on the node. This exclusivity is enforced using the cpuset cgroup controller.
https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/
Changes in 1.14 (https://github.com/kubernetes/kubernetes/pull/69366/files) are meant to change the usage of --cri-socket argument. After seting up the k8s master with the PR #80, in client I'm getting the following errors:
kubeadm join 10.219.128.45:6443 --token ctcivd.d2dwdsi660hkjmrd --discovery-token-ca-cert-hash sha256:3bf9103c4558c31314846e691ff7136a1b337b6242f338476adca1286f746db2
**Found multiple CRI sockets, please use --cri-socket to select one: /var/run/dockershim.sock, /var/run/crio/crio.sock**
root@kube1 ~ # kubeadm join 10.219.128.45:6443 --token ctcivd.d2dwdsi660hkjmrd --discovery-token-ca-cert-hash sha256:3bf9103c4558c31314846e691ff7136a1b337b6242f338476adca1286f746db2 -**-cri-socket /var/run/crio/crio.sock**
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
**error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: Found multiple CRI sockets, please use --cri-socket to select one: /var/run/dockershim.sock, /var/run/crio/crio.sock**
root@kube1 ~ # file /var/run/crio/crio.sock /var/run/dockershim.sock
/var/run/crio/crio.sock: socket
/var/run/dockershim.sock: cannot open `/var/run/dockershim.sock' (No such file or directory)
Testing as per test-autoscale.sh
. The pods are not scaling.
Name: php-apache-test
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Tue, 04 Dec 2018 22:36:01 +0000
Reference: Deployment/php-apache-test
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 50%
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedComputeMetricsReplicas 27s (x12 over 3m) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource
metrics API
Warning FailedGetResourceMetric 12s (x13 over 3m) horizontal-pod-autoscaler unable to get metrics for resource cpu: no metrics returned from resource metrics API
After a succesfull execution of cloud-native-setup/clr-k8s-examples/setup_system.sh; the "kubeadm init" call from create_stack.sh minimal reports the following error:
error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition
Do you know what I need to setup in order to have the problem fixed?
Full log is in: https://github.intel.com/gist/anonymous/476dd91c52ca3a9a3ecfb69d6394ec62
This will allow the use of customized upstream yaml files.
This will simplify the number of active components for Kata. This will also reduce the memory footprint of kata.
Document ugrade path for system components to limit downtime.
apiVersion: v1
kind: Service
metadata:
name: php-apache-runc-lb
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
run: php-apache-runc
sessionAffinity: None
type: LoadBalancer
@mcastelino /cc @amshinde
Add an upgraded version of amshinde/kata-sriov-dp which has example manifests for deploying Multus-CNI, SRIOV-CNI and sriov device plugin. Optionally, include manifests for intel-device-plugins-for-kubernetes and userspace-cni-network-plugin
Also look into enabling CPUManager by default in the kubelet configuration along with #8
--cpu-manager-policy=static
--cpu-manager-reconcile-period=5s
Warning FailedCreatePodSandBox 98s kubelet, clr-02 Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-fb8b8dccf-9fpfn_kube-system_50f10a4d-947c-11e9-9e0f-52540041e026_0(fd2c062948fa94deb8214b9c088acbe3a6f69dd2606f8261f56aae85209d876b): failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/cni/bin/]
sudo swupd info
Installed version: 30030
sudo crictl version
Version: 0.1.0
RuntimeName: cri-o
RuntimeVersion: 1.14.4
RuntimeApiVersion: v1alpha1
# Paths to directories where CNI plugin binaries are located.
plugin_dirs = [
"/usr/libexec/cni",
"/opt/cni/bin/",
]
/usr/share/defaults/crio/crio.conf
has the proper values, but /etc/crio/crio.conf
is incorrect
It looks like based on which version of crio you were on. When clearlinux is updated from a version where crio did not support plugin_dirs list to one that did, the exiting /etc/crio/crio.conf will have the older value.
Hence on a crio update the user unfortunately needs to delete /etc/crio/crio.conf
Hence we should delete crio.conf and restart crio which will automatically setup crio based on latest values.
CRIO panics when pulling images. This seems to happen with the newer Clearlinux kernels.
I tried vagrant up --provider=virtualbox
today. It updated to Clear Linux 29940. Then it failed with kubelet refusing to start:
clr-01: Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /usr/lib/systemd/system/kubelet.service.
clr-01: Created symlink /etc/systemd/system/multi-user.target.wants/crio.service → /usr/lib/systemd/system/crio.service.
clr-01: Job for kubelet.service failed because the control process exited with error code.
clr-01: See "systemctl status kubelet.service" and "journalctl -xe" for details.
Logging in and checking the journal shows:
Jun 17 14:27:20 clr-01 kubelet-version-check.sh[4166]: /usr/bin/kubelet-version-check.sh: line 8: /var/lib/kubelet/kubelet_version: No such file or directory
Jun 17 14:27:20 clr-01 kubelet[4165]: F0617 14:27:20.140206 4165 server.go:194] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kub>
/var/lib/kubelet/config.yaml
does not exist.
Clearlinux now supports and bundles firecracker. Add support for adding firecracker as an additional runtime class in kata. This allows runc, kata with qemu and kata with firecracker to live alongside each other in the same cluster on all nodes
Add a new script along the lines of setup_firecracker.sh
#!/bin/bash
cat <<EOT | tee /etc/kata-containers/configuration_firecracker.toml
[hypervisor.firecracker]
path = "/usr/bin/firecracker"
kernel = "/usr//share/kata-containers/vmlinux.container"
image = "/usr//share/kata-containers/kata-containers.img"
kernel_params = ""
default_vcpus = 1
default_memory = 4096
default_maxvcpus = 0
default_bridges = 1
block_device_driver = "virtio-mmio"
disable_block_device_use = false
enable_debug = true
use_vsock = true
[shim.kata]
path = "/usr//libexec/kata-containers/kata-shim"
[runtime]
internetworking_model="tcfilter"
EOT
cat <<EOT | tee /usr/bin/kata-runtime-fire
#!/bin/bash
/usr/bin/kata-runtime --kata-config /etc/kata-containers/configuration_firecracker.toml "$@"
EOT
mkdir -p /etc/crio/
cp /usr/share/defaults/crio/crio.conf /etc/crio/crio.conf
echo -e "\n[crio.runtime.runtimes.kata]\nruntime_path = \"/usr/bin/kata-runtime\"" >> /etc/crio/crio.conf
echo -e "\n[crio.runtime.runtimes.fire]\nruntime_path = \"/usr/bin/kata-runtime-fire\"" >> /etc/crio/crio.conf
sed -i 's|\(\[crio\.runtime\]\)|\1\nmanage_network_ns_lifecycle = true|' /etc/crio/crio.conf
sed -i 's/storage_driver = \"overlay\"/storage_driver = \"devicemapper\"\
storage_option = [\
\"dm.basesize=8G\",\
\"dm.directlvm_device=\/dev\/vdb\",\
\"dm.directlvm_device_force=true\",\
\"dm.fs=ext4\"\
]/g' /etc/crio/crio.conf
systemctl daemon-reload
systemctl restart crio
And a simple test case along the lines of tests/test-deploy-fire.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: php-apache-fire
name: php-apache-fire
spec:
replicas: 1
selector:
matchLabels:
run: php-apache-fire
template:
metadata:
labels:
run: php-apache-fire
spec:
runtimeClassName: fire
containers:
- image: k8s.gcr.io/hpa-example
imagePullPolicy: Always
name: php-apache
ports:
- containerPort: 80
protocol: TCP
resources:
requests:
cpu: 200m
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
name: php-apache-fire
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
run: php-apache-fire
sessionAffinity: None
type: ClusterIP
Trying to follow the create_stack.sh example, and doing :
kubectl apply -f tests/deploy-svc-ing/test-deploy-kata-qemu.yaml
I get: kubectl get pod
NAME READY STATUS RESTARTS AGE
php-apache-kata-qemu-7d4647498f-srlmw 0/1 Pending 0 8m26s
looking at /var/log/containers/kata-deploy-v22pt_kube-system_kube-kata-18d156ed33052c5a7d2bda4637d147f30d1d92f4812746f3e196a5479f98ea09.log I can see:
2019-04-19T17:18:13.158164470+01:00 stdout F copying kata artifacts onto host
2019-04-19T17:18:13.831374115+01:00 stdout F Add Kata Containers as a supported runtime for CRIO:
2019-04-19T17:18:13.832274647+01:00 stderr F cp: cannot stat '/etc/crio/crio.conf': No such file or directory
Something weird seems broken wrt crio :-/
As per this comment #13 (comment) we want to avoid version pins. Today all yamls have image
pinned since its considered best practice, and the yamls themselves pin the api versions of the different kubernetes objects they create. Since k8s versions will keep moving forward with clear, we need a way to keep these yamls up-to-date too.
What is the plan for upgrading the yamls?
Available helm charts -
I think, by default, the scripts will set up the k8s network hung off the first ether controller on the master node.
In my case, that is not correct - I have two ether cards, and my node pool is hung off the second one.
I'll see if I can figure out what needs to be told to use the second (or rather, a specific) network controller to talk to the nodes. If anybody wants to chip in with some input, please do :-)
If you tear down and try to re-join a slave/node then it may fail to bring up ceph-rook
(fairly silently), as the installation may find an existing /var/lib/rook
directory, and fail to initialise.
We should see if we can find a way to remove the /var/lib/rook
or error out nicely when initialising or joining nodes, somehow.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.