Giter Club home page Giter Club logo

kubeadm-ha's Introduction

🎉 项目受 kubeasz 启发,考虑使用二进制进行安装的童鞋可以参考这个项目。

kubeadm-ha 使用 kubeadm 进行高可用 kubernetes 集群搭建,利用 ansible-playbook 实现自动化安装,既提供一键安装脚本,也可以根据 playbook 分步执行安装各个组件。

LICENSE FOSSA Status

  • 项目特性: 不受国内网络限制、所有组件使用 kubelet 托管、多 master 高可用、tls 双向认证、自定义 tls 证书有效期、RBAC 授权、支持 Network Policy

  • 分支说明:

    • release-*:安装 * 版本的 Kubernetes
    • develop:开发分支,不建议使用
  • 相关支持:

    类别 支持
    Architecture amd64, arm64
    OS RedHat : 7
    Rocky Linux : 8, 9
    CentOS : 7.9, 8
    Debian : 10, 11
    Ubuntu : 18.04
    Kylin : V10
    Anolis OS : 8
    OpenEuler : 21.09, 22.03, 23.03
    Etcd 3.5.7-0
    Container runtimes Docker, containerd
    Kubernetes v1.20, v1.21, v1.22, v1.23, v1.24, v1.25, v1.26, v1.27
    Kube-apiserver lb slb, haproxy, nginx
    Network plugin flannel, calico
    Ingress controller traefik, nginx-ingress

    Note: 表格中粗体标识出来的为默认安装版本

已知问题

  • 由于各插件(Network plugin、Ingress controller)版本更新可能不再兼容低版本 kubernetes,若在部署时指定了低版本 kubernetes 可能导致插件部署时报错。在此建议安装本项目默认或者最新版本 kubernetes。也可参与此进行讨论 #28

使用指南

00-安装须知 01-集群安装 02-节点管理 03-证书轮换 04-集群升级
05-集群备份 06-集群恢复 07-集群重置 08-离线安装 09-扩展阅读

asciicast

参与者

carllhw

carllhw
Jaywoods2

Jaywoods2
ChongmingDu

ChongmingDu
happinesslijian

happinesslijian
zlingqu

zlingqu
li-sen

li-sen

JetBrains 开源证书支持

kubeadm-ha 基于 free JetBrains Open Source license(s) 正版免费授权进行开发,在此表达我的谢意。

License

FOSSA Status

kubeadm-ha's People

Contributors

allenliu88 avatar carllhw avatar cczhung avatar chongmingdu avatar fightdou avatar fossabot avatar li-sen avatar taoup avatar timebye avatar vinkdong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubeadm-ha's Issues

nginx 负载均衡启动异常

缺陷描述
执行ansible-playbook -i inventory.ini 90-init-cluster.yml
在“以轮询的方式等待 nginx 运行完成“阶段,安装失败

环境 (请填写以下信息):
CentOS7.8 4C16G

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
    Linux 3.10.0-1127.el7.x86_64 x86_64
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

  • Ansible版本 (ansible --version):
    ansible 2.10.6
    config file = None
    configured module search path = ['/home/ops/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
    ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
    executable location = /usr/local/bin/ansible
    python version = 3.6.8 (default, Nov 16 2020, 16:55:22) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]

  • Python版本 (python --version):
    Python 2.7.5

  • Kubeadm-ha版本(commit) (git rev-parse --short HEAD):
    38be5d5

如何复现

复现的步骤:

  1. 第一步:编写 inventory.ini 文件,内容如下
    ; 将所有节点的信息在这里填写

; 第一个字段 为节点内网IP,部署完成后为 kubernetes 节点 nodeName
; 第二个字段 ansible_port 为节点 sshd 监听端口
; 第三个字段 ansible_user 为节点远程登录用户名
; 第四个字段 ansible_ssh_pass 为节点远程登录用户密码
[all]
192.168.16.56 ansible_port=22 ansible_user="user" ansible_ssh_pass="password"
192.168.16.52 ansible_port=22 ansible_user="user" ansible_ssh_pass="password"
192.168.16.70 ansible_port=22 ansible_user="user" ansible_ssh_pass="password"
192.168.16.60 ansible_port=22 ansible_user="user" ansible_ssh_pass="password"
; 私有云:
; VIP 负载模式:
; 也就是负载均衡器 + keepalived 模式,比如常用的 haproxy + keepalived。
; 本脚本中负载均衡器有 nginx、openresty、haproxy、envoy 可供选择,设置 lb_mode 即可进行任意切换。
; 设置 lb_kube_apiserver_ip 即表示启用 keepalived,请先与服务器提供部门协商保留一个IP作为 lb_kube_apiserver_ip,
; 一般 lb 节点组中有两个节点就够了,lb节点组中第一个节点为 keepalived 的 master 节点,剩下的都为 backed 节点。
;
; 节点本地负载模式:
; 只启动负载均衡器,不启用 keepalived(即不设置 lb_kube_apiserver_ip),
; 此时 kubelet 链接 apiserver 地址为 127.0.0.1:lb_kube_apiserver_port。
; 使用此模式时请将 lb 节点组置空。
;
; 公有云:
; 不推荐使用 slb 模式,建议直接使用节点本地负载模式。
; 若使用 slb 模式,请先使用节点本地负载模式进行部署,
; 部署成功后再切换至 slb 模式:
; 将 lb_mode 修改为 slb,将 lb_kube_apiserver_ip 设置为购买到的 slb 内网ip,
; 修改 lb_kube_apiserver_port 为 slb 监听端口。
; 再次运行初始化集群脚本即可切换至 slb 模式。
[lb]

; 注意etcd集群必须是1,3,5,7...奇数个节点
[etcd]
192.168.16.56
192.168.16.52
192.168.16.70

[kube-master]
192.168.16.56
192.168.16.52
192.168.16.70

[kube-worker]
192.168.16.56
192.168.16.52
192.168.16.70
192.168.16.60

; 预留组,后续添加master节点使用
[new-master]

; 预留组,后续添加worker节点使用
[new-worker]

; 预留组,后续添加etcd节点使用
[new-etcd]

; 预留组,后续删除worker角色使用
[del-worker]

; 预留组,后续删除master角色使用
[del-master]

; 预留组,后续删除etcd角色使用
[del-etcd]

; 预留组,后续删除节点使用
[del-node]

;-------------------------------------- 以下为基础信息配置 ------------------------------------;
[all:vars]
; 是否跳过节点物理资源校验,Master节点要求2c2g以上,Worker节点要求2c4g以上
skip_verify_node=false
; kubernetes版本
kube_version="1.16.15"

; 容器运行时类型,可选项:containerd,docker;默认 containerd
container_manager="containerd"

; 负载均衡器
; 有 nginx、openresty、haproxy、envoy 和 slb 可选,默认使用 nginx
lb_mode="nginx"
; 使用负载均衡后集群 apiserver ip,设置 lb_kube_apiserver_ip 变量,则启用负载均衡器 + keepalived
; lb_kube_apiserver_ip="192.168.56.15"
; 使用负载均衡后集群 apiserver port
lb_kube_apiserver_port="8443"

; 网段选择:pod 和 service 的网段不能与服务器网段重叠,
; 若有重叠请配置 kube_pod_subnetkube_service_subnet 变量设置 pod 和 service 的网段,示例参考:
; 如果服务器网段为:10.0.0.1/8
; pod 网段可设置为:192.168.0.0/18
; service 网段可设置为 192.168.64.0/18
; 如果服务器网段为:172.16.0.1/12
; pod 网段可设置为:10.244.0.0/18
; service 网段可设置为 10.244.64.0/18
; 如果服务器网段为:192.168.0.1/16
; pod 网段可设置为:10.244.0.0/18
; service 网段可设置为 10.244.64.0/18
; 集群pod ip段,默认掩码位 18 即 16384 个ip
kube_pod_subnet="10.244.0.0/18"
; 集群service ip段
kube_service_subnet="10.244.64.0/18"
; 分配给节点的 pod 子网掩码位,默认为 24 即 256 个ip,故使用这些默认值可以纳管 16384/256=64 个节点。
kube_network_node_prefix="24"

; node节点最大 pod 数。数量与分配给节点的 pod 子网有关,ip 数应大于 pod 数。
; https://cloud.google.com/kubernetes-engine/docs/how-to/flexible-pod-cidr
kube_max_pods="110"

; 集群网络插件,目前支持flannel,calico
network_plugin="flannel"

; 若服务器磁盘分为系统盘与数据盘,请修改以下路径至数据盘自定义的目录。
; Kubelet 根目录
kubelet_root_dir="/u01/lib/kubelet"
; docker容器存储目录
docker_storage_dir="/u01/lib/docker"
; containerd容器存储目录
containerd_storage_dir="/u01/lib/containerd"
; Etcd 数据根目录
etcd_data_dir="/u01/lib/etcd"

```
  1. 第二步:编写 variables.yaml 文件,内容如下

    没有改动
  2. 第三步:执行部署命令,命令如下

    ansible-playbook -i inventory.ini 90-init-cluster.yml
  3. 出现错误

    TASK [load-balancer : 以轮询的方式等待 nginx 运行完成] ****************************************************************************************************************************
    

changed: [192.168.16.52]
changed: [192.168.16.70]
changed: [192.168.16.60]
FAILED - RETRYING: 以轮询的方式等待 nginx 运行完成 (12 retries left).
FAILED - RETRYING: 以轮询的方式等待 nginx 运行完成 (11 retries left).
FAILED - RETRYING: 以轮询的方式等待 nginx 运行完成 (10 retries left).
FAILED - RETRYING: 以轮询的方式等待 nginx 运行完成 (9 retries left).
FAILED - RETRYING: 以轮询的方式等待 nginx 运行完成 (8 retries left).
FAILED - RETRYING: 以轮询的方式等待 nginx 运行完成 (7 retries left).
FAILED - RETRYING: 以轮询的方式等待 nginx 运行完成 (6 retries left).
FAILED - RETRYING: 以轮询的方式等待 nginx 运行完成 (5 retries left).
FAILED - RETRYING: 以轮询的方式等待 nginx 运行完成 (4 retries left).
FAILED - RETRYING: 以轮询的方式等待 nginx 运行完成 (3 retries left).
FAILED - RETRYING: 以轮询的方式等待 nginx 运行完成 (2 retries left).
FAILED - RETRYING: 以轮询的方式等待 nginx 运行完成 (1 retries left).
fatal: [192.168.16.56]: FAILED! => {"attempts": 12, "changed": true, "cmd": "nc -z -w 3 127.0.0.1 8443", "delta": "0:00:00.063242", "end": "2021-03-15 17:20:35.532305", "msg": "non-zero return code", "rc": 1, "start": "2021-03-15 17:20:35.469063", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

NO MORE HOSTS LEFT ****************************************************************************************************************************************************

PLAY RECAP ************************************************************************************************************************************************************
192.168.16.52 : ok=63 changed=15 unreachable=0 failed=0 skipped=27 rescued=0 ignored=0
192.168.16.56 : ok=70 changed=15 unreachable=0 failed=1 skipped=30 rescued=0 ignored=0
192.168.16.60 : ok=60 changed=15 unreachable=0 failed=0 skipped=30 rescued=0 ignored=0
192.168.16.70 : ok=63 changed=15 unreachable=0 failed=0 skipped=27 rescued=0 ignored=0
```

预期结果

正常安装

屏幕截图
image

其他事项

内网服务器间 防火墙是关闭的

coredns无法pull,是不是地址不对了

TASK [kube-master : 开始拉取 master 节点相关镜像] *********************************************************************************************************
changed: [192.168.0.11] => (item=registry.aliyuncs.com/kubeadm-ha/kube-apiserver:v1.20.4)
changed: [192.168.0.11] => (item=registry.aliyuncs.com/kubeadm-ha/kube-controller-manager:v1.20.4)
changed: [192.168.0.11] => (item=registry.aliyuncs.com/kubeadm-ha/kube-scheduler:v1.20.4)
changed: [192.168.0.11] => (item=registry.aliyuncs.com/kubeadm-ha/kube-proxy:v1.20.4)
changed: [192.168.0.11] => (item=registry.aliyuncs.com/kubeadm-ha/pause:3.4.1)
failed: [192.168.0.11] (item=registry.aliyuncs.com/kubeadm-ha/coredns/coredns:v1.8.0) => {"ansible_loop_var": "item", "changed": true, "cmd": "docker pull registry.aliyuncs.com/kubeadm-ha/coredns/coredns:v1.8.0", "delta": "0:00:00.476402", "end": "2021-04-11 20:54:51.240249", "item": "registry.aliyuncs.com/kubeadm-ha/coredns/coredns:v1.8.0", "msg": "non-zero return code", "rc": 1, "start": "2021-04-11 20:54:50.763847", "stderr": "Error response from daemon: manifest for registry.aliyuncs.com/kubeadm-ha/coredns/coredns:v1.8.0 not found: manifest unknown: manifest unknown", "stderr_lines": ["Error response from daemon: manifest for registry.aliyuncs.com/kubeadm-ha/coredns/coredns:v1.8.0 not found: manifest unknown: manifest unknown"], "stdout": "", "stdout_lines": []}

集群初始化失败

TASK [prepare/base : 清除 yum 缓存] *********************************************************************************************************************************************************************************
task path: /etc/ansible/roles/prepare/base/tasks/centos.yml:28
fatal: [cxs-master01]: FAILED! => {
"changed": true,
"cmd": [
"yum",
"clean",
"all"
],
"delta": "0:00:00.388357",
"end": "2021-02-04 18:31:53.233119",
"invocation": {
"module_args": {
"_raw_params": "yum clean all",
"_uses_shell": false,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"stdin_add_newline": true,
"strip_empty_ends": true,
"warn": false
}
},
"msg": "non-zero return code",
"rc": 1,
"start": "2021-02-04 18:31:52.844762",
"stderr": "There are no enabled repos.\n Run "yum repolist all" to see the repos you have.\n To enable Red Hat Subscription Management repositories:\n subscription-manager repos --enable \n To enable custom repositories:\n yum-config-manager --enable ",
"stderr_lines": [
"There are no enabled repos.",
" Run "yum repolist all" to see the repos you have.",
" To enable Red Hat Subscription Management repositories:",
" subscription-manager repos --enable ",
" To enable custom repositories:",
" yum-config-manager --enable "
],
"stdout": "Loaded plugins: fastestmirror\nLoading mirror speeds from cached hostfile",
"stdout_lines": [
"Loaded plugins: fastestmirror",
"Loading mirror speeds from cached hostfile"
]
}
Using module file /usr/lib/python3.8/site-packages/ansible/modules/command.py
Pipelining is enabled.

k8s1.20安装nfs问题

安装nfs无法创建pv,报错信息如下: Warning FailedMount 2m14s kubelet Unable to attach or mount volumes: unmounted volumes=[nfs-client-root], unattached volumes=[nfs-client-provisioner-sa-token-9frf5 nfs-client-root]: timed out waiting for the condition

Can't Startup

TASK [etcd/install : 临时启动 kubelet 以引导 etcd 运行] *******************************************************************************************
changed: [10.110.150.126]
changed: [10.110.150.128]
changed: [10.110.150.127]
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (12 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (12 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (12 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (11 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (11 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (11 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (10 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (10 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (10 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (9 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (9 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (9 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (8 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (8 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (8 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (7 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (7 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (7 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (6 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (6 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (6 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (5 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (5 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (5 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (4 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (4 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (4 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (3 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (3 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (3 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (2 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (2 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (2 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (1 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (1 retries left).
FAILED - RETRYING: 以轮询的方式等待 etcd 运行完成 (1 retries left).

TASK [etcd/install : 以轮询的方式等待 etcd 运行完成] *************************************************************************************************
fatal: [10.110.150.128]: FAILED! => {"attempts": 12, "changed": true, "cmd": "docker run --net host -e ETCDCTL_API=3  -v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd --rm registry.aliyuncs.com/kubeadm-ha/etcd:3.4.13-0 etcdctl endpoint health --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt\n", "delta": "0:00:05.638651", "end": "2020-12-24 15:01:22.773392", "msg": "non-zero return code", "rc": 1, "start": "2020-12-24 15:01:17.134741", "stderr": "{\"level\":\"warn\",\"ts\":\"2020-12-24T07:01:22.550Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-9193de77-3898-43d6-bfa8-de87a9a88c8f/[127.0.0.1]:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\\\"\"}\nhttps://[127.0.0.1]:2379 is unhealthy: failed to commit proposal: context deadline exceeded\nError: unhealthy cluster", "stderr_lines": ["{\"level\":\"warn\",\"ts\":\"2020-12-24T07:01:22.550Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-9193de77-3898-43d6-bfa8-de87a9a88c8f/[127.0.0.1]:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\\\"\"}", "https://[127.0.0.1]:2379 is unhealthy: failed to commit proposal: context deadline exceeded", "Error: unhealthy cluster"], "stdout": "", "stdout_lines": []}
fatal: [10.110.150.127]: FAILED! => {"attempts": 12, "changed": true, "cmd": "docker run --net host -e ETCDCTL_API=3  -v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd --rm registry.aliyuncs.com/kubeadm-ha/etcd:3.4.13-0 etcdctl endpoint health --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt\n", "delta": "0:00:05.639123", "end": "2020-12-24 15:01:23.312397", "msg": "non-zero return code", "rc": 1, "start": "2020-12-24 15:01:17.673274", "stderr": "{\"level\":\"warn\",\"ts\":\"2020-12-24T07:01:23.099Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-764abd7e-0b5d-462c-bd63-26624b9be284/[127.0.0.1]:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\\\"\"}\nhttps://[127.0.0.1]:2379 is unhealthy: failed to commit proposal: context deadline exceeded\nError: unhealthy cluster", "stderr_lines": ["{\"level\":\"warn\",\"ts\":\"2020-12-24T07:01:23.099Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-764abd7e-0b5d-462c-bd63-26624b9be284/[127.0.0.1]:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\\\"\"}", "https://[127.0.0.1]:2379 is unhealthy: failed to commit proposal: context deadline exceeded", "Error: unhealthy cluster"], "stdout": "", "stdout_lines": []}
fatal: [10.110.150.126]: FAILED! => {"attempts": 12, "changed": true, "cmd": "docker run --net host -e ETCDCTL_API=3  -v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd --rm registry.aliyuncs.com/kubeadm-ha/etcd:3.4.13-0 etcdctl endpoint health --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt\n", "delta": "0:00:05.676692", "end": "2020-12-24 15:01:23.379762", "msg": "non-zero return code", "rc": 1, "start": "2020-12-24 15:01:17.703070", "stderr": "{\"level\":\"warn\",\"ts\":\"2020-12-24T07:01:23.109Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-fb7017d4-97ca-4953-b204-6ff049a67604/[127.0.0.1]:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\\\"\"}\nhttps://[127.0.0.1]:2379 is unhealthy: failed to commit proposal: context deadline exceeded\nError: unhealthy cluster", "stderr_lines": ["{\"level\":\"warn\",\"ts\":\"2020-12-24T07:01:23.109Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-fb7017d4-97ca-4953-b204-6ff049a67604/[127.0.0.1]:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\\\"\"}", "https://[127.0.0.1]:2379 is unhealthy: failed to commit proposal: context deadline exceeded", "Error: unhealthy cluster"], "stdout": "", "stdout_lines": []}

NO MORE HOSTS LEFT ***********************************************************************************************************************

离线部署建议把镜像仓库引进来

是不是得先起一个registry,把所有的镜像推上去,在 variables.yaml 定义好仓库地址,所有的镜像都从本地化的registry拉取,
不引进registry的话,导入本地的镜像会被垃圾回收,新加worker也需要一个个去导入镜像太麻烦了,pod切换到新节点没镜像还会起不来

22-ingress-controller.yml 遇到问题

[root@b1xtw22-pve-192 kubeadm-ha]# ansible-playbook -i example/hosts.m-master.ip.install.ini 22-ingress-controller.yml

PLAY [kube-master[0]] *********************************************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************************************
ok: [192.168.3.23]

TASK [plugins/ingress-controller : 在第一台 master 节点创建 ingress-controller 配置文件目录] ************************************************************************************************************
ok: [192.168.3.23]

TASK [plugins/ingress-controller : 获取当前 kubernetes 版本] ************************************************************************************************************************************
changed: [192.168.3.23]
included: /root/kubeadm-ha/roles/plugins/ingress-controller/tasks/nginx-ingress-controller.yml for 192.168.3.23

TASK [plugins/ingress-controller : 渲染 nginx-ingress-controller 配置文件] **********************************************************************************************************************
fatal: [192.168.3.23]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: Unable to look up a name or access an attribute in template string (apiVersion: v1\nkind: Namespace\nmetadata:\n name: ingress-controller\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\n\n---\n\nkind: ConfigMap\napiVersion: v1\nmetadata:\n name: nginx-configuration\n namespace: ingress-controller\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\ndata:\n proxy-body-size: "0"\n server-tokens: "false"\n proxy-connect-timeout: "60"\n # log-format-upstream: >\n # '{"time": "$time_iso8601", "remote_addr": "$remote_addr", "x-forward-for": "$proxy_add_x_forwarded_for",\n # "the_real_ip": "$the_real_ip", "upstream_addr": "$upstream_addr","remote_user":"$remote_user",\n # "bytes_sent": $bytes_sent, "request_time": $request_time, "status":$status, "vhost": "$host", \n # "request_proto": "$server_protocol", "path": "$uri","request_query": "$args",\n # "request_length": $request_length, "duration": $request_time,"method": "$request_method", \n # "http_referrer": "$http_referer", "http_user_agent":"$http_user_agent",\n # "proxy_protocol_addr":"$proxy_protocol_addr" }'\n # compute-full-forwarded-for: "true"\n # forwarded-for-header: X-Forwarded-For\n # use-forwarded-headers: "true"\n # enable-vts-status: "true"\n # use-proxy-protocol: "true"\n---\nkind: ConfigMap\napiVersion: v1\nmetadata:\n name: tcp-services\n namespace: ingress-controller\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\n\n---\nkind: ConfigMap\napiVersion: v1\nmetadata:\n name: udp-services\n namespace: ingress-controller\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\n\n{% if "PodSecurityPolicy" in kube_apiserver_enable_admission_plugins %}\n---\napiVersion: policy/v1beta1\nkind: PodSecurityPolicy\nmetadata:\n name: psp-nginx-ingress\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\nspec:\n allowedCapabilities:\n - NET_BIND_SERVICE\n privileged: false\n allowPrivilegeEscalation: true\n # Allow core volume types.\n volumes:\n - 'configMap'\n #- 'emptyDir'\n #- 'projected'\n - 'secret'\n #- 'downwardAPI'\n hostNetwork: false\n hostIPC: false\n hostPID: false\n runAsUser:\n # Require the container to run without root privileges.\n rule: 'MustRunAsNonRoot'\n supplementalGroups:\n rule: 'MustRunAs'\n ranges:\n # Forbid adding the root group.\n - min: 1\n max: 65535\n fsGroup:\n rule: 'MustRunAs'\n ranges:\n # Forbid adding the root group.\n - min: 1\n max: 65535\n readOnlyRootFilesystem: false\n seLinux:\n rule: 'RunAsAny'\n{% endif %}\n---\napiVersion: v1\nkind: ServiceAccount\nmetadata:\n name: nginx-ingress-serviceaccount\n namespace: ingress-controller\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\n\n---\napiVersion: rbac.authorization.k8s.io/v1beta1\nkind: ClusterRole\nmetadata:\n name: nginx-ingress-clusterrole\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\nrules:\n{% if "PodSecurityPolicy" in kube_apiserver_enable_admission_plugins %}\n # Cluster role which grants access to the privileged pod security policy\n - apiGroups:\n - extensions\n resourceNames:\n - psp-nginx-ingress\n resources:\n - podsecuritypolicies\n verbs:\n - use\n{% endif %}\n - apiGroups:\n - ""\n resources:\n - configmaps\n - endpoints\n - nodes\n - pods\n - secrets\n verbs:\n - list\n - watch\n - apiGroups:\n - ""\n resources:\n - nodes\n verbs:\n - get\n - apiGroups:\n - ""\n resources:\n - services\n verbs:\n - get\n - list\n - watch\n - apiGroups:\n - ""\n resources:\n - events\n verbs:\n - create\n - patch\n - apiGroups:\n - "extensions"\n{% if kubeadm_version_output.stdout is version('v1.14.0', '>=') %}\n - "networking.k8s.io"\n{% endif %}\n resources:\n - ingresses\n verbs:\n - get\n - list\n - watch\n - apiGroups:\n - "extensions"\n{% if kubeadm_version_output.stdout is version('v1.14.0', '>=') %}\n - "networking.k8s.io"\n{% endif %}\n resources:\n - ingresses/status\n verbs:\n - update\n\n---\napiVersion: rbac.authorization.k8s.io/v1beta1\nkind: Role\nmetadata:\n name: nginx-ingress-role\n namespace: ingress-controller\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\nrules:\n - apiGroups:\n - ""\n resources:\n - configmaps\n - pods\n - secrets\n - namespaces\n verbs:\n - get\n - apiGroups:\n - ""\n resources:\n - configmaps\n resourceNames:\n # Defaults to "-"\n # Here: "-"\n # This has to be adapted if you change either parameter\n # when launching the nginx-ingress-controller.\n - "ingress-controller-leader-nginx"\n verbs:\n - get\n - update\n - apiGroups:\n - ""\n resources:\n - configmaps\n verbs:\n - create\n - apiGroups:\n - ""\n resources:\n - endpoints\n verbs:\n - get\n\n---\napiVersion: rbac.authorization.k8s.io/v1beta1\nkind: RoleBinding\nmetadata:\n name: nginx-ingress-role-nisa-binding\n namespace: ingress-controller\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\nroleRef:\n apiGroup: rbac.authorization.k8s.io\n kind: Role\n name: nginx-ingress-role\nsubjects:\n - kind: ServiceAccount\n name: nginx-ingress-serviceaccount\n namespace: ingress-controller\n\n---\napiVersion: rbac.authorization.k8s.io/v1beta1\nkind: ClusterRoleBinding\nmetadata:\n name: nginx-ingress-clusterrole-nisa-binding\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\nroleRef:\n apiGroup: rbac.authorization.k8s.io\n kind: ClusterRole\n name: nginx-ingress-clusterrole\nsubjects:\n - kind: ServiceAccount\n name: nginx-ingress-serviceaccount\n namespace: ingress-controller\n\n---\n\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: nginx-ingress-controller\n namespace: ingress-controller\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\nspec:\n replicas: {{ INGRESS_MIN_REPLICAS }}\n selector:\n matchLabels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\n template:\n metadata:\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\n annotations:\n prometheus.io/port: "10254"\n prometheus.io/scrape: "true"\n spec:\n # wait up to five minutes for the drain of connections\n terminationGracePeriodSeconds: 300\n affinity:\t\n podAntiAffinity:\t\n requiredDuringSchedulingIgnoredDuringExecution:\n - labelSelector:\n matchExpressions:\n - key: app.kubernetes.io/name\n operator: In\n values:\n - ingress-nginx\t\n topologyKey: kubernetes.io/hostname\n serviceAccountName: nginx-ingress-serviceaccount\n nodeSelector:\n beta.kubernetes.io/os: linux\n containers:\n - name: nginx-ingress-controller\n image: {{ nginx_ingress_image }}\n args:\n - /nginx-ingress-controller\n - --configmap=$(POD_NAMESPACE)/nginx-configuration\n - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services\n - --udp-services-configmap=$(POD_NAMESPACE)/udp-services\n - --publish-service=$(POD_NAMESPACE)/ingress-nginx\n - --annotations-prefix=nginx.ingress.kubernetes.io\n securityContext:\n allowPrivilegeEscalation: true\n capabilities:\n drop:\n - ALL\n add:\n - NET_BIND_SERVICE\n # www-data -> 101\n runAsUser: 101\n env:\n - name: POD_NAME\n valueFrom:\n fieldRef:\n fieldPath: metadata.name\n - name: POD_NAMESPACE\n valueFrom:\n fieldRef:\n fieldPath: metadata.namespace\n ports:\n - name: http\n containerPort: 80\n protocol: TCP\n - name: https\n containerPort: 443\n protocol: TCP\n livenessProbe:\n failureThreshold: 3\n httpGet:\n path: /healthz\n port: 10254\n scheme: HTTP\n initialDelaySeconds: 10\n periodSeconds: 10\n successThreshold: 1\n timeoutSeconds: 10\n readinessProbe:\n failureThreshold: 3\n httpGet:\n path: /healthz\n port: 10254\n scheme: HTTP\n periodSeconds: 10\n successThreshold: 1\n timeoutSeconds: 10\n lifecycle:\n preStop:\n exec:\n command:\n - /wait-shutdown\n resources:\n limits:\n cpu: 1000m\n memory: 1Gi\n requests:\n cpu: 500m\n memory: 512Mi\n---\n\napiVersion: v1\nkind: Service\nmetadata:\n name: ingress-nginx\n namespace: ingress-controller\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\nspec:\n type: NodePort\n externalTrafficPolicy: {{ ingress_controller_external_traffic_policy }}\n{% if kube_proxy_mode == "iptables" %}\n externalIPs:\n{{ INGRESS_EXTERNALIPS }}\n{% endif %}\n ports:\n - name: http\n port: 80\n targetPort: 80\n protocol: TCP\n nodePort: {{ ingress_controller_http_nodeport }}\n - name: https\n port: 443\n targetPort: 443\n protocol: TCP\n nodePort: {{ ingress_controller_https_nodeport }}\n selector:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\n---\napiVersion: autoscaling/v2beta1\nkind: HorizontalPodAutoscaler\nmetadata:\n name: nginx-ingress-controller\n namespace: ingress-controller\n labels:\n app.kubernetes.io/name: ingress-nginx\n app.kubernetes.io/part-of: ingress-nginx\nspec:\n scaleTargetRef:\n apiVersion: apps/v1\n kind: Deployment\n name: nginx-ingress-controller\n minReplicas: {{ INGRESS_MIN_REPLICAS }}\n maxReplicas: {{ INGRESS_MAX_REPLICAS }}\n metrics:\n - resource:\n name: cpu\n targetAverageUtilization: 60\n type: Resource\n - resource:\n name: memory\n targetAverageUtilization: 60\n type: Resource).\nMake sure your variable name does not contain invalid characters like '-': argument of type 'AnsibleUndefined' is not iterable"}

NO MORE HOSTS LEFT ************************************************************************************************************************************************************************

PLAY RECAP ********************************************************************************************************************************************************************************
192.168.3.23 : ok=4 changed=1 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0

镜像同步问题

参照您的项目使用的setzero/images-sync:0.3.1镜像同步到harbor私库报错,我这私库是可以公网访问的,请问这个镜像的源码开源的吗,想看看是什么问题

http: server gave HTTP response to HTTPS client

按离线安装的流程,在ubuntu下会报错(理论上centos应该也会报错), containerd/config.toml.j2没做https相关的配置,应该有类似下面的配置才行

      [plugins."io.containerd.grpc.v1.cri".registry.configs."registry.custom.local:12480".tls]
          insecure_skip_verify = true

内核参数kernel/panic与kubelet校验不一致,master机器重启无法启动

缺陷描述

master机器重启后无法自行kubectl命令
image

环境 (请填写以下信息):

执行下面括号中的命令,提交返回结果

  • Linux 192-168-252-26 5.4.92-1.el7.elrepo.x86_64 #1 SMP Sat Jan 23 09:25:42 EST 2021 x86_64 x86_64 x86_64 GNU/Linux

  • Ansible版本 (ansible --version):ansible 2.10.9

  • Python版本 (python --version):Python 2.7.5

  • Kubeadm-ha版本(commit) (git rev-parse --short HEAD):最新版本

如何复现

复现的步骤:

  1. 正常安装

  2. 正常启动

  3. 重启master机器

  4. 出现错误
    image

预期结果

正常显示

屏幕截图

命令journalctl -xefu kubelet 显示结果如图:
image

其他事项

使用hosts.m-master.ip.ini 和hosts.s-master.ip.ini 都一样。

针对新版 Ansible 增加额外的配置

  1. 新版 Ansible 要求主机操作系统下需要安装 sshpass ,否则运行失败。

建议增加针对 macOS 文档说明:

# https://gist.github.com/arunoda/7790979
$ brew install http://git.io/sshpass.rb
  1. 新版 Ansible 默认会对没条命令输出 CowSay 图形提示,建议默认在 ansible.cfg 中关掉这个选项:

ansible.cfg

[defaults]
nocows = 1

在虚拟机部署单机版k8s环境时报了一些错误,请问是否正常

虚拟机环境:2核 4G内存(自动部署脚本好像检测的就是这个)
报错如下:
TASK [plugins/ingress-controller : 轮询等待 nginx-ingress-controller 运行] ***********
fatal: [192.168.68.130]: FAILED! => {"attempts": 12, "changed": true, "cmd": "kubectl get pod --all-namespaces -o wide | grep 'nginx-ingress' | awk '{print $4}'", "delta": "0:00:00.089931", "end": "2019-06-26 20:41:20.396806", "rc": 0, "start": "2019-06-26 20:41:20.306875", "stderr": "", "stderr_lines": [], "stdout": "ContainerCreating", "stdout_lines": ["ContainerCreating"]}
...ignoring

TASK [plugins/metrics-server : 轮询等待 metrics-server 运行] *************************
fatal: [192.168.68.130]: FAILED! => {"attempts": 12, "changed": true, "cmd": "kubectl get pod --all-namespaces -o wide | grep 'metrics-server' | awk '{print $4}'", "delta": "0:00:00.081390", "end": "2019-06-26 20:42:24.123638", "rc": 0, "start": "2019-06-26 20:42:24.042248", "stderr": "", "stderr_lines": [], "stdout": "ContainerCreating", "stdout_lines": ["ContainerCreating"]}
...ignoring

TASK [plugins/kubernetes-dashboard : 轮询等待 kubernetes-dashboard 运行] *************
fatal: [192.168.68.130]: FAILED! => {"attempts": 12, "changed": true, "cmd": "kubectl get pod --all-namespaces -o wide | grep 'kubernetes-dashboard' | awk '{print $4}'", "delta": "0:00:00.060985", "end": "2019-06-26 20:43:27.131886", "rc": 0, "start": "2019-06-26 20:43:27.070901", "stderr": "", "stderr_lines": [], "stdout": "ContainerCreating", "stdout_lines": ["ContainerCreating"]}
...ignoring

安装过程中遇到的问题

正式安装前,在虚拟机中测试过多变,没有问题,在服务器中执行后,出现如下问题:
环境 centos7 安装gnome桌面,开发环境

fatal: [192.168.20.86]: FAILED! => {"changed": true, "cmd": "kubectl apply -f /etc/kubernetes/plugins/network-plugin/calico-typha.yaml", "delta": "0:00:00.485489", "end": "2020-12-01 09:57:47.017930", "msg": "non-zero return code", "rc": 1, "start": "2020-12-01 09:57:46.532441", "stderr": "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"\nunable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "stderr_lines": ["unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1"", "unable to recognize "/etc/kubernetes/plugins/network-plugin/calico-typha.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1""], "stdout": "configmap/calico-config created\nclusterrole.rbac.authorization.k8s.io/calico-kube-controllers created\nclusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created\nclusterrole.rbac.authorization.k8s.io/calico-node created\nclusterrolebinding.rbac.authorization.k8s.io/calico-node created\nservice/calico-typha created\ndeployment.apps/calico-typha created\npoddisruptionbudget.policy/calico-typha created\ndaemonset.apps/calico-node created\nserviceaccount/calico-node created\ndeployment.apps/calico-kube-controllers created\nserviceaccount/calico-kube-controllers created", "stdout_lines": ["configmap/calico-config created", "clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created", "clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created", "clusterrole.rbac.authorization.k8s.io/calico-node created", "clusterrolebinding.rbac.authorization.k8s.io/calico-node created", "service/calico-typha created", "deployment.apps/calico-typha created", "poddisruptionbudget.policy/calico-typha created", "daemonset.apps/calico-node created", "serviceaccount/calico-node created", "deployment.apps/calico-kube-controllers created", "serviceaccount/calico-kube-controllers created"]}

根证书续期规则不一致

如下
pki/etcd/ca.csr
pki/ca.csr
不会自动续期

而pki/front-proxy-ca.crt会自动的续期

`

  • name: 创建 etcd-ca 根证书
    shell: >
    openssl req -x509 -new -nodes -extensions v3_ca
    -subj "/CN=kubernetes"
    -config /etc/kubernetes/pki/etcd/etcd-openssl.cnf
    -key /etc/kubernetes/pki/etcd/ca.key
    -out /etc/kubernetes/pki/etcd/ca.crt
    -days {{ etcd_ca_certs_expired }}
    when: etcd_ca_crt_stat.stat.isreg is not defined

  • name: 创建 kubernetes-ca 根证书
    shell: >
    openssl req -x509 -new -nodes
    -extensions v3_ca -subj "/CN=kubernetes"
    -config /etc/kubernetes/pki/kube-openssl.cnf
    -key /etc/kubernetes/pki/ca.key
    -out /etc/kubernetes/pki/ca.crt
    -days {{ kube_ca_certs_expired }}
    when: ca_crt_stat.stat.isreg is not defined

  • name: 创建 front-proxy-ca 根证书
    shell: >
    openssl req -x509 -new -nodes
    -extensions v3_ca -subj "/CN=kubernetes"
    -config /etc/kubernetes/pki/kube-openssl.cnf
    -key /etc/kubernetes/pki/front-proxy-ca.key
    -out /etc/kubernetes/pki/front-proxy-ca.crt
    -days {{ kube_ca_certs_expired }}`

请教下 node节点资源预留 问题

kubeadm node节点做 资源预留
kubelet 启动 报错:Dec 16 19:29:15 k8s-master01 kubelet[2960]: F1216 19:29:15.930514 2960 kubelet.go:1372] Failed to start ContainerManager Failed to enforce System Reserved Cgroup Limits on "/system.slice": ["system.slice"] cgroup does not exist

手工建了 system.slice 也不行

安装失败

不知道有什么特殊的地方,执行到nginx安装的时候,就失败。
系统是:centos7.4
节点数量:3
除了设置了一下vip_Ip
其余都是默认配置

Centos8 安装失败

PLAY [all] *********************************************************************

TASK [设置代理服务器环境变量] *************************************************************
ok: [192.168.85.131]

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
ok: [192.168.85.131]
included: /opt/k8s/git/kubeadm-ha/roles/prepare/base/tasks/verify_variables.yml for 192.168.85.131

TASK [prepare/base : 校验 NodeName 是否合法] *****************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 etcd 节点数量] *********************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 master 节点数量] *******************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 worker 节点数量] *******************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 new-etcd 节点组数量] ****************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 etcd 节点数量] *********************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 lb 模式类型] ***********************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 lb 模式端口设置] *********************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 kube-proxy 模式类型] ***************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 ingress controller node port 端口设置] *********************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 ingress controller http node port 端口设置] ****************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 ingress controller https node port 端口设置] ***************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}
included: /opt/k8s/git/kubeadm-ha/roles/prepare/base/tasks/verify_node.yml for 192.168.85.131

TASK [prepare/base : 校验节点操作系统] *************************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验节点 systemd 类型操作系统] **************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验节点系统内核] *************************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验节点系统架构] *************************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验节点系统版本] *************************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 master 节点内存] *******************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 worker 节点内存] *******************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 master 节点CPU核数] ****************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}

TASK [prepare/base : 校验 worker 节点CPU核数] ****************************************
ok: [192.168.85.131] => {
"changed": false,
"msg": "All assertions passed"
}
included: /opt/k8s/git/kubeadm-ha/roles/prepare/base/tasks/common.yml for 192.168.85.131

TASK [prepare/base : 统一时区为 Asia/Shanghai] **************************************
ok: [192.168.85.131]

TASK [prepare/base : 禁用系统 swap] ************************************************
changed: [192.168.85.131]

TASK [prepare/base : 删除 fstab swap 相关配置] ***************************************
changed: [192.168.85.131]

TASK [prepare/base : 加载内核模块] ***************************************************
ok: [192.168.85.131] => (item=sunrpc)
changed: [192.168.85.131] => (item=ip_vs)
changed: [192.168.85.131] => (item=ip_vs_rr)
changed: [192.168.85.131] => (item=ip_vs_sh)
changed: [192.168.85.131] => (item=ip_vs_wrr)
changed: [192.168.85.131] => (item=br_netfilter)

TASK [prepare/base : 加载nf_conntrack for kernel < 4.19] *************************
ok: [192.168.85.131]

TASK [prepare/base : 设置 systemd-modules-load 配置] *******************************
changed: [192.168.85.131]

TASK [prepare/base : 启动/重启 systemd-modules-load] *******************************
changed: [192.168.85.131]

TASK [prepare/base : 设置系统参数] ***************************************************
changed: [192.168.85.131]

TASK [prepare/base : 生效系统参数] ***************************************************
changed: [192.168.85.131]

TASK [prepare/base : 优化 nfs clinet 配置] *****************************************
changed: [192.168.85.131]

TASK [prepare/base : 生效 nfs clinet 配置] *****************************************
changed: [192.168.85.131]

TASK [prepare/base : 添加集群节点 hostname 信息到 hosts 文件中] ****************************
ok: [192.168.85.131]

TASK [prepare/base : 确认 hosts 文件中 localhost ipv4 配置正确] *************************
changed: [192.168.85.131]

TASK [prepare/base : 确认 hosts 文件中 localhost ipv6 配置正确] *************************
changed: [192.168.85.131]

TASK [prepare/base : 创建 systemd 配置目录] ******************************************
changed: [192.168.85.131]

TASK [prepare/base : 设置系统 ulimits] *********************************************
changed: [192.168.85.131]
included: /opt/k8s/git/kubeadm-ha/roles/prepare/base/tasks/centos.yml for 192.168.85.131

TASK [prepare/base : 判断 firewalld 是否安装] ****************************************
changed: [192.168.85.131]

TASK [prepare/base : 禁用防火墙] ****************************************************
changed: [192.168.85.131]

TASK [prepare/base : 设置 yum obsoletes 值为 0] ************************************
changed: [192.168.85.131]

TASK [prepare/base : 添加 epel 仓库] ***********************************************
changed: [192.168.85.131]

TASK [prepare/base : 刷新 yum 缓存] ************************************************
[WARNING]: Consider using the yum module rather than running 'yum'. If you
need to use command because yum is insufficient you can add 'warn: false' to
this command task or set 'command_warnings=False' in ansible.cfg to get rid of
this message.

fatal: [192.168.85.131]: FAILED! => {"changed": true, "cmd": "yum clean all && yum makecache fast", "delta": "0:00:00.697239", "end": "2020-11-23 17:29:17.150879", "msg": "non-zero return code", "rc": 2, "start": "2020-11-23 17:29:16.453640", "stderr": "usage: yum makecache [-c [config file]] [-q] [-v] [--version]\n [--installroot [path]] [--nodocs] [--noplugins]\n [--enableplugin [plugin]] [--disableplugin [plugin]]\n [--releasever RELEASEVER] [--setopt SETOPTS]\n [--skip-broken] [-h] [--allowerasing] [-b | --nobest]\n [-C] [-R [minutes]] [-d [debug level]] [--debugsolver]\n [--showduplicates] [-e ERRORLEVEL] [--obsoletes]\n [--rpmverbosity [debug level name]] [-y] [--assumeno]\n [--enablerepo [repo]] [--disablerepo [repo] | --repo\n [repo]] [--enable | --disable] [-x [package]]\n [--disableexcludes [repo]] [--repofrompath [repo,path]]\n [--noautoremove] [--nogpgcheck] [--color COLOR]\n [--refresh] [-4] [-6] [--destdir DESTDIR]\n [--downloadonly] [--comment COMMENT] [--bugfix]\n [--enhancement] [--newpackage] [--security]\n [--advisory ADVISORY] [--bz BUGZILLA] [--cve CVES]\n [--sec-severity {Critical,Important,Moderate,Low}]\n [--forcearch ARCH] [--timer]\nyum makecache: error: argument timer: invalid choice: 'fast' (choose from 'timer')", "stderr_lines": ["usage: yum makecache [-c [config file]] [-q] [-v] [--version]", " [--installroot [path]] [--nodocs] [--noplugins]", " [--enableplugin [plugin]] [--disableplugin [plugin]]", " [--releasever RELEASEVER] [--setopt SETOPTS]", " [--skip-broken] [-h] [--allowerasing] [-b | --nobest]", " [-C] [-R [minutes]] [-d [debug level]] [--debugsolver]", " [--showduplicates] [-e ERRORLEVEL] [--obsoletes]", " [--rpmverbosity [debug level name]] [-y] [--assumeno]", " [--enablerepo [repo]] [--disablerepo [repo] | --repo", " [repo]] [--enable | --disable] [-x [package]]", " [--disableexcludes [repo]] [--repofrompath [repo,path]]", " [--noautoremove] [--nogpgcheck] [--color COLOR]", " [--refresh] [-4] [-6] [--destdir DESTDIR]", " [--downloadonly] [--comment COMMENT] [--bugfix]", " [--enhancement] [--newpackage] [--security]", " [--advisory ADVISORY] [--bz BUGZILLA] [--cve CVES]", " [--sec-severity {Critical,Important,Moderate,Low}]", " [--forcearch ARCH] [--timer]", "yum makecache: error: argument timer: invalid choice: 'fast' (choose from 'timer')"], "stdout": "39 文件已删除", "stdout_lines": ["39 文件已删除"]}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
192.168.85.131 : ok=47 changed=17 unreachable=0 failed=1 skipped=14 rescued=0 ignored=0

[root@localhost kubeadm-ha]# kubectl get cs

集群其他的节点telnet不通5000端口呢?导致镜像也拖不下来;iptables规则有问题吗??

TASK [load-balancer : Nginx lb | 拉取相关镜像] ****************************************************************************************
task path: /etc/ansible/roles/load-balancer/tasks/nginx.yml:9
failed: [cxs-master01] (item=registry.custom.local:5000/kubeadm-ha/nginx:1.19-alpine) => {
"ansible_loop_var": "item",
"changed": true,
"cmd": "docker pull registry.custom.local:5000/kubeadm-ha/nginx:1.19-alpine",
"delta": "0:00:00.217152",
"end": "2021-02-05 11:52:08.815262",
"invocation": {
"module_args": {
"_raw_params": "docker pull registry.custom.local:5000/kubeadm-ha/nginx:1.19-alpine",
"_uses_shell": true,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"stdin_add_newline": true,
"strip_empty_ends": true,
"warn": true
}
},
"item": "registry.custom.local:5000/kubeadm-ha/nginx:1.19-alpine",
"msg": "non-zero return code",
"rc": 1,
"start": "2021-02-05 11:52:08.598110",
"stderr": "Error response from daemon: Get http://registry.custom.local:5000/v2/: dial tcp 172.19.70.37:5000: connect: connec
"stderr_lines": [
"Error response from daemon: Get http://registry.custom.local:5000/v2/: dial tcp 172.19.70.37:5000: connect: connection r
],
"stdout": "",
"stdout_lines": []
}
Using module file /usr/lib/python3.8/site-packages/ansible/modules/command.py
Pipelining is enabled.

无法安装

{"changed": false, "changes": {"installed": ["kubectl-1.16.9", "kubelet-1.16.9", "kubeadm-1.16.9", "kubernetes-cni-0.7.5"]}, "msg": "Error: Package: kubeadm-1.16.9-0.x86_64 (kubernetes)
Requires: kubernetes-cni >= 0.7.5
Available: kubernetes-cni-0.3.0.1-0.07a8a2.x86_64 (kubernetes)
kubernetes-cni = 0.3.0.1-0.07a8a2
Available: kubernetes-cni-0.5.1-0.x86_64 (kubernetes)
kubernetes-cni = 0.5.1-0
Available: kubernetes-cni-0.5.1-1.x86_64 (kubernetes)
kubernetes-cni = 0.5.1-1
Available: kubernetes-cni-0.6.0-0.x86_64 (kubernetes)
kubernetes-cni = 0.6.0-0
Available: kubernetes-cni-0.7.5-0.x86_64 (kubernetes)
kubernetes-cni = 0.7.5-0
Error: Package: kubelet-1.16.9-0.x86_64 (kubernetes)
Requires: kubernetes-cni >= 0.7.5
Available: kubernetes-cni-0.3.0.1-0.07a8a2.x86_64 (kubernetes)
kubernetes-cni = 0.3.0.1-0.07a8a2
Available: kubernetes-cni-0.5.1-0.x86_64 (kubernetes)
kubernetes-cni = 0.5.1-0
Available: kubernetes-cni-0.5.1-1.x86_64 (kubernetes)
kubernetes-cni = 0.5.1-1
Available: kubernetes-cni-0.6.0-0.x86_64 (kubernetes)
kubernetes-cni = 0.6.0-0
Available: kubernetes-cni-0.7.5-0.x86_64 (kubernetes)
kubernetes-cni = 0.7.5-0
", "rc": 1, "results": ["Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile

  • base: mirrors.aliyun.com
  • extras: mirrors.aliyun.com
  • updates: mirrors.aliyun.com
    Package kubernetes-cni is obsoleted by kubelet, trying to install kubelet-1.18.4-0.x86_64 instead
    Resolving Dependencies
    --> Running transaction check
    ---> Package kubeadm.x86_64 0:1.16.9-0 will be installed
    --> Processing Dependency: kubernetes-cni >= 0.7.5 for package: kubeadm-1.16.9-0.x86_64
    Package kubernetes-cni is obsoleted by kubelet, but obsoleting package does not provide for requirements
    ---> Package kubectl.x86_64 0:1.16.9-0 will be installed
    ---> Package kubelet.x86_64 0:1.16.9-0 will be installed
    --> Processing Dependency: kubernetes-cni >= 0.7.5 for package: kubelet-1.16.9-0.x86_64
    Package kubernetes-cni is obsoleted by kubelet, but obsoleting package does not provide for requirements
    ---> Package kubelet.x86_64 0:1.18.4-0 will be installed
    --> Finished Dependency Resolution
    You could try using --skip-broken to work around the problem
    ** Found 2 pre-existing rpmdb problem(s), 'yum check' output follows:
    2:postfix-2.10.1-7.el7.x86_64 has missing requires of libmysqlclient.so.18()(64bit)
    2:postfix-2.10.1-7.el7.x86_64 has missing requires of libmysqlclient.so.18(libmysqlclient_18)(64bit)
    "]}
    fatal: [172.16.116.217]: FAILED! => {"changed": false, "changes": {"installed": ["kubectl-1.16.9", "kubelet-1.16.9", "kubeadm-1.16.9", "kubernetes-cni-0.7.5"]}, "msg": "Error: Package: kubeadm-1.16.9-0.x86_64 (kubernetes)
    Requires: kubernetes-cni >= 0.7.5
    Available: kubernetes-cni-0.3.0.1-0.07a8a2.x86_64 (kubernetes)
    kubernetes-cni = 0.3.0.1-0.07a8a2
    Available: kubernetes-cni-0.5.1-0.x86_64 (kubernetes)
    kubernetes-cni = 0.5.1-0
    Available: kubernetes-cni-0.5.1-1.x86_64 (kubernetes)
    kubernetes-cni = 0.5.1-1
    Available: kubernetes-cni-0.6.0-0.x86_64 (kubernetes)
    kubernetes-cni = 0.6.0-0
    Available: kubernetes-cni-0.7.5-0.x86_64 (kubernetes)
    kubernetes-cni = 0.7.5-0
    Error: Package: kubelet-1.16.9-0.x86_64 (kubernetes)
    Requires: kubernetes-cni >= 0.7.5
    Available: kubernetes-cni-0.3.0.1-0.07a8a2.x86_64 (kubernetes)
    kubernetes-cni = 0.3.0.1-0.07a8a2
    Available: kubernetes-cni-0.5.1-0.x86_64 (kubernetes)
    kubernetes-cni = 0.5.1-0
    Available: kubernetes-cni-0.5.1-1.x86_64 (kubernetes)
    kubernetes-cni = 0.5.1-1
    Available: kubernetes-cni-0.6.0-0.x86_64 (kubernetes)
    kubernetes-cni = 0.6.0-0
    Available: kubernetes-cni-0.7.5-0.x86_64 (kubernetes)
    kubernetes-cni = 0.7.5-0
    ", "rc": 1, "results": ["Loaded plugins: fastestmirror, product-id, search-disabled-repos, subscription-
    : manager

This system is not registered with an entitlement server. You can use subscription-manager to register.

Loading mirror speeds from cached hostfile

  • base: mirrors.aliyun.com
  • extras: mirrors.aliyun.com
  • updates: mirrors.aliyun.com
    Package kubernetes-cni is obsoleted by kubelet, trying to install kubelet-1.18.4-0.x86_64 instead
    Resolving Dependencies
    --> Running transaction check
    ---> Package kubeadm.x86_64 0:1.16.9-0 will be installed
    --> Processing Dependency: kubernetes-cni >= 0.7.5 for package: kubeadm-1.16.9-0.x86_64
    Package kubernetes-cni is obsoleted by kubelet, but obsoleting package does not provide for requirements
    ---> Package kubectl.x86_64 0:1.16.9-0 will be installed
    ---> Package kubelet.x86_64 0:1.16.9-0 will be installed
    --> Processing Dependency: kubernetes-cni >= 0.7.5 for package: kubelet-1.16.9-0.x86_64
    Package kubernetes-cni is obsoleted by kubelet, but obsoleting package does not provide for requirements
    ---> Package kubelet.x86_64 0:1.18.4-0 will be installed
    --> Finished Dependency Resolution
    You could try using --skip-broken to work around the problem
    ** Found 2 pre-existing rpmdb problem(s), 'yum check' output follows:
    2:postfix-2.10.1-7.el7.x86_64 has missing requires of libmysqlclient.so.18()(64bit)
    2:postfix-2.10.1-7.el7.x86_64 has missing requires of libmysqlclient.so.18(libmysqlclient_18)(64bit)
    "]}
    fatal: [172.16.116.215]: FAILED! => {"changed": false, "changes": {"installed": ["kubectl-1.16.9", "kubelet-1.16.9", "kubeadm-1.16.9", "kubernetes-cni-0.7.5"]}, "msg": "Error: Package: kubeadm-1.16.9-0.x86_64 (kubernetes)
    Requires: kubernetes-cni >= 0.7.5
    Available: kubernetes-cni-0.3.0.1-0.07a8a2.x86_64 (kubernetes)
    kubernetes-cni = 0.3.0.1-0.07a8a2
    Available: kubernetes-cni-0.5.1-0.x86_64 (kubernetes)
    kubernetes-cni = 0.5.1-0
    Available: kubernetes-cni-0.5.1-1.x86_64 (kubernetes)
    kubernetes-cni = 0.5.1-1
    Available: kubernetes-cni-0.6.0-0.x86_64 (kubernetes)
    kubernetes-cni = 0.6.0-0
    Available: kubernetes-cni-0.7.5-0.x86_64 (kubernetes)
    kubernetes-cni = 0.7.5-0
    Error: Package: kubelet-1.16.9-0.x86_64 (kubernetes)
    Requires: kubernetes-cni >= 0.7.5
    Available: kubernetes-cni-0.3.0.1-0.07a8a2.x86_64 (kubernetes)
    kubernetes-cni = 0.3.0.1-0.07a8a2
    Available: kubernetes-cni-0.5.1-0.x86_64 (kubernetes)
    kubernetes-cni = 0.5.1-0
    Available: kubernetes-cni-0.5.1-1.x86_64 (kubernetes)
    kubernetes-cni = 0.5.1-1
    Available: kubernetes-cni-0.6.0-0.x86_64 (kubernetes)
    kubernetes-cni = 0.6.0-0
    Available: kubernetes-cni-0.7.5-0.x86_64 (kubernetes)
    kubernetes-cni = 0.7.5-0
    ", "rc": 1, "results": ["Loaded plugins: fastestmirror, product-id, search-disabled-repos, subscription-
    : manager

This system is not registered with an entitlement server. You can use subscription-manager to register.

Loading mirror speeds from cached hostfile

  • base: mirrors.aliyun.com
  • extras: mirrors.aliyun.com
  • updates: mirrors.aliyun.com
    Package kubernetes-cni is obsoleted by kubelet, trying to install kubelet-1.18.4-0.x86_64 instead
    Resolving Dependencies
    --> Running transaction check
    ---> Package kubeadm.x86_64 0:1.16.9-0 will be installed
    --> Processing Dependency: kubernetes-cni >= 0.7.5 for package: kubeadm-1.16.9-0.x86_64
    Package kubernetes-cni is obsoleted by kubelet, but obsoleting package does not provide for requirements
    ---> Package kubectl.x86_64 0:1.16.9-0 will be installed
    ---> Package kubelet.x86_64 0:1.16.9-0 will be installed
    --> Processing Dependency: kubernetes-cni >= 0.7.5 for package: kubelet-1.16.9-0.x86_64
    Package kubernetes-cni is obsoleted by kubelet, but obsoleting package does not provide for requirements
    ---> Package kubelet.x86_64 0:1.18.4-0 will be installed
    --> Finished Dependency Resolution
    You could try using --skip-broken to work around the problem
    You could try running: rpm -Va --nofiles --nodigest
    "]}
    fatal: [172.16.116.209]: FAILED! => {"changed": false, "changes": {"installed": ["kubectl-1.16.9", "kubelet-1.16.9", "kubeadm-1.16.9", "kubernetes-cni-0.7.5"]}, "msg": "Error: Package: kubeadm-1.16.9-0.x86_64 (kubernetes)
    Requires: kubernetes-cni >= 0.7.5
    Available: kubernetes-cni-0.3.0.1-0.07a8a2.x86_64 (kubernetes)
    kubernetes-cni = 0.3.0.1-0.07a8a2
    Available: kubernetes-cni-0.5.1-0.x86_64 (kubernetes)
    kubernetes-cni = 0.5.1-0
    Available: kubernetes-cni-0.5.1-1.x86_64 (kubernetes)
    kubernetes-cni = 0.5.1-1
    Available: kubernetes-cni-0.6.0-0.x86_64 (kubernetes)
    kubernetes-cni = 0.6.0-0
    Available: kubernetes-cni-0.7.5-0.x86_64 (kubernetes)
    kubernetes-cni = 0.7.5-0
    Error: Package: kubelet-1.16.9-0.x86_64 (kubernetes)
    Requires: kubernetes-cni >= 0.7.5
    Available: kubernetes-cni-0.3.0.1-0.07a8a2.x86_64 (kubernetes)
    kubernetes-cni = 0.3.0.1-0.07a8a2
    Available: kubernetes-cni-0.5.1-0.x86_64 (kubernetes)
    kubernetes-cni = 0.5.1-0
    Available: kubernetes-cni-0.5.1-1.x86_64 (kubernetes)
    kubernetes-cni = 0.5.1-1
    Available: kubernetes-cni-0.6.0-0.x86_64 (kubernetes)
    kubernetes-cni = 0.6.0-0
    Available: kubernetes-cni-0.7.5-0.x86_64 (kubernetes)
    kubernetes-cni = 0.7.5-0
    ", "rc": 1, "results": ["Loaded plugins: fastestmirror, product-id, search-disabled-repos, subscription-
    : manager

This system is not registered with an entitlement server. You can use subscription-manager to register.

Loading mirror speeds from cached hostfile

  • base: mirrors.aliyun.com
  • extras: mirrors.aliyun.com
  • updates: mirrors.aliyun.com
    Package kubernetes-cni is obsoleted by kubelet, trying to install kubelet-1.18.4-0.x86_64 instead
    Resolving Dependencies
    --> Running transaction check
    ---> Package kubeadm.x86_64 0:1.16.9-0 will be installed
    --> Processing Dependency: kubernetes-cni >= 0.7.5 for package: kubeadm-1.16.9-0.x86_64
    Package kubernetes-cni is obsoleted by kubelet, but obsoleting package does not provide for requirements
    ---> Package kubectl.x86_64 0:1.16.9-0 will be installed
    ---> Package kubelet.x86_64 0:1.16.9-0 will be installed
    --> Processing Dependency: kubernetes-cni >= 0.7.5 for package: kubelet-1.16.9-0.x86_64
    Package kubernetes-cni is obsoleted by kubelet, but obsoleting package does not provide for requirements
    ---> Package kubelet.x86_64 0:1.18.4-0 will be installed
    --> Finished Dependency Resolution
    You could try using --skip-broken to work around the problem
    You could try running: rpm -Va --nofiles --nodigest
    "]}

Add install the netaddr

Feature Request:
When I run to generate a certificate in kube-certificates/common.yml

An error is :
fatal: [192.168.1.100]: FAILED! => {"changed": false, "msg": "AnsibleFilterError: {{ SERVICE_CIDR | ipaddr(‘net‘) | ipaddr(1) | ipaddr(‘address‘) }}: The ipaddr filter requires python-netaddr be installed on the ansible controller"}

Fix :
pip install netaddr

OS :
CentOS Linux release 7.6.1810 (Core)
The minimum installation

lb_kube_apiserver_port 不合理

对于单master节点的集群,
roles\kube-master\certificates\tasks\kubeconfig.yml 中生成各个配置文件时,
--server=https://{{ KUBE_APISERVER_IP | trim }}:{{ lb_kube_apiserver_port | trim }} 段落,
使用lb_kube_apiserver_port 不合理,

建议修改roles\kube-master\certificates\defaults\main.yml中的lb_kube_apiserver_port配置,默认使用6443或者匹配host.ini中的另一个变量

创建 apiserver 证书失败

缺陷描述

清晰而简明的描述缺陷是什么。

TASK [kube-certificates : 创建 apiserver 证书] *************************************************************************
fatal: [10.6.1.57]: FAILED! => {"changed": true, "cmd": "openssl x509 -req -CAcreateserial  -extensions v3_req_server  -extfile /etc/kubernetes/pki/kube-openssl.cnf  -CA /etc/kubernetes/pki/ca.crt  -CAkey /etc/kubernetes/pki/ca.key  -in /etc/kubernetes/pki/apiserver.csr -out /etc/kubernetes/pki/apiserver.crt -days 3650 \n", "delta": "0:00:00.036481", "end": "2021-05-29 13:19:55.367519", "msg": "non-zero return code", "rc": 1, "start": "2021-05-29 13:19:55.331038", "stderr": "Error Loading extension section v3_req_server\n140474631620496:error:220A4076:X509 V3 routines:a2i_GENERAL_NAME:bad ip address:v3_alt.c:476:value=False\n140474631620496:error:22098080:X509 V3 routines:X509V3_EXT_nconf:error in extension:v3_conf.c:95:name=subjectAltName, value=@alt_kube_apiserver", "stderr_lines": ["Error Loading extension section v3_req_server", "140474631620496:error:220A4076:X509 V3 routines:a2i_GENERAL_NAME:bad ip address:v3_alt.c:476:value=False", "140474631620496:error:22098080:X509 V3 routines:X509V3_EXT_nconf:error in extension:v3_conf.c:95:name=subjectAltName, value=@alt_kube_apiserver"], "stdout": "", "stdout_lines": []}

环境 (请填写以下信息):

执行下面括号中的命令,提交返回结果

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 3.10.0-1062.el7.x86_64 x86_64
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Ansible版本 (ansible --version):
ansible 2.10.10
  config file = /root/choerodon/kubeadm-ha/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.6.8 (default, Nov 16 2020, 16:55:22) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
  • Python版本 (python --version):
    Python 2.7.5
  • Kubeadm-ha版本(commit) (git rev-parse --short HEAD):
    67e2af0
    如何复现

复现的步骤:

  1. 第一步:编写 inventory.ini 文件,内容如下

; 将所有节点的信息在这里填写
; 第一个字段 为节点内网IP,部署完成后为 kubernetes 节点 nodeName
; 第二个字段 ansible_port 为节点 sshd 监听端口
; 第三个字段 ansible_user 为节点远程登录用户名
; 第四个字段 ansible_ssh_pass 为节点远程登录用户密码
[all]
10.6.1.51 ansible_port=22 ansible_user="root" ansible_ssh_pass="Wxjc@3705"
10.6.1.57 ansible_port=22 ansible_user="root" ansible_ssh_pass="Wxjc@3705"
10.6.1.58 ansible_port=22 ansible_user="root" ansible_ssh_pass="Wxjc@3705"
10.6.1.59 ansible_port=22 ansible_user="root" ansible_ssh_pass="Wxjc@3705"

; 私有云:
; VIP 负载模式:
; 也就是负载均衡器 + keepalived 模式,比如常用的 haproxy + keepalived。
; 本脚本中负载均衡器有 nginx、openresty、haproxy、envoy 可供选择,设置 lb_mode 即可进行任意切换。
; 设置 lb_kube_apiserver_ip 即表示启用 keepalived,请先与服务器提供部门协商保留一个IP作为 lb_kube_apiserver_ip,
; 一般 lb 节点组中有两个节点就够了,lb节点组中第一个节点为 keepalived 的 master 节点,剩下的都为 backed 节点。
;
; 节点本地负载模式:
; 只启动负载均衡器,不启用 keepalived(即不设置 lb_kube_apiserver_ip),
; 此时 kubelet 链接 apiserver 地址为 127.0.0.1:lb_kube_apiserver_port。
; 使用此模式时请将 lb 节点组置空。
;
; 公有云:
; 不推荐使用 slb 模式,建议直接使用节点本地负载模式。
; 若使用 slb 模式,请先使用节点本地负载模式进行部署,
; 部署成功后再切换至 slb 模式:
; 将 lb_mode 修改为 slb,将 lb_kube_apiserver_ip 设置为购买到的 slb 内网ip,
; 修改 lb_kube_apiserver_port 为 slb 监听端口。
; 再次运行初始化集群脚本即可切换至 slb 模式。
[lb]

; 注意etcd集群必须是1,3,5,7...奇数个节点
[etcd]
10.6.1.57
10.6.1.58
10.6.1.59

[kube-master]
10.6.1.57
10.6.1.58
10.6.1.59

[kube-worker]
10.6.1.51
10.6.1.57
10.6.1.58
10.6.1.59

; 预留组,后续添加master节点使用
[new-master]

; 预留组,后续添加worker节点使用
[new-worker]

; 预留组,后续添加etcd节点使用
[new-etcd]

; 预留组,后续删除worker角色使用
[del-worker]

; 预留组,后续删除master角色使用
[del-master]

; 预留组,后续删除etcd角色使用
[del-etcd]

; 预留组,后续删除节点使用
[del-node]

;-------------------------------------- 以下为基础信息配置 ------------------------------------;
[all:vars]
; 是否跳过节点物理资源校验,Master节点要求2c2g以上,Worker节点要求2c4g以上
skip_verify_node=false
; kubernetes版本
kube_version="1.16.15"

; 容器运行时类型,可选项:containerd,docker;默认 containerd
container_manager="containerd"

; 负载均衡器
; 有 nginx、openresty、haproxy、envoy 和 slb 可选,默认使用 nginx
lb_mode="nginx"
; 使用负载均衡后集群 apiserver ip,设置 lb_kube_apiserver_ip 变量,则启用负载均衡器 + keepalived
; lb_kube_apiserver_ip="192.168.56.15"
; 使用负载均衡后集群 apiserver port
lb_kube_apiserver_port="8443"

; 网段选择:pod 和 service 的网段不能与服务器网段重叠,
; 若有重叠请配置 kube_pod_subnetkube_service_subnet 变量设置 pod 和 service 的网段,示例参考:
; 如果服务器网段为:10.0.0.1/8
; pod 网段可设置为:192.168.0.0/18
; service 网段可设置为 192.168.64.0/18
; 如果服务器网段为:172.16.0.1/12
; pod 网段可设置为:10.244.0.0/18
; service 网段可设置为 10.244.64.0/18
; 如果服务器网段为:192.168.0.1/16
; pod 网段可设置为:10.244.0.0/18
; service 网段可设置为 10.244.64.0/18
; 集群pod ip段,默认掩码位 18 即 16384 个ip
kube_pod_subnet="172.16.200.1/18"
; 集群service ip段
kube_service_subnet="172.16.201.1/18"
; 分配给节点的 pod 子网掩码位,默认为 24 即 256 个ip,故使用这些默认值可以纳管 16384/256=64 个节点。
kube_network_node_prefix="24"

; node节点最大 pod 数。数量与分配给节点的 pod 子网有关,ip 数应大于 pod 数。
; https://cloud.google.com/kubernetes-engine/docs/how-to/flexible-pod-cidr
kube_max_pods="110"

; 集群网络插件,目前支持flannel,calico
network_plugin="calico"

; 若服务器磁盘分为系统盘与数据盘,请修改以下路径至数据盘自定义的目录。
; Kubelet 根目录
kubelet_root_dir="/var/lib/kubelet"
; docker容器存储目录
docker_storage_dir="/var/lib/docker"
; containerd容器存储目录
containerd_storage_dir="/var/lib/containerd"
; Etcd 数据根目录
etcd_data_dir="/var/lib/etcd"
......
```

  1. 第二步:编写 variables.yaml 文件,内容如下

  2. 第三步:执行部署命令,命令如下

ansible-playbook -i inventory.ini 90-init-cluster.yml
```

  1. 出现错误

fatal: [10.6.1.57]: FAILED! => {"changed": true, "cmd": "openssl x509 -req -CAcreateserial -extensions v3_req_server -extfile /etc/kubernetes/pki/kube-openssl.cnf -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -in /etc/kubernetes/pki/apiserver.csr -out /etc/kubernetes/pki/apiserver.crt -days 3650 \n", "delta": "0:00:00.036481", "end": "2021-05-29 13:19:55.367519", "msg": "non-zero return code", "rc": 1, "start": "2021-05-29 13:19:55.331038", "stderr": "Error Loading extension section v3_req_server\n140474631620496:error:220A4076:X509 V3 routines:a2i_GENERAL_NAME:bad ip address:v3_alt.c:476:value=False\n140474631620496:error:22098080:X509 V3 routines:X509V3_EXT_nconf:error in extension:v3_conf.c:95:name=subjectAltName, value=@alt_kube_apiserver", "stderr_lines": ["Error Loading extension section v3_req_server", "140474631620496:error:220A4076:X509 V3 routines:a2i_GENERAL_NAME:bad ip address:v3_alt.c:476:value=False", "140474631620496:error:22098080:X509 V3 routines:X509V3_EXT_nconf:error in extension:v3_conf.c:95:name=subjectAltName, value=@alt_kube_apiserver"], "stdout": "", "stdout_lines": []}
```

预期结果
可以创建APi证书
对你期望发生的结果清晰而简洁的描述。
image
image

3个master,1个node,部署机器是master01;node节点无法加入集群

TASK [kube-worker : Worker 节点加入集群] *********************************************************************************************************************************************************
task path: /etc/ansible/roles/kube-worker/tasks/main.yml:41
fatal: [node01]: FAILED! => {
"changed": true,
"cmd": "kubeadm join --config /etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests,FileAvailable--etc-kubernetes-pki-ca.crt\n",
"delta": "0:00:01.127363",
"end": "2021-02-21 11:20:19.672170",
"invocation": {
"module_args": {
"_raw_params": "kubeadm join --config /etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests,FileAvailable--etc-kubernetes-pki-ca.crt\n",
"_uses_shell": true,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"stdin_add_newline": true,
"strip_empty_ends": true,
"warn": true
}
},
"msg": "non-zero return code",
"rc": 1,
"start": "2021-02-21 11:20:18.544807",
"stderr": "\t[WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty\n\t[WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists\nerror execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://127.0.0.1:8443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s\": x509: certificate has expired or is not yet valid: current time 2021-02-21T11:20:19+08:00 is before 2021-02-21T11:17:01Z\nTo see the stack trace of this error execute with --v=5 or higher",
"stderr_lines": [
"\t[WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty",
"\t[WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists",
"error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://127.0.0.1:8443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": x509: certificate has expired or is not yet valid: current time 2021-02-21T11:20:19+08:00 is before 2021-02-21T11:17:01Z",
"To see the stack trace of this error execute with --v=5 or higher"
],
"stdout": "[preflight] Running pre-flight checks\n[preflight] Reading configuration from the cluster...\n[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'",
"stdout_lines": [
"[preflight] Running pre-flight checks",
"[preflight] Reading configuration from the cluster...",
"[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'"
]
}

slb模式按照说明, 再次运行初始化集群脚本导致集群不可用

不推荐使用 slb 模式,建议直接使用节点本地负载模式。
; 若使用 slb 模式,请先使用节点本地负载模式进行部署,
; 部署成功后再切换至 slb 模式:
; 将 lb_mode 修改为 slb,将 lb_kube_apiserver_ip 设置为购买到的 slb 内网ip,
; 修改 lb_kube_apiserver_port 为 slb 监听端口。
; 再次运行初始化集群脚本即可切换至 slb 模式。

按照文档操作,这种方式是有什么问题吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.