Comments (23)
lighthouse 和 lighthouse-plugin进程配置和社区代码一致, lighthouse-plugin加入了两个配置项 --hostname-override= --kubeconfig=
from caelus.
lighthouse 配置:
/etc/lighthouse/config:
ARGS="--config=/etc/lighthouse/config.yaml --logtostderr=false --v=10 --log-dir=/data0/gelanjie/caelus_log/lighthouse"
/etc/lighthouse/config.yaml :
apiVersion: componentconfig.lighthouse.io/v1alpha1
kind: HookConfiguration
timeout: 10
listenAddress: unix:///var/run/lighthouse.sock
webhooks:
- name: versioned
endpoint: unix://@plugin-server
failurePolicy: Fail
stages:- urlPattern: /{id:v[.0-9]+}/containers/create
method: post
type: PreHook
- urlPattern: /{id:v[.0-9]+}/containers/create
- name: non-versioned
endpoint: unix://@plugin-server
failurePolicy: Fail
stages:- urlPattern: /containers/create
method: post
type: PreHook
- urlPattern: /containers/create
from caelus.
plugin-server:
/etc/plugin-server/config:
ARGS="--feature-gates=AllAlpha=true --logtostderr=false --hostname-override=sbd2-lgna2-a10-bcc-2 --v=10 --log-dir=/data0/gelanjie/caelus_log/plugin-server --listen-address=unix://@plugin-server --kubeconfig=/root/.kube/config"
from caelus.
kubelet 报错:
Connecting to docker on unix:///var/run/lighthouse.sock
Start docker client with request timeout=2m0s
failed to run Kubelet: failed to create kubelet: failed to get docker version: request returned Bad Gateway for API route and
from caelus.
lighthouse的配置修改为:
apiVersion: lighthouse.io/v1alpha1
kind: hookConfiguration
timeout: 10
listenAddress: unix:///var/run/lighthouse.sock
webhooks:
- name: docker
endpoint: unix://@plugin-server
failurePolicy: Fail
stages:- urlPattern: /containers/create
method: post
type: PreHook - urlPattern: /containers/{name:.*}/update
method: post
type: PreHook
- urlPattern: /containers/create
测试下?
from caelus.
提供一下 docker info 的数据
from caelus.
提供一下 docker info 的数据
docker info:
Containers: 34
Running: 17
Paused: 0
Stopped: 17
Images: 17
Server Version: 18.09.6
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: nvidia
Init Binary: docker-init
containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30-dirty
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.19.95-21
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 236
Total Memory: 957.9GiB
Name: sbd2-lgna2-a10-bcc-2
ID: MSMC:CI72:UZC2:332D:O5SZ:MIJR:4DWF:5ITS:DDP7:EJ3A:PYDK:NNDY
Docker Root Dir: /data0/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
Live Restore Enabled: true
Product License: Community Engine
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
from caelus.
你这边docker daemon监听的地址是/var/run/docker.sock吗?如果不是需要按照文档配置RemoteEndpoint,
你这边是docker无法访问
from caelus.
你这边docker daemon监听的地址是/var/run/docker.sock吗?如果不是需要按照文档配置RemoteEndpoint,
你这边是docker无法访问
docker的监听地址确实是 /var/run/docker.sock
[root@sbd2-lgna2-a10-bcc-2] /data0$ ls -al /var/run/docker.sock
srw-rw---- 1 root docker 0 12月 9 20:31 /var/run/docker.sock
有什么排错的工具和手段吗?多谢!
from caelus.
你直接使用docker -H unix:///var/run/lighthouse.sock info看是否能显示数据,多长时间返回数据。你这边提供的log显示,请求被cancel掉了。要么是你这边kubelet设置的docker 请求时间过短,要么就是要加大配置的timeout时间
from caelus.
你直接使用docker -H unix:///var/run/lighthouse.sock info看是否能显示数据,多长时间返回数据。你这边提供的log显示,请求被cancel掉了。要么是你这边kubelet设置的docker 请求时间过短,要么就是要加大配置的timeout时间
[root@sbd2-lgna2-a10-bcc-2] /data0$ docker -H unix:///var/run/lighthouse.sock info
request returned Bad Gateway for API route and version http://%2Fvar%2Frun%2Flighthouse.sock/v1.39/info, check if the server supports the requested API version
[root@sbd2-lgna2-a10-bcc-2] /data0$ docker -H unix:///var/run/docker.sock info
Containers: 34
Running: 1
Paused: 0
Stopped: 33
Images: 17
Server Version: 18.09.6
from caelus.
你直接使用docker -H unix:///var/run/lighthouse.sock info看是否能显示数据,多长时间返回数据。你这边提供的log显示,请求被cancel掉了。要么是你这边kubelet设置的docker 请求时间过短,要么就是要加大配置的timeout时间
[root@sbd2-lgna2-a10-bcc-2] /data0$ docker -H unix:///var/run/lighthouse.sock info request returned Bad Gateway for API route and version http://%2Fvar%2Frun%2Flighthouse.sock/v1.39/info, check if the server supports the requested API version
[root@sbd2-lgna2-a10-bcc-2] /data0$ docker -H unix:///var/run/docker.sock info Containers: 34 Running: 1 Paused: 0 Stopped: 33 Images: 17 Server Version: 18.09.6
这个时候 lighthouse 的日志发出来一下,要完整的请求日志
from caelus.
重新用社区代码编译了一个二进制试了一下
发送了两次 docker -H unix:///var/run/lighthouse.sock info
以及 curl -v --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info
这个时候 lighthouse 的完整日志:
[root@sbd2-lgna2-a10-bcc-2] /data0$ cat /data0/gelanjie/caelus_log/lighthouse/lighthouse.INFO
Log file created at: 2021/12/10 15:16:43
Running on machine: sbd2-lgna2-a10-bcc-2
Binary: Built with gc go1.16.10 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1210 15:16:43.107530 17467 util.go:69] FLAG: --add-dir-header="false"
I1210 15:16:43.107680 17467 util.go:69] FLAG: --alsologtostderr="false"
I1210 15:16:43.107683 17467 util.go:69] FLAG: --config="/etc/lighthouse/config.yaml"
I1210 15:16:43.107693 17467 util.go:69] FLAG: --help="false"
I1210 15:16:43.107695 17467 util.go:69] FLAG: --log-backtrace-at=":0"
I1210 15:16:43.107699 17467 util.go:69] FLAG: --log-dir="/data0/gelanjie/caelus_log/lighthouse"
I1210 15:16:43.107701 17467 util.go:69] FLAG: --log-file=""
I1210 15:16:43.107703 17467 util.go:69] FLAG: --log-file-max-size="1800"
I1210 15:16:43.107704 17467 util.go:69] FLAG: --log-flush-frequency="5s"
I1210 15:16:43.107706 17467 util.go:69] FLAG: --logtostderr="false"
I1210 15:16:43.107708 17467 util.go:69] FLAG: --skip-headers="false"
I1210 15:16:43.107709 17467 util.go:69] FLAG: --skip-log-headers="false"
I1210 15:16:43.107711 17467 util.go:69] FLAG: --stderrthreshold="2"
I1210 15:16:43.107712 17467 util.go:69] FLAG: --v="10"
I1210 15:16:43.107714 17467 util.go:69] FLAG: --version="false"
I1210 15:16:43.107716 17467 util.go:69] FLAG: --vmodule=""
I1210 15:16:43.108173 17467 hook_manager.go:124] Hook timeout: 10 seconds
I1210 15:16:43.108178 17467 hook_manager.go:132] Register hook docker, endpoint unix://@plugin-server
I1210 15:16:43.108182 17467 hook_manager.go:136] Register PreHook post /containers/create with unix://@plugin-server
I1210 15:16:43.108186 17467 hook_manager.go:136] Register PreHook post /containers/{name:.}/update with unix://@plugin-server
I1210 15:16:43.108188 17467 hook_manager.go:164] Build router: post /containers/create
I1210 15:16:43.108202 17467 hook_manager.go:164] Build router: post /containers/{name:.}/update
I1210 15:16:43.108328 17467 hook_manager.go:101] Hook manager is running
I1210 15:17:32.250451 17467 hook_manager.go:343] Unhandled request GET /_ping
I1210 15:17:32.250505 17467 log.go:184] http: proxy error: context canceled
I1210 15:17:32.251220 17467 hook_manager.go:343] Unhandled request GET /v1.39/info
I1210 15:17:32.251236 17467 log.go:184] http: proxy error: context canceled
I1210 15:17:38.916794 17467 hook_manager.go:343] Unhandled request GET /_ping
I1210 15:17:38.916843 17467 log.go:184] http: proxy error: context canceled
I1210 15:17:38.917189 17467 hook_manager.go:343] Unhandled request GET /v1.39/info
I1210 15:17:38.917221 17467 log.go:184] http: proxy error: context canceled
I1210 15:18:03.286403 17467 hook_manager.go:343] Unhandled request GET /v1.39/info
I1210 15:18:03.286440 17467 log.go:184] http: proxy error: context canceled
I1210 15:18:07.702430 17467 hook_manager.go:343] Unhandled request GET /v1.39/info
I1210 15:18:07.702468 17467 log.go:184] http: proxy error: context canceled
from caelus.
你用什么用户启动 lighthouse?
from caelus.
你用什么用户启动 lighthouse?
root 用户
[root@sbd2-lgna2-a10-bcc-2] /data0$ ls -al /var/run/lighthouse.sock
srwxr-xr-x 1 root root 0 12月 10 13:56 /var/run/lighthouse.sock
[root@sbd2-lgna2-a10-bcc-2] /data0$ ls -al /var/run/docker.sock
srw-rw---- 1 root docker 0 12月 10 11:08 /var/run/docker.sock
from caelus.
curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info, 执行这个有输出结果吗
from caelus.
curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info, 执行这个有输出结果吗
没结果
[root@sbd2-lgna2-a10-bcc-2] /var/run$ curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info
[root@sbd2-lgna2-a10-bcc-2] /var/run$ curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/version
[root@sbd2-lgna2-a10-bcc-2] /var/run$ curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/info
from caelus.
curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info, 执行这个有输出结果吗
没结果
[root@sbd2-lgna2-a10-bcc-2] /var/run$ curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info [root@sbd2-lgna2-a10-bcc-2] /var/run$ curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/version [root@sbd2-lgna2-a10-bcc-2] /var/run$ curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/info
[root@sbd2-lgna2-a10-bcc-2] /data0$ curl -v --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info
About to connect() to 127.0.0.1 port 80 (#0)
Trying /var/run/lighthouse.sock...
Failed to set TCP_KEEPIDLE on fd 3
Failed to set TCP_KEEPINTVL on fd 3
Connected to 127.0.0.1 (/var/run/lighthouse.sock) port 80 (#0)
GET /v1.39/info HTTP/1.1
User-Agent: curl/7.29.0
Host: 127.0.0.1
Accept: /
HTTP/1.1 502 Bad Gateway
Date: Fri, 10 Dec 2021 07:18:23 GMT
Content-Length: 0
Connection #0 to host 127.0.0.1 left intact
from caelus.
@GeorgeSen 拉一下新的代码再试一下
from caelus.
目前正常了
from caelus.
@mYmNeo 按照新代码试过了,运行没有问题,我创建一个pod的时候把注解加上了:
mixer.kubernetes.io/app-class: "greedy"
发现创建的pod的容器并没有将cgroup放在 /sys/fs/cgroup/cpu,cpuacct/kubepods/offline 目录下:
lighthouse 日志:
I1210 16:57:12.913446 118028 hook_manager.go:343] Unhandled request GET /images/nm-operator:v0_1/json
I1210 16:57:12.914741 118028 hook_manager.go:343] Unhandled request GET /images/nm-operator:v0_1/json
I1210 16:57:12.916540 118028 hook_manager.go:343] Unhandled request GET /version
I1210 16:57:12.917098 118028 hook_manager.go:333] Handle request POST /containers/create
I1210 16:57:12.917134 118028 hook_manager.go:313] PreHook request /containers/create, body: {"Hostname":"","Domainname":"","User":"0","AttachStdin":false,"AttachStdout":false,"AttachStderr":false,"Tty":false,"OpenStdin":false,"StdinOnce":false,"Env":["USER=root","PID_FILE=/tmp/hadoop-root-nodemanager.pid","CONTAINER_EXECUTOR=org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor","GINIT_PORT=10010","GROUP=root","HADOOP_CONF_DIR=/opt/module/hadoop-3.1.3/etc/hadoop","HADOOP_YARN_HOME=/opt/module/hadoop-3.1.3/","CGROUP_PATH=/sys/fs/cgroup","NM_LOCAL_DIRS=/hadoop-data","MY_NODE_IP=10.26.0.10","KUBERNETES_PORT=tcp://10.243.0.1:443","KUBERNETES_PORT_443_TCP=tcp://10.243.0.1:443","KUBERNETES_PORT_443_TCP_PROTO=tcp","KUBERNETES_PORT_443_TCP_PORT=443","KUBERNETES_PORT_443_TCP_ADDR=10.243.0.1","KUBERNETES_SERVICE_HOST=10.243.0.1","KUBERNETES_SERVICE_PORT=443","KUBERNETES_SERVICE_PORT_HTTPS=443"],"Cmd":null,"Healthcheck":{"Test":["NONE"]},"Image":"sha256:30ce197b3c5b0d06d3bf0e6a320f141067321af7845d588c2ea8d8bea3464bb9","Volumes":null,"WorkingDir":"","Entrypoint":null,"OnBuild":null,"Labels":{"annotation.io.kubernetes.container.hash":"bd015f2b","annotation.io.kubernetes.container.restartCount":"0","annotation.io.kubernetes.container.terminationMessagePath":"/dev/termination-log","annotation.io.kubernetes.container.terminationMessagePolicy":"File","annotation.io.kubernetes.pod.terminationGracePeriod":"30","io.kubernetes.container.logpath":"/var/log/pods/hadoop-yarn_node-manager-9rzvq_03ec582f-954a-4dee-a3dd-ea0ddf9b04b3/node-manager/0.log","io.kubernetes.container.name":"node-manager","io.kubernetes.docker.type":"container","io.kubernetes.pod.name":"node-manager-9rzvq","io.kubernetes.pod.namespace":"hadoop-yarn","io.kubernetes.pod.uid":"03ec582f-954a-4dee-a3dd-ea0ddf9b04b3","io.kubernetes.sandbox.id":"a7853f436d7aa84756f2a00d429f7df946099c98f952f122937c58e8c29e1435"},"HostConfig":{"Binds":["/var/lib/kubelet/pods/03ec582f-954a-4dee-a3dd-ea0ddf9b04b3/volumes/kubernetes.io~secret/default-token-lvn46:/var/run/secrets/kubernetes.io/serviceaccount:ro","/var/lib/kubelet/pods/03ec582f-954a-4dee-a3dd-ea0ddf9b04b3/etc-hosts:/etc/hosts","/var/lib/kubelet/pods/03ec582f-954a-4dee-a3dd-ea0ddf9b04b3/containers/node-manager/f78b29f6:/dev/termination-log"],"ContainerIDFile":"","LogConfig":{"Type":"","Config":null},"NetworkMode":"container:a7853f436d7aa84756f2a00d429f7df946099c98f952f122937c58e8c29e1435","PortBindings":null,"RestartPolicy":{"Name":"no","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":null,"CapDrop":null,"Capabilities":null,"Dns":null,"DnsOptions":null,"DnsSearch":null,"ExtraHosts":null,"GroupAdd":null,"IpcMode":"container:a7853f436d7aa84756f2a00d429f7df946099c98f952f122937c58e8c29e1435","Cgroup":"","Links":null,"OomScoreAdj":1000,"PidMode":"","Privileged":true,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":["seccomp=unconfined"],"UTSMode":"","UsernsMode":"","ShmSize":67108864,"ConsoleSize":[0,0],"Isolation":"","CpuShares":2,"Memory":0,"NanoCpus":0,"CgroupParent":"kubepods-besteffort-pod03ec582f_954a_4dee_a3dd_ea0ddf9b04b3.slice","BlkioWeight":0,"BlkioWeightDevice":null,"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":100000,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"DiskQuota":0,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":null,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":null,"ReadonlyPaths":null},"NetworkingConfig":null}
I1210 16:57:12.917140 118028 hook_manager.go:200] Send to PreHook handler 0
I1210 16:57:12.917148 118028 hook_connector.go:70] Send request POST /prehook/containers/create for non-versioned
I1210 16:57:12.919436 118028 hook_connector.go:82] Decode response /prehook/containers/create for non-versioned
I1210 16:57:12.919766 118028 hook_manager.go:174] Send data to backend path /containers/create
I1210 16:57:12.926452 118028 hook_manager.go:177] Finish backend path /containers/create
I1210 16:57:12.926467 118028 hook_manager.go:280] PostHook request /containers/create, body: {"statusCode":201,"body":{"Id":"9f17870589ebcc5593dccb0dc67b83c72815c360028f31061f597d6716d34f8e","Warnings":null}
}
I1210 16:57:12.926860 118028 hook_manager.go:343] Unhandled request POST /containers/9f17870589ebcc5593dccb0dc67b83c72815c360028f31061f597d6716d34f8e/start
I1210 16:57:12.960252 118028 hook_manager.go:343] Unhandled request GET /containers/9f17870589ebcc5593dccb0dc67b83c72815c360028f31061f597d6716d34f8e/json
I1210 16:57:13.006822 118028 hook_manager.go:343] Unhandled request GET /containers/9f17870589ebcc5593dccb0dc67b83c72815c360028f31061f597d6716d34f8e/json
I1210 16:57:13.007458 118028 hook_manager.go:343] Unhandled request GET /containers/9f17870589ebcc5593dccb0dc67b83c72815c360028f31061f597d6716d34f8e/json
I1210 16:57:13.008184 118028 hook_manager.go:343] Unhandled request GET /containers/a7853f436d7aa84756f2a00d429f7df946099c98f952f122937c58e8c29e1435/json
from caelus.
你的 docker info 显示不是用的 cgroupfs 是 systemd
from caelus.
@GeorgeSen
更详细的文档已上传
https://github.com/Tencent/caelus/blob/master/doc/start.md
https://github.com/Tencent/caelus/blob/master/doc/config.md
入口为:
欢迎使用caelus
from caelus.
Related Issues (20)
- ERROR: /sys/fs/cgroup/cpu/cpu.offline: no such file or directory HOT 5
- 不完善的README HOT 3
- 离线业务流量限制手段 HOT 3
- lighthouse make rpm 报错 HOT 2
- Some Feedback HOT 2
- 离线调度器哪里去了 HOT 1
- 离线调度器开源了吗 HOT 5
- Whether LinuxContainerExecutor could be supported on NM runs in Docker HOT 4
- 离线大框的一个问题 HOT 1
- 在容器中往/rootfs/etc写文件,报:Read-only file system HOT 4
- 运行二进制 ./caelus --v="2" --kubeconfig=config 找不到k8s 节点?? HOT 9
- no kind "hookConfiguration" HOT 2
- lighthouse组件是必须的吗 HOT 6
- 干扰检测部分的实现有开源吗? HOT 3
- 离线调度器准备开源吗,如果开源的话,大概什么时候开源 HOT 1
- lighthouse运行报错 HOT 5
- 收集指标问题 HOT 3
- lighthouse支持gRPC协议吗? HOT 1
- question: which tc filter worked in netqos HOT 3
- 请问下batch-scheduler和coordinator现在开源了么 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from caelus.