Giter Club home page Giter Club logo

Comments (23)

GeorgeSen avatar GeorgeSen commented on May 6, 2024

lighthouse 和 lighthouse-plugin进程配置和社区代码一致, lighthouse-plugin加入了两个配置项 --hostname-override= --kubeconfig=

from caelus.

GeorgeSen avatar GeorgeSen commented on May 6, 2024

lighthouse 配置:

/etc/lighthouse/config:

ARGS="--config=/etc/lighthouse/config.yaml --logtostderr=false --v=10 --log-dir=/data0/gelanjie/caelus_log/lighthouse"

/etc/lighthouse/config.yaml :

apiVersion: componentconfig.lighthouse.io/v1alpha1
kind: HookConfiguration
timeout: 10
listenAddress: unix:///var/run/lighthouse.sock
webhooks:

  • name: versioned
    endpoint: unix://@plugin-server
    failurePolicy: Fail
    stages:
    • urlPattern: /{id:v[.0-9]+}/containers/create
      method: post
      type: PreHook
  • name: non-versioned
    endpoint: unix://@plugin-server
    failurePolicy: Fail
    stages:
    • urlPattern: /containers/create
      method: post
      type: PreHook

from caelus.

GeorgeSen avatar GeorgeSen commented on May 6, 2024

plugin-server:

/etc/plugin-server/config:

ARGS="--feature-gates=AllAlpha=true --logtostderr=false --hostname-override=sbd2-lgna2-a10-bcc-2 --v=10 --log-dir=/data0/gelanjie/caelus_log/plugin-server --listen-address=unix://@plugin-server --kubeconfig=/root/.kube/config"

from caelus.

GeorgeSen avatar GeorgeSen commented on May 6, 2024

kubelet 报错:

Connecting to docker on unix:///var/run/lighthouse.sock
Start docker client with request timeout=2m0s
failed to run Kubelet: failed to create kubelet: failed to get docker version: request returned Bad Gateway for API route and

from caelus.

ddongchen avatar ddongchen commented on May 6, 2024

lighthouse的配置修改为:

apiVersion: lighthouse.io/v1alpha1
kind: hookConfiguration
timeout: 10
listenAddress: unix:///var/run/lighthouse.sock
webhooks:

  • name: docker
    endpoint: unix://@plugin-server
    failurePolicy: Fail
    stages:
    • urlPattern: /containers/create
      method: post
      type: PreHook
    • urlPattern: /containers/{name:.*}/update
      method: post
      type: PreHook

测试下?

from caelus.

mYmNeo avatar mYmNeo commented on May 6, 2024

提供一下 docker info 的数据

from caelus.

GeorgeSen avatar GeorgeSen commented on May 6, 2024

提供一下 docker info 的数据

docker info:

Containers: 34
Running: 17
Paused: 0
Stopped: 17
Images: 17
Server Version: 18.09.6
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: nvidia
Init Binary: docker-init
containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30-dirty
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.19.95-21
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 236
Total Memory: 957.9GiB
Name: sbd2-lgna2-a10-bcc-2
ID: MSMC:CI72:UZC2:332D:O5SZ:MIJR:4DWF:5ITS:DDP7:EJ3A:PYDK:NNDY
Docker Root Dir: /data0/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:

Live Restore Enabled: true
Product License: Community Engine

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

from caelus.

mYmNeo avatar mYmNeo commented on May 6, 2024

你这边docker daemon监听的地址是/var/run/docker.sock吗?如果不是需要按照文档配置RemoteEndpoint,

RemoteEndpoint string `json:"remoteEndpoint,omitempty"`

你这边是docker无法访问

from caelus.

GeorgeSen avatar GeorgeSen commented on May 6, 2024

你这边docker daemon监听的地址是/var/run/docker.sock吗?如果不是需要按照文档配置RemoteEndpoint,

RemoteEndpoint string `json:"remoteEndpoint,omitempty"`

你这边是docker无法访问

docker的监听地址确实是 /var/run/docker.sock

[root@sbd2-lgna2-a10-bcc-2] /data0$ ls -al /var/run/docker.sock
srw-rw---- 1 root docker 0 12月 9 20:31 /var/run/docker.sock

有什么排错的工具和手段吗?多谢!

from caelus.

mYmNeo avatar mYmNeo commented on May 6, 2024

你直接使用docker -H unix:///var/run/lighthouse.sock info看是否能显示数据,多长时间返回数据。你这边提供的log显示,请求被cancel掉了。要么是你这边kubelet设置的docker 请求时间过短,要么就是要加大配置的timeout时间

from caelus.

GeorgeSen avatar GeorgeSen commented on May 6, 2024

你直接使用docker -H unix:///var/run/lighthouse.sock info看是否能显示数据,多长时间返回数据。你这边提供的log显示,请求被cancel掉了。要么是你这边kubelet设置的docker 请求时间过短,要么就是要加大配置的timeout时间

[root@sbd2-lgna2-a10-bcc-2] /data0$ docker -H unix:///var/run/lighthouse.sock info
request returned Bad Gateway for API route and version http://%2Fvar%2Frun%2Flighthouse.sock/v1.39/info, check if the server supports the requested API version

[root@sbd2-lgna2-a10-bcc-2] /data0$ docker -H unix:///var/run/docker.sock info
Containers: 34
Running: 1
Paused: 0
Stopped: 33
Images: 17
Server Version: 18.09.6

from caelus.

mYmNeo avatar mYmNeo commented on May 6, 2024

你直接使用docker -H unix:///var/run/lighthouse.sock info看是否能显示数据,多长时间返回数据。你这边提供的log显示,请求被cancel掉了。要么是你这边kubelet设置的docker 请求时间过短,要么就是要加大配置的timeout时间

[root@sbd2-lgna2-a10-bcc-2] /data0$ docker -H unix:///var/run/lighthouse.sock info request returned Bad Gateway for API route and version http://%2Fvar%2Frun%2Flighthouse.sock/v1.39/info, check if the server supports the requested API version

[root@sbd2-lgna2-a10-bcc-2] /data0$ docker -H unix:///var/run/docker.sock info Containers: 34 Running: 1 Paused: 0 Stopped: 33 Images: 17 Server Version: 18.09.6

这个时候 lighthouse 的日志发出来一下,要完整的请求日志

from caelus.

GeorgeSen avatar GeorgeSen commented on May 6, 2024

重新用社区代码编译了一个二进制试了一下

发送了两次 docker -H unix:///var/run/lighthouse.sock info

以及 curl -v --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info

这个时候 lighthouse 的完整日志:

[root@sbd2-lgna2-a10-bcc-2] /data0$ cat /data0/gelanjie/caelus_log/lighthouse/lighthouse.INFO
Log file created at: 2021/12/10 15:16:43
Running on machine: sbd2-lgna2-a10-bcc-2
Binary: Built with gc go1.16.10 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1210 15:16:43.107530 17467 util.go:69] FLAG: --add-dir-header="false"
I1210 15:16:43.107680 17467 util.go:69] FLAG: --alsologtostderr="false"
I1210 15:16:43.107683 17467 util.go:69] FLAG: --config="/etc/lighthouse/config.yaml"
I1210 15:16:43.107693 17467 util.go:69] FLAG: --help="false"
I1210 15:16:43.107695 17467 util.go:69] FLAG: --log-backtrace-at=":0"
I1210 15:16:43.107699 17467 util.go:69] FLAG: --log-dir="/data0/gelanjie/caelus_log/lighthouse"
I1210 15:16:43.107701 17467 util.go:69] FLAG: --log-file=""
I1210 15:16:43.107703 17467 util.go:69] FLAG: --log-file-max-size="1800"
I1210 15:16:43.107704 17467 util.go:69] FLAG: --log-flush-frequency="5s"
I1210 15:16:43.107706 17467 util.go:69] FLAG: --logtostderr="false"
I1210 15:16:43.107708 17467 util.go:69] FLAG: --skip-headers="false"
I1210 15:16:43.107709 17467 util.go:69] FLAG: --skip-log-headers="false"
I1210 15:16:43.107711 17467 util.go:69] FLAG: --stderrthreshold="2"
I1210 15:16:43.107712 17467 util.go:69] FLAG: --v="10"
I1210 15:16:43.107714 17467 util.go:69] FLAG: --version="false"
I1210 15:16:43.107716 17467 util.go:69] FLAG: --vmodule=""
I1210 15:16:43.108173 17467 hook_manager.go:124] Hook timeout: 10 seconds
I1210 15:16:43.108178 17467 hook_manager.go:132] Register hook docker, endpoint unix://@plugin-server
I1210 15:16:43.108182 17467 hook_manager.go:136] Register PreHook post /containers/create with unix://@plugin-server
I1210 15:16:43.108186 17467 hook_manager.go:136] Register PreHook post /containers/{name:.}/update with unix://@plugin-server
I1210 15:16:43.108188 17467 hook_manager.go:164] Build router: post /containers/create
I1210 15:16:43.108202 17467 hook_manager.go:164] Build router: post /containers/{name:.
}/update
I1210 15:16:43.108328 17467 hook_manager.go:101] Hook manager is running
I1210 15:17:32.250451 17467 hook_manager.go:343] Unhandled request GET /_ping
I1210 15:17:32.250505 17467 log.go:184] http: proxy error: context canceled
I1210 15:17:32.251220 17467 hook_manager.go:343] Unhandled request GET /v1.39/info
I1210 15:17:32.251236 17467 log.go:184] http: proxy error: context canceled
I1210 15:17:38.916794 17467 hook_manager.go:343] Unhandled request GET /_ping
I1210 15:17:38.916843 17467 log.go:184] http: proxy error: context canceled
I1210 15:17:38.917189 17467 hook_manager.go:343] Unhandled request GET /v1.39/info
I1210 15:17:38.917221 17467 log.go:184] http: proxy error: context canceled
I1210 15:18:03.286403 17467 hook_manager.go:343] Unhandled request GET /v1.39/info
I1210 15:18:03.286440 17467 log.go:184] http: proxy error: context canceled
I1210 15:18:07.702430 17467 hook_manager.go:343] Unhandled request GET /v1.39/info
I1210 15:18:07.702468 17467 log.go:184] http: proxy error: context canceled

from caelus.

mYmNeo avatar mYmNeo commented on May 6, 2024

你用什么用户启动 lighthouse?

from caelus.

GeorgeSen avatar GeorgeSen commented on May 6, 2024

你用什么用户启动 lighthouse?

root 用户

[root@sbd2-lgna2-a10-bcc-2] /data0$ ls -al /var/run/lighthouse.sock
srwxr-xr-x 1 root root 0 12月 10 13:56 /var/run/lighthouse.sock

[root@sbd2-lgna2-a10-bcc-2] /data0$ ls -al /var/run/docker.sock
srw-rw---- 1 root docker 0 12月 10 11:08 /var/run/docker.sock

from caelus.

mYmNeo avatar mYmNeo commented on May 6, 2024

curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info, 执行这个有输出结果吗

from caelus.

GeorgeSen avatar GeorgeSen commented on May 6, 2024

curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info, 执行这个有输出结果吗

没结果

[root@sbd2-lgna2-a10-bcc-2] /var/run$ curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info
[root@sbd2-lgna2-a10-bcc-2] /var/run$ curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/version
[root@sbd2-lgna2-a10-bcc-2] /var/run$ curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/info

from caelus.

GeorgeSen avatar GeorgeSen commented on May 6, 2024

curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info, 执行这个有输出结果吗

没结果

[root@sbd2-lgna2-a10-bcc-2] /var/run$ curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info [root@sbd2-lgna2-a10-bcc-2] /var/run$ curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/version [root@sbd2-lgna2-a10-bcc-2] /var/run$ curl --unix-socket /var/run/lighthouse.sock http://127.0.0.1/info

[root@sbd2-lgna2-a10-bcc-2] /data0$ curl -v --unix-socket /var/run/lighthouse.sock http://127.0.0.1/v1.39/info
About to connect() to 127.0.0.1 port 80 (#0)
Trying /var/run/lighthouse.sock...
Failed to set TCP_KEEPIDLE on fd 3
Failed to set TCP_KEEPINTVL on fd 3
Connected to 127.0.0.1 (/var/run/lighthouse.sock) port 80 (#0)
GET /v1.39/info HTTP/1.1
User-Agent: curl/7.29.0
Host: 127.0.0.1
Accept: /

HTTP/1.1 502 Bad Gateway
Date: Fri, 10 Dec 2021 07:18:23 GMT
Content-Length: 0

Connection #0 to host 127.0.0.1 left intact

from caelus.

mYmNeo avatar mYmNeo commented on May 6, 2024

@GeorgeSen 拉一下新的代码再试一下

from caelus.

GeorgeSen avatar GeorgeSen commented on May 6, 2024

目前正常了

from caelus.

GeorgeSen avatar GeorgeSen commented on May 6, 2024

@mYmNeo 按照新代码试过了,运行没有问题,我创建一个pod的时候把注解加上了:

mixer.kubernetes.io/app-class: "greedy"

发现创建的pod的容器并没有将cgroup放在 /sys/fs/cgroup/cpu,cpuacct/kubepods/offline 目录下:

lighthouse 日志:

I1210 16:57:12.913446 118028 hook_manager.go:343] Unhandled request GET /images/nm-operator:v0_1/json
I1210 16:57:12.914741 118028 hook_manager.go:343] Unhandled request GET /images/nm-operator:v0_1/json
I1210 16:57:12.916540 118028 hook_manager.go:343] Unhandled request GET /version
I1210 16:57:12.917098 118028 hook_manager.go:333] Handle request POST /containers/create
I1210 16:57:12.917134 118028 hook_manager.go:313] PreHook request /containers/create, body: {"Hostname":"","Domainname":"","User":"0","AttachStdin":false,"AttachStdout":false,"AttachStderr":false,"Tty":false,"OpenStdin":false,"StdinOnce":false,"Env":["USER=root","PID_FILE=/tmp/hadoop-root-nodemanager.pid","CONTAINER_EXECUTOR=org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor","GINIT_PORT=10010","GROUP=root","HADOOP_CONF_DIR=/opt/module/hadoop-3.1.3/etc/hadoop","HADOOP_YARN_HOME=/opt/module/hadoop-3.1.3/","CGROUP_PATH=/sys/fs/cgroup","NM_LOCAL_DIRS=/hadoop-data","MY_NODE_IP=10.26.0.10","KUBERNETES_PORT=tcp://10.243.0.1:443","KUBERNETES_PORT_443_TCP=tcp://10.243.0.1:443","KUBERNETES_PORT_443_TCP_PROTO=tcp","KUBERNETES_PORT_443_TCP_PORT=443","KUBERNETES_PORT_443_TCP_ADDR=10.243.0.1","KUBERNETES_SERVICE_HOST=10.243.0.1","KUBERNETES_SERVICE_PORT=443","KUBERNETES_SERVICE_PORT_HTTPS=443"],"Cmd":null,"Healthcheck":{"Test":["NONE"]},"Image":"sha256:30ce197b3c5b0d06d3bf0e6a320f141067321af7845d588c2ea8d8bea3464bb9","Volumes":null,"WorkingDir":"","Entrypoint":null,"OnBuild":null,"Labels":{"annotation.io.kubernetes.container.hash":"bd015f2b","annotation.io.kubernetes.container.restartCount":"0","annotation.io.kubernetes.container.terminationMessagePath":"/dev/termination-log","annotation.io.kubernetes.container.terminationMessagePolicy":"File","annotation.io.kubernetes.pod.terminationGracePeriod":"30","io.kubernetes.container.logpath":"/var/log/pods/hadoop-yarn_node-manager-9rzvq_03ec582f-954a-4dee-a3dd-ea0ddf9b04b3/node-manager/0.log","io.kubernetes.container.name":"node-manager","io.kubernetes.docker.type":"container","io.kubernetes.pod.name":"node-manager-9rzvq","io.kubernetes.pod.namespace":"hadoop-yarn","io.kubernetes.pod.uid":"03ec582f-954a-4dee-a3dd-ea0ddf9b04b3","io.kubernetes.sandbox.id":"a7853f436d7aa84756f2a00d429f7df946099c98f952f122937c58e8c29e1435"},"HostConfig":{"Binds":["/var/lib/kubelet/pods/03ec582f-954a-4dee-a3dd-ea0ddf9b04b3/volumes/kubernetes.io~secret/default-token-lvn46:/var/run/secrets/kubernetes.io/serviceaccount:ro","/var/lib/kubelet/pods/03ec582f-954a-4dee-a3dd-ea0ddf9b04b3/etc-hosts:/etc/hosts","/var/lib/kubelet/pods/03ec582f-954a-4dee-a3dd-ea0ddf9b04b3/containers/node-manager/f78b29f6:/dev/termination-log"],"ContainerIDFile":"","LogConfig":{"Type":"","Config":null},"NetworkMode":"container:a7853f436d7aa84756f2a00d429f7df946099c98f952f122937c58e8c29e1435","PortBindings":null,"RestartPolicy":{"Name":"no","MaximumRetryCount":0},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"CapAdd":null,"CapDrop":null,"Capabilities":null,"Dns":null,"DnsOptions":null,"DnsSearch":null,"ExtraHosts":null,"GroupAdd":null,"IpcMode":"container:a7853f436d7aa84756f2a00d429f7df946099c98f952f122937c58e8c29e1435","Cgroup":"","Links":null,"OomScoreAdj":1000,"PidMode":"","Privileged":true,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":["seccomp=unconfined"],"UTSMode":"","UsernsMode":"","ShmSize":67108864,"ConsoleSize":[0,0],"Isolation":"","CpuShares":2,"Memory":0,"NanoCpus":0,"CgroupParent":"kubepods-besteffort-pod03ec582f_954a_4dee_a3dd_ea0ddf9b04b3.slice","BlkioWeight":0,"BlkioWeightDevice":null,"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":100000,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"DeviceCgroupRules":null,"DeviceRequests":null,"DiskQuota":0,"KernelMemory":0,"KernelMemoryTCP":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":null,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":null,"ReadonlyPaths":null},"NetworkingConfig":null}
I1210 16:57:12.917140 118028 hook_manager.go:200] Send to PreHook handler 0
I1210 16:57:12.917148 118028 hook_connector.go:70] Send request POST /prehook/containers/create for non-versioned
I1210 16:57:12.919436 118028 hook_connector.go:82] Decode response /prehook/containers/create for non-versioned
I1210 16:57:12.919766 118028 hook_manager.go:174] Send data to backend path /containers/create
I1210 16:57:12.926452 118028 hook_manager.go:177] Finish backend path /containers/create
I1210 16:57:12.926467 118028 hook_manager.go:280] PostHook request /containers/create, body: {"statusCode":201,"body":{"Id":"9f17870589ebcc5593dccb0dc67b83c72815c360028f31061f597d6716d34f8e","Warnings":null}
}
I1210 16:57:12.926860 118028 hook_manager.go:343] Unhandled request POST /containers/9f17870589ebcc5593dccb0dc67b83c72815c360028f31061f597d6716d34f8e/start
I1210 16:57:12.960252 118028 hook_manager.go:343] Unhandled request GET /containers/9f17870589ebcc5593dccb0dc67b83c72815c360028f31061f597d6716d34f8e/json
I1210 16:57:13.006822 118028 hook_manager.go:343] Unhandled request GET /containers/9f17870589ebcc5593dccb0dc67b83c72815c360028f31061f597d6716d34f8e/json
I1210 16:57:13.007458 118028 hook_manager.go:343] Unhandled request GET /containers/9f17870589ebcc5593dccb0dc67b83c72815c360028f31061f597d6716d34f8e/json
I1210 16:57:13.008184 118028 hook_manager.go:343] Unhandled request GET /containers/a7853f436d7aa84756f2a00d429f7df946099c98f952f122937c58e8c29e1435/json

from caelus.

mYmNeo avatar mYmNeo commented on May 6, 2024

你的 docker info 显示不是用的 cgroupfs 是 systemd

from caelus.

ddongchen avatar ddongchen commented on May 6, 2024

@GeorgeSen
更详细的文档已上传
https://github.com/Tencent/caelus/blob/master/doc/start.md
https://github.com/Tencent/caelus/blob/master/doc/config.md
入口为:
image

欢迎使用caelus

from caelus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.