kubewharf / kelemetry Goto Github PK
View Code? Open in Web Editor NEWGlobal control plane tracing for Kubernetes
License: Apache License 2.0
Global control plane tracing for Kubernetes
License: Apache License 2.0
Kelemetry is perfectly meets our needs, and I have been running it for a few days.
But one thing that confused me is the db size of etcd keeps increasing. It brings io pressure on the etcd server, and sometimes cause proposals pending. This could be due to a high number of k8s events(We have also optimized the data disk of etcd by using SSDs).
If Kelemetry could support event filtering, it would be a great help. For example, filter periodic events which I don't care about
(Is redactPattern a similar filtering functionality? I had tried, but not works will)
No response
Sometimes Kelemetry's linkers are insufficient to relate two traces or linking is too expensive/large (e.g. linking a leader election object would link all operations that interact with the component). The user can tell us the two traces are related explicitly through a new endpoint e.g. /merge?trace={traceid1}&trace={traceid2}
that redirects to a new cached ID.
It is often useful to debug why a controller is not working by comparing its output activity against its leader lease update frequency.
helm install kelemetry oci://ghcr.io/kubewharf/kelemetry-chart --values values.yaml
install successfully
#helm install kelemetry oci://ghcr.io/kubewharf/kelemetry-chart --values values.yaml
Pulled: ghcr.io/kubewharf/kelemetry-chart:0.2.3
Digest: sha256:5c6ae9ee1e30fc8f5f2409db3a5cfa502b0dabf8c42fba3e122e420317f5e89a
Error: INSTALLATION FAILED: template: kelemetry-chart/templates/storage.deployment.yaml:30:16: executing "kelemetry-chart/templates/storage.deployment.yaml" at <include "kelemetry.storage-options-stateless" .>: error calling include: template: no template "kelemetry.storage-options-stateless" associated with template "gotpl"
0.2.3
k8s:1.2.4
Jaeger:1.42
Install kelemetry by helm, but can not see the audit event in k8s resource trace. It can see the audit event by kind setup to uncomment the extension trace config in tfconfig.yaml. How to enable or config them in helm install ?
No response
We can try to infer the diff (although not exact) by reading the RequestObject
field and comparing it against the original object state. However this requires caching each resourceVersion of an object in the database.
An admission webhook gets a lot of requests, most of which failed. As the admission webhook owner does not know the component that created these requests, audit diff would help them diagnose why the admission webhook rejects them.
Hello, sorry to disturb you guys. I appreciate this project very much. Is there any design document for this project?
Furthermore, there are hardly any annotations in the code, which may increase the difficulty for other contributors to participate. Would it be possible to add some annotations to key fields or structures to improve clarity?
During FindTraces
call, given startTime
and endTime
, retrieve all trace IDs from floor(startTime, 30 minutes)
to floor(endTime, 30 minutes)
, grouped by the object identity tags (cluster/group/resource/namespace/name
). Each group is displayed as a separate trace and has its own tracecache entry. The tracecache value stores the list of trace identifiers instead of a single identifier. During GetTrace
call, all identifiers are fetched, then the tf package shall merge spans with the same object identity + nestingLevel tags together (by merging their underlying spans).
When user searches spans from 12:15 to 12:45, the user expects to see events between this period, not events whose root object span is within this period (i.e. only 12:30). The current behavior is very confusing to users, and very inconvenient to use because debugging an actual event that happened across multiple traces requires opening two separate pages.
List trace without any filters
Click on a non-object span
No response
goroutine 398 [running]:
github.com/kubewharf/kelemetry/pkg/frontend/tf/step.ReplaceNameVisitor.Enter({}, {0xc0466fd800, 0xc0466fd830, 0xc0466fd860, 0xc0466fd890, 0x0}, 0x0)
/data00/home/chankyin/go/src/github.com/kubewharf/kelemetry/pkg/frontend/tf/step/prune_tags.go:30 +0x6e
github.com/kubewharf/kelemetry/pkg/frontend/tf/tree.spanNode.visit({{0xc0466fd800, 0xc0466fd830, 0xc0466fd860, 0xc0466fd890, 0x0}, 0x0}, {0x408a578, 0x5596210})
/data00/home/chankyin/go/src/github.com/kubewharf/kelemetry/pkg/frontend/tf/tree/tree.go:120 +0xf9
github.com/kubewharf/kelemetry/pkg/frontend/tf/tree.SpanTree.Visit(...)
/data00/home/chankyin/go/src/github.com/kubewharf/kelemetry/pkg/frontend/tf/tree/tree.go:116
github.com/kubewharf/kelemetry/pkg/frontend/tf.(*Transformer).Transform(0xc00037bf20, 0xc042691570, 0xc04659f8f0?, 0x459836d0?)
/data00/home/chankyin/go/src/github.com/kubewharf/kelemetry/pkg/frontend/tf/transform.go:62 +0x2c8
github.com/kubewharf/kelemetry/pkg/frontend/reader.(*spanReader).GetTrace(0xc000514d80, {0x409e6b8, 0xc04659f8f0}, {0x10?, 0xc041061458?})
/data00/home/chankyin/go/src/github.com/kubewharf/kelemetry/pkg/frontend/reader/reader.go:176 +0x2ca
github.com/jaegertracing/jaeger/plugin/storage/grpc/shared.(*GRPCHandler).GetTrace(0xc000614020, 0xc04659f920, {0x40a6b50, 0xc041c9fe20})
/home/chankyin/go/pkg/mod/github.com/jaegertracing/[email protected]/plugin/storage/grpc/shared/grpc_handler.go:152 +0xf6
github.com/jaegertracing/jaeger/proto-gen/storage_v1._SpanReaderPlugin_GetTrace_Handler({0x37ddcc0?, 0xc000614020}, {0x40a45c0, 0xc0410613b0})
/home/chankyin/go/pkg/mod/github.com/jaegertracing/[email protected]/proto-gen/storage_v1/storage.pb.go:1456 +0xf9
google.golang.org/grpc.(*Server).processStreamingRPC(0xc040c24000, {0x40a9ae0, 0xc0435da9c0}, 0xc0464978c0, 0xc00051def0, 0x553e740, 0x0)
/home/chankyin/go/pkg/mod/google.golang.org/[email protected]/server.go:1639 +0x1fe8
google.golang.org/grpc.(*Server).handleStream(0xc040c24000, {0x40a9ae0, 0xc0435da9c0}, 0xc0464978c0, 0x0)
/home/chankyin/go/pkg/mod/google.golang.org/[email protected]/server.go:1726 +0xfaf
google.golang.org/grpc.(*Server).serveStreams.func1.2()
/home/chankyin/go/pkg/mod/google.golang.org/[email protected]/server.go:966 +0xed
created by google.golang.org/grpc.(*Server).serveStreams.func1
/home/chankyin/go/pkg/mod/google.golang.org/[email protected]/server.go:964 +0x4de
No response
all pod is running
all pods are running but frontend
kelemetry-1689762474-collector-79f5c59df4-52q4z 1/1 Running 2 (16h ago) 16h
kelemetry-1689762474-consumer-b88789bb4-5kzbf 1/1 Running 0 16h
kelemetry-1689762474-etcd-0 1/1 Running 0 10m
kelemetry-1689762474-frontend-755b8f47ff-rl4jl 0/2 CrashLoopBackOff 8 (8s ago) 2m14s
kelemetry-1689762474-informers-76fb5d4458-s5ftw 1/1 Running 0 16h
kelemetry-1689762474-storage-0 1/1 Running 0 16h
error.log
{"level":"warn","ts":1689822124.0339894,"caller":"channelz/funcs.go:342","msg":"[core][Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {\n \"Addr\": \"localhost:17271\",\n \"ServerName\": \"localhost:17271\",\n \"Attributes\": null,\n \"BalancerAttributes\": null,\n \"Type\": 0,\n \"Metadata\": null\n}. Err: connection error: desc = \"transport: Error while dialing dial tcp [::1]:17271: connect: connection refused\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822124.034001,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1 SubChannel #2] Subchannel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822124.0340135,"caller":"grpclog/component.go:71","msg":"[core]pickfirstBalancer: UpdateSubConnState: 0xc000196540, {TRANSIENT_FAILURE connection error: desc = \"transport: Error while dialing dial tcp [::1]:17271: connect: connection refused\"}","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822124.0340188,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1] Channel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822126.1586947,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1] Channel Connectivity change to SHUTDOWN","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822126.1587367,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1 SubChannel #2] Subchannel Connectivity change to SHUTDOWN","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822126.1587467,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1 SubChannel #2] Subchannel deleted","system":"grpc","grpc_log":true}
{"level":"info","ts":1689822126.1587508,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1] Channel deleted","system":"grpc","grpc_log":true}
{"level":"fatal","ts":1689822126.1587627,"caller":"./main.go:107","msg":"Failed to init storage factory","error":"grpc-plugin builder failed to create a store: error connecting to remote storage: context deadline exceeded","stacktrace":"main.main.func1\n\t./main.go:107\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:968\nmain.main\n\t./main.go:170\nruntime.main\n\truntime/proc.go:250"}
kubernetes version:
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-15T13:40:17Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:51:45Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
cloud provider: local vm
Jaeger version: jaegertracing/jaeger-collector:1.42
use deployment collector of Kelemetry
storage: custom nfs storage
Deploy following by steps in https://github.com/kubewharf/kelemetry/blob/a832d2856bccdbdedefbe2a437d05c709e22f5bd/docs/DEPLOY.md
#helm install kelemetry oci://ghcr.io/kubewharf/kelemetry-chart --values values.yaml
All pods and svc run up fine and tracing data can be browsed from UI.
frontend pod CrashLoopBackOff, check from log
time="2023-07-17T13:59:40Z" level=error msg="unknown flag: --trace-server-enable"
run the docker image, no "--trace-server-enable"" in kelemetry:0.1.0
#docker run -d ghcr.io/kubewharf/kelemetry:0.1.0
af6e0f708046cbae26858de58df97a56442fabe8490646cc0aa61689564ac8d4
#docker exec -it af6e0f708046cbae26858de58df97a56442fabe8490646cc0aa61689564ac8d4 sh
...
--span-cache-local-cleanup-frequency duration frequency to collect garbage from span cache (default 30m0s)
--tracer string implementation of tracer. Possible values are ["otel"]. (default "otel")
--tracer-otel-endpoint string otel endpoint (default "127.0.0.1:4317")
--tracer-otel-insecure allow insecure otel connections
--tracer-otel-resource-attributes stringToString otel resource service attributes (default [service.version=dev])
--usage string
As with the frontend failed, not sure if need to switch to kelemetry:0.2.0? but how about other components, kelemetry-consumer, kelemetry-informers. [tried with 0.2.0, informers pod CrashLoopBackOff with
kubectl logs kelemetry-informers-84c49bc575-hs5dg
time="2023-07-18T01:30:12Z" level=error msg="unknown flag: --diff-cache-use-old-rv"]
and for the kelemetry-collector, already one issue #122 for the version 1.42, it does not listen to 4317. What version shall we use to get the tracing work?
there are also flood of such message in the "kelemetry-collector" pod.
{"level":"info","ts":1689602342.3532155,"caller":"[email protected]/server.go:932","msg":"[core][Server #6] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: \"\\x16\\x03\\x01\\x01\\x1b\\x01\\x00\\x01\\x17\\x03\\x03a\\x9a\\xd36\\x81\\x87\\xba\\xb0\\xdb\\xe8\\xdcp\\xa9\""","system":"grpc","grpc_log":true}
{"level":"info","ts":1689602359.958463,"caller":"[email protected]/server.go:932","msg":"[core][Server #6] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: \"\\x16\\x03\\x01\\x01\\x1b\\x01\\x00\\x01\\x17\\x03\\x03c\\uf448\\xb2$\\xf7\\x02\\x1f\\xee\\x13\\xe9\\x87\""","system":"grpc","grpc_log":true}
{"level":"info","ts":1689602562.2759514,"caller":"[email protected]/server.go:932","msg":"[core][Server #6] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: \"\\x16\\x03\\x01\\x01\\x1b\\x01\\x00\\x01\\x17\\x03\\x03Z\\xfb\\x05A=0\\xb0mi\\x12\\x1a\\xfa\\xd6\""","system":"grpc","grpc_log":true}
{"level":"info","ts":1689602634.1497142,"caller":"[email protected]/server.go:932","msg":"[core][Server #6] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: \"\\x16\\x03\\x01\\x01\\x1b\\x01\\x00\\x01\\x17\\x03\\x03PO\\xb9#%W־\\x03\\x1a\\xac\\t\\x9a\""","system":"grpc","grpc_log":true}
{"level":"info","ts":1689602778.7651076,"caller":"[email protected]/server.go:932","msg":"[core][Server #6] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: \"\\x16\\x03\\x01\\x01\\x1b\\x01\\x00\\x01\\x17\\x03\\x03e\\xd6z4\\xccLq\\xb286\\xb4\\x02z\""","system":"grpc","grpc_log":true}
{"level":"info","ts":1689602918.8066566,"caller":"[email protected]/server.go:932","msg":"[core][Server #6] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: \"\\x16\\x03\\x01\\x01\\x1b\\x01\\x00\\x01\\x17\\x03\\x03\\xb7\\x98\\xea\\xe3\\xd0\\x12\\xb2@\\xe7\\xc7S\\xe0\\x19\""","system":"grpc","grpc_log":true}
{"level":"info","ts":1689602953.5312855,"caller":"[email protected]/server.go:932","msg":"[core][Server #6] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: \"\\x16\\x03\\x01\\x01\\x1b\\x01\\x00\\x01\\x17\\x03\\x03K\\x15\\x01\\xf9\\x1cϋ\\xfb\\xc6xr7\\xe2\""","system":"grpc","grpc_log":true}
{"level":"info","ts":1689603063.8368163,"caller":"[email protected]/server.go:932","msg":"[core][Server #6] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams received bogus greeting from client: \"\\x16\\x03\\x01\\x01\\x1b\\x01\\x00\\x01\\x17\\x03\\x03\\xa9J\\x9bȨު\\xa8\\xb6\\x1bP\\xca\\xee\""","system":"grpc","grpc_log":true}
0.1.0
Kubernetes version :1.24.15
Jaeger version: 1.4.2
support for redis
No response
Currently diff controller runs a multi-leader elector to ensure high availability. However this puts a lot of stress on the diff cache backend.
To optimize this, we can introduce a new elector for writing to diff cache. While the original lease continues to restrict the number of watch streams, the new elector only restricts the number of diff cache writers.
stateDiagram
A --> B: acquire watch lease
B --> C: acquire cache writer lease
C --> D: lose cache writer lease
D --> B: maintained state\nfor more than\n15 seconds
State | Watch lease | Cache writer lease | Behavior |
---|---|---|---|
A | Not acquired | Do not participate in election | Try to acquire watch lease. Do not watch objects. |
B | Acquired | Not acquired | Try to acquire cache writer lease. Watch objects and maintain a local cache for 15 seconds. Do not write to remote diff cache. |
B -> C | Acquired | Not acquired -> acquired | Write the local cache for the previous 15 seconds to the remote diff cache. |
C | Acquired | Acquired | Watch objects and write to remote diff cache. |
D | Acquired | Acquired -> lost election for less than 15 seconds | Watch objects and write to remote diff cache. |
B | Acquired | Lost election for more than 15 seconds | Try to acquire cache writer lease. Watch objects and maintain a local cache for 15 seconds. Do not write to remote diff cache. |
No response
With the use of lazy frontend transformation, the field
pseudospan is no longer helpful in readability. Instead it increases the complexity of trace transformation, increases the number of entries in the span cache and doubles the length of the chain of parent span creation (an extra "children" span for every parent object).
No response
Allow configuring link pattern rules in jaeger-ui for chart deployment method
#25 introduced new tags that could reference other objects, but we do not support utilizing this in the helm chart as there is no jaeger-ui config option.
When a pull request is created/synchronized, render the sample API response similar to what the generate-page
workflow is doing now. Instead of uploading the page, start a chromium headless instance to load the page and take a screenshot, then post it to the pull request.
This helps spotting UI breaks in frontend more easily.
Originally posted by calvinxu July 14, 2023
apiVersion: v1
clusters:
- cluster:
server: http://xxxxx:8080/audit
name: audit
contexts:
- context:
cluster: audit
name: main
current-context: main
kind: Config
preferences: {}
apiserver logs is normal, while from the kelemetry consumer pod log, it have the following error:
time="2023-07-14T05:00:40Z" level=error msg="cannot access cluster from object reference" error="cluster \"10.120.127.221\" is not available" mod=owner-linker object=10.120.127.221//pods/default/hello-6c64f775c6-hlmdn
time="2023-07-14T05:00:40Z" level=error msg="cannot access cluster from object reference" error="cluster \"10.120.127.221\" is not available" mod=owner-linker object=10.120.127.221//pods/default/hello-5ccd6d84d6-zldzw
time="2023-07-14T05:00:40Z" level=error msg="cannot access cluster from object reference" error="cluster \"10.120.127.221\" is not available" mod=owner-linker object=10.120.127.221//pods/default/hello-5ccd6d84d6-8dwk6
time="2023-07-14T05:00:41Z" level=error msg="cannot access cluster from object reference" error="cluster \"10.120.127.221\" is not available" mod=owner-linker object=10.120.127.221//pods/default/hello-5ccd6d84d6-qp7g4
any advice for the webhook setting or any other setting need to be adjusted to make it work?
from the Jaegure UI, there is no tracing data..
#NAME READY STATUS RESTARTS AGE
hello-6c64f775c6-56ghh 1/1 Running 0 5m31s
hello-6c64f775c6-hlmdn 1/1 Running 0 5m30s
hello-6c64f775c6-kwbvw 1/1 Running 0 5m31s
hello-6c64f775c6-nx2gd 1/1 Running 0 5m31s
hello-6c64f775c6-wb5kz 1/1 Running 0 5m30s
kelemetry-collector-7459c48b85-psdk9 1/1 Running 0 3h6m
kelemetry-collector-7459c48b85-qzxrn 1/1 Running 0 3h6m
kelemetry-collector-7459c48b85-smm84 1/1 Running 0 3h6m
kelemetry-consumer-549d4b664b-nh7fd 1/1 Running 0 153m
kelemetry-consumer-549d4b664b-xbxtq 1/1 Running 0 153m
kelemetry-consumer-549d4b664b-zfglq 1/1 Running 0 153m
kelemetry-etcd-0 1/1 Running 0 14h
kelemetry-etcd-1 1/1 Running 0 14h
kelemetry-etcd-2 1/1 Running 0 14h
kelemetry-frontend-d984d9fb9-7mb6j 2/2 Running 0 3h6m
kelemetry-frontend-d984d9fb9-p2w7h 2/2 Running 0 3h6m
kelemetry-frontend-d984d9fb9-qp4h4 2/2 Running 0 3h6m
kelemetry-informers-fb8ddb6b4-8wvsn 1/1 Running 0 3h6m
kelemetry-informers-fb8ddb6b4-drh8w 1/1 Running 0 3h6m
kelemetry-informers-fb8ddb6b4-l4cpk 1/1 Running 0 3h6m
kelemetry-storage-0 1/1 Running 0 14h
# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kelemetry-collector LoadBalancer 10.100.80.232 10.120.127.224 4317:31324/TCP 80m
kelemetry-etcd LoadBalancer 10.98.116.229 10.120.127.224 2379:31447/TCP,2380:32611/TCP 75m
kelemetry-query LoadBalancer 10.110.123.5 10.120.127.224 16686:32328/TCP,8090:30873/TCP 86m
kelemetry-storage ClusterIP None <none> 17271/TCP 86m
kelemetry-webhook LoadBalancer 10.108.9.80 10.120.127.224 8080:31597/TCP 86m
#kubectl logs kelemetry-consumer-8599cb4cc5-cbfkf
...
2023/07/14 07:21:02 traces export: context deadline exceeded: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.100.80.232:4317: connect: connection refused"
2023/07/14 07:21:02 traces export: context deadline exceeded: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.100.80.232:4317: connect: connection refused"
2023/07/14 07:21:02 traces export: context deadline exceeded: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.100.80.232:4317: connect: connection refused"
2023/07/14 07:21:17 traces export: context deadline exceeded: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.100.80.232:4317: connect: connection refused"
2023/07/14 07:21:27 traces export: context deadline exceeded: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.100.80.232:4317: connect: connection refused"
2023/07/14 07:21:27 traces export: context deadline exceeded: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.100.80.232:4317: connect: connection refused"
2023/07/14 07:21:27 traces export: context deadline exceeded: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.100.80.232:4317: connect: connection refused"
.....
there is no 4317 port is listening in the collector container, is that something wrong with the collector?
#kubectl exec -it kelemetry-collector-7cf7988fd5-pfvcb -- sh
/ # netstat -tulp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 :::14268 :::* LISTEN 1/collector-linux
tcp 0 0 :::14269 :::* LISTEN 1/collector-linux
tcp 0 0 :::14250 :::* LISTEN 1/collector-linux
1. check out 0.2.2 tag from kelemetry repo
2. helm install kelemetry oci://ghcr.io/kubewharf/kelemetry-chart --values values.yaml
on the Jaeger UI, when browsing the tracing data for a deployment/pod, it will event/audit data in the tracing data,
it only displays the event data
it is quite odd, that the cluster is "foo", is it from a test value?
my audit-log setting is as below, enabled in the apiserver setting, not sure whether the setting for webhook is correct??? I followed somehow from the quickstart demo setting.
#cat audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
# cat webhook-config.yaml
apiVersion: v1
clusters:
- cluster:
server: http://10.200.112.184:8080/audit
name: audit
contexts:
- context:
cluster: audit
name: main
current-context: main
kind: Config
preferences: {}
in quickstart deployment, event and audit data both displayed
0.2.2
k8s:1.23.17
Jaeger:1.4.2
I want to aggregate multicluster event by kelemetry as mentioned in #136
but I confused how to configure the storageBackend to use elasticsearch. I met errors when I set storageBackend.type
to elasticsearch
(I'm using kelemetry-chart-0.2.2):
Error: UPGRADE FAILED: template: kelemetry-chart/templates/frontend.deployment.yaml:57:15: executing "kelemetry-chart/templates/frontend.deployment.yaml" at <include "kelemetry.storage-plugin-options" .>: error calling include: template: kelemetry-chart/templates/_helpers.yaml:263:4: executing "kelemetry.storage-plugin-options" at <include "kelemetry.storage-plugin-options-raw" .>: error calling include: template: kelemetry-chart/templates/_helpers.yaml:276:27: executing "kelemetry.storage-plugin-options-raw" at <include "kelemetry.storage-options-raw" .>: error calling include: template: kelemetry-chart/templates/_helpers.yaml:299:4: executing "kelemetry.storage-options-raw" at <include "kelemetry.storage-options-raw-stateless" .>: error calling include: template: no template "kelemetry.storage-options-raw-stateless" associated with template "gotpl"
Is there any example about it?
No response
Setup tests and CI to ensure the chart is working correctly in different setup scenarios:
Helm chart is often out of maintenance due to rapid feature iteration, but our CI only tests the make quickstart
script and development often just uses the make stack
script.
From k8s/config/mapoption, there has two field targetClusterName
and kubeconfig
, seem like meanning kelemetry can watch multicluster event in one master cluster(controlplane cluster). But i just found the informer only watch targetCluster, the other cluster seem like just for the diff api.So i wanna know how to aggregate multicluster event by kelemetry. I have some guess, please correct me if I am wrong:
make kind quickstart
Start process normally
kelemetry_1 | time="2023-04-20T07:[1](https://github.com/kubewharf/kelemetry/actions/runs/4751485893/jobs/8440692061#step:14:1)[4](https://github.com/kubewharf/kelemetry/actions/runs/4751485893/jobs/8440692061#step:14:4):[5](https://github.com/kubewharf/kelemetry/actions/runs/4751485893/jobs/8440692061#step:14:5)[6](https://github.com/kubewharf/kelemetry/actions/runs/4751485893/jobs/8440692061#step:14:6)Z" level=fatal msg="Cannot disable \"resource-object-tagger\" because [aggregator] depend on it but are not disabled"
GitHub CI https://github.com/kubewharf/kelemetry/actions/runs/4751485893
graph TB
A --> B & C
B
on frontend with "ancestors"Only A
and B
appear in the output trace
A
, B
and C
all appear in the output trace
If we change ff22
to ff20
, the entire trace still appears.
But if we searched in exclusive mode directly to begin with, the behavior is correct
Jaeger storage backend
Show trace with the same trace ID
It uses the raw trace ID in the storage backend.
This is because the (*spanReader).GetTrace
method does not update TraceID in the trace like FindTraces
does:
kelemetry/pkg/frontend/reader/reader.go
Lines 151 to 156 in 0a8b7d6
No response
frontend deployment mainfest
...
- command:
- /usr/local/bin/kelemetry
- --log-level=info
- --pprof-enable=true
- --jaeger-backend=jaeger-storage
- --jaeger-cluster-names=cluster1
- --jaeger-redirect-server-enable=true
- --jaeger-storage-plugin-address=:17271 # localhost:17271
- --jaeger-storage-plugin-enable=true
- --jaeger-storage.grpc-storage.server=kelemetry-1689762474-storage.kelemetry.svc:17271 # storage-svc
- --jaeger-storage.span-storage.type=grpc-plugin
- --jaeger-trace-cache=etcd
- --jaeger-trace-cache-etcd-endpoints=kelemetry-1689762474-etcd.kelemetry.svc:2379
- --jaeger-trace-cache-etcd-prefix=/trace/
- --trace-server-enable=true
image: ghcr.io/kubewharf/kelemetry:0.1.0
...
It is convenient for users to be more familiar with the source code of the project
jaeger UI Service count is 8
No response
No response
No response
No response
Instead of finding the trace as the linker, the aggregator just emits a tag that indicates the CGRNN of the parent span. Children are emitted with a separate trace ID so that each object has its own trace.
To be precise, we can remove the "parent"/"child" relationship in general and emit a new pseudospan type called "link":
tag name | tag value |
---|---|
zzz-traceSource |
"link" |
linkedCluster |
linked object cluster |
linkedGroup |
linked object group |
linkedResource |
linked object resource |
linkedNamespace |
linked object namespace |
linkedName |
linked object name |
linkedRole |
If "parent" , the current span is displayed as a child of the linked span. If empty, the linked span is displayed as a child of the current span. Otherwise, the linked span is displayed under a display-only virtual span with the specified role |
Currently, frontend queries that use the exclusive
mode have to search the entire trace, making queries unnecessarily slow. Since the user may only be interested in a particular object, we can speed up the query by limiting each trace to one object only.
This allows us to use more aggressive linkers, which may become more significant with #109. Furthermore, dynamic linking means that we can now have multi-object links, such as linking pods to their secrets/nodes (this was previously not possible because nodes can relate to multiple pods).
jaeger ui page error: No trace results. Try another query.
kelemetry-informers container log error: level=error msg="query discovery API failed: unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request" cluster=cluster1 mod=discovery
No response
No response
v0.2.4
No response
kelemetry component up and run as expected
kelemetry-collector-5ccb7986df-58zdz 1/1 Running 2 (17m ago) 17m
kelemetry-collector-5ccb7986df-5cj8k 1/1 Running 2 (17m ago) 17m
kelemetry-collector-5ccb7986df-vxchw 1/1 Running 2 (17m ago) 17m
kelemetry-consumer-bdb696b46-dpp6c 0/1 CrashLoopBackOff 7 (2m29s ago) 17m
kelemetry-consumer-bdb696b46-f6bht 0/1 CrashLoopBackOff 7 (2m19s ago) 17m
kelemetry-consumer-bdb696b46-xz9xm 0/1 CrashLoopBackOff 7 (2m54s ago) 17m
kelemetry-etcd-0 1/1 Running 0 17m
kelemetry-etcd-1 1/1 Running 0 17m
kelemetry-etcd-2 1/1 Running 0 17m
kelemetry-frontend-5cb6746769-4tc4v 0/2 CrashLoopBackOff 16 (47s ago) 17m
kelemetry-frontend-5cb6746769-kl6xx 0/2 CrashLoopBackOff 17 (38s ago) 17m
kelemetry-frontend-5cb6746769-qhjwd 0/2 CrashLoopBackOff 16 (50s ago) 17m
kelemetry-informers-866df8f86d-2nh7x 1/1 Running 0 17m
kelemetry-informers-866df8f86d-4vlvk 1/1 Running 0 17m
kelemetry-informers-866df8f86d-gwml2 1/1 Running 0 17m
kelemetry-storage-0 1/1 Running 0 17m
...
time="2023-07-19T14:47:48Z" level=info msg=Initializing mod=jaeger-backend
time="2023-07-19T14:47:48Z" level=info msg=Initializing mod=tf-step/replace-name-visitor
time="2023-07-19T14:47:48Z" level=info msg=Initializing mod=tf-step/prune-tags-visitor
time="2023-07-19T14:47:48Z" level=info msg=Initializing mod=tf-step/compact-duration-visitor
time="2023-07-19T14:47:48Z" level=info msg=Initializing mod=tf-step/extract-nesting-visitor
time="2023-07-19T14:47:48Z" level=info msg=Initializing mod=tf-step/service-operation-replace-visitor
time="2023-07-19T14:47:48Z" level=info msg=Initializing mod=tf-step/cluster-name-visitor
time="2023-07-19T14:47:48Z" level=info msg=Initializing mod=tf-step/group-by-trace-source-visitor
time="2023-07-19T14:47:48Z" level=info msg=Initializing mod=tf-step/object-tags-visitor
time="2023-07-19T14:47:48Z" level=info msg=Initializing mod=tfconfig.RegisteredStep-list
time="2023-07-19T14:47:48Z" level=info msg=Initializing mod=jaeger-transform-config/file
time="2023-07-19T14:47:53Z" level=error msg="error initializing "jaeger-transform-config/file": parse tfconfig modifier error: invalid modifier args: parse extension provider config error: cannot initialize extension storage: grpc-plugin builder failed to create a store: error connecting to remote storage: context deadline exceeded"
2.#kubectl logs kelemetry-consumer-bdb696b46-f6bht
time="2023-07-19T14:46:18Z" level=info msg=Starting mod=pprof
time="2023-07-19T14:46:18Z" level=info msg="Startup complete"
panic: metric "diff_decorator_retry_count" was not initialized
panic: metric "diff_decorator" was not initialized
goroutine 84 [running]:
github.com/kubewharf/kelemetry/pkg/metrics.(*Metric[...]).With(0x4069d9?, 0x4ff15a?)
/src/pkg/metrics/interface.go:158 +0xf4
github.com/kubewharf/kelemetry/pkg/metrics.(*Metric[...]).DeferCount(0xc00071c2f8?, {0x51e68e?, 0x2bde0d0?, 0x3fd9a20?}, 0x2bc0320?)
/src/pkg/metrics/interface.go:169 +0x49
panic({0x226f460, 0xc005d20970})
/usr/local/go/src/runtime/panic.go:884 +0x213
github.com/kubewharf/kelemetry/pkg/metrics.(*Metric[...]).With(0xc005d28390?, 0x3fd9a20?)
/src/pkg/metrics/interface.go:158 +0xf4
github.com/kubewharf/kelemetry/pkg/diff/decorator.(*decorator).tryDecorate(0xc0002e20e0, {0x2bde0d0, 0xc000338730}, {0x2bfccf0, 0xc005d193b0}, 0xc005d10b00, 0xc005cba840)
/src/pkg/diff/decorator/decorator.go:282 +0x1325
github.com/kubewharf/kelemetry/pkg/diff/decorator.(*decorator).Decorate(0xc0002e20e0, {0x2bde0d0, 0xc000338730}, 0xc005d10b00, 0xc005cba840)
/src/pkg/diff/decorator/decorator.go:144 +0x465
github.com/kubewharf/kelemetry/pkg/audit/consumer.(*receiver).handleItem(0xc0001ac0e0, {0x2bde0d0, 0xc000338730}, {0x2bfccf0?, 0xc005d18fc0?}, 0xc005d10b00, 0xc005c71da0)
/src/pkg/audit/consumer/consumer.go:278 +0x12d3
github.com/kubewharf/kelemetry/pkg/audit/consumer.(*receiver).handleMessage(0xc0001ac0e0, {0x2bde0d0, 0xc000338730}, {0x2bfccf0?, 0xc005d18ee0?}, {0xc005d9fbc0, 0x33, 0x0?}, {0xc005dbac80, 0xc58, ...}, ...)
/src/pkg/audit/consumer/consumer.go:175 +0x4de
github.com/kubewharf/kelemetry/pkg/audit/consumer.(*receiver).Init.func1({0x2bde0d0?, 0xc000338730?}, {0x2bfccf0?, 0xc005d18ee0?}, {0xc005d9fbc0?, 0xd40010?, 0xc0005c2f00?}, {0xc005dbac80, 0xc58, 0xc80})
/src/pkg/audit/consumer/consumer.go:130 +0x9d
github.com/kubewharf/kelemetry/pkg/audit/mq/local.(*localConsumer).start.func1()
/src/pkg/audit/mq/local/local.go:203 +0x14c
created by github.com/kubewharf/kelemetry/pkg/audit/mq/local.(*localConsumer).start
/src/pkg/audit/mq/local/local.go:193 +0x95
0.2.1
k8s:1.23.17
Jaeger:1.4.2
Install helm chart
start normally
panic: metric "diff_decorator_retry_count" was not initialized
panic: metric "diff_decorator" was not initialized
0.2.2
Install through helm chart
docker container are up and running
kelemetry-jaeger-query-1 keep restaring with logs
..{"level":"info","ts":1689923838.4368327,"caller":"grpclog/component.go:71","msg":"[core]pickfirstBalancer: UpdateSubConnState: 0xc000202018, {IDLE connection error: desc = \"transport: Error while dialing dial tcp 172.17.0.1:17271: connect: connection refused\"}","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923838.4368534,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1] Channel Connectivity change to IDLE","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923838.4369116,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923838.4369738,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1 SubChannel #2] Subchannel picks a new address \"172.17.0.1:17271\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923838.4370942,"caller":"grpclog/component.go:71","msg":"[core]pickfirstBalancer: UpdateSubConnState: 0xc000202018, {CONNECTING <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923838.437123,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1] Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923838.4372501,"caller":"grpclog/component.go:71","msg":"[core]Creating new client transport to \"{\\n \\\"Addr\\\": \\\"172.17.0.1:17271\\\",\\n \\\"ServerName\\\": \\\"172.17.0.1:17271\\\",\\n \\\"Attributes\\\": null,\\n \\\"BalancerAttributes\\\": null,\\n \\\"Type\\\": 0,\\n \\\"Metadata\\\": null\\n}\": connection error: desc = \"transport: Error while dialing dial tcp 172.17.0.1:17271: connect: connection refused\"","system":"grpc","grpc_log":true}
{"level":"warn","ts":1689923838.4372795,"caller":"channelz/funcs.go:342","msg":"[core][Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {\n \"Addr\": \"172.17.0.1:17271\",\n \"ServerName\": \"172.17.0.1:17271\",\n \"Attributes\": null,\n \"BalancerAttributes\": null,\n \"Type\": 0,\n \"Metadata\": null\n}. Err: connection error: desc = \"transport: Error while dialing dial tcp 172.17.0.1:17271: connect: connection refused\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923838.437295,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1 SubChannel #2] Subchannel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923838.437333,"caller":"grpclog/component.go:71","msg":"[core]pickfirstBalancer: UpdateSubConnState: 0xc000202018, {TRANSIENT_FAILURE connection error: desc = \"transport: Error while dialing dial tcp 172.17.0.1:17271: connect: connection refused\"}","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923838.4373517,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1] Channel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923840.6481597,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1] Channel Connectivity change to SHUTDOWN","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923840.6482139,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1 SubChannel #2] Subchannel Connectivity change to SHUTDOWN","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923840.6482327,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1 SubChannel #2] Subchannel deleted","system":"grpc","grpc_log":true}
{"level":"info","ts":1689923840.6482399,"caller":"channelz/funcs.go:340","msg":"[core][Channel #1] Channel deleted","system":"grpc","grpc_log":true}
{"level":"fatal","ts":1689923840.6482666,"caller":"./main.go:107","msg":"Failed to init storage factory","error":"grpc-plugin builder failed to create a store: error connecting to remote storage: context deadline exceeded","stacktrace":"main.main.func1\n\t./main.go:107\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:968\nmain.main\n\t./main.go:170\nruntime.main\n\truntime/proc.go:250"}
...
dev.docker-compose.yaml
# cat dev.docker-compose.yaml
# docker-compose setup for development setup.
# Use quickstart.docker-compose.yaml if you just want to try out Kelemetry.
# Use the helm chart if you want to deploy in production.
version: "2.2"
services:
# ETCD cache storage, only required if etcd cache is used
etcd:
image: quay.io/coreos/etcd:v3.2
entrypoint: [etcd]
command:
- -name=main
- -advertise-client-urls=http://etcd:2479
- -listen-client-urls=http://0.0.0.0:2479
- -initial-advertise-peer-urls=http://etcd:2380
- -listen-peer-urls=http://0.0.0.0:2380
- -initial-cluster-state=new
- -initial-cluster=main=http://etcd:2380
- -initial-cluster-token=etcd-cluster-1
- -data-dir=/var/run/etcd/default.etcd
volumes:
- etcd:/var/run/etcd/
ports:
- 2479:2379
restart: always
# Web frontend for trace view.
jaeger-query:
image: jaegertracing/jaeger-query:1.42
environment:
SPAN_STORAGE_TYPE: grpc-plugin
GRPC_STORAGE_SERVER: host.docker.internal:17272 # run on host directly
ports:
- 0.0.0.0:16686:16686
restart: always
# OTLP collector that writes to Badger
jaeger-collector:
image: jaegertracing/jaeger-collector:1.42
environment:
COLLECTOR_OTLP_ENABLED: "true"
SPAN_STORAGE_TYPE: grpc-plugin
GRPC_STORAGE_SERVER: remote-badger:17271
ports:
- 0.0.0.0:4317:4317
restart: always
# Backend badger storage
# Feel free to override environment.SPAN_STORAGE_TYPE to other storages given the proper configuration.
remote-badger:
image: jaegertracing/jaeger-remote-storage:1.42
environment:
SPAN_STORAGE_TYPE: badger
BADGER_EPHEMERAL: "false"
BADGER_DIRECTORY_KEY: /mnt/badger/key
BADGER_DIRECTORY_VALUE: /mnt/badger/data
ports:
- 127.0.0.1:17272:17271
volumes:
- badger:/mnt/badger
# Web frontend for raw trace database view.
jaeger-query-raw:
image: jaegertracing/jaeger-query:1.42
environment:
SPAN_STORAGE_TYPE: grpc-plugin
GRPC_STORAGE_SERVER: remote-badger:17271
ports:
- 0.0.0.0:26686:16686
restart: always
volumes:
etcd: {}
badger: {}
0.2.2
k8s:1.24.15
Jaeger:1.4.2
etcd pod running normal and health check successfully
kelemetry-etcd-0 0/1 CrashLoopBackOff 18 (83s ago) 74m
kelemetry-etcd-1 1/1 Running 0 74m
kelemetry-etcd-2 1/1 Running 0 74m
# kubectl logs kelemetry-etcd-0
2023-07-20 08:54:32.056926 I | etcdmain: etcd Version: 3.3.13
2023-07-20 08:54:32.056972 I | etcdmain: Git SHA: 98d3084
2023-07-20 08:54:32.056976 I | etcdmain: Go Version: go1.10.8
2023-07-20 08:54:32.056979 I | etcdmain: Go OS/Arch: linux/amd64
2023-07-20 08:54:32.056982 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2023-07-20 08:54:32.057030 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2023-07-20 08:54:32.057261 I | embed: listening for peers on http://0.0.0.0:2380
2023-07-20 08:54:32.057293 I | embed: listening for client requests on 0.0.0.0:2379
2023-07-20 08:54:32.057826 I | pkg/netutil: resolving kelemetry-etcd-0:2380 to 192.168.253.80:2380
2023-07-20 08:54:32.060864 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:33.065013 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:34.069684 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:35.073983 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:36.079512 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:37.083088 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:38.102341 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:39.107733 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:40.112295 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:41.116753 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:42.121587 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:43.126384 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:44.131495 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:45.136237 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:46.141086 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:47.145136 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:48.150200 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:49.157245 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:50.162159 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:51.166931 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:52.172077 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:53.177795 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:54.183211 W | pkg/netutil: failed resolving host kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 (lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host); retrying in 1s
2023-07-20 08:54:55.187278 I | pkg/netutil: resolving kelemetry-etcd-0.kelemetry-etcd.default.svc:2380 to 192.168.253.80:2380
2023-07-20 08:54:55.197378 C | etcdmain: member 6fa7a00416c5d67d has already been bootstrapped
#kubectl logs kelemetry-etcd-1
...
2023-07-20 08:58:06.740587 W | etcdserver: cannot get the version of member 6fa7a00416c5d67d (Get http://kelemetry-etcd-0.kelemetry-etcd.default.svc:2380/version: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host)
2023-07-20 08:58:08.670076 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_RAFT_MESSAGE")
2023-07-20 08:58:08.670482 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_SNAPSHOT")
2023-07-20 08:58:10.745502 W | etcdserver: failed to reach the peerURL(http://kelemetry-etcd-0.kelemetry-etcd.default.svc:2380) of member 6fa7a00416c5d67d (Get http://kelemetry-etcd-0.kelemetry-etcd.default.svc:2380/version: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host)
2023-07-20 08:58:10.745557 W | etcdserver: cannot get the version of member 6fa7a00416c5d67d (Get http://kelemetry-etcd-0.kelemetry-etcd.default.svc:2380/version: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host)
2023-07-20 08:58:13.670395 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_RAFT_MESSAGE")
2023-07-20 08:58:13.670855 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_SNAPSHOT")
2023-07-20 08:58:14.750999 W | etcdserver: failed to reach the peerURL(http://kelemetry-etcd-0.kelemetry-etcd.default.svc:2380) of member 6fa7a00416c5d67d (Get http://kelemetry-etcd-0.kelemetry-etcd.default.svc:2380/version: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host)
2023-07-20 08:58:14.751047 W | etcdserver: cannot get the version of member 6fa7a00416c5d67d (Get http://kelemetry-etcd-0.kelemetry-etcd.default.svc:2380/version: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host)
# kubectl logs kelemetry-etcd-2
...
2023-07-20 08:58:53.156209 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_RAFT_MESSAGE")
2023-07-20 08:58:53.156809 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_SNAPSHOT")
2023-07-20 08:58:58.156996 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_RAFT_MESSAGE")
2023-07-20 08:58:58.157404 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_SNAPSHOT")
2023-07-20 08:59:03.157340 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_RAFT_MESSAGE")
2023-07-20 08:59:03.157662 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_SNAPSHOT")
2023-07-20 08:59:08.157610 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_RAFT_MESSAGE")
2023-07-20 08:59:08.157900 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_SNAPSHOT")
2023-07-20 08:59:13.158076 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_RAFT_MESSAGE")
2023-07-20 08:59:13.158478 W | rafthttp: health check for peer 6fa7a00416c5d67d could not connect: dial tcp: lookup kelemetry-etcd-0.kelemetry-etcd.default.svc on 10.96.0.10:53: no such host (prober "ROUND_TRIPPER_SNAPSHOT")
0.2.2
k8s:1.23.17
jaeger:1.4.2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.