Comments (22)
@Random-Liu Hi. I'm facing the same issue. The thing is, I have been running my cluster without ServiceAccount
admission control plugin since I'm running cluster without auth.
I was facing a similar situation when trying to run heapser, but it has an option inClusterConfig=false
to allow running without service account tokens.
from node-problem-detector.
We may close this once #49 is merged. @sols1 @ApsOps @Random-Liu
from node-problem-detector.
@sols1 It should work.
/var/run/secrets/kubernetes.io/serviceaccount/token
is the token file used by the api client. It should be set by default in each pod.
I started a random pod, and then exec into the pod:
/ # cat /var/run/secrets/kubernetes.io/serviceaccount/token
eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3Nlcn...
Which K8s version are you using?
Can you do the same to make sure there is this token file in your pod? :)
from node-problem-detector.
When I try to do the same thing on my k8s cluster I get different result:
kubectl get pods -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE NODE
default collectd-0e6qk 1/1 Running 0 3h 192.168.78.15
default collectd-bal2i 1/1 Running 0 3h 192.168.78.16
default graphite-5rr7m 1/1 Running 0 4h 192.168.78.16
default ha-service-loadbalancer-egvmk 1/1 Running 0 4h 192.168.78.15
default ha-service-loadbalancer-p6ikc 1/1 Running 0 4h 192.168.78.16
kube-system kube-dns-v11-b8grr 4/4 Running 0 4h 192.168.78.16
kube-system kube-registry-v0-5fly7 1/1 Running 0 4h 192.168.78.16
kube-system kubernetes-dashboard-v1.0.0-t458m 1/1 Running 0 4h 192.168.78.16
kubectl exec collectd-0e6qk cat /var/run/secrets/kubernetes.io/serviceaccount/token
cat: /var/run/secrets/kubernetes.io/serviceaccount/token: No such file or directory
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1
As you can see many pods are running but /var/run/secrets/kubernetes.io/serviceaccount/token does not exist.
from node-problem-detector.
- Can you try to start the pod in kube-system namespace to see whether that will make any difference?
- Which version of kubernetes are you using?
I'll make sure whether the token file is set by default. At least according to the document, for pod in kube-system
namespace, the file should be there. http://kubernetes.io/docs/user-guide/accessing-the-cluster/#accessing-the-api-from-a-pod
from node-problem-detector.
As you can see from above I am running several pods in kube-system namespace and those don't have this token file either:
kubectl exec kube-dns-v11-b8grr --namespace=kube-system cat /var/run/secrets/kubernetes.io/serviceaccount/token
cat: can't open '/var/run/secrets/kubernetes.io/serviceaccount/token': No such file or directory
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1
from node-problem-detector.
kubernetes version ~1.3
from node-problem-detector.
The thing that might be unusual about this setup is that kubernetes runs on top of RancherOS, which is containerized.
But when I am trying to access this file inside host namespace I get the same result:
docker run -v /:/rootfs --net=host --pid=host --privileged phusion/baseimage cat /rootfs/var/run/secrets/kubernetes.io/serviceaccount/token
cat: /rootfs/var/run/secrets/kubernetes.io/serviceaccount/token: No such file or directory
Or inside kubelet container:
docker exec kubelet cat /var/run/secrets/kubernetes.io/serviceaccount/token
cat: /var/run/secrets/kubernetes.io/serviceaccount/token: No such file or directory
Or kube-proxy:
docker exec kube-proxy cat /var/run/secrets/kubernetes.io/serviceaccount/token
cat: /var/run/secrets/kubernetes.io/serviceaccount/token: No such file or directory
from node-problem-detector.
- Can you check whether you have default service account? It should be created by default.
$ kubectl get serviceaccount
NAME SECRETS AGE
default 1 2d
- If you didn't see it. I guess you didn't start your apiserver with
admission-control
:
--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,ResourceQuota
I believe you need to enable the ServiceAccount
in the admission control.
The thing that might be unusual about this setup is that kubernetes runs on top of RancherOS, which is containerized.
That may be the reason, it looks like RancherOS
didn't enable the service account control.
from node-problem-detector.
Yes, serviceaccount does not show any secrets:
kubectl get serviceaccount
NAME SECRETS AGE
default 0 5d
And yes, there is no --admission-control option in apiserver.
But nothing says it is a required option, particularly for bare metal cluster.
This brings back the original question: is this tool supposed to work for any k8s cluster?
from node-problem-detector.
@sols1 It is supposed to be.
If there is no admission control, and the pod could talk with apiserver without auth, then it should still work. I'll see whether we could fix it.
from node-problem-detector.
@sols1 Please see kubernetes/kubernetes#27973 (comment). :)
from node-problem-detector.
Even after this #kubernetes/kubernetes#27973 (comment) I got:
kubectl logs node-problem-detector-bcrpv
I0625 00:35:56.475486 1 kernel_monitor.go:86] Finish parsing log file: {WatcherConfig:{KernelLogPath:/log/kern.log} BufferSize:10 Source:kernel-monitor DefaultConditions:[{Type:KernelDeadlock Status:false Transition:0001-01-01 00:00:00 +0000 UTC Reason:KernelHasNoDeadlock Message:kernel has no deadlock}] Rules:[{Type:temporary Condition: Reason:OOMKilling Pattern:Kill process \d+ (.+) score \d+ or sacrifice child\nKilled process \d+ (.+) total-vm:\d+kB, anon-rss:\d+kB, file-rss:\d+kB} {Type:temporary Condition: Reason:TaskHung Pattern:task \S+:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:AUFSUmountHung Pattern:task umount\.aufs:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:DockerHung Pattern:task docker:\w+ blocked for more than \w+ seconds\.} {Type:permanent Condition:KernelDeadlock Reason:UnregisterNetDeviceIssue Pattern:unregister_netdevice: waiting for \w+ to become free. Usage count = \d+}]}
I0625 00:35:56.475661 1 kernel_monitor.go:93] Got system boot time: 2016-06-17 17:51:02.475654619 +0000 UTC
I0625 00:35:56.476730 1 kernel_monitor.go:102] Start kernel monitor
I0625 00:35:56.476768 1 kernel_log_watcher.go:87] kernel log "/log/kern.log" is not found, kernel monitor doesn't support the os distort
I0625 00:35:56.476782 1 problem_detector.go:60] Problem detector started
E0625 00:35:57.536246 1 manager.go:130] failed to update node conditions: the server does not allow this method on the requested resource
E0625 00:35:58.478941 1 manager.go:130] failed to update node conditions: the server does not allow this method on the requested resource
E0625 00:35:59.478941 1 manager.go:130] failed to update node conditions: the server does not allow this method on the requested resource
So, which version of OS is supported?
Also, is there another option necessary to change the server does not allow this method on the requested resource
?
from node-problem-detector.
@sols1 This is because node-problem-detector uses Patch
method to update node status. (See #9)
The apiserver before kubernetes/kubernetes#26381 didn't allow patch operation on node status. I believe that your kubernetes version doesn't contain kubernetes/kubernetes#26381 yet.
from node-problem-detector.
@sols1 And from the log, it seems that the kernel log is not at /var/log/kern.log
on your host. What is your os distro?
from node-problem-detector.
@ApsOps That's quite useful. I'll also add a inscure
option in node problem detector.
from node-problem-detector.
@Random-Liu Maybe we can refer to Heapster for it's convenience to deploy both in Kubernetes cluster and stand alone.
from node-problem-detector.
Close this one since #49 is merged. Please take look at the document about how to use the newly introduced --apiserver-override
flag https://github.com/kubernetes/node-problem-detector#flags.
from node-problem-detector.
@Random-Liu I'm unable to build (I've seen the open issue regarding journald usage). I also tried using image that @shyamjvs pushed (tagged v0.3), but it fails with
panic: strconv.ParseBool: parsing "": invalid syntax
goroutine 1 [running]:
panic(0x18330c0, 0xc42000cd80)
/usr/local/google/home/shyamjvs/.gvm/gos/go1.7/src/runtime/panic.go:500 +0x1a1
k8s.io/node-problem-detector/pkg/problemclient.NewClientOrDie(0x7ffed64509e3, 0x55, 0xd, 0xc4204d4300)
/usr/local/google/home/shyamjvs/go/src/k8s.io/node-problem-detector/pkg/problemclient/problem_client.go:62 +0x3f1
k8s.io/node-problem-detector/pkg/problemdetector.NewProblemDetector(0x2684c80, 0xc4204ea0d0, 0x7ffed64509e3, 0x55, 0x32, 0xc4200001a0)
/usr/local/google/home/shyamjvs/go/src/k8s.io/node-problem-detector/pkg/problemdetector/problem_detector.go:45 +0x3f
main.main()
/usr/local/google/home/shyamjvs/go/src/k8s.io/node-problem-detector/node_problem_detector.go:45 +0x77
It seems to be missing a few commits or I might be doing something wrong in config. Could you please release a docker image, or suggest something I can try? :)
from node-problem-detector.
I'm unable to build (I've seen the open issue regarding journald usage).
@ApsOps To build, you need to install libsystemd-journal-dev or libsystemd-dev to build it for journald support for the journald support.
After #39 is merged, you can avoid building journald support.
I also tried using image that @shyamjvs pushed (tagged v0.3), but it fails.
The error seems to me that you set inClusterConfig
or some other boolean options in --apiserver-override
, but not give it a value.
I've tried v0.3 myself and it seems to work for me except a recently introduced issue #61.
Could you please release a docker image?
The issue is tracked here #60.
from node-problem-detector.
Ah, makes sense. I had &auth=&insecure=
from README. Removing these params worked!
from node-problem-detector.
OK. Then the README is a bit misleading then.
Could you file an issue or pr for this? Thanks a lot~ :) @ApsOps
from node-problem-detector.
Related Issues (20)
- Failure cluster [ea667941...] node-problem-detector CI jobs HOT 6
- Random delays in pluginConfig to asynchronize plugins execution on different nodes HOT 2
- Disk and memory percent used HOT 4
- Keep deployment yaml in sync with release version HOT 2
- Intel binary present on the ARM HOT 14
- add metric for check runtime HOT 4
- Node Problem Detector Test Failures HOT 4
- DSA 5514-1 glibc security issue on Debian HOT 2
- Feature Request: GPU Support HOT 5
- Non Root option HOT 5
- Adopt opentelemetry HOT 3
- No event generated
- Support revert-pattern in logcounter
- arm64 container contains amd64 binary HOT 1
- What is up with the v0.8.15 tag? HOT 10
- V0.8.15 image is missing log-counter binary HOT 2
- node-problem-detector not able to detect kernel log events for a Kind cluster HOT 2
- Health checker doesn't restart kubelet on EKS HOT 7
- go.mod vulnerabilities fixed in master; when 0.8.16 release HOT 4
- License Scan and Findings HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from node-problem-detector.