Giter Club home page Giter Club logo

kubectl-flame's Introduction

kubectl flame ๐Ÿ”ฅ

A kubectl plugin that allows you to profile production applications with low-overhead by generating FlameGraphs

Running kubectlf-flame does not require any modification to existing pods.

Table of Contents

Requirements

  • Supported languages: Go, Java (any JVM based language), Python, Ruby, and NodeJS
  • Kubernetes cluster that use Docker as the container runtime (tested on GKE, EKS and AKS)

Usage

Profiling Kubernetes Pod

In order to profile a Java application in pod mypod for 1 minute and save the flamegraph as /tmp/flamegraph.svg run:

kubectl flame mypod -t 1m --lang java -f /tmp/flamegraph.svg

Profiling Alpine based container

Profiling Java application in alpine based containers require using --alpine flag:

kubectl flame mypod -t 1m -f /tmp/flamegraph.svg --lang java --alpine

NOTICE: this is only required for Java apps, the --alpine flag is unnecessary for Go profiling.

Profiling sidecar container

Pods that contains more than one container require specifying the target container as an argument:

kubectl flame mypod -t 1m --lang go -f /tmp/flamegraph.svg mycontainer

Profiling Golang multi-process container

Profiling Go application in pods that contains more than one process require specifying the target process name via --pgrep flag:

kubectl flame mypod -t 1m --lang go -f /tmp/flamegraph.svg --pgrep go-app

Java profiling assumes that the process name is java. Use --pgrep flag if your process name is different.

Profiling on clusters running containerd

To run this tool on Kubernetes clusters that use containerd as the runtime engine, you must specify the path to the containerd runtime files:

kubectl flame mypod -t 1m --docker-path /run/containerd

Installing

Krew

You can install kubectl flame using the Krew, the package manager for kubectl plugins.

Once you have Krew installed just run:

kubectl krew install flame

Pre-built binaries

See the release page for the full list of pre-built assets.

How it works

kubectl-flame launch a Kubernetes Job on the same node as the target pod. Under the hood kubectl-flame use async-profiler in order to generate flame graphs for Java applications. Interaction with the target JVM is done via a shared /tmp folder. Golang support is based on ebpf profiling. Python support is based on py-spy. Ruby support is based on rbspy. NodeJS support is based on perf. In order for Javascript Symbols to be resolved, node process needs to be run with --perf-basic-prof flag.

Contribute

Please refer to the contributing.md file for information about how to get involved. We welcome issues, questions, and pull requests.

Maintainers

License

This project is licensed under the terms of the Apache 2.0 open source license. Please refer to LICENSE for the full terms.

kubectl-flame's People

Contributors

adriananeci avatar alowde avatar dontmint avatar edenfed avatar enc avatar italux avatar kujon avatar loicmathieu avatar mehrdadrad avatar michaelgugino avatar obitech avatar sbaier1 avatar u6f6o avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubectl-flame's Issues

Can flame pod works with istio?

Hi, it seems kubectl-flame pod doesn't work with istio sidecar. I got follow error when I execute the example command on the pod in my istio-enabled namespace.

> kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l java CONTAINER-NAME
Verifying target pod ... โœ”
Launching profiler ... โœ”
โŒ
a container name must be specified for pod kubectl-flame-831b66f2-5a01-4b92-b821-24c23277fa1e-8k9vf, choose one of: [kubectl-flame istio-proxy] or one of the init containers: [istio-validation]

I can try to disable istio sidecar injection on this namespace but I think the side-effect is too heavy.
Or a solution is to give a way to add the annotation sidecar.istio.io/inject: "false" to the kubectl-flame pod and make it to be aware I think.

Any idea?
Thanks!

Java application Profiling Error

Hello,

I am having a problem profiling a java application running in GCP. I am using MPB M1.

kubectl flame app-78d789d49d-hj6vr -t 1m --lang java -f /tmp/flamegraph.svg sidecar
Verifying target pod ... โœ”
Launching profiler ... โœ”
Profiling ... Error: open /var/lib/docker/image/overlay2/layerdb/mounts/containerd://6059a9bc5b364a995b452628ea78597e05fc283d662efe83fc7df8f1a942d1c8/mount-id: no such file or directory

Is anyone having a similar problem?

Thanks!

Yi

Issue installing flame on M1 chips

Hello,

I'm having trouble installing flame on my macbook with an M1 chip, encountering this issue
image

Is anyone else having this issue? If it's a known issue, any idea if there will be a fix?

Profiling on Kotlin Pods fail

does kube-flame support profiling on kotlin pods? It seems to fail on Launching Profiler Step. It says "pod failed"

Upgrade to async-profiler 2

Upgrade to async-profiler 2 for JVM application.
Version 2 supports HTML flamegraph which are easier to use.

I can try to provide a PR if you point me in the right direction, if I undertand it correctly, the Docker container image should be updated to use the new version. By the way, you build async-profiler from the source from a forked repo and not the official one, is there a reason why ?

Is there a way to run as unprivileged?

First of all, thanks for writing the plugin. Profiling on kubernetes is difficult and I'm looking forward to seeing if this plugin can ease the pain.

Currently, our clusters have a default podsecurity policy which disallow privileged containers.
Resulting in something like this, when I try to run the plugin:

2021/01/18 13:53:47 Job.batch "kubectl-flame-cx36dx4d-e2xe-4987-879a-d64776cb5543" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy

I see this plugin needs to run the jobs on Kubernetes currently as privileged: https://github.com/VerizonMedia/kubectl-flame/blob/cb7290125d6d471bfb159be5e3ff3bf7178bef94/cli/cmd/kubernetes/job/python.go#L72 (same on jvm and golang)

Is there a way to reduce the number of privileges it needs to run? E.g. by setting the right capabilities?

Exit status 255

I try to run a job but with this error
[xxx@xxx ~]$ kubectl flame xxx -t 1m --lang java -f flamegraph.svg xxx Verifying target pod ... โœ” Launching profiler ... โœ” Profiling ... Error: exit status 255
and the log is like
[xxx@xxx ~]$ kubectl logs kubectl-flame-24059523-04a3-4a87-b759-266e3c74d768-8627v {"type":"progress","data":{"time":"2021-06-17T19:14:12.666546141Z","stage":"started"}} {"type":"error","data":{"reason":"exit status 255"}}

Can choose the wrong container based on unrealated volume mounts

Steps to Reproduce

  • Have a pod with two containers, e.g. example1 and example2
  • Mount a ConfigMap into the example1 container with a name that contains example2
  • Attempt to profile the example2 container

Observed

  • You might actually choose the example1 container

Desired

  • If you specify a container you get that container

More info

We have a pod with containers named app and multiplexer, and a config map with the key app_env.json. I took a look at the mountinfo for one of the processes in the multiplexer container, and cross referencing with this logic I believe this line is being incorrectly matched on:

14828 14802 259:1 /var/lib/kubelet/pods/7a102c4a-964f-43a4-8477-8a5f1e7c99b4/volumes/kubernetes.io~configmap/app-env/..2021_09_14_22_01_44.499123521/app_env.json /volumes/settings/app_env.json ro,noatime - xfs /dev/nvme0n1p1 rw,attr2,inode64,logbufs=8,logbsize=32k,noquota

this is the line it's probably trying to match:

14820 14804 259:1 /var/lib/kubelet/pods/7a102c4a-964f-43a4-8477-8a5f1e7c99b4/containers/multiplexer/0216267a /dev/termination-log rw,noatime - xfs /dev/nvme0n1p1 rw,attr2,inode64,logbufs=8,logbsize=32k,noquota

As a result, I try to profile the app container but end up with data from the multiplexer container.

Maybe it should be stricter about matching e.g. /<pod_id>/containers/<container_name>/ ... I don't know how much of an assumption this is about the internals of the kubelet this is.

path issues causing kubeclt-flame to fail - container error - patched image can't locate fsroot

We have been attempting to try our your kubectl flame plugin.

We immediately ran into the problem described in https://github.com/yahoo/kubectl-flame/pull/77/commits because our k8s clusters are using containerd not docker.

After doing some digging, we found out how to apply the above mentioned patch and build (hoefully) using your compile options and were able to build the cli. Here is our best guess as to you build options

$ FLAME_SEMVER=0.2.5
$ BUILD_DATE=2022-08-15
$ CGO_ENABLED=0 go build -ldflags="-X github.com/VerizonMedia/kubectl-flame/cli/cmd/version.semver=${FLAME_SEMVER} -X github.com/VerizonMedia/kubectl-flame/cli/cmd/version.date=${BUILD_DATE} -X github.com/VerizonMedia/kubectl-flame/cli/cmd/version.commit=OUR-BUILD" -o kubectl-flame ./cli

However then we ran into the next problem.
Now we run into the problem that the image pull does not work because it is not found.

After building a docker image from our patched code, we spoke with of internal IT we found the naming tricks/tags to get this docker image published to our internal registry where all our k8s deployments pull images from.
So now we have the agent built and published and the image pull now works.

But now we have errors with the file system paths.
{"type":"progress","data":{"time":"2022-08-16T20:40:11.087607299Z","stage":"started"}}
{"type":"error","data":{"reason":"open /var/lib/docker/io.containerd.runtime.v2.task/k8s.io/47838c0fdf292987d5e03dbf8291286f842e6688a993c14bbb02d354c97b2aaa/rootfs: no such file or directory "}}
[scwald@spaserver2609 71]> kubectl -n onebismp logs kubectl-flame-f7db470c-014c-4b1a-91ed-7a6550106d50--1-5qqgg
{"type":"progress","data":{"time":"2022-08-16T20:40:09.602407356Z","stage":"started"}}
{"type":"error","data":{"reason":"open /var/lib/docker/io.containerd.runtime.v2.task/k8s.io/47838c0fdf292987d5e03dbf8291286f842e6688a993c14bbb02d354c97b2aaa/rootfs: no such file or directory "}}

The paths above were created from the patch pull/77 item above. I have exec-ed into my containers to look for the available mount points from containerd, but within our deployments there doesnโ€™t appear to be any mount points specific to docker or containerd. The only generally available folders in my containers is / and /opt. I tried hardcoding the above directory names to /opt, but while that might make the directory found. If there is some assumed content, nothing is there. So I would like to know what is the purpose behind these directories and is the some content that is expected at either location?

No the agent just exits with very little logging information to help with debugging.
{"type":"progress","data":{"time":"2022-08-17T10:36:08.827199035Z","stage":"started"}}
{"type":"error","data":{"reason":"exit status 1"}}

Do you have plans to add support for containerd in the near future and do you have any suggestions for us to try in the meantime?

Launching profiler failure sometimes: timed out waiting for the condition

I see the time of complete profiling execution is kind of unreasonable too long [0], and sometimes it works and outputs the svg result successfully however sometimes it failed during profiler launching with follow error (two separated issues?):

โฏ time kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l java CONTAINER-NAME
Verifying target pod ... โœ”
Launching profiler ... โŒ
timed out waiting for the condition
kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l   0.29s user 0.15s system 0% cpu 5:02.61 total

At the moment I can see a timeout warning in the events of flame job description as following whatever the profiling success or failure, and the job pod will not be created:

Name:           kubectl-flame-2fde7132-33d5-4344-9305-9dd427128f7f
Namespace:      default
Selector:       controller-uid=026b12f1-a439-4acb-9650-1756a37d4435
Labels:         kubectl-flame/id=2fde7132-33d5-4344-9305-9dd427128f7f
Annotations:    sidecar.istio.io/inject: false
Parallelism:    1
Completions:    1
Pods Statuses:  0 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       controller-uid=026b12f1-a439-4acb-9650-1756a37d4435
                job-name=kubectl-flame-2fde7132-33d5-4344-9305-9dd427128f7f
                kubectl-flame/id=2fde7132-33d5-4344-9305-9dd427128f7f
  Annotations:  sidecar.istio.io/inject: false
  Containers:
   kubectl-flame:
    Image:      verizondigital/kubectl-flame:v0.1.5-jvm
    Port:       <none>
    Host Port:  <none>
    Command:
      /app/agent
    Args:
      2fde7132-33d5-4344-9305-9dd427128f7f
      13efad1b-6569-47db-9e19-be1b22e6e288
      CONTAINER-NAME
      docker://c7e037b70817e3e8fa8a49ce90c1e2d427aa05fd867f4ff8989916366b1b5580
      1m0s
      java
    Environment:  <none>
    Mounts:
      /var/lib/docker from target-filesystem (rw)
  Volumes:
   target-filesystem:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/docker
    HostPathType:
Events:
  Type     Reason        Age   From            Message
  ----     ------        ----  ----            -------
  Warning  FailedCreate  12s   job-controller  Error creating: Timeout: request did not complete within requested timeout

kubectl-flame version:
Version: v0.1.5, Commit: 5cb73b3

kubernets version:

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T21:51:49Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-eks-2ba888", GitCommit:"2ba888155c7f8093a1bc06e3336333fbdb27b3da", GitTreeState:"clean", BuildDate:"2020-07-17T18:48:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

[0] The long time of a successful profiling (the total time is longer than above failure one but end of success):

time kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l java CONTAINER-NAME
Verifying target pod ... โœ”
Launching profiler ... โœ”
Profiling ... โœ”
FlameGraph saved to: /tmp/flamegraph.svg ๐Ÿ”ฅ
kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l   0.29s user 0.17s system 0% cpu 5:45.96 total

Any idea?
Thanks.

ImagePullBackOff..

this verizondigital/kubectl-flame:v0.2.4-bpf in my machine
image

but k8s job is failed..
image

nodejs support

Hi, from the description it seems nodejs is not supported.

did I understand it correctly?

thanks in advance

Profiling failed

I'm trying to use the flame with a go 1.13 image running into a GKE cluster, but every time I try to run the profiling I got the error bellow

kubectl flame k8slearning-6744c89ff5-vsxsq -n liuchuan -t 10s --lang go -f /Users/liuchuan/Desktop/flamegraph.svg --pgrep web
Verifying target pod ... โœ”
Launching profiler ... โœ”
Profiling ... Error: profiling failed: exit status 1

Looking into the failed POD logs I got the error bellow

{"type":"progress","data":{"time":"2021-02-21T14:14:28.537218275Z","stage":"started"}} {"type":"error","data":{"reason":"profiling failed: exit status 1"}}

anyone can help me?

nodeaffinity vs nodeName

Why use nodeaffinity, when you can schedule the pod directly to the node you want (i.e.: same node as the pod you're taking the flamegraph of) by specifying the nodeName?

[Feature] Be able to specify resource requests and limits for jobs

Hi,

first of all thank you for this nice plugin ๐ŸŽ‰

It would be really nice to be able to specify CPU/Memory requests and limits on the pods that are launched, I image we could add a field in TargetDetails or create a new struct which would allow further customization of the created JobSpec. We could then also pass labels, annotations, etc.

WDYT? If you want I could draft a PR too ๐Ÿ™‚

Cheers,
Alex

Fail to produce flamegraph --lang go

kubectl flame mypod -t 1m --lang go -f /tmp/flamegraph.svg --pgrep go-app

Return Profiling ... Error: flamegraph generation failed: exit status 2
Im pretty sure that is "/app/FlameGraph/flamegraph.pl" error.
Works fine with --lang java on different container.

C/C++ binaries support

Since there is eBPF based profiling for Go, maybe this would not be too hard to support ?

Launching JVM alpine profiler pod failed

I'm trying to use the flame with a JVM-based image running into an EKS cluster with Kubernetes version 1.15, but every time I try to run the profiling I got the error bellow

kubectl flame POD -f /tmp/flamegraph.svg -n NAMESPACE -l java --alpine CONTAINER
Verifying target pod ... โœ”
Launching profiler ... โŒ
pod failed

Looking into the failed POD logs I got the error bellow

{"type":"progress","data":{"time":"2020-12-11T01:43:28.02366121Z","stage":"started"}}
{"type":"error","data":{"reason":"exit status 255"}}

Also, I've attached the failed POD describe

Name:           kubectl-flame-6a3fb104-aaf4-4f86-92a0-2c53c7f6601c-tqpft
Namespace:      NAMESPACE
Priority:       0
Node:           ip-172-20-124-211.ec2.internal/172.20.124.211
Start Time:     Thu, 10 Dec 2020 22:43:26 -0300
Labels:         controller-uid=9fcbed51-3f1c-42fb-af18-118e53b1a3cf
                job-name=kubectl-flame-6a3fb104-aaf4-4f86-92a0-2c53c7f6601c
                kubectl-flame/id=6a3fb104-aaf4-4f86-92a0-2c53c7f6601c
Annotations:    kubernetes.io/psp: eks.privileged
                sidecar.istio.io/inject: false
Status:         Failed
IP:             172.20.126.116
IPs:            <none>
Controlled By:  Job/kubectl-flame-6a3fb104-aaf4-4f86-92a0-2c53c7f6601c
Containers:
  kubectl-flame:
    Container ID:  docker://ee90601c752f38721ecd9c4d3ce8fd0bf5aaecadfbfa385c524d9514b35762e2
    Image:         verizondigital/kubectl-flame:v0.1.5-jvm
    Image ID:      docker-pullable://verizondigital/kubectl-flame@sha256:463c8ef5075c42644818c7646bc6e9371905a4253248a8f3be1c7acfa146c343
    Port:          <none>
    Host Port:     <none>
    Command:
      /app/agent
    Args:
      6a3fb104-aaf4-4f86-92a0-2c53c7f6601c
      7f317a90-b1d9-4772-ae1e-ec429df6cdf1
      CONTAINER
      docker://6ab1c5da9dca6267bf24f82691d3464825dff639ec6b7d83d25af77716ff6e39
      1m0s
      java
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 10 Dec 2020 22:43:28 -0300
      Finished:     Thu, 10 Dec 2020 22:43:28 -0300
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/docker from target-filesystem (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-p67s5 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  target-filesystem:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/docker
    HostPathType:
  default-token-p67s5:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-p67s5
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason   Age    From     Message
  ----    ------   ----   ----     -------
  Normal  Pulling  4m52s  kubelet  Pulling image "verizondigital/kubectl-flame:v0.1.5-jvm"
  Normal  Pulled   4m52s  kubelet  Successfully pulled image "verizondigital/kubectl-flame:v0.1.5-jvm"
  Normal  Created  4m52s  kubelet  Created container kubectl-flame
  Normal  Started  4m51s  kubelet  Started container kubectl-flame

kubectl-flame does not take into account the node taints

I tried to profile a pod that runs on a certain type of node that has a taint, e.g.:

taints:
  - effect: NoSchedule
    key: some-key
    value: some-value 

The job/pod that is launched by kubectl-flame, however, does not have tolerations for such taints, which makes it impossible to actually run the pod

make kubectl-flame exit automatically when specific errors occur

I got this error when I use kubectl-flame to profile a golang application, which is expected as I do not specify the process name.
kubectl-flame-pgrep

Not what I expected is the process cannot terminated automatically and I had to stop it manually.
As the error cause is clear and the best way to solve is let user run kubectl-flame again with correct command, is it possible to make kubectl-flame exit automatically when specific errors occur?

Best wishes.

Failed to inject profiler

Failed to inject profiler into 147932
	linux-vdso.so.1 (0x00007fff57388000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f33097b2000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3309595000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f330938d000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f330900b000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3308d07000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f3308af0000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3308751000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f3309dfb000)
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

target container run with openjdk:8-jdk

[Feature] Allow to choose profiling event type

Hi,

I would propose that we add an option to specify the profiling mode. The github page states some of these and especially the allocation mode would be interesting for us:

-e event - the profiling event: cpu, alloc, lock, cache-misses etc. Use list to see the complete list of available events.

As far as I can see, the profiling mode is currently hardcoded: https://github.com/VerizonMedia/kubectl-flame/blob/5cb73b39e20a9998308c6b1960add543da23558f/agent/profiler/jvm.go#L47

If you agree that this could be useful, I'd come up with a PR.

Cheers, u6f6o

couldn't record ruby subprocess

Hi, I would like to record a flame graph with ruby application running with unicorn_rails. However, the flame graph seems only record the master process.

Currently, the running processes likes below:

$ ps aux

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   4288   768 ?        Ss   Apr22   0:00 sh -c ./start.sh
root         9  0.0  1.2 399664 192104 ?       Sl   Apr22   0:14 unicorn_rails master -c config/unicorn.rb
root      4741  0.1  1.3 551768 202820 ?       Sl   06:36   0:01 unicorn_rails worker[6] -c config/unicorn.rb
root      4939  0.1  1.2 752732 199644 ?       Sl   06:40   0:01 unicorn_rails worker[5] -c config/unicorn.rb
root      4978  0.1  1.3 552028 201684 ?       Sl   06:40   0:01 unicorn_rails worker[2] -c config/unicorn.rb
root      5005  0.1  1.2 551712 199332 ?       Sl   06:40   0:01 unicorn_rails worker[0] -c config/unicorn.rb
root      5152  0.2  1.3 822852 201480 ?       Sl   06:43   0:01 unicorn_rails worker[4] -c config/unicorn.rb
root      5171  0.1  1.2 551804 199864 ?       Sl   06:43   0:01 unicorn_rails worker[7] -c config/unicorn.rb
root      5193  0.1  1.2 552108 195444 ?       Sl   06:43   0:01 unicorn_rails worker[1] -c config/unicorn.rb
root      5397  0.0  1.1 399664 182352 ?       Sl   06:55   0:00 unicorn_rails worker[3] -c config/unicorn.rb

I run the following command to record the flame graph but the flame graph only record the master process.

$ kubectl flame my-app -t 1m --lang ruby -f flamegraph.svg

Also, I tried the rbspy with --subprocesses flag and the flame graph can record all the worker process correctly.

$ ./rbspy record --duration 60 --pid 9 --subprocesses

Therefore, I'm thinking if we need to set the --subprocesses flag in agent/profiler/ruby.go line 31 to make it able to record all the subprocess. Thanks!

uWSGI profile only shows threads, no stack traces

I am using a uWSGI Flask app inside a Docker container in Azure Kubernetes. The flame profiler works, but shows no stack traces from my code. I only see process and thread IDs.

Is this a known issue?
flamegraph

Bug fix from issue added in #42

Description

Even after the changes applied on #42, the reported error still happen and this was due to a missing argument (aka --event) and also the validation of the --pgrep

Add cleanup and add signal handling

Either add a cleanup flag to delete jobs created by kubectl-flame on failures, and/or handle Ctrl+C and other os signals to do the cleanup.

Bug: ShareProcessNamespace and HostPID cannot both be enabled

Command used
kubectl flame pod -t 1m --lang java -f flamegraph.svg

Error
Error creating: Pod "kubectl-flame-c3eb2760-35f7-4ba3-9ee5-df31432682e2-vznhn" is invalid: spec.securityContext.shareProcessNamespace: Invalid value: true: ShareProcessNamespace and HostPID cannot both be enabled

From Kubernetes Docs, PodShareProcessNamespace is set to true by default as it is in GA since Kubernetes version 1.17. Therefore Kubectl Flame will not work with flag hostPID: true on recent Kubernetes versions
https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/

Installation issue with Apple M1 "flame": plugin "flame" does not offer installation for this platform

Getting below error when tried to install on Apple M1.

kubectl krew install  flame

Updated the local copy of plugin index.
Installing plugin: flame
W0811 10:26:30.223383   21587 install.go:164] failed to install plugin "flame": plugin "flame" does not offer installation for this platform
F0811 10:26:30.223437   21587 root.go:79] failed to install some plugins: [flame]: plugin "flame" does not offer installation for this platform

Add option to specify serviceAccountName for flame job pod template

On clusters where running in privileged mode by default is not possible, a good workaround would be to add a new flag in order to be able to specify a service account that is able to run the pod in privileged mode.

While running:

 kubectl flame test-pod -t 1m --lang go -f /tmp/flamegraph.svg
Verifying target pod ... โœ”
Launching profiler ... โŒ
2021/07/13 14:29:56 timed out waiting for the condition

the job is failing to start the pod because of:

Events:
  Type     Reason        Age                From            Message
  ----     ------        ----               ----            -------
  Warning  FailedCreate  17s (x3 over 47s)  job-controller  Error creating: pods "kubectl-flame-fbbed42d-a529-4cd6-a21f-6b435157bd32-" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]

Use to generate perf.data

Can I use this to generate perf.data in the container?? so that I can record resource utilization during a given time period..

Launching the profiler failed.

Using the latest version.

kubectl flame authz-7b995986b4-xddkf -n test -l java
Verifying target pod ... โœ”
Launching profiler ... โŒ
pod failed

when i look into the pod logs i see this.

kubectl logs pods/kubectl-flame-57b65fbd-6667-48f5-9835-bb7f5fefc7c0-7tj95 -n test
{"type":"progress","data":{"time":"2020-10-08T20:38:50.79538225Z","stage":"started"}}
{"type":"error","data":{"reason":"open /var/lib/docker/image/overlay2/layerdb/mounts/f39d32b7755fd31190f3919a3e94870dd7147b5349dd3c268a85411af101184f/mount-id: no such file or directory"}}

any pointers for debugging this further what is happening.

Java container: Could not find any process

I'm trying to profile a Java container (hivemq/hivemq4:k8s-4.4.1 to be specific, deployed using this chart), however i am seeing the above Error: could not find any process error.

My Kubernetes version is 1.18.8, deployed with a pretty standard kubespray inventory.

The process is just regularly running as PID 1 (and can also be found in /proc/1/) and i have also used async profiler manually from within the container before so there should be no issues. Maybe the process name search isn't working here?

Open to a PR to add Ruby support?

It looks like it should be fairly straightforward to add Ruby support to this project based on the excellent rbspy. This would be quite useful for my team, would you be interested in a PR?

Hang on the error during profiling: flamegraph.svg: no such file or directory

> kubectl flame POD-ID -t 1m -f /tmp/flamegraph.svg -l java CONTAINER-NAME
Verifying target pod ... โœ”
Launching profiler ... โœ”
Profiling ... Error: open /tmp/async-profiler/flamegraph.svg: no such file or directory

The job pod under error status and logs out:

{"type":"progress","data":{"time":"2020-10-02T03:03:30.473657566Z","stage":"started"}}
{"type":"error","data":{"reason":"exit status 1"}}

kubectl-flame version:

v0.1.4, Commit: 52a33a1bfeeeead31d9b77dcf29d4f2fbe92a300, Build Date: 2020-10-01T05:55:08Z

k8s version:

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T21:51:49Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-eks-2ba888", GitCommit:"2ba888155c7f8093a1bc06e3336333fbdb27b3da", GitTreeState:"clean", BuildDate:"2020-07-17T18:48:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Any idea?
Thanks!

Error: could not find root process

$ ./kubectl-flame -n PythonPod -l python -f /tmp/python.svg
Verifying target pod ... โœ”
Launching profiler ... โœ”
Profiling ... Error: could not find root process

ignore sidecar container

make kubectl-flame work in envs where sidecar container is injected (for example: service mesh)

Provide ability to override image

I am running into this issue in my cluster since we disable images tagged with latest. Generally, it's good to provide a way to override the image.

admission webhook "admission.XXX.com" denied the request: Forbidden: Usage of latest tag for images is prohibited. Resource kubectl-flame-f2a23476-f254-4558-afee-6401ab8fe5f1 has a container kubectl-flame that contains image tag as latest

pod failed: open /var/lib/docker/image/overlay2/layerdb/mounts/containerd://.../mount-id: no such file or directory

I am trying to run kubectl flame on a new k8s cluster and it fails to launch

% kubectl flame `kubectl get pods -n dev-load| grep sharding | awk '{print $1}'` -n=dev-load -t 10s -f ./flamegraph.svg --lang java
Verifying target pod ... โœ”
Launching profiler ... โŒ
2022/01/12 12:32:20 pod failed

% kubectl version

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T21:16:14Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:32:32Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

flame pod contains error message like this:

{"type":"progress","data":{"time":"2022-01-12T10:32:43.529897521Z","stage":"started"}}
{"type":"error","data":{"reason":"open /var/lib/docker/image/overlay2/layerdb/mounts/containerd://166013c88125d0123a7d957165e87ca0d3cb28ee6c42327d8b905e1c8796e58c/mount-id: no such file or directory"}}

everything works just fine with older versions of k8s

% kubectl flame `kubectl get pods -n jenkins| grep hub-list | awk '{print $1}'` -n=jenkins -t 10s -f ./flamegraph.svg --lang java
Verifying target pod ... โœ”
Launching profiler ... โœ”
Profiling ... โœ”
FlameGraph saved to: ./flamegraph.svg ๐Ÿ”ฅ
โˆš    
% kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T21:16:14Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.4", GitCommit:"3cce4a82b44f032d0cd1a1790e6d2f5a55d20aae", GitTreeState:"clean", BuildDate:"2021-08-11T18:10:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
% kubectl flame `kubectl get pods -n solid| grep sharding | awk '{print $1}'` -n=solid -t 10s -f ./flamegraph.svg --lang java
Verifying target pod ... โœ”
Launching profiler ... โœ”
Profiling ... โœ”
FlameGraph saved to: ./flamegraph.svg ๐Ÿ”ฅ
โˆš    
% kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T21:16:14Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:49Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

QUESTION:
Is there a workaround or some special security or filesystem settings in k8s cluster I need to check?

Improve ability to triage errors

I tried this plugin and I see this error

kubectl logs -n XXX kubectl-flame-80bde19a-17e5-4a15-8aa0-edf8dce6c58c-sb2zb
{"type":"progress","data":{"time":"2020-08-24T13:17:47.12752075Z","stage":"started"}}
{"type":"error","data":{"reason":"exit status 1"}}

kubectl flame -n XXXX XXX-894cd9c96-452nl
Verifying target pod ... โœ”
Launching profiler ... โœ”
Profiling ... Error: exit status 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.