acln0 / perf Goto Github PK
View Code? Open in Web Editor NEWtentative golang.org/x/sys/unix/linux/perf package
tentative golang.org/x/sys/unix/linux/perf package
Hi all,
I've reached out to @acln0 privately to ask about the future of this package. Unfortunately, he won't be returning to OSS anytime soon, so we'll need to find an alternative working solution in the mean time. He agrees we should try to focus the community's efforts around a single repository, since there are currently multiple forks of this repo with different patch sets.
There's also a 'competing' implementation at https://github.com/hodgesds/perf-utils, (@hodgesds) though not sure about feature overlap.
My current goal is using this library to complement https://github.com/cilium/ebpf for easy tracepoint and kprobe support, and moving out most of the cilium/ebpf/perf
package. (cc @lmb)
Given my (very) limited familiarity with the kernel's perf API, I don't feel positioned to take up maintainership long-term, but I can already start gathering patches from forks for inclusion. (also, my availability over the summer will be limited)
I'd also like to gather the community's thoughts around this lib in particular.
Also, where should the package live in the mean time?
github.com/elastic/go-perf
? (@adriansr)github.com/cilium/perf
is available@pwaller @andrewkroh @florianl @tklauser @mdlayher @pengfei-su @markpash @fbegyn
cc, feel free to ignore/unsub if not interested, sorry for spamming!
Thanks, take care!
As noted in #18, the code is full of these.
In #25, I accidentally introduced an additional comment on evwait(sawcommevfd)
in the testComm test. This was unintentional, and doesn't otherwise relate to the fixes introduced in that PR.
This issue is here as a reminder to revisit this.
Lines 528 to 544 in 4d8e4e5
The current state for me is that if I apply the patch of #26 onto master at 4d8e4e5, then sudo ./perf.test -test.count=1000 -test.run=Record/Comm
passes with no failures for me.
However, if I uncomment the evwait
above, then the tests fail. In that case, ReadRecord doesn't return, even if the context deadline is set many seconds into the future. What I see is that the child ends up waiting for the signal sawcommevfd after changing its COMM, and never exits.
So the mystery is why ReadRecord doesn't return when this wait is present. It feels as though the kernel isn't respecting our wakeup events = 1. When the wait is commented out (erroneously by me), then the process exits, and we receive the event.
While looking into #23, I spied the skipped testComm test, and suspected it might be related to some of the issues I've been seeing.
I noticed that there were some ignored errors:
Line 1234 in 6861f4b
Line 1240 in 6861f4b
If I remove the skip and run go test -count=100 -run TestReadRecord/Comm
, then every ~100 tests or so I see EBADF returning from those, and sometimes I see hangs while one side or the other is waiting for an eventfd which will never get messaged.
The skip:
Line 548 in 6861f4b
https://github.com/acln0/perf/blob/master/perf.go#L24-L25
I assume the above is correct but wanted to double check. The perf docs say:
pid == 0 and cpu == -1
This measures the calling process/thread on any CPU.
I note that it uses the word process/thread
for pid==0
and I'm not really sure what this means.
The other part to my question is, how does this interact with runtime.LockOSThread()
? Can we ensure that Measure(func() {...})
doesn't measure other things going on within the process?
I note that golang/go#20458 was fixed and makes LockOSThread and UnlockOSThread safe to call in pairs, even if you don't know the outer state with which you are being called.
In #4 (comment) I commented that I was getting perf: failed to open event leader: perf_event_open: permission denied
with perf.perf_event_paranoid=2
.
Stepping through perf_event_open
with a kernel debugger I isolated the cause. The ExcludeKernel
option isn't being passed through.
Here's the code I wrote where I expected the option to be set:
var g perf.Group
g.CountFormat = perf.CountFormat{
Running: true,
ID: true,
}
g.Options.ExcludeKernel = true
g.Options.ExcludeHypervisor = true
g.Add(perf.Instructions, perf.CPUCycles)
Indeed, if I look at the leader attributes being used:
Lines 73 to 75 in e7587bd
... I find that g.Options
aren't being respected.
Line 1142 in 6861f4b
In the sampling mode, I found the number of samples is dramatically reduced if preciseIP =1,2,3 instead of 0. Did I miss sth important?
From @pwaller:
Calling g.Add(perf.Instructions, perf.CPUCycles, perf.ContextSwitches, tracepoint)
on a Group with ExcludeKernel = true
causes the tracepoint to not register any values.
Perhaps the Configurator returned by Tracepoint could be more clever, and set ExcludeKernel = false
.
Or perhaps a different API is in order, such as IncludeKernel(Configurator) Configurator
, or more generally, WithSomeOption(Configurator) Configurator
.
Thinking.
As far as I can tell, there is no way to pass flags into perf.Open.
Here it's passed as 0:
Line 121 in 6861f4b
I want to pass PERF_FLAG_FD_OUTPUT so that I can coalesce tracepoint events into a single ring buffer, where it is useful to preserve their ordering.
This is only a minor suggestion, but I found this confusing:
Lines 219 to 225 in 239c48f
Variables/consts beginning with the name err
are usually errors.
It refers to testErrDisabledProcessExist which doesn't exist.
It's unclear from the comment which side of the branch is "in" something. Is it in the returning side, or the non-returning side? This might be a rare case where "early return" is a minor hinderance and perhaps it would be better to phrase it positively, and move the body of code out, to make it clear when the special condition matches:
func init() {
if specialCondition {
doSpecialCondition()
}
}
--- FAIL: TestCount (0.08s)
--- FAIL: TestCount/Software (0.02s)
--- FAIL: TestCount/Software/PageFaults (0.02s)
count_test.go:99: failed to drop VM cache: open /proc/sys/vm/drop_caches: permission denied
--- FAIL: TestCount/IoctlAndCountIDsMatch (0.00s)
count_test.go:366: failed to drop VM cache: open /proc/sys/vm/drop_caches: permission denied
--- FAIL: TestReadRecord (0.01s)
--- FAIL: TestReadRecord/CPUWideSwitch (0.00s)
record_test.go:846: perf_event_open: permission denied
Please note CPUWideSwitch in particular - I have a suggestion/another discussion for the TestCounts.
One use case I have is to measure another process I want to start from Go. A slight challenge I immediately came up against was how to start the measurement before the process starts doing useful work.
Assumptions:
cmd := exec.Command()
cmd.Start() // required to get the Pid of the subprocess
... := perf.Open(..., cmd.Process.Pid, ...) // by the time we get here, the process may have already done some of the work we want to measure.
So my idea is to have some sort of user interface which takes the process to run. Either the *exec.Cmd
, or arguments that are passed to `exec.Command (though I guess that would be less flexible).
First there is a challenge to solve: How to create a process in Go such that it is in the stopped state, so that you can start measurement before it does work you want to measure? Searching around I'm surprised there is not an immediate solution to this. Ptrace
on SysProcAttr might do it - I haven't checked yet.
The next best thing I can think of is to rig up a shell process along the lines of sh -c 'kill -STOP $$; exec "$@"' -- ...
.
Whatever the solution is, it could be wrapped up in perf.Command
or alike.
I'd like to record hw_interrupts.received
:
perf stat -v -e hw_interrupts.received bash -c 'for i in {1..10000}; do echo hi; done > /dev/null'
Using CPUID GenuineIntel-6-9E
hw_interrupts.received -> cpu/umask=0x1,period=100003,event=0xcb/
hw_interrupts.received: 3 13527404 13527404
Performance counter stats for 'bash -c for i in {1..10000}; do echo hi; done > /dev/null':
3 hw_interrupts.received
I can't yet figure out how to do it, but I'll update the thread if I do figure out how to.
I'd be interested in seeing that the perf API does eventually cover this and other events given by perf list
in an intuitive way.
Tested on master just now @ 46e3c14.
$ go test -c
$ sudo ./perf.test -test.count=10
--- FAIL: TestCount (0.15s)
--- FAIL: TestCount/Software (0.00s)
--- FAIL: TestCount/Software/PageFaults (0.00s)
count_test.go:104: PERF_EVENT_IOC_DISABLE: bad file descriptor
--- FAIL: TestCount (0.12s)
--- FAIL: TestCount/Software (0.00s)
--- FAIL: TestCount/Software/PageFaults (0.00s)
count_test.go:104: PERF_EVENT_IOC_DISABLE: bad file descriptor
--- FAIL: TestReadRecord (0.24s)
--- FAIL: TestReadRecord/Comm (0.05s)
record_test.go:635: got context deadline exceeded, want valid record
--- FAIL: TestCount (0.11s)
--- FAIL: TestCount/Software (0.00s)
--- FAIL: TestCount/Software/PageFaults (0.00s)
count_test.go:104: PERF_EVENT_IOC_DISABLE: bad file descriptor
--- FAIL: TestCount/IoctlAndCountIDsMatch (0.02s)
count_test.go:368: didn't see a page fault
--- FAIL: TestCount (0.09s)
--- FAIL: TestCount/Software (0.00s)
--- FAIL: TestCount/Software/PageFaults (0.00s)
count_test.go:104: PERF_EVENT_IOC_DISABLE: bad file descriptor
--- FAIL: TestCount/IoctlAndCountIDsMatch (0.02s)
count_test.go:368: didn't see a page fault
--- FAIL: TestReadRecord (0.24s)
--- FAIL: TestReadRecord/Comm (0.05s)
record_test.go:635: got context deadline exceeded, want valid record
--- FAIL: TestCount (0.10s)
--- FAIL: TestCount/Software (0.01s)
--- FAIL: TestCount/Software/PageFaults (0.01s)
count_test.go:104: PERF_EVENT_IOC_DISABLE: bad file descriptor
--- FAIL: TestCount/IoctlAndCountIDsMatch (0.02s)
count_test.go:368: didn't see a page fault
--- FAIL: TestReadRecord (0.25s)
--- FAIL: TestReadRecord/Comm (0.05s)
record_test.go:635: got context deadline exceeded, want valid record
--- FAIL: TestCount (0.09s)
--- FAIL: TestCount/Software (0.00s)
--- FAIL: TestCount/Software/PageFaults (0.00s)
count_test.go:104: PERF_EVENT_IOC_DISABLE: bad file descriptor
--- FAIL: TestCount/IoctlAndCountIDsMatch (0.01s)
count_test.go:368: didn't see a page fault
--- FAIL: TestReadRecord (0.27s)
--- FAIL: TestReadRecord/Comm (0.05s)
record_test.go:635: got context deadline exceeded, want valid record
--- FAIL: TestCount (0.09s)
--- FAIL: TestCount/Software (0.01s)
--- FAIL: TestCount/Software/PageFaults (0.01s)
count_test.go:104: PERF_EVENT_IOC_DISABLE: bad file descriptor
--- FAIL: TestCount/IoctlAndCountIDsMatch (0.02s)
count_test.go:368: didn't see a page fault
--- FAIL: TestReadRecord (0.30s)
--- FAIL: TestReadRecord/Comm (0.05s)
record_test.go:635: got context deadline exceeded, want valid record
--- FAIL: TestCount (0.09s)
--- FAIL: TestCount/Software (0.01s)
--- FAIL: TestCount/Software/PageFaults (0.01s)
count_test.go:104: PERF_EVENT_IOC_DISABLE: bad file descriptor
--- FAIL: TestCount/IoctlAndCountIDsMatch (0.02s)
count_test.go:368: didn't see a page fault
--- FAIL: TestReadRecord (0.26s)
--- FAIL: TestReadRecord/Comm (0.06s)
record_test.go:635: got context deadline exceeded, want valid record
--- FAIL: TestCount (0.10s)
--- FAIL: TestCount/Software (0.00s)
--- FAIL: TestCount/Software/PageFaults (0.00s)
count_test.go:104: PERF_EVENT_IOC_DISABLE: bad file descriptor
--- FAIL: TestCount/IoctlAndCountIDsMatch (0.02s)
count_test.go:368: didn't see a page fault
--- FAIL: TestReadRecord (0.28s)
--- FAIL: TestReadRecord/Comm (0.05s)
record_test.go:635: got context deadline exceeded, want valid record
FAIL
I had a group filled with hardware counters, then I added a Tracepoint for sys_enter_write
to the group. I always got zero for my tracepoint counter, even though I was not expecting this.
Subsequently, I made a group only containing tracepoint counters, but then I get:
perf: empty event group
Here's a reproducer:
main.go
:package main
import (
"fmt"
"log"
"os"
"acln.ro/perf"
)
func main() {
var g perf.Group
g.CountFormat = perf.CountFormat{}
g.Options.ExcludeKernel = true
g.Options.ExcludeHypervisor = true
tp := perf.Tracepoint("syscalls", "sys_enter_write")
g.Add(tp)
counts, err := g.Open(perf.CallingThread, perf.AnyCPU)
if err != nil {
log.Fatal(err)
}
c, err := counts.MeasureGroup(func() {
os.Stdout.WriteString("Hi\n")
})
if err != nil {
log.Fatal(err)
}
for _, v := range c.Values {
fmt.Println(v)
}
}
I've been messing around with perf, using inherit
to trace a process tree.
What I've found is that when the process tree exits, we get HUP on the poll, and whatever events are in the kernel buffer at that time are lost.
I'm not sure I fully understand what's going on, but I've found anecdotally that here:
Lines 130 to 133 in 6861f4b
If I insert:
if ev.readRawRecordNonblock(raw) {
return nil
}
Before return ErrDisabled
, then it appears the events are not lost.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.