ranweiler / pete Goto Github PK

View Code? Open in Web Editor NEW

16.0 3.0 10.0 106 KB

A friendly wrapper around ptrace(2)

License: ISC License

Rust 93.41% Python 1.20% Shell 5.06% Makefile 0.30% C 0.03%

ptrace linux debugging

pete's Introduction

Pete

A friendly wrapper around the Linux ptrace(2) syscall.

Requirements

The current minimum supported OS and compiler versions are:

Linux 4.8
rustc 1.64

Continuous testing is only run for x86_64-unknown-linux-gnu.

Support for earlier Linux versions is possible, but low priority. Eventually, we would like to support any platform that provides ptrace(2).

Summary

The ptrace(2) interface entails interpreting a series of wait(2) statuses. The context used to interpret a status includes the attach options set on each tracee, previously-seen stops, recent ptrace requests, and in some cases, extra event data that must be queried using additional ptrace calls.

Pete is meant to instead permit reasoning directly about ptrace-stops, as described in the manual. We hide the lowest-level contextual bookkeeping required to disambiguate ptrace-stops. Whenever we can, we avoid extraneous ptrace calls, deferring to downstream tracers implemented on top of the library. For example, Pete can distinguish a syscall-enter-stop and syscall-exit-stop, but does not automatically query register state to identify the specific syscall.

License

Pete is licensed under the ISC License.

Contributing

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in pete by you, shall be licensed as ISC, without any additional terms or conditions.

pete's People

Contributors

Stargazers

Watchers

Forkers

glebpom adir-shemesh travitch bmc-msft s1341 porges openmina fridayortiz rstenvi tevoinea

pete's Issues

Improve docs

Among other things:

Add module docs
Hide re-exports

Sometimes problems when tracing

I am tracing a bash script that forks some sub-processes, maybe some sub-sub-processes, too.

In some cases the tracing runs without any problems, other times I get some error, like:
ESRCH: No such process on accessing the registers

while let Some(tracee) = self.tracer.wait()?
{
   let regs = tracee.registers()?;
   ...
}

I thought that the tracee should be in a stopped state in that situation. So how can it die in the meantime?

new release with bump of nix requirement

It would be useful to create a new release now that the nix prereq has been bumped. Otherwise users will have to pin to a GIT revision.

mark changes from 0.6.0 in changelog as part of 0.6.0

As is, 0.6.0 does not have an entry in the CHANGELOG.md. Many (most?) of the entries currently in the section named "[Unreleased]" were included in the 0.6.0 release.

Breaking in 0.12: tracee's that call wait(2)

In our build tracing tool (https://github.com/travitch/build-bom), upgrading to pete 0.12 causes the process to hang, whereas 0.11 runs as expected. This is likely due to the changes in #102 that went into 0.12.

The build-bom tool runs a build process that is monitored via pete. If the build process is make, then the latter operates by running sub-processes and then calling wait(2) to wait on their completion before performing the next action. This wait call never seems to complete when we upgraded to pete 0.12. This can be demonstrated by the test_blddir test for build_bom (the last test in the test_bom.rs file there). By modifying the test to run with the -d debug flag, we observe the pete 0.12 run stops on the first attempted make action:

...
  Finished prerequisites of target file 'blddir/obj'.
  Must remake target 'blddir/obj'.
make: Entering directory '/tmp/nix-shell.Nztet5/.tmpTcLAfb/blddir_test'
Makefile:16: update target 'blddir/obj' due to: target does not exist
mkdir -p blddir/obj
Putting child 0x474d00 (blddir/obj) PID 1852635 on the chain.
Live child 0x474d00 (blddir/obj) PID 1852635

with no further output and here the test hangs. Comparatively with pete 0.11 we can see it continue past that point:

...
  Finished prerequisites of target file 'blddir/obj'.
  Must remake target 'blddir/obj'.
make: Entering directory '/tmp/nix-shell.Nztet5/.tmp3Nd8RL/blddir_test'
Makefile:16: update target 'blddir/obj' due to: target does not exist
mkdir -p blddir/obj
Putting child 0x4742a0 (blddir/obj) PID 1830875 on the chain.
Live child 0x4742a0 (blddir/obj) PID 1830875
Reaping winning child 0x4742a0 PID 1830875
Removing child 0x4742a0 PID 1830875 from chain.
  Successfully remade target file 'blddir/obj'.
  Considering target file 'headers/target.h'.
... [more output...]

Using strace on the pete 0.12 version:

...
mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7ffff71d
9000
rt_sigprocmask(SIG_BLOCK, ~[], [], 8)   = 0
clone3({flags=CLONE_VM|CLONE_VFORK, exit_signal=SIGCHLD, stack=0x7ffff71d9000, stack_size=0x9000
}, 88) = 1855335
munmap(0x7ffff71d9000, 36864)           = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(1855335,

Conjecture as to cause:

We suspect that the subprocess run by make is not in the tracee list at https://github.com/ranweiler/pete/pull/102/files#diff-862cffa434b0d152dd6cd08f8eb0e84105690698957786052056808fd65b6667R439, thus causing the loop at https://github.com/ranweiler/pete/pull/102/files#diff-862cffa434b0d152dd6cd08f8eb0e84105690698957786052056808fd65b6667R474 to never exit, whereas the broader waitpid call in the previous version would have accepted the pid for the make subprocess and continued to the following code that would allow the make process to proceed.

Idea: Easy handling

Just an idea.

Right now: tracer.restart(tracee, Restart::Syscall).unwrap();

If the tracee would have a reference to the tracer, you coould allow:
tracee.restart(Restart::Syscall).unwrap();

Enable use of `PTRACE_GET_SYSCALL_INFO`

This request is available as of Linux 5.3. It lets us access a ptrace_syscall_info struct, which lets us reliably distinguish syscall-enter-stop and syscall-exit-stop. It also provides more direct access to information about the syscall itself (number, args, ret val).

Add way to set Ptracer.options

First of all can I just say this is an amazing crate, saved me tons of work!

I might be being blind but as far as I can see there's no public way to set Ptrace.options to anything other than the default Options::all(). Is that intentional?

Document platform support

We only continuously test support for x86-64 Linux. Eventually, we want to support any platform that supports ptrace(2), including ARM variants and macOS. But right now, to the extent that we do, it's due to luck. Further document both the status quo and eventual goals.

Make marker field private

We use a marker field to force Tracee to be !Send. This is accidentally pub.

Automate issue creation for out-of-date pre-1.0 deps

Pre-1.0 dependencies like libc and nix can make semver-legal breaking changes on minor version updates. In practice, they may even break on patch updates. It's probably best to track their latest minor version, at least, and to make sure we continuously test with the latest patch release of all deps.

If we can, automate issue creation against this policy, e.g. with Dependabot.

Remove `Error::Restart`

This doesn't convey any actionable information, and obscures that the real problem is a tracee that has died while ptrace-stopped. The only extra data it includes (the requested restart mode) will never be the cause of a restart error, so including it by default isn't useful.

Remove this error variant, and replace the one occurrence of it with an Error::TraceeDied.

Return `Err` from `wait()` if no registered tracees

Calls to Ptracer::wait() block, and in general, can hang. However, the Ptracer maintains an index of known tracees. If we have no tracees at all, then we should fail fast in wait() before blocking on a doomed internal call to waitpid().

To investigate: there may be out-of-order stop deliveries such that we temporarily have no registered tracees, e.g. around exec. This needs to be ruled out as an edge case. Even if this is true, we shouldn't hang on a call to wait() if attach() or spawn() were never called.

Conditionally define arch-specific features

We have several places where we unconditionally use values that are actually arch-specific.

Examples:

We always alias the x86 type libc::user_regs_struct with Register
nix::sys::ptrace::getregs() is only defined for some x86-64 platforms (this is a nix limitation)

And we'd like to add more. For now, just make it so the existing features are appropriately gated.

Revisit `Stop` variants

The Stop variants are tuple structs. This can be nice for pattern matching, but is unhelpful when we are presenting two PIDs (e.g. in Stop::Exec.

Revisit this, and probably switch to variants with named fields. Explore whether there is value in adding a struct for the data associated with each stop type.

Trouble tracing children through vfork

I was playing around with this and ran into some panics at line 421 of ptracer.rs. I think the cause might be a missing call to set_tracee_state around https://github.com/ranweiler/pete/blob/master/src/ptracer.rs#L346. It seems to work with that. If you think that sounds reasonable, I can put together a PR.

Compatibility to spawn_ptrace

Hi there,

your Command structure is not as powerful as the one shipped with rust when is comes to stdou-stdin-piping. But with the crate spawn_ptrace you can activate ptrace for the default rust command structure. I needed some hack to allow pete to work with the resulting process, because pete has no option to add an already attached pid. Maybe you have a look!

Update examples for aarch64

Update MSRV to 1.46.0

Support dependencies which use const_if_match, which was stabilized in 1.46.0.

Remove reundant suffix from `Stop` variants

Some of the variants of Stop have an identifier that ends in Stop. This is redundant, and done inconsistently, so remove it.

Ease user detection of tracee death via `ESRCH` errors

If a tracee dies in ptrace-stop, any ptrace operation will fail with ESRCH.

In some cases, like Ptracer::wait(), we detect extremal cases of this (when there are no more tracees), and we return None for the next stop event.

However, for our Error type, this detail gets buried, and typically requires inspection of the source.

One easy improvement would be to add a method like Error::tracee_died(&self). This would let users decide whether to attempt to continue waiting for more stops after an "expected" tracee death (if other tracees exist), or else bail.

Add support for `SEIZE`, `INTERRUPT`, `LISTEN` requests

Right now, the nix crate only provides ptrace::seize().

We could implement the others via raw calls to libc. It'd be better to wait / help nix grow support for these calls, include the missing PTRACE_EVENT_STOP variant of Event.

tracee hang infinitly if it exec on not main thread

tracee code

fn main() -> anyhow::Result<()> {
    std::thread::spawn(|| unsafe {
        libc::execl("/home/test/xxxx\0".as_ptr() as *const _, std::ptr::null());
    })
    .join()
    .unwrap();
    Ok(())
}

the tracer will got the main thread exit event and then remove it from tracees map.
and the tracer will never wait its event via waipid

but as the ptrace man:

PTRACE_O_TRACEEXEC (since Linux 2.5.46)
                     Stop the tracee at the next [execve(2)](https://man7.org/linux/man-pages/man2/execve.2.html).  A
                     [waitpid(2)](https://man7.org/linux/man-pages/man2/waitpid.2.html) by the tracer will return a status value
                     such that

                       status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8))

                     If the execing thread is not a thread group leader,
                     the thread ID is reset to thread group leader's ID
                     before this stop.  Since Linux 3.0, the former
                     thread ID can be retrieved with PTRACE_GETEVENTMSG.

pay attention to this section

If the execing thread is not a thread group leader, the thread ID is reset to thread group leader's ID before this stop.

as the man says, the exec event will still triggered on exited main thread rather than the execing thread, so ptracer.wait() will never recv the exec event due to the tracees map not contains the main thread

this seems a thorny problem, because it means you can not remove the exited tracee from tracees map. or you need recongnized the tgid while remove tracee when handle exit event

Tracer calls to `waitpid()` leave non-tracees un-waitable

Due to how Ptracer invokes waitpid(2) under the hood, we will consume wait(2) statuses of non-tracee child processes. This most notably breaks calls to std::process::Child::wait() for untraced children.

Repro case:

use std::process::Command;

use anyhow::Result;
use pete::{Ptracer, Restart};

fn main() -> Result<()> {
    // Not a tracee. We should be able to spawn this normally and `wait()` on its exit status.
    let mut untraced = Command::new("sleep").arg("2").spawn()?;

    // We will trace this, and it will exit before `untraced`.
    let mut traceme = Command::new("sleep");
    traceme.arg("0.2");

    let mut tracer = Ptracer::new();
    tracer.spawn(traceme)?;

    while let Some(tracee) = tracer.wait()? {
        eprintln!("{}: {:?}", tracee.pid, tracee.stop);

        tracer.restart(tracee, Restart::Continue)?;
    }

    eprintln!("waiting on: {}", untraced.id());
    untraced.wait()?;  // Fails due the silent consumption of its wait status by `tracer`

    eprintln!("ok!");

    Ok(())
}

target environment variables

It would be useful to enable target environment variables in Command

update nix dependency

Since nix is in 0.x, the ^-specifier does not allow the update from 0.17 to 0.19

Syscall tracing very very very slow

On current master, the following command takes a very long time:

❯ time cargo run --release --example syscalls -- ls
    Finished release [optimized] target(s) in 0.01s
     Running `target/release/examples/syscalls ls`
pid = 60232, pc = 7f33dc5e0c50: [execve], SyscallExit
pid = 60232, pc = 7f33dc5e5ccb: [brk], SyscallEnter
pid = 60232, pc = 7f33dc5e5ccb: [brk], SyscallExit
...
pid = 60232, pc = 7f33dbec4b06: Exiting { exit_code: 0 }

real	0m43.973s
user	0m0.054s
sys	0m0.049s

strace ls only takes 17 ms. Any idea what the difference is? I was under the impression that strace uses ptrace in the same way to Pete. So how is it 2500x faster?

Support accessing x86 debug registers

We do not currently support setting hardware breakpoints and watchpoints using pete APIs, but we could.

The ptrace(2) API provides the PEEKUSER/POKEUSER requests to read/write individual words from an arch-specific struct user context associated with a task. In the ptrace(2) setting, this struct is wholly virtual. That it is, it does not directly correspond to any struct actually used internally by the kernel. The user struct must be used with offsetof(3) to make a successful request. For example, on x86, PEEKUSER can only be used to access the user registers and the debug registers (and nothing else defined in struct user). See the x86 PEEKUSER impl. We already provide access to the same user registers via Tracee, so we only need the USER requests to access the debug registers. See also the old mailing list discussion here.

The nix wrapper for ptrace(2) does not expose the USER requests, so we'll need to invoke ptrace(2) directly. Rather than just exposing the full USER requests, we could instead provide feature-gated support for debug register access (only) on x86.

Re-export public types from dependencies

If a type from a dependency is accessible from a pete type, re-export it. Example: nix::sys::signal::Signal.

replacing panics & expect with returning results

It would be great to bubble out errors via Result rather than panics for stability of consumers of Pete

Add support for reading C strings

Some syscalls have const char * parameters pointing to a null-terminating string residing in program's memory (in my case, this is execve(2)). If the tracing program needs to analyze such parameter's value, it needs somehow to read memory char-by-char, instead of reading a fragment with a specified size.

A workaround would be to manually construct the path to /proc/[pid]/mem file, open, seek and read it sequentially.

Add continuous integration

Trigger via Actions, use Azure Pipelines.

Add integration tests

Automate some integration tests and add to CI, at least for some easy cases like syscall tracing, where we can compare tracepoints against the syscalls example binary.

Prevent invalid restarts from non-tracer threads

It is possible to compile code like the following:

let mut tracer = Ptracer::new();

// Attach to the tracee here, on the main thread.
let tracee = tracer.spawn(..)?;

// Move tracer/tracee off-thread, and attempt to restart _from_ a new TID.
std::thread::spawn(move || {
    tracer.restart(tracee, ..)?;
    // ..
});

This will always fail with an ESRCH, because the newly-spawned thread is not the tracer of tracee.

Instead, prevent this at compile-time by ensuring the right types are !Send.

Add a changelog

We're using semver, but still pre-1.0, so this is extra important for consumers. Keep it simple and follow the format here.