dyninst / spi Goto Github PK

Makefile 1.89% C 2.62% Shell 7.12% Python 5.27% C++ 73.79% CMake 0.27% M4 0.29% JavaScript 7.69% HTML 0.11% CSS 0.05% Eiffel 0.89%

spi's Introduction

Dyninst

Notes

Known issues should have open issues associated with them.
ARMv8 (64 bit) support for dynamic instrumentation is experimental and incomplete. For more details about current supported functionality refer to Dyninst Support for the ARMv8 (64 bit).

Build DyninstAPI and its subcomponents

Docker Containers

Containers are provided that can be used for Dyninst development (e.g., make changes to Dyninst and quickly rebuild it) or for development of your own tools (e.g., have a container ready to go with Dyninst). Links will be added here when the containers are pushed to the Dyninst associated package registries. Instructions for usage and building locally are provided in the docker directory.

Install with Spack

spack install dyninst

Build from source

Configure Dyninst with CMake

cmake /path/to/dyninst/source -DCMAKE_INSTALL_PREFIX=/path/to/installation
Build and install Dyninst in parallel

make install -jN

If this does not work for you, please refer to the Wiki for detailed instructions. If you encounter any errors, see the Building Dyninst or leave a GitHub issue.

Known Issues

Windows 64-bit mode is not yet supported
Windows rewriter mode is not yet supported
Exceptions in relocated code will not be caught
Linux rewriter mode for 32-bit, statically linked binaries does not support binaries with .plt, .rel, or .rela sections.
Callbacks at thread or process exit that stop the process will deadlock when a SIGSEGV occurs on a thread other than the main thread of a process
Stackwalker is fragile on Windows
Parsing a binary with no functions (typically a single object file) will crash at CodeObject destruction time.

spi's People

Contributors

Stargazers

Watchers

Forkers

wenbinf tylergu ethane98 jerrylinlinlin

spi's Issues

SP_TRAP does not enable trap-only instrumentation

The SP_TRAP environment variable is received by SPI and debugging output is logged to indicate that it forces SPI to use trap-only instrumentation. However, the instrumentation workers don't use the variable or support forcing only trap-based instrumentation yet.

Smarter memory allocation mechanism in SPI

Currently SPI finds free intervals between objects and tries to find a free interval for each CodeObject. A better memory allocation mechanism should be developed.

Ideally we should have a single pool of memory for all CodeObject, and memory allocation requests would look for a chunk of memory within a specific range. When this pool runs out of memory, we would mmap more memory from the free intervals based on the /proc/$pid/maps file.

Right now we are trying some simpler workarounds.

Getting callee in payload exit function in the case of indirect call

In the case of indirect call with register-related addressing mode, SPI finds the callee by fetching saved register value from stack and computes the effective callee address. But this is not possible in the payload exit function as the saved registers are consumed by the original function call.

SPI not ready for public consumption due to the problem of using dyninst internal

SPI is using one of the dyninst internal header file.

In particular, SPI is using arch-x86.h under common/src. As this file is platform specific, dyninst is not supposed to expose this.

Right now, a custom built dyninst is used. To build this customized dyninst:

Move arch-x86.h from common/src to common/h
Remove the include statement of #include "common/src/Types.h" inside the new arch-x86.h, add include statement to include dyntypes.h, and manually add the missing definitions to arch-x86.h
Add COMMON_EXPORT before class and function names
Change the include statements #include "common/src/arch-x86.h" in other files inside dyninst to #include "arch-x86.h". Current files that need to be changed are:
- common/src/arch-x86.C
- common/src/arch.h
- instructionAPI/src/Instruction.C
- instructionAPI/src/InstructionDecoder-x86.C
- instructionAPI/src/Operation.C

A long term solution would be to expose codegenAPI from dyninst, so that SPI can directly use emit-* functions.

Change SPI to a CMAKE based project

To change SPI to use CMAKE so that we can share flags and dependencies with dyninst

FPVA plugin events

Exec calls:

execve: captures path/argvs/envs arguments, does not record XML trace event
execl, execlp, execle, execv, execvp: captures path argument, does not record XML trace event
execvpe, fexecve, execveat, etc: not recorded by FPVA plugin

Process calls:

fork: captures child pid, records XML trace event
clone, clone2, clone3, : not recorded by FPVA plugin
exit, exit_group: not recorded by FPVA plugin

File calls:

open, fopen: captures path argument, records XML trace event
openat, fdopen, freopen: not recorded by FPVA plugin
chmod: captures file name and mode, records XML trace event
close, fclose: not recorded by FPVA plugin
setuid, seteuid: captures username and uid, records XML trace event

Connection calls:

connect, accept: captures host ip and port, records XML trace event

Overall trace data:

pid, exe_name, working_dir, host, parent pid, real_user name and id, effective_user name and id, real_group name and id, effective_group name and id captured in XML trace
hostname, PID extracted directly from XML trace for python graph
parent PID extracted directly from XML trace, parent_exe obtained using os.path.basename during python graphing
init_exe and cur_exe extracted from XML trace exe_name using os.path.basename during python graphing, always the same
init_euid and cur_euid extracted from XML trace effective_user during python graphing, always the same

Python graphing:

fork, connect, accept, seteuid, execv, execve: events recorded in XML trace and parsed for python graphing
- accept events are dropped in final graph visualization
send, recv, clone, exit: not recorded in XML trace but they are parsed for python graphing
Procedure improvements: Refactor python graph builder to ignore miscellaneous trace errors (missing fields, missing parent/child nodes). Render raw/dedup image sets separately in Javascript instead of failing if one or the other doesn't exist.

New test suite is needed

The old test suite is out dated, we need new test suite to do unit test and blackbox testing.

Create new blackbox tests
Create unittest for inter-process propelling

dyninst is not exported normally in Self-Propelled Instrumentation

Include dyninstAPI in SPI

The Self-Propelled Instrumentation project is using the dyninst as source code. As @hainest suggested, this could be due to dyninst is not exported properly in SPI, or SPI is using some innards of dyninst.
One example is:
#include "proccontrol/src/int_process.h"

SPI's makefile issue

In the makefile of SPI, dyninst library's path points to platform dependent folders which do not exist in current release of dyninst. @hainest suggests that dyninst's structure may be changed overtime and the path should points to the lib folder in install directory. To solve this issue, it seems that the structure of makefile has to be reorganized.
Example:-L$(DYNINST_DIR)/proccontrol/$(PLATFORM)

Unable to find callee when the library functions are not bound

SPI is current not able to find the callee of plt stubs if the function address is not bound.

Programs now use plt stubs to call exported functions. The plt stubs usually consist of three instructions:

indirect jump to an entry in the GOT
push an index onto stack
jump to the resolver function

When program calls an exported function, there are two cases. In the case when the exported function is already bound (meaning the exported function address is already resolved and saved in the GOT), it just calls the first instruction in the plt stub. In the other case where the exported function is not bound yet, it calls the second instruction in the plt stub, which calls the resolver function. The first case is the easy case, we can get the callee through computing the effective address and looking for function by address. In the second case, we are unable to recognize any function call in the plt stub, thus missing this callee.

The tentative fix is to instrument the resolver function directly. Inside the resolver function, it tries to find the exported function address, and calls that function. If we can correctly parse the resolver function and recognize the call instruction at the end, we will be able to find all exported functions that we missed the plt stub.

parser hangs when using dyninst with openmp

When injecting SPI with LD_PRELOAD, parser hangs while parsing the code objects inside agent library's init.

Instrumentation needs to be stopped after exit function

The exit function in glibc calls destructors of class objects in global scope, including the ones in dyninst and SPI. So we need to stop instrumentation after the exit function is called.

Accessing Register Value can have race condition

In SPI's generated instrumentation snippet, e.g.

    7f0533470000( 1 bytes): push %rdi                 | 57 
    7f0533470001( 1 bytes): push %rsi                 | 56 
    7f0533470002( 1 bytes): push %rdx                 | 52 
    7f0533470003( 1 bytes): push %rcx                 | 51 
    7f0533470004( 2 bytes): push %r8                  | 41 50 
    7f0533470006( 2 bytes): push %r9                  | 41 51 
    7f0533470008( 1 bytes): push %rax                 | 50 
    7f0533470009( 1 bytes): lahf                      | 9f 
    7f053347000a( 3 bytes): seto %al                  |  f 90 c0 
    7f053347000d( 1 bytes): push %rax                 | 50 
    7f053347000e(10 bytes): mov 69f68088,%rax         | 48 b8 88 80 f6 69 3e 56  0  0   # Save stack pointer 
    7f0533470018( 3 bytes): mov %rsp,(%rax)           | 48 89 20                        # to a class member variable
    7f053347001b( 8 bytes): lea 0xffffff78(%rsp),%rsp | 48 8d a4 24 78 ff ff ff 
    7f0533470023( 3 bytes): mov %rsp,%rax             | 48 8b c4 
    7f0533470026( 6 bytes): add 8,%rax                | 48  5  8  0  0  0 
    7f053347002c( 4 bytes): movdqa %xmm0,(%rax)       | 66  f 7f  0 
    7f0533470030( 5 bytes): movdqa %xmm1,0x10(%rax)   | 66  f 7f 48 10 
    7f0533470035( 5 bytes): movdqa %xmm2,0x20(%rax)   | 66  f 7f 50 20 
    7f053347003a( 5 bytes): movdqa %xmm3,0x30(%rax)   | 66  f 7f 58 30 
    7f053347003f( 5 bytes): movdqa %xmm4,0x40(%rax)   | 66  f 7f 60 40 
    7f0533470044( 5 bytes): movdqa %xmm5,0x50(%rax)   | 66  f 7f 68 50 
    7f0533470049( 5 bytes): movdqa %xmm6,0x60(%rax)   | 66  f 7f 70 60 
    7f053347004e( 5 bytes): movdqa %xmm7,0x70(%rax)   | 66  f 7f 78 70 
    7f0533470053( 1 bytes): push %rax                 | 50 
    7f0533470054(10 bytes): mov 4d712380,%rdi         | 48 bf 80 23 71 4d 3e 56  0  0 
    7f053347005e(10 bytes): mov 9317c714,%rsi         | 48 be 14 c7 17 93  5 7f  0  0 
    7f0533470068( 5 bytes): call 9317bb0a             | e8 9d ba d0 5f

It pushes registers onto stack, and at the instruction at 7f053347000e and 7f0533470018, it moves %rsp to a class member variable of SpSnippet which is unique per SpPoint.

Now consider the multi-threaded case, where two threads are executing this same snippet in parallel. One thread saved the stack pointer to the class member variable, but haven't called the payload entry yet; the other thread tries to save the stack pointer to the class member variable and overwrites the old data by the first thread.

Need to reduce the amount of debug log

SPI is producing a huge amount of debug log, which can be tens of gigabytes.

We need better configuration for debug logging and get rid of unnecessary debug logs.

TrampGuard to prevent recursive instrumentation

SPI now suffers from recursive instrumentation, where the instrumented function is called when we are inside the instrumentation code.

One example is: SPI instruments the new operator at plt section. When we are running the instrumentation code, it calls the new operator and triggered the instrumentation code again. This causes an infinite recursion.

Ideal solution is to implement the trampGuard similar to the one in dyninstAPI. The trampGuard checks if we are inside instrumentation code or not. It skips instrumentation if it detects that it is inside the instrumentation code.

A workaround now is to discover all the functions that can cause recursive instrumentation and specifically avoid instrumenting them.

Limited access to parameter value and return value

Due to the structure of emitted baseTramp structure, we can only access the parameter values in the pre-instrumentation function; we can only access the return value of the call in the post-instrumentation function.

In the former case, it is possible to pass the parameter information from pre-instrumentation function to the post-instrumentation function. In the later case, it is impossible to get return value in the pre-instrumentation function since the function is not actually called yet.

Checking for libc++ functions is artificial in propeller

spi/src/agent/propeller.cc

Line 77 in cccf5b3

 if (func->name().find("std::")!=std::string::npos || func->name().find("cxx")!=std::string::npos) { 

We are currently doing pattern matching to skip propelling into libc++ functions. This is not outstanding issue and will be resolved after the trampGuard fix.

Reduce calls to getenv

SPI currently calls getenv() repeatedly during instrumentation instead of storing variables at start.

Additionally, getenv("X") should be replaced with getenv_bool("X"), defined as not (undefined v || v is "" || v is "0" || v is "false"). This checks for environment variable values instead of just definition, making configuration more user-friendly.