Giter Club home page Giter Club logo

pystack's Introduction


OS Linux PyPI - Python Version PyPI - Implementation PyPI PyPI - Downloads Tests Code Style

PyStack

Print the stack trace of a running Python process, or of a Python core dump.

PyStack is a tool that uses forbidden magic to let you inspect the stack frames of a running Python process or a Python core dump, helping you quickly and easily learn what it's doing (or what it was doing when it crashed) without having to interpret nasty CPython internals.

What PyStack can do

PyStack has the following amazing features:

  • ๐Ÿ’ป Works with both running processes and core dump files.
  • ๐Ÿงต Shows if each thread currently holds the Python GIL, is waiting to acquire it, or is currently dropping it.
  • ๐Ÿ—‘๏ธ Shows if a thread is running a garbage collection cycle.
  • ๐Ÿ Optionally shows native function calls, as well as Python ones. In this mode, PyStack prints the native stack trace (C/C++/Rust function calls), except that the calls to Python callables are replaced with frames showing the Python code being executed, instead of showing the internal C code the interpreter used to make the call.
  • ๐Ÿ” Automatically demangles symbols shown in the native stack.
  • ๐Ÿ“ˆ Includes calls to inlined functions in the native stack whenever enough debug information is available.
  • ๐Ÿ” Optionally shows the values of local variables and function arguments in Python stack frames.
  • ๐Ÿ”’ Safe to use on running processes. PyStack does not modify any memory or execute any code in a process that is running. It simply attaches just long enough to read some of the process's memory.
  • โšก Optionally, it can perform a Python stack analysis without pausing the process at all. This minimizes impact to the debugged process, at the cost of potentially failing due to data races.
  • ๐Ÿš€ Super fast! It can analyze core files 10x faster than general-purpose tools like GDB.
  • ๐ŸŽฏ Even works with aggressively optimized Python interpreter binaries.
  • ๐Ÿ” Even works with Python interpreters' binaries that do not have symbols or debug information (Python stack only).
  • ๐Ÿ’ฅ Tolerates memory corruption well. Even if the process crashed due to memory corruption, PyStack can usually reconstruct the stack.
  • ๐Ÿ’ผ Self-contained: it does not depend on external tools or programs other than the Python interpreter used to run PyStack itself.

What platforms are supported?

At this time only Linux is supported.

Building from source

If you wish to build PyStack from source, you need the following binary dependencies in your system:

  • libdw
  • libelf

Note that sometimes both libraries are provided together as part of a distribution's elfutils package.

Check your package manager on how to install these dependencies (e.g., apt-get install libdw-dev libelf-dev in Debian-based systems). Note that you may need to tell the compiler where to find the header and library files of the dependencies for the build to succeed. If pkg-config is available (e.g. apt-get install pkg-config on Debian-based systems), it will automatically be used to locate the libraries and configure the correct build flags. Check your distribution's documentation to determine the location of the header and library files or for more detailed information. When building on Alpine Linux (or any other distribution that doesn't use glibc) you'll need elfutils 0.188 or newer. You may need to build this from source if your distribution's package manager doesn't have it.

Once you have these binary dependencies installed, you can clone the repository and follow the typical build process for Python libraries:

git clone [email protected]:bloomberg/pystack.git pystack
cd pystack
python3 -m venv ../pystack-env/  # just an example, put this wherever you want
source ../pystack-env/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -e .
python3 -m pip install -r requirements-test.txt -r requirements-extra.txt

This will install PyStack in the virtual environment in development mode (the -e of the last pip install command), and then install the Python libraries needed to test it, lint it, and generate its documentation.

If you plan to contribute back, you should install the pre-commit hooks:

pre-commit install

This will ensure that your contribution passes our linting checks.

Documentation

You can find the full documentation here.

Usage

PyStack uses distinct subcommands for analyzing running processes and core dump files.

usage: pystack [-h] [-v] [--no-color] {remote,core} ...

Get Python stack trace of a remote process

options:
  -h, --help     show this help message and exit
  -v, --verbose
  --no-color     Deactivate colored output

commands:
  {remote,core}  What should be analyzed by PyStack (use <command> --help for a command-specific help section).
    remote       Analyze a remote process given its PID
    core         Analyze a core dump file given its location and the executable

Analyzing running processes

The remote command is used to analyze the status of a running (remote) process. The analysis is always done in a safe and non-intrusive way, as no code is loaded in the memory space of the process under analysis and no memory is modified in the remote process. This makes analysis using PyStack a great option even for those services and applications that are running in environments where the running process must not be impacted in any way (other than being temporarily paused, though --no-block can avoid even that). There are several options available:

usage: pystack remote [-h] [-v] [--no-color] [--no-block] [--native] [--native-all] [--locals] [--exhaustive] pid

positional arguments:
  pid            The PID of the remote process

options:
  -h, --help     show this help message and exit
  -v, --verbose
  --no-color     Deactivate colored output
  --no-block     do not block the process when inspecting its memory
  --native       Include the native (C) frames in the resulting stack trace
  --native-all   Include native (C) frames from threads not registered with the interpreter (implies --native)
  --locals       Show local variables for each frame in the stack trace
  --exhaustive   Use all possible methods to obtain the Python stack info (may be slow)

To use PyStack, you just need to provide the PID of the process:

$ pystack remote 112
Traceback for thread 112 [] (most recent call last):
    (Python) File "/test.py", line 17, in <module>
        first_func()
    (Python) File "/test.py", line 6, in first_func
        second_func()
    (Python) File "/test.py", line 10, in second_func
        third_func()
    (Python) File "/test.py", line 14, in third_func
        time.sleep(1000)

Analyzing core dumps

The core subcommand is used to analyze the status of a core dump file. Analyzing core files is very similar to analyzing processes but there are some differences, as the core file does not contain the totality of the memory that was valid when the program was live. In most cases, this makes no difference, as PyStack will try to adapt automatically. However, in some cases, you will need to specify extra command line options to help PyStack locate the information it needs. When analyzing cores, there are several options available:

usage: pystack core [-h] [-v] [--no-color] [--native] [--native-all] [--locals] [--exhaustive] [--lib-search-path LIB_SEARCH_PATH | --lib-search-root LIB_SEARCH_ROOT] core [executable]

positional arguments:
  core                  The path to the core file
  executable            (Optional) The path to the executable of the core file

options:
  -h, --help            show this help message and exit
  -v, --verbose
  --no-color            Deactivate colored output
  --native              Include the native (C) frames in the resulting stack trace
  --native-all          Include native (C) frames from threads not registered with the interpreter (implies --native)
  --locals              Show local variables for each frame in the stack trace
  --exhaustive          Use all possible methods to obtain the Python stack info (may be slow)
  --lib-search-path LIB_SEARCH_PATH
                        List of paths to search for shared libraries loaded in the core. Paths must be separated by the ':' character
  --lib-search-root LIB_SEARCH_ROOT
                        Root directory to search recursively for shared libraries loaded into the core.

In most cases, you just need to provide the location of the core to use PyStack with core dump files:

$ pystack core ./the_core_file
Using executable found in the core file: /usr/bin/python3.8

Core file information:
state: t zombie: True niceness: 0
pid: 570 ppid: 1 sid: 1
uid: 0 gid: 0 pgrp: 570
executable: python3.8 arguments: python3.8

The process died due receiving signal SIGSTOP
Traceback for thread 570 [] (most recent call last):
    (Python) File "/test.py", line 19, in <module>
        first_func({1: None}, [1,2,3])
    (Python) File "/test.py", line 7, in first_func
        second_func(x, y)
    (Python) File "/test.py", line 12, in second_func
        third_func(x, y)
    (Python) File "/test.py", line 16, in third_func
        time.sleep(1000)

License

PyStack is Apache-2.0 licensed, as found in the LICENSE file.

Code of Conduct

This project has adopted a Code of Conduct. If you have any concerns about the Code, or behavior that you have experienced in the project, please contact us at [email protected].

Security Policy

If you believe you have identified a security vulnerability in this project, please send an email to the project team at [email protected], detailing the suspected issue and any methods you've found to reproduce it.

Please do NOT open an issue in the GitHub repository, as we'd prefer to keep vulnerability reports private until we've had an opportunity to review and address them.

Contributing

We welcome your contributions to help us improve and extend this project!

Below you will find some basic steps required to be able to contribute to the project. If you have any questions about this process or any other aspect of contributing to a Bloomberg open source project, feel free to send an email to [email protected] and we'll get your questions answered as quickly as we can.

Contribution Licensing

Since this project is distributed under the terms of an open source license, contributions that you make are licensed under the same terms. For us to be able to accept your contributions, we will need explicit confirmation from you that you are able and willing to provide them under these terms, and the mechanism we use to do this is called a Developer's Certificate of Origin (DCO). This is similar to the process used by the Linux kernel, Samba, and many other major open source projects.

To participate under these terms, all that you must do is include a line like the following as the last line of the commit message for each commit in your contribution:

Signed-Off-By: Random J. Developer <[email protected]>

The simplest way to accomplish this is to add -s or --signoff to your git commit command.

You must use your real name (sorry, no pseudonyms, and no anonymous contributions).

Steps

  • Create an Issue, select 'Feature Request', and explain the proposed change.
  • Follow the guidelines in the issue template presented to you.
  • Submit the Issue.
  • Submit a Pull Request and link it to the Issue by including "#" in the Pull Request summary.

pystack's People

Contributors

aelsayed95 avatar alicederyn avatar apurvakhatri avatar austinlg96 avatar chaimhaas avatar dependabot[bot] avatar ecalifornica avatar flpm avatar godlygeek avatar gwen-sarapata avatar helithumper avatar ivonastojanovic avatar jayybhatt avatar kkkoo7 avatar mgmacias95 avatar olgarithms avatar pablogsal avatar pistachiocannoli avatar sarahmonod avatar stefmolin avatar yize415 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pystack's Issues

Support running on gz files directly

Is there an existing proposal for this?

  • I have searched the existing proposals

Is your feature request related to a problem?

Core dumps are often automatically archived to save disk space (in my case using the .gz) format. However, pystack does not allow running directly on these files, requiring a raw core dump. This means it introduces an extra manual step of unzipping the core dump before I can use pystack to see the stacktrace.

Describe the solution you'd like

pystack should automate this kind of core dump extraction; it could see the file ends with .gz and call gzip unarchive in a temp file, and run against the temp file. This would be a quality of life improvement.

Alternatives you considered

Keep doing the manual method...

Reach 100% coverage of Python code

Our make pycoverage command currently has --cov-fail-under=97 to fail the test suite if the Python code coverage level drops below 80%. Let's get that up to 100% (adding a # pragma: no cover for anything that's not reasonably testable), and update this to --cov-fail-under=100.

Global state is ugly state

We handle setting the version currently using a very ugly global state and that makes us sad. Although there is nothing inherently wrong with this because we don't intend to be able to process several applications in parallel (at the moment, maybe there is space of pystack as a library) this is not really very elegant.

It would fill us with joy if we could move all this global state to some encapsulation.

Failure to unwind stack in an after-fork handler

After causing a deadlock in an after-fork handler installed with pthread_atfork, an attempt to unwind that process with pystack remote --native-all is giving me

Engine error: basic_string::_S_construct null not valid

That's happening because dwfl_getthread_frames is not finding any frames, and also not setting dwfl_errno to something non-zero. Interestingly, this doesn't seem to reproduce with eu-stack, so we might be doing something wrong here that's causing this.

#101 fixes the failure mode that we get here, but we should figure out why unwinding is failing, as both gdb and eu-stack succeed.

I think the documents is poor or makefile is poor or bug.

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

On AWS EC2(t2 family), I followed your documents.
and in Readme.md file there is no build section.
so I type make to use makefile to build.
I got following errors.
-> no pkgconfig
-> ok, pip install pkconfig
-> no Cython
-> ok, pip install cython
-> then Python.h No such file directory.

(pystack-env) ubuntu@ip-172-31-14-29:~/utils/pystack$ make
python setup.py build_ext --inplace
Traceback (most recent call last):
  File "/home/ubuntu/utils/pystack/setup.py", line 6, in <module>
    import pkgconfig
ModuleNotFoundError: No module named 'pkgconfig'
make: *** [Makefile:16: build] Error 1
(pystack-env) ubuntu@ip-172-31-14-29:~/utils/pystack$ vi Makefile 
(pystack-env) ubuntu@ip-172-31-14-29:~/utils/pystack$ vi setup.py 
(pystack-env) ubuntu@ip-172-31-14-29:~/utils/pystack$ pip install pkgconfig
Collecting pkgconfig
  Using cached pkgconfig-1.5.5-py3-none-any.whl (6.7 kB)
Installing collected packages: pkgconfig
Successfully installed pkgconfig-1.5.5
(pystack-env) ubuntu@ip-172-31-14-29:~/utils/pystack$ make
python setup.py build_ext --inplace
Traceback (most recent call last):
  File "/home/ubuntu/utils/pystack/setup.py", line 8, in <module>
    from Cython.Build import cythonize
ModuleNotFoundError: No module named 'Cython'
make: *** [Makefile:16: build] Error 1
(pystack-env) ubuntu@ip-172-31-14-29:~/utils/pystack$ pip install Cython
Collecting Cython
  Using cached Cython-3.0.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.2 kB)
Using cached Cython-3.0.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
Installing collected packages: Cython
Successfully installed Cython-3.0.5
(pystack-env) ubuntu@ip-172-31-14-29:~/utils/pystack$ make
python setup.py build_ext --inplace
pkg-config not found. pkg-config probably not installed: FileNotFoundError(2, 'No such file or directory')
Falling back to static flags.
Compiling src/pystack/_pystack.pyx because it depends on /home/ubuntu/utils/pystack-env/lib/python3.11/site-packages/Cython/Includes/libcpp/vector.pxd.
[1/1] Cythonizing src/pystack/_pystack.pyx
running build_ext
building 'pystack._pystack' extension
creating build
creating build/temp.linux-x86_64-cpython-311
creating build/temp.linux-x86_64-cpython-311/src
creating build/temp.linux-x86_64-cpython-311/src/pystack
creating build/temp.linux-x86_64-cpython-311/src/pystack/_pystack
x86_64-linux-gnu-gcc -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -Isrc/pystack/_pystack -I/home/ubuntu/utils/pystack-env/include -I/usr/include/python3.11 -c src/pystack/_pystack.cpp -o build/temp.linux-x86_64-cpython-311/src/pystack/_pystack.o -std=c++17
src/pystack/_pystack.cpp:66:10: fatal error: Python.h: No such file or directory
   66 | #include "Python.h"
      |          ^~~~~~~~~~
compilation terminated.
error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
make: *** [Makefile:16: build] Error 1

Expected Behavior

make should be done.

Steps To Reproduce

  1. AWS EC2(t2-micro)
  2. I install python with deadsnakes ppa
  3. git clone this
  4. just follow documents.

Pystack Version

from source code ( Always run required CI check commit 772a0bc)

Python Version

3.11

Linux distribution

Ubuntu

Anything else?

No response

Report stacks for suspended asyncio tasks

Add a new command line switch to ask pystack to try to find suspended asyncio tasks and print their stacks. It would do this by finding and walking the CPython implementation's linked list of Python objects to find all asyncio tasks, identify the coroutine wrapped by each one, and finding the most recent stack frame of that coroutine.

This is probably quite a tricky change, but would be a very cool feature.

๐Ÿ’€ Engine error: Function not implemented ๐Ÿ’€

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I am SSHing into a Spark executor, and I'm trying to look at what it's doing. I am getting the following unusual error:

$ pystack remote <PID>
๐Ÿ’€ Engine error: Function not implemented ๐Ÿ’€

$ pystack -v remote 197 --exhaustive
INFO(process_remote): Analyzing process with pid 197 using stack method StackMethod.ALL with native mode NativeReportingMode.OFF
INFO(parse_maps_file_for_binary): python binary first map found: VirtualMap(start=0x0000000000400000, end=0x0000000000423000, filesize=0x23000, offset=0x0, device='09:7f', flags='r--p', inode=54676376, path='/usr/bin/python3.8')
INFO(parse_maps_file_for_binary): Process does not have a libpython.so, reading from binary
INFO(parse_maps_file_for_binary): Heap map found: VirtualMap(start=0x00000000011e4000, end=0x0000000001873000, filesize=0x68f000, offset=0x0, device='00:00', flags='rw-p', inode=0, path='[heap]')
INFO(_get_bss): Determined exact addr of .bss section: 0x93db20 (0x400000 + 0x53db20)
INFO(parse_maps_file_for_binary): bss map found: VirtualMap(start=0x000000000093db20, end=0x0000000000960ba8, filesize=0x23088, offset=0x53cb20, device='', flags='', inode=0, path='None')
INFO(parse_maps_file_for_binary): python binary first map found: VirtualMap(start=0x0000000000400000, end=0x0000000000423000, filesize=0x23000, offset=0x0, device='09:7f', flags='r--p', inode=54676376, path='/usr/bin/python3.8')
INFO(parse_maps_file_for_binary): Process does not have a libpython.so, reading from binary
INFO(parse_maps_file_for_binary): Heap map found: VirtualMap(start=0x00000000011e4000, end=0x0000000001873000, filesize=0x68f000, offset=0x0, device='00:00', flags='rw-p', inode=0, path='[heap]')
INFO(_get_bss): Determined exact addr of .bss section: 0x93db20 (0x400000 + 0x53db20)
INFO(parse_maps_file_for_binary): bss map found: VirtualMap(start=0x000000000093db20, end=0x0000000000960ba8, filesize=0x23088, offset=0x53cb20, device='', flags='', inode=0, path='None')
INFO(process_remote): Trying to stop thread 197
INFO(process_remote): Waiting for thread 197 to be stopped
INFO(process_remote): Process 197 attached
INFO(process_remote): Attempting to find symbol 'Py_Version' in /usr/bin/python3.8
๐Ÿ’€ Engine error: Function not implemented ๐Ÿ’€

Expected Behavior

Normal PyStack session

Steps To Reproduce

Inside a PySpark 3.3.2 executor (python3 -m pyspark.daemon),

$ pgrep python

# pick any of the PIDs and
$ pystack remote <PID>

No combination of CLI flags brings back any information beyond the ๐Ÿ’€ Engine error: Function not implemented ๐Ÿ’€ error.

Pystack Version

1.0.1

Python Version

3.8

Linux distribution

Ubuntu

Anything else?

No response

Build on aarch64 fails

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

On an aarch64 (ubuntu) system, installation fails (it's a build from source due to missing wheels)

$ sudo apt install libdw-dev libelf-dev 
$ pip install pystack

Collecting pystack
  Using cached pystack-1.0.1.tar.gz (82 kB)                                                                                                 
  Installing build dependencies ... done                              
  Getting requirements to build wheel ... done                                                                                              
  Preparing metadata (pyproject.toml) ... done                                                                                              
Building wheels for collected packages: pystack                       
  Building wheel for pystack (pyproject.toml) ... error                                                                                     
  error: subprocess-exited-with-error                                                                                                       
                                                                      
  ร— Building wheel for pystack (pyproject.toml) did not run successfully.
  โ”‚ exit code: 1                                                      
  โ•ฐโ”€> [89 lines of output]
      warning: no previously-included files found matching 'src/pystack/*.h'
      /tmp/pip-build-env-yll9scnv/overlay/lib/python3.11/site-packages/setuptools/command/build_py.py:201: _Warning: Package 'pystack._pysta
ck' is absent from the `packages` configuration.
      !!
      
              ********************************************************************************
              ############################
              # Package would be ignored #
              ############################
              Python recognizes 'pystack._pystack' as an importable package[^1],
              but it is absent from setuptools' `packages` configuration.
      
              This leads to an ambiguous overall configuration. If you want to distribute this
              package, please make sure that 'pystack._pystack' is explicitly added
              to the `packages` configuration field.
      
              Alternatively, you can also rely on setuptools' discovery methods
              (for example by using `find_namespace_packages(...)`/`find_namespace:`
              instead of `find_packages(...)`/`find:`).
      
              You can read more about "package discovery" on setuptools documentation page:
      
              - https://setuptools.pypa.io/en/latest/userguide/package_discovery.html
      
              If you don't want 'pystack._pystack' to be distributed and are
              already explicitly excluding 'pystack._pystack' via
              `find_namespace_packages(...)/find_namespace` or `find_packages(...)/find`,
              you can try to use `exclude_package_data`, or `include-package-data=False` in
              combination with a more fine grained `package-data` configuration.
      
              You can read more about "package data files" on setuptools documentation page:
      
              - https://setuptools.pypa.io/en/latest/userguide/datafiles.html
      
      
              [^1]: For Python, any directory (with suitable naming) can be imported,
                    even if it does not contain any `.py` files.
                    On the other hand, currently there is no concept of package data
                    directory, all directories are treated like packages.
              ********************************************************************************
      
      !!
        check.warn(importable)
      /tmp/pip-build-env-yll9scnv/overlay/lib/python3.11/site-packages/setuptools/command/build_py.py:201: _Warning: Package 'pystack._pysta
ck.cpython' is absent from the `packages` configuration.
      !!
      
              ********************************************************************************
              ############################
              # Package would be ignored #
              ############################
              Python recognizes 'pystack._pystack.cpython' as an importable package[^1],
              but it is absent from setuptools' `packages` configuration.
      
              This leads to an ambiguous overall configuration. If you want to distribute this
              package, please make sure that 'pystack._pystack.cpython' is explicitly added
              to the `packages` configuration field.
      
              Alternatively, you can also rely on setuptools' discovery methods
              (for example by using `find_namespace_packages(...)`/`find_namespace:`
              instead of `find_packages(...)`/`find:`).
      
              You can read more about "package discovery" on setuptools documentation page:
      
              - https://setuptools.pypa.io/en/latest/userguide/package_discovery.html
      
              If you don't want 'pystack._pystack.cpython' to be distributed and are
              already explicitly excluding 'pystack._pystack.cpython' via
              `find_namespace_packages(...)/find_namespace` or `find_packages(...)/find`,
              you can try to use `exclude_package_data`, or `include-package-data=False` in
              combination with a more fine grained `package-data` configuration.
      
              You can read more about "package data files" on setuptools documentation page:
      
              - https://setuptools.pypa.io/en/latest/userguide/datafiles.html
      
      
              [^1]: For Python, any directory (with suitable naming) can be imported,
                    even if it does not contain any `.py` files.
                    On the other hand, currently there is no concept of package data
                    directory, all directories are treated like packages.
              ********************************************************************************
      
      !!
        check.warn(importable)
      src/pystack/_pystack/unwinder.cpp: In function โ€˜int pystack::frameCallback(Dwfl_Frame*, void*)โ€™:
      src/pystack/_pystack/unwinder.cpp:105:18: error: โ€˜dwfl_frame_regโ€™ was not declared in this scope; did you mean โ€˜dwfl_frame_pcโ€™?
        105 |         if (0 != dwfl_frame_reg(state, stackPointerRegNo.value(), &stackPointer.value())) {
            |                  ^~~~~~~~~~~~~~
            |                  dwfl_frame_pc
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]
   
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pystack
Failed to build pystack
ERROR: Could not build wheels for pystack, which is required to install pyproject.toml-based projects

Expected Behavior

Installation works

Steps To Reproduce

use an aarch64 system with debian installed

$ sudo apt install libdw-dev libelf-dev 
$ pip install pystack
# Observe failure

As far as i can tell, non-python dependencies documented in the "build from source" section are installed - therefore i'd expect this to work.

Pystack Version

latest

Python Version

3.11

Linux distribution

Ubuntu

Anything else?

Ideally, pystack would provide wheels for the aarch architecture, which would circumvent this problem alltogether.

Remove --self flag

Originally posted by @godlygeek in #139 (comment)

I think we should probably just remove this flag. It isn't exercised by our tests, and it appears to have bit rotted:

$ pystack --self
usage: pystack [-h] [-v] [--no-color] {remote,core} ...
pystack: error: the following arguments are required: command
$ pystack --self remote
usage: pystack remote [-h] [-v] [--no-color] [--no-block] [--native] [--native-all] [--locals] [--exhaustive] [--self] pid
pystack remote: error: the following arguments are required: pid

I think we removed the code for handling this flag at some point, and missed removing the flag itself.

@alicederyn Would you be interested in updating the PR to remove all mentions of --self instead?

Analyzing core files generated from Python files with shebang

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When we try to analyze a core file that is generated from a script with a shebang or any other form of non-ELF file with executable permissions, Pystack will fail to analyze it because it will detect that the provided executable is not an ELF file (Executable and Linkable Format) and therefore it cannot be used to analyze the core file.

Expected Behavior

When Pystack detects that the provided file is not an ELF file, Pystack should analyze the core file that we provided in order to determine which Python executable to use. To properly analyze the core file, we can pass the full path to the executable that was used in the original invocation as the second argument to the pystack core command.

$ pystack core $(COREFILE) /full/path/to/the/python/executable

Steps To Reproduce

To reproduce the issue generate a core file from a Python file with a shebang.

[pgalindo3@pyfrad-ob-299 ~]$ cat program.py
#!/opt/bb/bin/python3.10
import time
time.sleep(100)

[pgalindo3@pyfrad-ob-299 ~]$ chmod +x program.py

[pgalindo3@pyfrad-ob-299 ~]$ ./program.py &
[1] 104173

[pgalindo3@pyfrad-ob-299 ~]$ gcore 104173
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f89765ec49b in select () from /lib64/libc.so.6
warning: target file /proc/104173/cmdline contained unexpected null characters
Saved corefile core.104173
[Inferior 1 (process 104173) detached]

[pgalindo3@pyfrad-ob-299 ~]$ pystack core core.104173
Using executable found in the core file: program.py

Core file information:
state: t zombie: True niceness: 0
pid: 104173 ppid: 102709 sid: 102709
uid: 26835 gid: 100 pgrp: 104173
executable: python3.10 arguments: /opt/bb/bin/python3.10

The process died due receiving signal SIGSTOP
๐Ÿ’€ The provided executable (program.py) doesn't have an executable format ๐Ÿ’€

Pystack Version

1.3.0

Python Version

3.7, 3.8, 3.9, 3.10, 3.11, 3.12

Linux distribution

Debian, Ubuntu, Fedora, Red Hat, Arch Linux, Alpine Linux, Other

Anything else?

No response

Utilize `pkgconfig` to find elfutils

The https://pypi.org/project/pkgconfig/ library provides a way to determine the right compiler and linker flags to use for building against a particular C library. We're not currently leveraging this, and are instead assuming that the elfutils headers and libraries are in the system's default search paths. We should update our setup.py to try to use pkgconfig to locate the right flags for building and linking against libelf and libdw, but fall back to our current behavior if the pkgconfig query fails.

Rewrite the README

The current README is using the internal version and is not very descriptive or engaging. We should bootstrap it with some improved version and make it better over time.

Segments can be missing read permissions

For some reasons (probably some security shenanigans) some segments lack all permissions and in particular the read permission. This can cause process_vm_readv to fail as if the memory address that we requested is invalid and does not exist.

We should filter these segments when we do our analysis to locate the relevant segments for the heap, bss....etc

elfutils unwinding may be busted on Linux aarch64 when unwinding muslc

When I try to unwind in an Alpine Linux docker container in a aarch64 host seems that elfutils goes brrrrrrrrrrrrrrrrrrrrrrrrrrrr and chokes with an instruction pointer and keeps looping on it, so unwinding takes as much as the heat death of the universe.

We should try to add some magic sparkles so that unwinding doesn't take forever.

Reach 100% coverage of Cython code

Our Python + Cython coverage currently is close to (but not quite at) 100%. #79 has made the Python code 100% covered, but there are a few things in src/pystack/_pystack.pyx that still aren't covered, and #86 will force the --cov-fail-under=100 to be bumped back to a lower minimum coverage. Let's get that up to 100% again, including Cython this time!

Fix type error in traceback formatter

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

mypy check failing due to a mismatch between signature and return type. Looks like the return value can be removed, as the only call to the function expects an iterable, matching the function signature.

Error:

python -m mypy src/pystack --strict --ignore-missing-imports
src/pystack/traceback_formatter.py:113: error: No return value expected  [return-value]
Found 1 error in 1 file (checked 11 source files)

Relevant function:

def _format_merged_stacks(
thread: PyThread, current_frame: Optional[PyFrame]
) -> Iterable[str]:
for frame in thread.native_frames:
if frame_type(frame, thread.python_version) == NativeFrame.FrameType.EVAL:
assert current_frame is not None
yield from format_frame(current_frame)
current_frame = current_frame.next
while current_frame and not current_frame.is_entry:
yield from format_frame(current_frame)
current_frame = current_frame.next
continue
elif frame_type(frame, thread.python_version) == NativeFrame.FrameType.IGNORE:
continue
elif frame_type(frame, thread.python_version) == NativeFrame.FrameType.OTHER:
function = colored(frame.symbol, "yellow")
yield (
f' {colored("(C)", "blue")} File "{frame.path}",'
f" line {frame.linenumber},"
f" in {function} ({colored(frame.library, attrs=['faint'])})"
)
else: # pragma: no cover
raise ValueError(
f"Invalid frame type: {frame_type(frame, thread.python_version)}"
)
return current_frame

Usage:

yield from _format_merged_stacks(thread, current_frame)

Expected Behavior

No CI failure.

Steps To Reproduce

  1. In development container.
  2. Run make lint

Pystack Version

1.3.0

Python Version

3.12

Linux distribution

Ubuntu

Anything else?

No response

Unwinding cores that stack overflowed can take forever

This program:

        max_iters = 1000000
         i = filter(bool, range(max_iters))
         for _ in range(max_iters):
             i = filter(bool, i)
         del i

Can generate a core that takes forever to unwind in some machines with gigantic stack sizes. For example, this can generate 58854+ frames. We may want to limit the number of frames we unwind. This also may be some kind of bug where we don't advance the IP when unwinding. Maybe we need to investigate a bit more.

Example of using eu-stack from elfutils:

...
#82404 0x00007f0888f7c93c filter_dealloc
#82405 0x00007f0888f7c93c filter_dealloc
#82406 0x00007f0888f7c93c filter_dealloc
#82407 0x00007f0888f7c93c filter_dealloc
#82408 0x00007f0888f7c93c filter_dealloc
#82409 0x00007f0888f7c93c filter_dealloc
#82410 0x00007f0888f7c93c filter_dealloc
#82411 0x00007f0888f7c93c filter_dealloc
#82412 0x00007f0888f7c93c filter_dealloc
#82413 0x00007f0888f7c93c filter_dealloc
#82414 0x00007f0888f7c93c filter_dealloc
#82415 0x00007f0888f7c93c filter_dealloc
#82416 0x00007f0888f7c93c filter_dealloc
#82417 0x00007f0888f7c93c filter_dealloc
#82418 0x00007f0888f7c93c filter_dealloc
#82419 0x00007f0888f7c93c filter_dealloc
#82420 0x00007f0888f7c93c filter_dealloc
#82421 0x00007f0888f7c93c filter_dealloc
#82422 0x00007f0888f7c93c filter_dealloc
#82423 0x00007f0888f7c93c filter_dealloc
#82424 0x00007f0888f7c93c filter_dealloc
#82425 0x00007f0888f7c93c filter_dealloc
#82426 0x00007f0888f7c93c filter_dealloc
#82427 0x00007f0888f7c93c filter_dealloc
#82428 0x00007f0888f7c93c filter_dealloc
#82429 0x00007f0888f7c93c filter_dealloc
#82430 0x00007f0888f7c93c filter_dealloc
#82431 0x00007f0888f7c93c filter_dealloc
#82432 0x00007f0888f7c93c filter_dealloc
#82433 0x00007f0888f7c93c filter_dealloc
#82434 0x00007f0888f7c93c filter_dealloc
#82435 0x00007f0888f7c93c filter_dealloc
#82436 0x00007f0888f7c93c filter_dealloc
#82437 0x00007f0888f7c93c filter_dealloc
#82438 0x00007f0888f7c93c filter_dealloc
#82439 0x00007f0888f7c93c filter_dealloc
#82440 0x00007f0888f7c93c filter_dealloc
#82441 0x00007f0888f7c93c filter_dealloc
#82442 0x00007f0888f7c93c filter_dealloc
#82443 0x00007f0888f7c93c filter_dealloc
#82444 0x00007f0888f7c93c filter_dealloc
...

Add a page to our published docs for our changelog

Currently our changelog isn't very interesting, but it will be as we add new features. To prepare ourselves for the future, add NEWS.rst to our published PyStack documentation, like it is in our Memray documentation.

Exercise C++ code coverage in CI

Run the make ccoverage command in CI, and somehow publish the coverage report for each CI run so that it can be reached from the PR.

Report pure Python stacks for multiple subinterpreters

When multiple subinterpreters are being run in the same Python process and --native mode is not in use, iterate over every Python interpreter in the process, dumping the tracebacks for the threads in each one in turn.

Version check makes potentially invalid assumptions about ELF layout

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I have been trying to debug why pystack thinks I am using python version 12.41 and it turns out that my python binary has a different layout. This problem seems to only occur (or at least I've only noticed thus far) when it attempts to find hte value of Py_Version. The version of python supplied by ubuntu has elf section for rodata that looks like this (obtained with readelf -S).

  [18] .rodata           PROGBITS         0000000000312000  00312000
       000000000008806d  0000000000000000   A       0     0     32

Mine looks like this:

  [16] .rodata           PROGBITS         00000000008741c0  004741c0
       000000000035f330  0000000000000000   A       0     0     64

Expected Behavior

The proper way to look up the address would be something like

<addr of Py_Version> - 0x00000000008741c0 + 004741c0

Since these values are the same in most python binaries, the issue would go unnoticed. I am not sure if this is some guarantee that the normal python build process makes or not, so this could also bite regular python versions later.

I was able to validate that the calculation above would work for my binary, where 0x0000000000ba2c68 is the address that pystack is attempting to look up.

 dd if=(path to python) bs=1 skip=$((0x0000000000ba2c68-0x00000000008741c0+0x004741c0)) count=8| hexdump16+0 records in
16+0 records out
16 bytes copied, 8.1777e-05 s, 196 kB/s
0000000 04f0 030b 0000 0000

Steps To Reproduce

I am not sure how you would easily reproduce this issue as you'd need to produce a python binary that has the rodata addresses like the one in my example. If you are able to do that the issue reproduces very easily and all functionality of pystack will fail.

Pystack Version

1.3.0

Python Version

3.11

Linux distribution

Ubuntu

Anything else?

No response

Python API for running on current process

Is there an existing proposal for this?

  • I have searched the existing proposals

Is your feature request related to a problem?

I want to be able to get a full stack dump of the running process, like faulthandler.dump_traceback(), but with native stack frames

Describe the solution you'd like

A function in the pystack module that could dump the stack of the running process including native stack frames, something like

import pystack
pystack.dump_traceback()

Alternatives you considered

I can invoke pystack via subprocess but it would be much more intuitive to use it directly since it's written in Python :)

Report stacks for all greenlets

Add a new command line switch to ask pystack to try to find suspended greenlets and print their stacks. It would do this by finding and walking the CPython implementation's linked list of Python objects to find all greenlets and to identify the most recent stack frame of each one.

This is probably quite a tricky change, but would be a very cool feature.

Consider a "brief" native mode only showing C frames below the last Python frame

Is there an existing proposal for this?

  • I have searched the existing proposals

Is your feature request related to a problem?

We default to showing pure Python stacks, both for performance and for legibility to people who aren't systems programmers, but often these aren't very useful for figuring out exactly why a program had a hang or crash, since it's not always clear what might be happening inside of the current Python call. The solution to this is running with --native or --native-all mode, but these modes are both very verbose.

Describe the solution you'd like

@alicederyn has pitched an idea for me: it might be possible to eliminate some of the noise and performance implications of --native mode while still providing enough information to puzzle out deadlocks or blocking happening in the C code called from the user's Python code if have a mode that only shows the C frames after the last Python frame. That is, instead of our pure-Python stacks:

Traceback for thread 112 [] (most recent call last):
    (Python) File "/test.py", line 17, in <module>
        first_func()
    (Python) File "/test.py", line 6, in first_func
        second_func()
    (Python) File "/test.py", line 10, in second_func
        third_func()
    (Python) File "/test.py", line 14, in third_func
        time.sleep(1000)

And instead of the hybrid stacks that we show with --native:

Traceback for thread 112 [] (most recent call last):
    (C) File "???", line 0, in _start ()
    (C) File "???", line 0, in __libc_start_main ()
    (C) File "Modules/main.c", line 743, in Py_BytesMain (/usr/lib/libpython3.8.so.1.0)
    (C) File "Modules/main.c", line 689, in Py_RunMain (/usr/lib/libpython3.8.so.1.0)
    (C) File "Modules/main.c", line 610, in pymain_run_python (inlined) (/usr/lib/libpython3.8.so.1.0)
    (C) File "Modules/main.c", line 385, in pymain_run_file (inlined) (/usr/lib/libpython3.8.so.1.0)
    (C) File "Python/pythonrun.c", line 472, in PyRun_SimpleFileExFlags (/usr/lib/libpython3.8.so.1.0)
    (C) File "Python/pythonrun.c", line 439, in pyrun_simple_file (inlined) (/usr/lib/libpython3.8.so.1.0)
    (C) File "Python/pythonrun.c", line 1085, in pyrun_file (/usr/lib/libpython3.8.so.1.0)
    (C) File "Python/pythonrun.c", line 1188, in run_mod (/usr/lib/libpython3.8.so.1.0)
    (C) File "Python/pythonrun.c", line 1166, in run_eval_code_obj (/usr/lib/libpython3.8.so.1.0)
    (Python) File "/test.py", line 17, in <module>
        first_func()
    (C) File "Python/ceval.c", line 4963, in call_function (inlined) (/usr/lib/libpython3.8.so.1.0)
    (C) File "Objects/call.c", line 284, in function_code_fastcall (inlined) (/usr/lib/libpython3.8.so.1.0)
    (Python) File "/test.py", line 6, in first_func
        second_func()
    (C) File "Python/ceval.c", line 4963, in call_function (inlined) (/usr/lib/libpython3.8.so.1.0)
    (C) File "Objects/call.c", line 284, in function_code_fastcall (inlined) (/usr/lib/libpython3.8.so.1.0)
    (Python) File "/test.py", line 10, in second_func
        third_func()
    (C) File "Python/ceval.c", line 4963, in call_function (inlined) (/usr/lib/libpython3.8.so.1.0)
    (C) File "Objects/call.c", line 284, in function_code_fastcall (inlined) (/usr/lib/libpython3.8.so.1.0)
    (Python) File "/test.py", line 14, in third_func
        time.sleep(1000)
    (C) File "Python/ceval.c", line 4963, in call_function (inlined) (/usr/lib/libpython3.8.so.1.0)
    (C) File "Modules/timemodule.c", line 338, in time_sleep (/usr/lib/libpython3.8.so.1.0)
    (C) File "Modules/timemodule.c", line 1866, in pysleep (inlined) (/usr/lib/libpython3.8.so.1.0)
    (C) File "???", line 0, in __select ()

We'd show a stack that switches to showing C frames automatically after the last Python frame:

Traceback for thread 112 [] (most recent call last):
    (Python) File "/test.py", line 17, in <module>
        first_func()
    (Python) File "/test.py", line 6, in first_func
        second_func()
    (Python) File "/test.py", line 10, in second_func
        third_func()
    (Python) File "/test.py", line 14, in third_func
        time.sleep(1000)
    (C) File "Python/ceval.c", line 4963, in call_function (inlined) (/usr/lib/libpython3.8.so.1.0)
    (C) File "Modules/timemodule.c", line 338, in time_sleep (/usr/lib/libpython3.8.so.1.0)
    (C) File "Modules/timemodule.c", line 1866, in pysleep (inlined) (/usr/lib/libpython3.8.so.1.0)
    (C) File "???", line 0, in __select ()

This could be faster than running with --native, as we should be able to avoid unwinding and symbolizing the entire call stack and instead stop when we reach the first _PyEval_EvalFrameDefault call (though currently we do all the unwinding before any of the symbolizing, and that would need to change if we wanted to break out of unwinding early).

@pablogsal feels this would be a confusing default, but is willing to consider it as a --native=brief or --native-all=brief flag.

Alternatives you considered

No response

Handle bss sections that span multiple segments

Is possible that the .bss section of a core file spans multiple segments. We currently handle this suboptimally as we create a virtual "segment" for the entire bss section when we pass it to the process manager. The problem with this is that when we need to copy the entire .bss section at once (or really anything that is not fully contained in a single segment) our memory copy machinery complains because is not prepared to work with a chunk of memory that spans multiple segments.

There are two possibilities here:

  • Fix the memory copy class to deal with multiple segments.
  • Change the process manager to receive a list of segments for the .bss section instead of a single one.

Can it be used to analyze flaky tests with pytest?

Is there an existing proposal for this?

  • I have searched the existing proposals

Is your feature request related to a problem?

Hello team,

This looks like a very promising tool.

I was wondering if there is a chance to use it together with pytest to get traces and debug flaky tests?

Thanks for the help

Describe the solution you'd like

I would love for PyStack to be able to be used as a pytest plugin to analyze core dumps or flaky processes when errors occur.

Alternatives you considered

For now, not real alternative I consider

Decide on a LICENSE

We should decide on a license for the project, taking into account the things we are linking again and other (likely internal) considerations.

Optimize performance with caching

Pystack calls a lot of sys calls where some of which are very expensive and that is what makes Pystack slower. One of the most expensive sys calls is copying the memory from a remote process to the local process. Each time a local process requires some portion of memory from a remote process it calls a sys call to copy memory. By running the strace -c -- python3 -m pystack remote PID --locals which report some statistics on the program it has traced. In the picture below the process_vm_readv syscall takes a lot of time.

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 35.36    0.005828           2      2390           process_vm_readv
 12.80    0.002109           2       716        92 stat
 10.55    0.001738           2       751           read
  8.53    0.001405           3       360        31 open
  7.32    0.001206         603         2           wait4
  3.57    0.000589           3       171           mmap
  3.18    0.000524           0       570           fstat
  2.79    0.000460         460         1           clone
  2.74    0.000451           1       356           close
  2.40    0.000395           0       518         2 lseek
  2.35    0.000387          19        20           munmap
  1.76    0.000290           0       453       447 ioctl
  1.66    0.000274           4        57           mprotect
  1.31    0.000216          11        19           openat
  0.91    0.000150           0       236           write
  0.64    0.000106           2        36           getdents
  0.39    0.000064           0        74           brk
  0.34    0.000056           5        10        10 access
  0.30    0.000049          12         4         1 connect
  0.22    0.000036           4         9         1 readlink
  0.14    0.000023           3         7           poll
  0.14    0.000023           5         4           socket
  0.10    0.000017           8         2           ptrace
  0.10    0.000017           2         8           futex
  0.09    0.000015           3         5           sendto
  0.04    0.000007           1         6           fcntl
  0.04    0.000006           3         2           recvmsg
  0.03    0.000005           0        68           rt_sigaction
  0.03    0.000005           5         1           execve
  0.03    0.000005           5         1           epoll_create1
  0.02    0.000004           1         3           dup
  0.02    0.000004           1         3           getuid
  0.01    0.000002           2         1           rt_sigprocmask
  0.01    0.000002           2         1           getrlimit
  0.01    0.000002           2         1           getgid
  0.01    0.000002           2         1           geteuid
  0.01    0.000002           2         1           getegid
  0.01    0.000002           2         1           arch_prctl
  0.01    0.000002           2         1           set_tid_address
  0.01    0.000002           2         1           set_robust_list
  0.00    0.000000           0        10           lstat
  0.00    0.000000           0         2           pread64
  0.00    0.000000           0         1           recvfrom
  0.00    0.000000           0         1           setsockopt
  0.00    0.000000           0         1           getsockopt
  0.00    0.000000           0         1           gettid
------ ----------- ----------- --------- --------- ----------------
100.00    0.016480                  6887       584 total

Adding cache should be a good solution to reduce the number of these sys calls which will make Pystack faster. Firstly, Pystack will try to find information about some portion of process memory in a cache, and if there is no information then sys call for copying the process memory is called. This cache will be used when analyzing a remote process and a core file.

Create a logo for the project

We need a cool logo for the pystack project that we can show in the README and in the documentation (for now ๐Ÿ˜„ ).

Decide what to do about debug strings and 3.7

As this comment says, the debug format in f-strings were introduced in 3.8. We should decide what to do about it as PyStack officially supports 3.7 and therefore should be able to run in 3.7. Options that I can think of:

  1. Degrade the __repr__ output in 3.7, same in other versions
  2. Reimplement same output with 3.7+ compliant features
  3. Remove the enhanced output altogether

Dockerfile missing `file` binary

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When attempting to run make check without file installed, we get a list of E FileNotFoundError: [Errno 2] No such file or directory: 'file' errors

python3.10 -m pytest -vvv --log-cli-level=info -s --color=yes  tests
==================================================== test session starts =====================================================platform linux -- Python 3.10.11, pytest-7.3.1, pluggy-1.0.0 -- /venv/bin/python3.10
cachedir: .pytest_cache
rootdir: /src
plugins: xdist-3.2.1, cov-4.0.0
collected 319 items / 8 errors                                                                                               

=========================================================== ERRORS ===========================================================__________________________________ ERROR collecting tests/integration/test_core_analyzer.py __________________________________tests/integration/test_core_analyzer.py:19: in <module>
    from tests.utils import ALL_PYTHONS
tests/utils.py:72: in <module>
    AVAILABLE_PYTHONS = tuple(find_all_available_pythons())
tests/utils.py:59: in find_all_available_pythons
    result = subprocess.run(
/usr/lib/python3.10/subprocess.py:503: in run
    with Popen(*popenargs, **kwargs) as process:
/usr/lib/python3.10/subprocess.py:971: in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
/usr/lib/python3.10/subprocess.py:1863: in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
E   FileNotFoundError: [Errno 2] No such file or directory: 'file'
__________________________________ ERROR collecting tests/integration/test_gather_stacks.py __________________________________tests/integration/test_gather_stacks.py:18: in <module>
    from tests.utils import ALL_PYTHONS
tests/utils.py:72: in <module>
    AVAILABLE_PYTHONS = tuple(find_all_available_pythons())
tests/utils.py:59: in find_all_available_pythons
    result = subprocess.run(
/usr/lib/python3.10/subprocess.py:503: in run
    with Popen(*popenargs, **kwargs) as process:
/usr/lib/python3.10/subprocess.py:971: in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
/usr/lib/python3.10/subprocess.py:1863: in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
E   FileNotFoundError: [Errno 2] No such file or directory: 'file'
_______________________________________ ERROR collecting tests/integration/test_gc.py ________________________________________tests/integration/test_gc.py:7: in <module>
    from tests.utils import ALL_PYTHONS
tests/utils.py:72: in <module>
    AVAILABLE_PYTHONS = tuple(find_all_available_pythons())
tests/utils.py:59: in find_all_available_pythons
    result = subprocess.run(
/usr/lib/python3.10/subprocess.py:503: in run
    with Popen(*popenargs, **kwargs) as process:
/usr/lib/python3.10/subprocess.py:971: in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
/usr/lib/python3.10/subprocess.py:1863: in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
E   FileNotFoundError: [Errno 2] No such file or directory: 'file'
_______________________________________ ERROR collecting tests/integration/test_gil.py _______________________________________tests/integration/test_gil.py:5: in <module>
    from tests.utils import ALL_PYTHONS
tests/utils.py:72: in <module>
    AVAILABLE_PYTHONS = tuple(find_all_available_pythons())
tests/utils.py:59: in find_all_available_pythons
    result = subprocess.run(
/usr/lib/python3.10/subprocess.py:503: in run
    with Popen(*popenargs, **kwargs) as process:
/usr/lib/python3.10/subprocess.py:971: in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
/usr/lib/python3.10/subprocess.py:1863: in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
E   FileNotFoundError: [Errno 2] No such file or directory: 'file'
_________________________________ ERROR collecting tests/integration/test_local_variables.py _________________________________tests/integration/test_local_variables.py:8: in <module>
    from tests.utils import ALL_PYTHONS
tests/utils.py:72: in <module>
    AVAILABLE_PYTHONS = tuple(find_all_available_pythons())
tests/utils.py:59: in find_all_available_pythons
    result = subprocess.run(
/usr/lib/python3.10/subprocess.py:503: in run
    with Popen(*popenargs, **kwargs) as process:
/usr/lib/python3.10/subprocess.py:971: in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
/usr/lib/python3.10/subprocess.py:1863: in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
E   FileNotFoundError: [Errno 2] No such file or directory: 'file'
_____________________________________ ERROR collecting tests/integration/test_process.py _____________________________________tests/integration/test_process.py:14: in <module>
    from tests.utils import ALL_PYTHONS
tests/utils.py:72: in <module>
    AVAILABLE_PYTHONS = tuple(find_all_available_pythons())
tests/utils.py:59: in find_all_available_pythons
    result = subprocess.run(
/usr/lib/python3.10/subprocess.py:503: in run
    with Popen(*popenargs, **kwargs) as process:
/usr/lib/python3.10/subprocess.py:971: in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
/usr/lib/python3.10/subprocess.py:1863: in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
E   FileNotFoundError: [Errno 2] No such file or directory: 'file'
________________________________ ERROR collecting tests/integration/test_relocatable_cores.py ________________________________tests/integration/test_relocatable_cores.py:13: in <module>
    from tests.utils import generate_core_file
tests/utils.py:72: in <module>
    AVAILABLE_PYTHONS = tuple(find_all_available_pythons())
tests/utils.py:59: in find_all_available_pythons
    result = subprocess.run(
/usr/lib/python3.10/subprocess.py:503: in run
    with Popen(*popenargs, **kwargs) as process:
/usr/lib/python3.10/subprocess.py:971: in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
/usr/lib/python3.10/subprocess.py:1863: in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
E   FileNotFoundError: [Errno 2] No such file or directory: 'file'
______________________________________ ERROR collecting tests/integration/test_smoke.py ______________________________________tests/integration/test_smoke.py:11: in <module>
    from tests.utils import generate_core_file
tests/utils.py:72: in <module>
    AVAILABLE_PYTHONS = tuple(find_all_available_pythons())
tests/utils.py:59: in find_all_available_pythons
    result = subprocess.run(
/usr/lib/python3.10/subprocess.py:503: in run
    with Popen(*popenargs, **kwargs) as process:
/usr/lib/python3.10/subprocess.py:971: in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
/usr/lib/python3.10/subprocess.py:1863: in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
E   FileNotFoundError: [Errno 2] No such file or directory: 'file'
================================================== short test summary info ===================================================ERROR tests/integration/test_core_analyzer.py - FileNotFoundError: [Errno 2] No such file or directory: 'file'
ERROR tests/integration/test_gather_stacks.py - FileNotFoundError: [Errno 2] No such file or directory: 'file'
ERROR tests/integration/test_gc.py - FileNotFoundError: [Errno 2] No such file or directory: 'file'
ERROR tests/integration/test_gil.py - FileNotFoundError: [Errno 2] No such file or directory: 'file'
ERROR tests/integration/test_local_variables.py - FileNotFoundError: [Errno 2] No such file or directory: 'file'
ERROR tests/integration/test_process.py - FileNotFoundError: [Errno 2] No such file or directory: 'file'
ERROR tests/integration/test_relocatable_cores.py - FileNotFoundError: [Errno 2] No such file or directory: 'file'
ERROR tests/integration/test_smoke.py - FileNotFoundError: [Errno 2] No such file or directory: 'file'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 8 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!===================================================== 8 errors in 0.63s ======================================================make: *** [Makefile:53: check] Error 2```

### Expected Behavior

Tests to run past execution

### Steps To Reproduce

1. Build Dockerfile
2. Open bash in Dockerfile
3. Ensure other dependencies are installed
4. Run `make check`

### Pystack Version

https://github.com/bloomberg/pystack/commit/6de749f015486a26c9c57265ce32a50976f59bd4

### Python Version

3.10

### Linux distribution

Ubuntu, Other

### Anything else?

_No response_

Additonal Documentation and make Targets for Docker

Is there an existing proposal for this?

  • I have searched the existing proposals

Is your feature request related to a problem?

When working on this project from a MacOS device, I had issues with a few different files, in particular, errors related to elf.h. To solve this, I used the included Dockerfile to work on the code (with #69 to fix an issue there). There is a lack of documentation in the README.md about using Docker with this project, however there is a Dockerfile provided.

Describe the solution you'd like

I think adding additional info to README.md around using Docker with this project would be useful. In particular,

  • Adding a copy-paste-able command docker run -it -v $(pwd):/src $IMAGENAME and docker build . -t $IMAGENAME
  • Possibly adding a target to the Makefile to build the image and run bash in a container based on that image.

Alternatives you considered

Alternatively, a Vagrantfile could also be created to spin up a Linux VM for working on development in for users not in a Linux system.

Documentation on how to run this program natively on MacOS would also be another good idea.

The test suite is not prepared to run in systems with poor symbol information

Running the test suite in systems where there isn't enough symbolic information makes it fail even if there isn't anything wrong with pystack. This can be fixed some times by installing debug packages for the interpreter (python3-dbg in debian), but some times this is not possible either because we are not using system packages for the interpreter (e.g. pyenv) or because other arcane reasons.

We should skip these failing tests using some mechanism that validates that we have enough information to run them.

Optimize core file reading

Pystack can print the stack trace of a Python core dump by running pystack core $CORE_FILE. During the execution of this command, Pystack needs to copy memory from the core file and/or several shared libraries. In order to read something from a file one needs to open a file, change the current read position and then read the data from the file. All of these require system calls and these operations take time. By running the strace -cw -- python3 -m pystack core CORE_FILE we can get some statistics on how expensive system calls are when we execute pystack with a core file. In the table below we can see the system calls that are issued when we run pystack over a core together with the number of calls per system call and how much wall time it takes in total per system call. What we want to do is to reduce the total time we spend in the open/close/read system calls when operating over files:

% time     seconds  usecs/call     calls    errors syscall 

------ ----------- ----------- --------- --------- ---------------- 

15.20    0.066126          55      1188       392 stat 

14.50    0.063094          50      1238        45 openat 

13.38    0.058219          48      1199         3 close 

11.82    0.051429          50      1015           read 

  9.08    0.039498          45       863         3 lseek 

  7.63    0.033201          52       632           fstat 

  6.75    0.029364          57       507           mmap 

  5.59    0.024325          60       405           munmap 

  4.24    0.018435          48       380           fcntl 

  4.20    0.018264          44       413       329 readlink 

  3.31    0.014414          52       273       173 ioctl 

  1.47    0.006394          96        66           write 

  0.94    0.004084          60        68           rt_sigaction 

  0.53    0.002298          95        24           getdents64 

  0.45    0.001950          67        29           brk 

  0.36    0.001555          59        26           mprotect 

  0.12    0.000508          50        10           lstat 

  0.09    0.000402          50         8           pread64 

  0.06    0.000245         244         1           execve 

  0.04    0.000185          46         4           futex 

  0.04    0.000173          57         3           sigaltstack 

  0.03    0.000151          75         2           getcwd 

  0.03    0.000149          49         3           dup 

  0.02    0.000102          50         2         1 arch_prctl 

  0.02    0.000084          83         1           sysinfo 

  0.02    0.000068          68         1           gettid 

  0.01    0.000055          55         1           set_robust_list 

  0.01    0.000053          52         1           getrandom 

  0.01    0.000052          51         1         1 access 

  0.01    0.000051          50         1           rt_sigprocmask 

  0.01    0.000050          50         1           prlimit64 

  0.01    0.000043          43         1           set_tid_address 

------ ----------- ----------- --------- --------- ---------------- 

100.00    0.435018                  8367       947 total  

When a program reads data from a file, then a location in that file is accessed and transferred from the hard drive to a buffer in memory. In order to try to reduce these system calls we want to use memory mapped files. The idea of memory mapping is to map a core file directly into a virtual memory pages. One of the reasons we think this is going to be faster is because accessing directly the memory mapped by mmap doesn't require to go to kernel mode and back (skips the syscall cost). To read more about mmap: https://man7.org/linux/man-pages/man2/mmap.2.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.