Giter Club home page Giter Club logo

Comments (13)

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024 4

This one should be solvable by a simple cython bump

guppy doesn't use cython. It's direct C code using CPython API.

@zhuyifei1999 do you have rough ETA when you'll be able to have a look on the issue? Thanks for any info!

Let me try this weekend.

from guppy3.

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024 2

I haven't had time to try to add Python 3.11 compatibility yet. I'll look into this.

from guppy3.

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024 2

The frame optimization in Python 3.11 in python/cpython#88756 made it quite a bit more difficult to support it. Guppy needs the ability to read all locals and globals of all stack frames for traversing purposes, and knowing names of variables (the "relate" part of guppy).

Previously this was easy: Every frame will have a Python-visible frame object, so we can traverse the rootstate for frame objects:

int frameno = -1;
int numframes = 0;
PyFrameObject *frame;
for (frame = (PyFrameObject *)ts->frame; frame; frame = frame->f_back) {
numframes++;
if (r->tgt == (PyObject *)frame)
frameno = numframes;
}
if (frameno != -1) {
frameno = numframes - frameno;
if (r->visit(NYHR_ATTRIBUTE, PyUnicode_FromFormat("i%d_t%lu_f%d", isno, THREAD_ID(ts), frameno), r))
return 1;
}
}

and then let frame object traversal do its thing:

guppy3/src/heapy/stdtypes.c

Lines 199 to 270 in 4cb9fcb

static int
frame_relate(NyHeapRelate *r)
{
PyFrameObject *v = (void *)r->src;
PyCodeObject *co = v->f_code;
Py_ssize_t ncells = PyTuple_GET_SIZE(co->co_cellvars);
Py_ssize_t nlocals = co->co_nlocals;
Py_ssize_t nfreevars = PyTuple_GET_SIZE(co->co_freevars);
ATTR(f_back)
ATTR(f_code)
ATTR(f_builtins)
ATTR(f_globals)
ATTR(f_locals)
ATTR(f_trace)
/*
ATTR(f_exc_type)
ATTR(f_exc_value)
ATTR(f_exc_traceback)
*/
/* locals */
if (
frame_locals(r, co->co_varnames, 0, nlocals, 0) ||
frame_locals(r, co->co_cellvars, nlocals, ncells, 0) ||
frame_locals(r, co->co_cellvars, nlocals, ncells, 1) ||
frame_locals(r, co->co_freevars, nlocals + ncells, nfreevars, 0) ||
frame_locals(r, co->co_freevars, nlocals + ncells, nfreevars, 1))
return 1;
/* stack */
#if PY_MAJOR_VERSION >= 3 && PY_MINOR_VERSION >= 10
PyObject **p;
PyObject **l = v->f_valuestack + v->f_stackdepth;
for (p = v->f_valuestack; p < l; p++) {
if (*p == r->tgt) {
if (r->visit(NYHR_STACK, PyLong_FromSsize_t(p-v->f_valuestack), r))
return 1;
}
}
#else
if (v->f_stacktop != NULL) {
PyObject **p;
for (p = v->f_valuestack; p < v->f_stacktop; p++) {
if (*p == r->tgt) {
if (r->visit(NYHR_STACK, PyLong_FromSsize_t(p-v->f_valuestack), r))
return 1;
}
}
}
#endif
return 0;
}
static int
frame_traverse(NyHeapTraverse *ta) {
PyFrameObject *v = (void *)ta->obj;
PyCodeObject *co = v->f_code;
int nlocals = co->co_nlocals;
if (PyTuple_Check(co->co_varnames)) {
int i;
for (i = 0; i < nlocals; i++) {
PyObject *name = PyTuple_GET_ITEM(co->co_varnames, i);
if (strcmp(PyUnicode_AsUTF8(name), "_hiding_tag_") == 0) {
if (v->f_localsplus[i] == ta->_hiding_tag_)
return 0;
else
break;
}
}
}
return Py_TYPE(v)->tp_traverse(ta->obj, ta->visit, ta->arg);
}

But now I need to do frame attributes as "rootstate" attributes since all the intermediate structs are not Python-visible objects. And It's not just traverse I need to implement (the pre-3.11 I can just trverse to the most recent frame object and it'll recursively traverse its f_back, but now I can't rely on this), but also "relate" (finding names of objects), getattr, and dir()...

I'll keep working on it.

from guppy3.

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024 2

My apologies it's been on my backburner for way too long. I always had other things to work on that I find more interesting :/

I'll get to it soon.

from guppy3.

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024 2

v3.1.3 released with support for Python 3.11

Edit: With all the pre-built wheels too ;) https://pypi.org/project/guppy3/3.1.3/#files

from guppy3.

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024 1

I was waiting for a response to #41 (comment) then I forgot (too many things I'm working on)... oops

I guess with a lack of opinions I'll just go with the simplest solution:

Or alternatively, we invoke all the lazy loader for the frames when profiling. Honestly, probably not too bad of an idea considering it's an interactive profiler that's already very expensive (traverses the whole reference graph, and generates a lot of objects internally). What we would lose on is the ability to figure out which frames objects has been lazy loaded and which have not, prior to running the profiler. At least this isn't a feature regression, since for Python < 3.11 they are always loaded. (Note to self: _PyFrame_GetFrameObject isn't exposed. Use PyThreadState_GetFrame)

Will do.

from guppy3.

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024 1

Gonna leave this open till I get the release done later

from guppy3.

frenck avatar frenck commented on July 18, 2024

This one should be solvable by a simple cython bump

from guppy3.

maciejfic avatar maciejfic commented on July 18, 2024

@zhuyifei1999 do you have rough ETA when you'll be able to have a look on the issue? Thanks for any info!

from guppy3.

frenck avatar frenck commented on July 18, 2024

With Python 3.11 becoming more commonly used (especially because of performance improvements), we are looking forward to starting to support it as well.

Not wanting to push this issue or change in any way, is there anything to report on the progress of this issue? (please don't feel pushed, I'm just trying to get information).

../Frenck

from guppy3.

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024

Note to self: Whatever I do here will be broken again by python/cpython@1e197e6 in 3.12. I need to read that commit in more detail some time.

from guppy3.

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024

d13f5cc commit above mentions an issue with localsplus. I think I need to explain this one in plain English so I can maybe ask for opinions.

As mentioned before python/cpython#88756 in 3.11 changed how frames work. Previously we had (simplified) pystate.h frameobject.h

typedef struct _ts {
[...]
    PyFrameObject *frame;
[...]
    CFrame *cframe;
[...]
} PyThreadState;

typedef struct _frame {
    PyObject_VAR_HEAD
    struct _frame *f_back;      /* previous frame, or NULL */
    PyCodeObject *f_code;       /* code segment */
    PyObject *f_builtins;       /* builtin symbol table (PyDictObject) */
    PyObject *f_globals;        /* global symbol table (PyDictObject) */
    PyObject *f_locals;         /* local symbol table (any mapping) */
    PyObject **f_valuestack;    /* points after the last local */
    PyObject *f_trace;          /* Trace function */
[...]
    PyObject *f_localsplus[1];  /* locals+stack, dynamically sized */
} PyFrameObject;

The traversal of frames is easy. Since every Python frame is guaranteed a Python-visible "frame object", to find everything each frame uses we can just traverse into these objects. All we have to do is provide accesssors from the "RootState" to every frame in existence.

In Python 3.11 this changed. Now we have pystate.h pycore_frame.h:

typedef struct _ts {
[...]
    CFrame *cframe;
[...]
} PyThreadState;

typedef struct _PyCFrame {
[...]
    struct _PyInterpreterFrame *current_frame;
[...]
} _PyCFrame;

typedef struct _PyInterpreterFrame {
    /* "Specials" section */
    PyFunctionObject *f_func; /* Strong reference */
    PyObject *f_globals; /* Borrowed reference */
    PyObject *f_builtins; /* Borrowed reference */
    PyObject *f_locals; /* Strong reference, may be NULL */
    PyCodeObject *f_code; /* Strong reference */
    PyFrameObject *frame_obj; /* Strong reference, may be NULL */
    /* Linkage section */
    struct _PyInterpreterFrame *previous;
[...]
    /* Locals and stack */
    PyObject *localsplus[1];
} _PyInterpreterFrame;

typedef struct _frame {
    PyObject_HEAD
    PyFrameObject *f_back;      /* previous frame, or NULL */
    struct _PyInterpreterFrame *f_frame; /* points to the frame data */
    PyObject *f_trace;          /* Trace function */
[...]
    PyObject *_f_frame_data[1];
} PyFrameObject;

Most of the interesting values are in _PyInterpreterFrame which is no longer an object.

Why is this an issue? Guppy has a feature that it can figure out how to reach an object for memory analysis purposes. Internally this is called "relate", which I think means figuring out the relationship in the path of the traversal. In the README we have

>>> from guppy import hpy; h=hpy()
>>> h.heap().byid[0].sp
 0: h.Root.i0_modules['os'].__dict__

sp shows the shortest path to each an object from the "RootState". This is one of the many uses of "relate". Another great one is the "Via" classifier:

>>> h.heap()
[...]
>>> _.byvia
Partition of a set of 37968 objects. Total size = 4480344 bytes.
 Index  Count   %     Size   % Cumulative  % Referred Via:
     0   1144   3   456096  10    456096  10 '.__dict__'
     1   2331   6   412466   9    868562  19 '.__code__'
     2   2668   7   254698   6   1123260  25 '.co_code'
     3   2088   5   168888   4   1292148  29 '.co_names'
     4    637   2   159671   4   1451819  32 '.__doc__', '[0]'
     5    514   1   157669   4   1609488  36 "['__doc__']"
     6   1856   5   130485   3   1739973  39 '.__qualname__'
     7   1571   4   111336   2   1851309  41 '.co_consts'
     8   2408   6   111178   2   1962487  44 '.co_lnotab'
     9   1467   4   108416   2   2070903  46 '.co_varnames'
<7211 more rows. Type e.g. '_.more' to view.>

Let's say you have a lot of large strings; the type classifier will tell you they are strings, but it'll take you a while to find the name of these strings (i.e. the attribute / variable name that refers to these strings). Byvia makes this easy:

>>> h.heap()
Partition of a set of 38027 objects. Total size = 4484757 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  10956  29   974101  22    974101  22 str
     1   7747  20   537504  12   1511605  34 tuple
     2    576   2   481208  11   1992813  44 type
     3   2669   7   472264  11   2465077  55 types.CodeType
     4   5240  14   377012   8   2842089  63 bytes
     5   2448   6   332928   7   3175017  71 function
     6    576   2   284384   6   3459401  77 dict of type
     7     99   0   183432   4   3642833  81 dict of module
     8    300   1   123912   3   3766745  84 dict (no owner)
     9   1138   3    81936   2   3848681  86 types.WrapperDescriptorType
<117 more rows. Type e.g. '_.more' to view.>
>>> _[0]
Partition of a set of 10956 objects. Total size = 974101 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  10956 100   974101 100    974101 100 str
>>> _.byvia
Partition of a set of 10956 objects. Total size = 974101 bytes.
 Index  Count   %     Size   % Cumulative  % Referred Via:
     0    637   6   159671  16    159671  16 '.__doc__', '[0]'
     1    502   5   156901  16    316572  32 "['__doc__']"
     2   1856  17   130485  13    447057  46 '.__qualname__'
     3    699   6    43171   4    490228  50 'list(_)[0]'
     4    509   5    34407   4    524635  54 '[1]'
     5    422   4    30315   3    554950  57 '[2]'
     6    276   3    18602   2    573552  59 '[3]'
     7    248   2    17091   2    590643  61 '[4]'
     8    227   2    13788   1    604431  62 '.__name__', '.co_name'
     9    186   2    11737   1    616168  63 '[0]'
<2178 more rows. Type e.g. '_.more' to view.>

(In this example it's a lot of docstrings)

The problem comes when the concern is local variables. The relation originally would show <some frame>.f_locals['<the local name>']. Let's say one is in a recursion and one of the local variables (let's say it's called foo) is really large. Via would have a classification such as

     0    637   6   159671  16    159671  16 ".locals['foo']"

Now let's look at 3.11. Guppy's "RootState" maintains a wonderful property that the traversable attributes are, not only just relate-able, but also available by getattr and even dir(). This is Python 3.9:

>>> import guppy.heapy.heapyc
>>> import pprint
>>> guppy.heapy.heapyc.RootState is h.Root
True
>>> pprint.pprint(dir(h.Root))
['i0_after_forkers_child',
 'i0_after_forkers_parent',
 'i0_audit_hooks',
 'i0_before_forkers',
 'i0_builtins',
 'i0_builtins_copy',
 'i0_codec_error_registry',
 'i0_codec_search_cache',
 'i0_codec_search_path',
 'i0_dict',
 'i0_import_func',
 'i0_importlib',
 'i0_modules',
 'i0_modules_by_index',
 'i0_pyexitmodule',
 'i0_sysdict',
 'i0_t140325531354944_async_exc',
 'i0_t140325531354944_async_gen_finalizer',
 'i0_t140325531354944_async_gen_firstiter',
 'i0_t140325531354944_c_profileobj',
 'i0_t140325531354944_c_traceobj',
 'i0_t140325531354944_context',
 'i0_t140325531354944_curexc_traceback',
 'i0_t140325531354944_curexc_type',
 'i0_t140325531354944_curexc_value',
 'i0_t140325531354944_dict',
 'i0_t140325531354944_exc_traceback',
 'i0_t140325531354944_exc_type',
 'i0_t140325531354944_exc_value',
 'i0_t140325531354944_f0']
>>> h.Root.i0_t140325531354944_f0
<frame at 0x7fa013d30680, file '<stdin>', line 1, code <module>>

I want to have attribute access for all these values in _PyInterpreterFrame, so in d13f5cc, it is like:

>>> import guppy.heapy.heapyc
>>> import pprint
>>> pprint.pprint(guppy.heapy.heapyc.RootState)
RootState
>>> pprint.pprint(dir(guppy.heapy.heapyc.RootState))
['i0_after_forkers_child',
 'i0_after_forkers_parent',
 'i0_audit_hooks',
 'i0_before_forkers',
 'i0_builtins',
 'i0_builtins_copy',
 'i0_codec_error_registry',
 'i0_codec_search_cache',
 'i0_codec_search_path',
 'i0_dict',
 'i0_import_func',
 'i0_importlib',
 'i0_modules',
 'i0_modules_by_index',
 'i0_sysdict',
 'i0_t140336295487296_async_exc',
 'i0_t140336295487296_async_gen_finalizer',
 'i0_t140336295487296_async_gen_firstiter',
 'i0_t140336295487296_c_profileobj',
 'i0_t140336295487296_c_traceobj',
 'i0_t140336295487296_context',
 'i0_t140336295487296_curexc_traceback',
 'i0_t140336295487296_curexc_type',
 'i0_t140336295487296_curexc_value',
 'i0_t140336295487296_dict',
 'i0_t140336295487296_exc_traceback',
 'i0_t140336295487296_exc_type',
 'i0_t140336295487296_exc_value',
 'i0_t140336295487296_f0_f_builtins',
 'i0_t140336295487296_f0_f_code',
 'i0_t140336295487296_f0_f_func',
 'i0_t140336295487296_f0_f_globals',
 'i0_t140336295487296_f0_f_locals',
 'i0_t140336295487296_f0_frame_obj']

The change is that the attributes of _PyInterpreterFrame is now inlined to the RootState... but not localsplus.

A bit of background for those who are unfamiliar with Python's bytecode architecture: Python is a stack-based architecture, and very similar to Java bytecode in some ways. Instead of having some sort of registers, temporaries are stored on a stack. This is a disassembly of lambda a: a+a, with my annotations:

>>> dis.dis(lambda a: a+a)
# push(argument[0])
  1           0 LOAD_FAST                0 (a)
# push(argument[0])
              2 LOAD_FAST                0 (a)
# push(add(pop(), pop()))
              4 BINARY_ADD
# return pop()
              6 RETURN_VALUE

The first few elements of the localsplus is the local variables, and then it's the variable stack. This is why it's called localsplus. And... it's also inlined in _PyInterpreterFrame.

One might say... okay, one can just h.Root.i0_t140336295487296_f0_f_locals["foo"] right? Not really. In python, the locals dict (as f_locals) is lazy. Guppy also maintains a wonderful property that most of the paths returned by guppy, in string form, is actually valid python leading to the same object. Eg:

>>> p = h.heap().byid[0]
>>> p.sp
 0: h.Root.i0_modules['os'].__dict__
>>> h.Root.i0_modules['os'].__dict__ is p.theone
True

Now, for a frame object, Actually accessing f_locals from a frame would invoke the lazy loader for the locals dict (PyFrame_FastToLocalsWithError is the lazy loader) frameobject.c:

static PyObject *
frame_getlocals(PyFrameObject *f, void *closure)
{
    if (PyFrame_FastToLocalsWithError(f) < 0)
        return NULL;
    PyObject *locals = f->f_frame->f_locals;
    Py_INCREF(locals);
    return locals;
}

static PyGetSetDef frame_getsetlist[] = {
[...]
    {"f_locals",        (getter)frame_getlocals, NULL, NULL},
[...]
};

I'm not sure I want getattr for RootState to invoke a lazy loader, when, say, accessing i0_t140336295487296_f0_f_locals.

Another thing comes with the relate. If i0_t140336295487296_f0_f_locals["foo"] is lazy, the traverse should almost definitely not invoke the lazy loader, so i0_t140336295487296_f0_f_locals["foo"] is now a single element in the path, containing two arguments, the frame "i0_t140336295487296_f0" (interpreter 0, thread 140336295487296, frame 0), and the name of the local variable name "foo". The earlier use case of Via classifier finding the name of local variables will be very moot, as the frame is a per-process unique identifier for each frame, so the relation i0_t140336295487296_f0_f_locals["foo"] is completely unique, and Via will not group same local variable names.

One might say, we could maybe create multiple components in the path somehow. The thing is, in the path, not only the "path" itself is recorded, but every object in between are, too:

>>> p.sp
 0: h.Root.i0_modules['os'].__dict__
>>> _[0]
"%s.i0_modules['os'].__dict__"
>>> [type(t) for t in _.path]
[<class 'guppy.heapy.UniSet.IdentitySetSingleton'>, <class 'guppy.heapy.Path.Based_R_ATTRIBUTE'>, <class 'guppy.heapy.UniSet.IdentitySetSingleton'>, <class 'guppy.heapy.Path.Based_R_INDEXVAL'>, <class 'guppy.heapy.UniSet.IdentitySetSingleton'>, <class 'guppy.heapy.Path.Based_R_ATTRIBUTE'>, <class 'guppy.heapy.UniSet.IdentitySetSingleton'>]
>>> p.sp[0].path[0].theone
RootState
>>> p.sp[0].path[2].theone is h.Root.i0_modules
True
>>> p.sp[0].path[4].theone
<module 'os' from '/usr/lib/python3.9/os.py'>
>>> p.sp[0].path[6].theone is p.theone
True

Unless I want to invoke the lazy loader for every frame when guppy generates the profile, I'd need to make an IdentitySet, but for structs that are not PyObject-s. Some sort of virtual IdentitySet. Guppy doesn't support this, and I have no idea what it would take to make it happen. Could try but it's potentially very complicated.

Or alternatively, we invoke all the lazy loader for the frames when profiling. Honestly, probably not too bad of an idea considering it's an interactive profiler that's already very expensive (traverses the whole reference graph, and generates a lot of objects internally). What we would lose on is the ability to figure out which frames objects has been lazy loaded and which have not, prior to running the profiler. At least this isn't a feature regression, since for Python < 3.11 they are always loaded. (Note to self: _PyFrame_GetFrameObject isn't exposed. Use PyThreadState_GetFrame)

Or we drop some features...

Thoughts?

CC @svenil too

from guppy3.

gothicVI avatar gothicVI commented on July 18, 2024

With python 3.11 already released in arch a week ago and now in manjaro testing since today the userbase is continuously growing.
@zhuyifei1999 could you please give a status update?

from guppy3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.