Giter Club home page Giter Club logo

Comments (8)

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024 1

I am aware that nodes[1] is just a pointer that points to the second element of nodes, not an array of size 1

Don't understand your question. I'm referring to the declaration PyObject *nodes[1]; being an array of indefinite size. The literal declaration of this is "nodes is an array of 1 element, of a pointer to PyObject". What I'm saying is, rather than that, it is "nodes is an array of indefinite size, of pointers to PyObject".

In other words, the assumption "NyNodeSetObject holds the pointer to nodes[1]" is incorrect.

This might help you: https://cdecl.org/?q=int+*nodes%5B1%5D%3B

Edit: I resized I was skipping too fast in my thoughts in my answer above. Sorry,

from guppy3.

michael-hahn avatar michael-hahn commented on July 18, 2024 1

My bad! It all makes sense! I need to brush up on my C syntax! Thank you!

from guppy3.

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024

Hmm... so the things such as NyNodeSet is just, think of it like a python set, except instead of equality being governed by the object's defined __eq__, it's equality based on the objects identity (a.k.a. memory address). Almost everything else from the API standpoint should be similar to that of a python set. You can see the test cases: https://github.com/zhuyifei1999/guppy3/blob/master/guppy/sets/test.py#L1327

Because nodeset contains objects, it has to keep the objects alive. This is where Py_INCREF you mentioned comes in.

Things like IdentitySet is more of a Python wrapper around the C NyNodeSet that provides the lots of methods and printers and stuffs to better interact with and visualize the sets.

Since you're looking at hv_heap, I assume you want to see all the way from how nodesets gets created, so let's assume somehow the heap contains only one object, so you did hpy().heap().theone:

Use.heap():

guppy3/guppy/heapy/Use.py

Lines 175 to 195 in a1bd557

def heap(self):
"""heap() -> IdentitySet[1]
Traverse the heap from a root to find all reachable and visible
objects. The objects that belong to a heapy instance are normally not
included. Return an IdentitySet with the objects found, which is
presented as a table partitioned according to a default equivalence
relation (Clodo [3]).
See also: setref[2]
References
[0] heapy_Use.html#heapykinds.Use.heap
[1] heapy_UniSet.html#heapykinds.IdentitySet
[2] heapy_Use.html#heapykinds.Use.setref
[3] heapy_Use.html#heapykinds.Use.Clodo"""
h = self.View.heap()
h |= self.gcobjs
h -= self.relheap
return h

View.heap():

guppy3/guppy/heapy/View.py

Lines 340 to 367 in a1bd557

def heap(self):
"""V.heap() -> idset
Return the set of objects in the visible heap.
"""
global heap_one_time_initialized
# This is to make sure that the first time called
# the heap will contain things that may likely be loaded later
# because of common operations.
if not heap_one_time_initialized:
old_root = self.root
objs = [[], 'a', 1, 1.23, {'a': 'b'}, self]
self.root = objs
try:
repr(self.idset(objs))
repr(self.iso(objs[0]).shpaths)
repr(self.iso(objs[0]).rp)
finally:
self.root = old_root
del objs
del old_root
heap_one_time_initialized = True
self.gc.collect() # Sealing a leak at particular usage ; Notes Apr 13 2005
# Exclude current frame by encapsulting in enter(). Note Apr 20 2005
return self.enter(lambda:
self.idset(self.hv.heap()))

View.enter is for limiting the visibility of guppy-internal frames during reference traversal, so the important part is self.idset(self.hv.heap())

hv.heap() is your mentioned hv_heap:

guppy3/src/heapy/hv.c

Lines 889 to 922 in a1bd557

static PyObject *
hv_heap(NyHeapViewObject *self, PyObject *args, PyObject *kwds)
{
HeapTravArg ta;
ta.hv = self;
ta.visited = hv_mutnodeset_new(self);
ta.to_visit = PyList_New(0);
if (!(ta.visited && ta.to_visit))
goto err;
if (hv_heap_rec(ta.hv->root, &ta) == -1)
goto err;
while (PyList_Size(ta.to_visit)) {
PyObject *obj = hv_PyList_Pop(ta.to_visit);
if (!obj)
goto err;
if (hv_std_traverse(ta.hv, obj, (visitproc)hv_heap_rec, &ta) == -1) {
Py_DECREF(obj);
goto err;
}
Py_DECREF(obj);
}
if (hv_cleanup_mutset(ta.hv, ta.visited) == -1)
goto err;
if (PyObject_Length(self->static_types) == 0) {
if (hv_update_static_types(self, (PyObject *)ta.visited) == -1)
goto err;
}
Py_XDECREF(ta.to_visit);
return (PyObject *)ta.visited;
err:
Py_XDECREF(ta.visited);
Py_XDECREF(ta.to_visit);
return 0;
}

The important thing here is that it creates and returns a mutnodeset (a mutable nodeset) that contains everything it saw during the traversal.

Then View.idset() is an import / alias:

'_parent.Use:idset',

Use.idset():

'_parent.UniSet:idset',

UniSet.idset():

guppy3/guppy/heapy/UniSet.py

Lines 2214 to 2215 in a1bd557

def idset(self, iterable, er=None):
return self.fam_IdentitySet._cons(self.immnodeset(iterable), er=er)

Here immnodeset first creates a copy of the mutnodeset, but now it's immutable, then it does IdentitySetFamily._cons():

guppy3/guppy/heapy/UniSet.py

Lines 1526 to 1538 in a1bd557

def _cons(self, arg, er=None):
# arg is a sequence of nodes
arg = self.immnodeset(arg)
if not arg:
return self.mod.Nothing
elif len(arg) == 1:
r = IdentitySetSingleton(self, tuple(arg)[0])
else:
r = IdentitySetMulti(self, arg)
if er is not None:
r._er = er
return r

If the size of the set is 1, then it gets the single element of the set via tuple(arg)[0], and calls the constructor of IdentitySetSingleton with it

And IdentitySetSingleton's constructor sets self._node with the given element:

class IdentitySetSingleton(IdentitySet):
__slots__ = '_node',
_help_url_ = 'heapy_UniSet.html#heapykinds.IdentitySetSingleton'
def __init__(self, fam, node):
super().__init__(fam)
self._node = node

So tl;dr: in hpy().heap().theone, heap() calls hv_heap() to create a mutnodeset, and then IdentitySetFamily._cons() sees that it's a set of size 1 and calls IdentitySetSingleton with the single element it contains, and the constructor sets the _node attribute.

Does this answer how it works?

from guppy3.

michael-hahn avatar michael-hahn commented on July 18, 2024

Hi @zhuyifei1999,

Thank you for your thorough explanation. This is definitely very helpful! I was able to follow the exact control flow when debugging with PyCharm.

Because nodeset contains objects, it has to keep the objects alive. This is where Py_INCREF you mentioned comes in.

This makes sense! To confirm: when adding an object by calling NyNodeSet_setobj():

guppy3/src/sets/nodeset.c

Lines 599 to 619 in a1bd557

int
NyNodeSet_setobj(NyNodeSetObject *v, PyObject *obj)
{
if (NyMutNodeSet_Check(v)) {
NyBit bitno = nodeset_obj_to_bitno(obj);
int r = NyMutBitSet_setbit((NyMutBitSetObject *)v->u.bitset, bitno);
if (r == -1)
return -1;
if (!r) {
Py_SIZE(v)++;
if (v->flags & NS_HOLDOBJECTS) {
Py_INCREF(obj);
}
}
return r;
} else {
PyErr_Format(PyExc_ValueError,
"mutable nodeset required");
return -1;
}
}

The core functionality that adds the object is actually this:
int r = NyMutBitSet_setbit((NyMutBitSetObject *)v->u.bitset, bitno);

where it adds the object's adjusted address to the bitset (or nodes, depending on whether the set is immutable or not) within the NyNodeSetObject struct:

guppy3/src/sets/nodeset.c

Lines 325 to 329 in a1bd557

static NyBit
nodeset_obj_to_bitno(PyObject *obj)
{
return (NyBit) obj / ALIGN;
}

Am I correct?

When retrieving the object from theone, I am assuming this method:

def _get_theone(self):
return self._node

somehow calls some C code to get the object from the address? I am sorry, this is where I get a bit fuzzy. I presume the following C function is triggered by some function(s) when self._node in the code above is called?

guppy3/src/sets/nodeset.c

Lines 331 to 335 in a1bd557

static PyObject *
nodeset_bitno_to_obj(NyBit bitno)
{
return (PyObject *)(bitno * ALIGN);
}

Thank you!

from guppy3.

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024

where it adds the object's adjusted address to the bitset (or nodes, depending on whether the set is immutable or not) within the NyNodeSetObject struct:

Immutable one cannot be added. NyMutNodeSet_Check(v) fails and you get ValueError: mutable nodeset required.

When retrieving the object from theone, I am assuming this method:
somehow calls some C code to get the object from the address? I am sorry, this is where I get a bit fuzzy. I presume the following C function is triggered by some function(s) when self._node in the code above is called?

No, custom C code is completely uninvolved here.

This corresponds to

class IdentitySetSingleton(IdentitySet):
__slots__ = '_node',
_help_url_ = 'heapy_UniSet.html#heapykinds.IdentitySetSingleton'
def __init__(self, fam, node):
super().__init__(fam)
self._node = node

In __init__:

self._node = node

In _get_theone:

return self._node 

How __init__ was called with the element / node was explained above. The argument to __init__ is the object returned by theone. Nothing fancy involved.

from guppy3.

michael-hahn avatar michael-hahn commented on July 18, 2024

OK. I think I see what is going on here. I believe, correct me if I am wrong, that the "trick" here is:

guppy3/guppy/heapy/UniSet.py

Lines 2214 to 2215 in a1bd557

def idset(self, iterable, er=None):
return self.fam_IdentitySet._cons(self.immnodeset(iterable), er=er)

It is when self.immnodeset() is called where we store actual objects into the immutable node set.
I see that when a copy is made:
NyNodeSetObject *
NyImmNodeSet_SubtypeNewCopy(PyTypeObject *type, NyNodeSetObject *v)
{
NSISetArg sa;
sa.i = 0;
sa.ns = NyImmNodeSet_SubtypeNew(type, Py_SIZE(v), v->_hiding_tag_);
if (!sa.ns)
return 0;
NyNodeSet_iterate(v, (visitproc)as_immutable_visit, &sa);
return sa.ns;
}
NyNodeSetObject *
NyImmNodeSet_NewCopy(NyNodeSetObject *v)
{
return NyImmNodeSet_SubtypeNewCopy(&NyImmNodeSet_Type, v);
}

NyNodeSet_iterate() would call mutnodeset_iterate_visit() to dereference the address from bitno, which is stored in mutable node set's bitset, and get the object. The object is then stored in nodes:
typedef struct {
PyObject_VAR_HEAD
int flags;
PyObject *_hiding_tag_;
union {
PyObject *bitset; /* If mutable type, a mutable bitset with addresses (divided). */
PyObject *nodes[1]; /* If immutable type, the start of node array, in address order. */
} u;
} NyNodeSetObject;

I guess my last question is, what is stored in nodes[0] since the NyNodeSetObject holds the pointer to nodes[1] instead of simply nodes?

Thanks!

from guppy3.

zhuyifei1999 avatar zhuyifei1999 commented on July 18, 2024

I guess my last question is, what is stored in nodes[0] since the NyNodeSetObject holds the pointer to nodes[1] instead of simply nodes?

This you can answer with GDB:

$ find build guppy -name '*.so' -delete; pip install --global-option build --global-option --debug -e .
Obtaining file:///home/zhuyifei1999/guppy3
Installing collected packages: guppy3
  Attempting uninstall: guppy3
    Found existing installation: guppy3 3.1.0
    Uninstalling guppy3-3.1.0:
      Successfully uninstalled guppy3-3.1.0
  Running setup.py develop for guppy3
Successfully installed guppy3-3.1.0
$ gdb python
GNU gdb (Gentoo 10.2 vanilla) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
[...]
(gdb) r
Starting program: /home/zhuyifei1999/guppy3/venv/bin/python 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Python 3.9.5 (default, Jun  2 2021, 13:52:24) 
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import guppy.sets.setsc
>>> def a():
...  b = object()
...  print(b)
...  c = guppy.sets.setsc.ImmNodeSet([b])
...  while True: pass
... 
>>> a()
<object object at 0x7ffff6bec3a0>
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7c2d1cc in _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /usr/src/debug/dev-lang/python-3.9.5_p2/Python-3.9.5/Python/ceval.c:3256
3256	            DISPATCH();
(gdb) py-locals
b = <object at remote 0x7ffff6bec3a0>
c = <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff6ad8ae0>
(gdb) p *(NyNodeSetObject *)0x7ffff6ad8ae0
$1 = {
  ob_base = {
    ob_base = {
      ob_refcnt = 0x1,
      ob_type = 0x7ffff6abd5c0 <NyImmNodeSet_Type>
    },
    ob_size = 0x1
  },
  flags = 0x1,
  _hiding_tag_ = 0x0,
  u = {
    bitset = <object at remote 0x7ffff6bec3a0>,
    nodes = {<object at remote 0x7ffff6bec3a0>}
  }
}
(gdb) 

The nodes[1] doesn't really mean it's an array of size 1, it's a C array of indefinite size. The array contains whatever objects the nodeset have. This is similar to how CPython implements tuples:
https://github.com/python/cpython/blob/769d7d0c66c5b86e2dd29b9ce67ac2daaab1bb38/Include/cpython/tupleobject.h#L5-L11

typedef struct {
    PyObject_VAR_HEAD
    /* ob_item contains space for 'ob_size' elements.
       Items must normally not be NULL, except during construction when
       the tuple is not yet visible outside the function that builds it. */
    PyObject *ob_item[1];
} PyTupleObject;

from guppy3.

michael-hahn avatar michael-hahn commented on July 18, 2024

I am not sure if you actually answered my question. I am aware that nodes[1] is just a pointer that points to the second element of nodes, not an array of size 1. But why does it skip the first element? Why does bitset not do the same (i.e., bitset[1])?

from guppy3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.