Giter Club home page Giter Club logo

Comments (32)

mmckerns avatar mmckerns commented on June 30, 2024

Some works some don't is not really that nice. I think that it's possible to get the module's __dict__ then pickle it (along with the module by reference), and then reconstitute on unpicking.

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

The issue is pickling things in the __dict__ like sys.flags.

>>> import dill, sys
>>> dill.dumps(sys.flags)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/matthew/GitHub/dill/tests/dill/dill.py", line 143, in dumps
    dump(obj, file, protocol, byref)
  File "/home/matthew/GitHub/dill/tests/dill/dill.py", line 136, in dump
    pik.dump(obj)
  File "/usr/lib/python3.4/pickle.py", line 410, in dump
    self.save(obj)
  File "/usr/lib/python3.4/pickle.py", line 522, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.4/pickle.py", line 600, in save_reduce
    save(func)
  File "/usr/lib/python3.4/pickle.py", line 477, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/matthew/GitHub/dill/tests/dill/dill.py", line 813, in save_type
    StockPickler.save_global(pickler, obj)
  File "/usr/lib/python3.4/pickle.py", line 918, in save_global
    (obj, module_name, name))
_pickle.PicklingError: Can't pickle <class 'sys.flags'>: it's not the same object as sys.flags
>>> sys.flags in sys.__dict__.values()
True
>>> 

Do you think it is acceptable to pop off the objects that don't pickle?

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

dill.detect.errors could be used to remove "bad objects":

>>> dill.detect.errors(sys, depth=1, safe=1).keys()
dict_keys(['int_info', 'stderr', 'stdin', 'hash_info', 'flags', 'float_info', 'stdout', 'modules', 'thread_info', 'last_traceback', '__stdout__', 'version_info', 'implementation', '__stderr__', '__stdin__'])

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

Yes and no. :) The only things that need to be pickled are the items the user adds to __dict__. All the other non-pickleable stuff should just "be there" when the module is loaded. So the thing to do is to see if there's a way to distinguish how dynamically added items are different than the __dict__ items that come with the moduleā€¦ and only the dynamically-added items need to be pickled.

I can think of a way that might work but it also might be slow. Use an import hook to get a reference to the module (hopefully, it'd be a "clean" copy, not one with the user-added stuff in it)ā€¦ and then pop off all the items that are identical in both __dict__.

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

Can you get a reference to the module though an import hook? I thought import hooks can only be active while searching and loading the module, so could not get a ref to a loaded module. Once a ref is got, you could just do dill.module_dicts[module.__name__] = module.__dict__.copy, and then when picking the module

for key, item in module.__dict__.items():
    if item != dill.module_dicts[module.__name__][key]:
        # add to list of attrs to pickle

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

I don't want a ref to the loaded module -- you can just grab that either from the current namespace, or globals, or sys.modules. I'm thinking leveraging some local scope to do a new import, or better yet, some form of a partial import that would have the "unaltered" __dict__ -- then comparing the altered and unaltered __dict__. It needs some digging to find if there's a reasonable way to do that.

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

Well, you could get a ref, and process it before the module is turned over to the user. Could do it by replacing __import__, that's always great fun!

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

Doing it the fun way: dump this at the end of dill.py:

module_dicts = {}

def _add_to_mod_dicts(mod):
    if mod.__name__ in module_dicts:
        return
    module_dicts[mod.__name__] = mod.__dict__.copy()
    for key, value in module_dicts[mod.__name__].items():
        if value is sys.__stdin__ or value is sys.__stdout__ \
           or value is sys.__stderr__:
            continue
        try:
            x = copy(value)
        except (ValueError, TypeError, AssertionError, AttributeError,
                PicklingError, UnpicklingError) as ex:
            pass
        else:
            module_dicts[mod.__name__][key] = x

def _imp(*args, __import__=__import__, **kwargs):
    mod = __import__(*args, **kwargs)
    _add_to_mod_dicts(mod)
    return mod

for mod in list(sys.modules.values()):
    _add_to_mod_dicts(mod)

__builtins__["__import__"] = _imp

# when pickling
@register(ModuleType)
def save_module(pickler, obj):
    if obj.__name__ in module_dicts:
        log.info("M1: %s" % obj)
        _main_dict = obj.__dict__.copy() #XXX: better no copy? option to copy?
        [_main_dict.pop(item, None) for item in singletontypes]
        for key, value in list(_main_dict.items()):
            if key in module_dicts[obj.__name__] and \
               value == module_dicts[obj.__name__][key]:
                _main_dict.pop(key)
        pickler.save_reduce(_import_module, (obj.__name__,), obj=obj,
                            state=_main_dict)
    else:
        log.info("M2: %s" % obj)
        pickler.save_reduce(_import_module, (obj.__name__,), obj=obj)
    return

The disadvantages are that the time taken to import dill is much longer, but detection of changed attributes works:

>>> import abc
>>> abc.xxx = 1
>>> dill.dumps(abc)
b'\x80\x03cdill.dill\n_import_module\nq\x00X\x03\x00\x00\x00abcq\x01\x85q\x02Rq\x03}q\x04X\x03\x00\x00\x00xxxq\x05K\x01sb.'
>>> 

which translates to:

    0: \x80 PROTO      3
    2: c    GLOBAL     'dill.dill _import_module'
   28: q    BINPUT     0
   30: X    BINUNICODE 'abc'
   38: q    BINPUT     1
   40: \x85 TUPLE1
   41: q    BINPUT     2
   43: R    REDUCE
   44: q    BINPUT     3
   46: }    EMPTY_DICT
   47: q    BINPUT     4
   49: X    BINUNICODE 'xxx'
   57: q    BINPUT     5
   59: K    BININT1    1
   61: s    SETITEM
   62: b    BUILD
   63: .    STOP

Also, as types are not copied, abc.ABCMeta.xxx = 1 will still not work.

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

There's got to be a better way to do that.

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

Yup, but I haven't found it yet. Any ideas?

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

Aside from the above, no.

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

Next crazy idea:
Put https://gist.github.com/matsjoyce/fee2e234fac425a52f6b in dill/memo_test.py, then add

from . import memo_test

@register(ModuleType)
def save_module(pickler, obj):
    log.info("M1: %s" % obj)
    _main_dict = memo_test.whats_changed(obj)[0]
    print(_main_dict)
    pickler.save_reduce(_import_module, (obj.__name__,), obj=obj,
                        state=_main_dict)
    return

to the end of dill.py. It probably more complicated than other solution, as it almost reimplements part of pickle, but its faster, and works for the abc.ABCMeta.x = 1 problem. Eg:

>>> abc.ABCMeta.x=2
>>> dill.dumps(abc)
b'\x80\x03cdill.dill\n_import_module\nq\x00X\x03\x00\x00\x00abcq\x01\x85q\x02Rq\x03}q\x04(X\x07\x00\x00\x00ABCMetaq\x05cabc\nABCMeta\nq\x06X\x02\x00\x00\x00xxq\x07K\x01ub.'
>>> 

If you think it'll be useful, I'll add comments, try to make it more efficient, neater, etc. But if there is a smaller solution, it should be probably be used.

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

This is by far the longest block of code used to solve a serialization issue in dill, and the case it handles is pretty narrow. So that's not a good sign, but also maybe it's not that big of an issue. Pickling the __dict__ for a function is a huge feature, and also for classes. I don't know how often in the course of normal programming a module is modified. However, having said thatā€¦ the solution is not a bad one. It's fast, and fairly general, so in that regard, it's a good solution. So, probably worthy of accepting, if it works for all the use cases.

Nice idea by the way, and def not crazy.

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

It could also be possibly used in fixing #42, to reduce pickle bloat. I'll try tidying it up a bit, then.

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

Re: #42, I agree.

The biggest sources of bloat, however, are (1) pulling in all globals for function closures, and (2) similar for storing the source code for class definitions (instead of byref). I have done some work to reduce the bloat on the above, but more strategic dropping of unnecessary __dict__ items and strategic use of name references is needed.

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

Thanks for taking up this issue, btw.

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

I've pushed the file + py2 compatibility fixes to matsjoyce/dill@228611477 if you want to test it a bit. Wait, I've found a šŸŖ²

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

OK, fixed them. Its at matsjoyce/dill@d97de9e
Notes:

  • Timings:
State Operation Time Notes
Without new method: Pickle module: 0.004 This does not save the state of the module at all
Import dill: 0.184
With new method: Pickle module: 0.407
Import dill: 0.359 Extra time due to memorising all imported modules
  • Tests:
    • test_module.py does not pass yet, as it cannot find some types

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

Ouch.

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

Seems I don't know how timeit works... šŸ˜«. Here's some more reliable timings:

Operation Memorise time Master time Slow down
Import dill 0.28599 0.15788 1.811
Dump small python module 0.00508 0.00031 16.378
Dump small python module which imports os 0.01992 0.00032 62.244
Dump abc 0.00597 0.00005 129.113
Dump abc with attr changed 0.00546 0.00005 118.588

Note: For the dumping of abc, master only dumps a reference.

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

Ok, that's much less of an "ouch". I'll see if I can eek out some time to play with your solution in the next few days. Not that I expect to see a better route right now, but just to stay on pulse with it at least.

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

Added optimisations in mastjoyce@72dacaf

Operation Memorise time Master time Slow down
Import dill 0.25513 0.14618 1.745
Dump small python module 0.00417 0.00031 13.250
Dump small python module which imports os 0.01187 0.00032 36.520
Dump abc 0.00458 0.00005 97.586
Dump abc with attr changed 0.00456 0.00005 96.963

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

All tests now succeed with python 2 at matsjoyce/dill@5ea2d6d, while with python 3 only test_objects.py fails on some ctypes stuff

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

Fixing this will also fix #49.

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

Sorry to be a little slow on this (it'll be another few days). When you serialize the moduleā€¦ and you need a "pure" module to compare toā€¦ could you not get a "pure" module by serializing it with the StockPickler? Then you could compare that "pure" copy with the one with the augmented __dict__ to find out what else you need to save.

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

Do you mean:

  • Use StockPickler to pickle then unpickle the module to get a pure copy?
    • This doesn't work as it uses __import__, which will just return the entry in sys.modules
  • Use StockPickler to pickle the modules __dict__ on first import (like above solution)
    • This is what #43 (comment) does
    • The disadvantage is that although this solution can find additions to modules, it cannot detect the abc.ABCMeta.x = 1 situation. It can also lead to weird behaviour (as usual, when c modules and pickle meet...)

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

@matsjoyce: I meant the firstā€¦ you are right, and I forgot it wouldn't work.

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

I played with this a little bit moreā€¦ it's really nice. However, it needs more testing, polishing, cases, etc. I'm punting this to the next release. I'm now not so concerned about the speed as I am about error cases, naming, etc.

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

It may be possible to better unify byref and use_diff as they seem to be shades of the same concept (like the different FMODE).

from dill.

matsjoyce avatar matsjoyce commented on June 30, 2024

I think the distinction is that diff aims to produce the smallest pickle to reproduce the object (assuming a similar unpickling environment) and no diff aims to be more 'complete'. We could almost split the entire package along those lines. The no diff option would be interesting for saving game state, etc.

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

@matsjoyce: this, #69, #70, and other related are a bit of a mess. Do you have a plan for this? Note also that #111 etc demonstrates there are issues with this code on windows. I keep punting on theseā€¦ and they are not showing up as a high priority for me. But, nonetheless, they are still something that needs cleaning up.

from dill.

mmckerns avatar mmckerns commented on June 30, 2024

@matsjoyce: You may want to check how 1938291 affects the diff branch. Basically, the patch I added was to pickle the builtins (or __builtin__) module by reference regardless of the value of settings['recurse']. Release of 0.2.4 is imminent unless you have a reason I should hold it off.

from dill.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.