Comments (32)
Some works some don't is not really that nice. I think that it's possible to get the module's __dict__
then pickle it (along with the module by reference), and then reconstitute on unpicking.
from dill.
The issue is pickling things in the __dict__
like sys.flags
.
>>> import dill, sys
>>> dill.dumps(sys.flags)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/matthew/GitHub/dill/tests/dill/dill.py", line 143, in dumps
dump(obj, file, protocol, byref)
File "/home/matthew/GitHub/dill/tests/dill/dill.py", line 136, in dump
pik.dump(obj)
File "/usr/lib/python3.4/pickle.py", line 410, in dump
self.save(obj)
File "/usr/lib/python3.4/pickle.py", line 522, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/python3.4/pickle.py", line 600, in save_reduce
save(func)
File "/usr/lib/python3.4/pickle.py", line 477, in save
f(self, obj) # Call unbound method with explicit self
File "/home/matthew/GitHub/dill/tests/dill/dill.py", line 813, in save_type
StockPickler.save_global(pickler, obj)
File "/usr/lib/python3.4/pickle.py", line 918, in save_global
(obj, module_name, name))
_pickle.PicklingError: Can't pickle <class 'sys.flags'>: it's not the same object as sys.flags
>>> sys.flags in sys.__dict__.values()
True
>>>
Do you think it is acceptable to pop off the objects that don't pickle?
from dill.
dill.detect.errors
could be used to remove "bad objects":
>>> dill.detect.errors(sys, depth=1, safe=1).keys()
dict_keys(['int_info', 'stderr', 'stdin', 'hash_info', 'flags', 'float_info', 'stdout', 'modules', 'thread_info', 'last_traceback', '__stdout__', 'version_info', 'implementation', '__stderr__', '__stdin__'])
from dill.
Yes and no. :) The only things that need to be pickled are the items the user adds to __dict__
. All the other non-pickleable stuff should just "be there" when the module is loaded. So the thing to do is to see if there's a way to distinguish how dynamically added items are different than the __dict__
items that come with the moduleā¦ and only the dynamically-added items need to be pickled.
I can think of a way that might work but it also might be slow. Use an import hook to get a reference to the module (hopefully, it'd be a "clean" copy, not one with the user-added stuff in it)ā¦ and then pop off all the items that are identical in both __dict__
.
from dill.
Can you get a reference to the module though an import hook? I thought import hooks can only be active while searching and loading the module, so could not get a ref to a loaded module. Once a ref is got, you could just do dill.module_dicts[module.__name__] = module.__dict__.copy
, and then when picking the module
for key, item in module.__dict__.items():
if item != dill.module_dicts[module.__name__][key]:
# add to list of attrs to pickle
from dill.
I don't want a ref to the loaded module -- you can just grab that either from the current namespace, or globals
, or sys.modules
. I'm thinking leveraging some local scope to do a new import, or better yet, some form of a partial import that would have the "unaltered" __dict__
-- then comparing the altered and unaltered __dict__
. It needs some digging to find if there's a reasonable way to do that.
from dill.
Well, you could get a ref, and process it before the module is turned over to the user. Could do it by replacing __import__
, that's always great fun!
from dill.
Doing it the fun way: dump this at the end of dill.py
:
module_dicts = {}
def _add_to_mod_dicts(mod):
if mod.__name__ in module_dicts:
return
module_dicts[mod.__name__] = mod.__dict__.copy()
for key, value in module_dicts[mod.__name__].items():
if value is sys.__stdin__ or value is sys.__stdout__ \
or value is sys.__stderr__:
continue
try:
x = copy(value)
except (ValueError, TypeError, AssertionError, AttributeError,
PicklingError, UnpicklingError) as ex:
pass
else:
module_dicts[mod.__name__][key] = x
def _imp(*args, __import__=__import__, **kwargs):
mod = __import__(*args, **kwargs)
_add_to_mod_dicts(mod)
return mod
for mod in list(sys.modules.values()):
_add_to_mod_dicts(mod)
__builtins__["__import__"] = _imp
# when pickling
@register(ModuleType)
def save_module(pickler, obj):
if obj.__name__ in module_dicts:
log.info("M1: %s" % obj)
_main_dict = obj.__dict__.copy() #XXX: better no copy? option to copy?
[_main_dict.pop(item, None) for item in singletontypes]
for key, value in list(_main_dict.items()):
if key in module_dicts[obj.__name__] and \
value == module_dicts[obj.__name__][key]:
_main_dict.pop(key)
pickler.save_reduce(_import_module, (obj.__name__,), obj=obj,
state=_main_dict)
else:
log.info("M2: %s" % obj)
pickler.save_reduce(_import_module, (obj.__name__,), obj=obj)
return
The disadvantages are that the time taken to import dill is much longer, but detection of changed attributes works:
>>> import abc
>>> abc.xxx = 1
>>> dill.dumps(abc)
b'\x80\x03cdill.dill\n_import_module\nq\x00X\x03\x00\x00\x00abcq\x01\x85q\x02Rq\x03}q\x04X\x03\x00\x00\x00xxxq\x05K\x01sb.'
>>>
which translates to:
0: \x80 PROTO 3
2: c GLOBAL 'dill.dill _import_module'
28: q BINPUT 0
30: X BINUNICODE 'abc'
38: q BINPUT 1
40: \x85 TUPLE1
41: q BINPUT 2
43: R REDUCE
44: q BINPUT 3
46: } EMPTY_DICT
47: q BINPUT 4
49: X BINUNICODE 'xxx'
57: q BINPUT 5
59: K BININT1 1
61: s SETITEM
62: b BUILD
63: . STOP
Also, as types are not copied, abc.ABCMeta.xxx = 1
will still not work.
from dill.
There's got to be a better way to do that.
from dill.
Yup, but I haven't found it yet. Any ideas?
from dill.
Aside from the above, no.
from dill.
Next crazy idea:
Put https://gist.github.com/matsjoyce/fee2e234fac425a52f6b in dill/memo_test.py, then add
from . import memo_test
@register(ModuleType)
def save_module(pickler, obj):
log.info("M1: %s" % obj)
_main_dict = memo_test.whats_changed(obj)[0]
print(_main_dict)
pickler.save_reduce(_import_module, (obj.__name__,), obj=obj,
state=_main_dict)
return
to the end of dill.py. It probably more complicated than other solution, as it almost reimplements part of pickle, but its faster, and works for the abc.ABCMeta.x = 1
problem. Eg:
>>> abc.ABCMeta.x=2
>>> dill.dumps(abc)
b'\x80\x03cdill.dill\n_import_module\nq\x00X\x03\x00\x00\x00abcq\x01\x85q\x02Rq\x03}q\x04(X\x07\x00\x00\x00ABCMetaq\x05cabc\nABCMeta\nq\x06X\x02\x00\x00\x00xxq\x07K\x01ub.'
>>>
If you think it'll be useful, I'll add comments, try to make it more efficient, neater, etc. But if there is a smaller solution, it should be probably be used.
from dill.
This is by far the longest block of code used to solve a serialization issue in dill, and the case it handles is pretty narrow. So that's not a good sign, but also maybe it's not that big of an issue. Pickling the __dict__
for a function is a huge feature, and also for classes. I don't know how often in the course of normal programming a module is modified. However, having said thatā¦ the solution is not a bad one. It's fast, and fairly general, so in that regard, it's a good solution. So, probably worthy of accepting, if it works for all the use cases.
Nice idea by the way, and def not crazy.
from dill.
It could also be possibly used in fixing #42, to reduce pickle bloat. I'll try tidying it up a bit, then.
from dill.
Re: #42, I agree.
The biggest sources of bloat, however, are (1) pulling in all globals for function closures, and (2) similar for storing the source code for class definitions (instead of byref
). I have done some work to reduce the bloat on the above, but more strategic dropping of unnecessary __dict__
items and strategic use of name references is needed.
from dill.
Thanks for taking up this issue, btw.
from dill.
I've pushed the file + py2 compatibility fixes to matsjoyce/dill@228611477 if you want to test it a bit. Wait, I've found a šŖ²
from dill.
OK, fixed them. Its at matsjoyce/dill@d97de9e
Notes:
- Timings:
State | Operation | Time | Notes |
---|---|---|---|
Without new method: | Pickle module: | 0.004 | This does not save the state of the module at all |
Import dill: | 0.184 | ||
With new method: | Pickle module: | 0.407 | |
Import dill: | 0.359 | Extra time due to memorising all imported modules |
- Tests:
- test_module.py does not pass yet, as it cannot find some types
from dill.
Ouch.
from dill.
Seems I don't know how timeit works... š«. Here's some more reliable timings:
Operation | Memorise time | Master time | Slow down |
---|---|---|---|
Import dill | 0.28599 | 0.15788 | 1.811 |
Dump small python module | 0.00508 | 0.00031 | 16.378 |
Dump small python module which imports os | 0.01992 | 0.00032 | 62.244 |
Dump abc | 0.00597 | 0.00005 | 129.113 |
Dump abc with attr changed | 0.00546 | 0.00005 | 118.588 |
Note: For the dumping of abc, master only dumps a reference.
from dill.
Ok, that's much less of an "ouch". I'll see if I can eek out some time to play with your solution in the next few days. Not that I expect to see a better route right now, but just to stay on pulse with it at least.
from dill.
Added optimisations in mastjoyce@72dacaf
Operation | Memorise time | Master time | Slow down |
---|---|---|---|
Import dill | 0.25513 | 0.14618 | 1.745 |
Dump small python module | 0.00417 | 0.00031 | 13.250 |
Dump small python module which imports os | 0.01187 | 0.00032 | 36.520 |
Dump abc | 0.00458 | 0.00005 | 97.586 |
Dump abc with attr changed | 0.00456 | 0.00005 | 96.963 |
from dill.
All tests now succeed with python 2 at matsjoyce/dill@5ea2d6d, while with python 3 only test_objects.py fails on some ctypes stuff
from dill.
Fixing this will also fix #49.
from dill.
Sorry to be a little slow on this (it'll be another few days). When you serialize the moduleā¦ and you need a "pure" module to compare toā¦ could you not get a "pure" module by serializing it with the StockPickler
? Then you could compare that "pure" copy with the one with the augmented __dict__
to find out what else you need to save.
from dill.
Do you mean:
- Use
StockPickler
to pickle then unpickle the module to get a pure copy?- This doesn't work as it uses
__import__
, which will just return the entry insys.modules
- This doesn't work as it uses
- Use
StockPickler
to pickle the modules__dict__
on first import (like above solution)- This is what #43 (comment) does
- The disadvantage is that although this solution can find additions to modules, it cannot detect the
abc.ABCMeta.x = 1
situation. It can also lead to weird behaviour (as usual, when c modules and pickle meet...)
from dill.
@matsjoyce: I meant the firstā¦ you are right, and I forgot it wouldn't work.
from dill.
I played with this a little bit moreā¦ it's really nice. However, it needs more testing, polishing, cases, etc. I'm punting this to the next release. I'm now not so concerned about the speed as I am about error cases, naming, etc.
from dill.
It may be possible to better unify byref
and use_diff
as they seem to be shades of the same concept (like the different FMODE
).
from dill.
I think the distinction is that diff
aims to produce the smallest pickle to reproduce the object (assuming a similar unpickling environment) and no diff
aims to be more 'complete'. We could almost split the entire package along those lines. The no diff
option would be interesting for saving game state, etc.
from dill.
@matsjoyce: this, #69, #70, and other related are a bit of a mess. Do you have a plan for this? Note also that #111 etc demonstrates there are issues with this code on windows. I keep punting on theseā¦ and they are not showing up as a high priority for me. But, nonetheless, they are still something that needs cleaning up.
from dill.
@matsjoyce: You may want to check how 1938291 affects the diff
branch. Basically, the patch I added was to pickle the builtins
(or __builtin__
) module by reference regardless of the value of settings['recurse']
. Release of 0.2.4
is imminent unless you have a reason I should hold it off.
from dill.
Related Issues (20)
- How to ensure the same functions serialize to the same bytes?
- Cannot use callable that was pickled within pytest HOT 14
- dill.source.getsource applied on lambda function returns many cached lines
- Class attributes changes at load if the object is in a list or tuple. HOT 1
- nan type drift for np.nan HOT 3
- dill.source.findsource fails when in asyncio REPL
- `save_function()` can't save function in a submodule that has the same name as an attribute of the parent module HOT 2
- bytes length not a multiple of item size
- dill.source.getsource returns decorated function instead of function
- 0.3.7 incorrectly pickles the class definition for module/class with the same name HOT 7
- Exception: --- Logging error --- HOT 3
- Use dump_module() in jupyter notebook with remote kernel HOT 1
- UnpicklingError using dill but not stdlib pickle HOT 2
- Compatibility with Pyodide HOT 6
- tfp.math.psd_kernels.ExponentiatedQuadratic no longer serialiases with latest versions
- `dill.load_session()` raises `NameError: name 'UnpicklingError' is not defined` HOT 5
- pydantic>=2.5 classes can't be serialized HOT 4
- Failure to load dumped partial function (in older python) properly (in newer python)
- threading.Thread fails to pickle in python > 3.13.0a5 HOT 4
- Error with typing.AbstractContextManager in Python 3.8 and newer dill versions (>=0.3.6) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dill.