Comments (11)
I'd disagree… dill
can serialize this type of object. What pickle
does is to serialize the class "by reference", which means using the module name (__main__
in this case) and the class name. If you pass the byref
flag to dill
, it can use the same mechanism.
>>> pickle.dumps(obj)
"ccopy_reg\n_reconstructor\np0\n(c__main__\nInheritsList\np1\ncdill.dill\n_load_type\np2\n(S'ListType'\np3\ntp4\nRp5\n(lp6\nS'string'\np7\natp8\nRp9\n."
>>>
>>> dill.dumps(obj, byref=True)
'\x80\x02c__main__\nInheritsList\nq\x00)\x81q\x01U\x06stringq\x02a}q\x03b.'
>>>
>>> dill.loads(dill.dumps(obj, byref=True))
['string']
This is fine for many cases, however is not good if you have an interactively defined class that you want to serialize… and you want the class to be persistent across interpreter sessions. For those cases, pickle
fails, while dill
will serialize the entire code for the class, and hence it should work to serialize the object to a file and then start up the interpreter and unpickle the object. The setting byref=False
is the default for dill.dumps
.
It does appear that while the object serializes, the super
causes problems deserializing… and thus is worth some investigation.
>>> dill.dumps(obj, byref=False)
'\x80\x02cdill.dill\n_create_type\nq\x00(cdill.dill\n_load_type\nq\x01U\x08TypeTypeq\x02\x85q\x03Rq\x04U\x0cInheritsListq\x05h\x01U\x08ListTypeq\x06\x85q\x07Rq\x08\x85q\t}q\n(U\r__slotnames__q\x0b]q\x0cU\n__module__q\rU\x08__main__q\x0eU\x08__init__q\x0fh\x01U\x0cFunctionTypeq\x10\x85q\x11Rq\x12(cdill.dill\n_unmarshal\nq\x13U\x94c\x01\x00\x00\x00\x01\x00\x00\x00\x03\x00\x00\x00C\x00\x00\x00s\x17\x00\x00\x00t\x00\x00t\x01\x00|\x00\x00\x83\x02\x00j\x02\x00\x83\x00\x00\x01d\x00\x00S(\x01\x00\x00\x00N(\x03\x00\x00\x00t\x05\x00\x00\x00supert\x0c\x00\x00\x00InheritsListt\x08\x00\x00\x00__init__(\x01\x00\x00\x00t\x04\x00\x00\x00self(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>R\x02\x00\x00\x00\x03\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x14\x85q\x15Rq\x16c__builtin__\n__main__\nh\x0fNNtq\x17Rq\x18U\x07__doc__q\x19U\x04blahq\x1aU\x06appendq\x1bh\x12(h\x13U\x9dc\x02\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00C\x00\x00\x00s\x1a\x00\x00\x00t\x00\x00t\x01\x00|\x00\x00\x83\x02\x00j\x02\x00|\x01\x00\x83\x01\x00\x01d\x00\x00S(\x01\x00\x00\x00N(\x03\x00\x00\x00t\x05\x00\x00\x00supert\x0c\x00\x00\x00InheritsListt\x06\x00\x00\x00append(\x02\x00\x00\x00t\x04\x00\x00\x00selft\x03\x00\x00\x00obj(\x00\x00\x00\x00(\x00\x00\x00\x00s\x07\x00\x00\x00<stdin>R\x02\x00\x00\x00\x05\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x1c\x85q\x1dRq\x1ec__builtin__\n__main__\nh\x1bNNtq\x1fRq utq!Rq")\x81q#U\x06stringq$a}q%b.'
>>> dill.loads(dill.dumps(obj, byref=False))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2b2.dev-py2.7.egg/dill/dill.py", line 138, in loads
return load(file)
File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2b2.dev-py2.7.egg/dill/dill.py", line 131, in load
obj = pik.load()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1182, in load_append
list.append(value)
File "<stdin>", line 6, in append
TypeError: super(type, obj): obj must be an instance or subtype of type
>>>
from dill.
Thanks for making the title more specific. The deserialization is indeed the issue and my code snippet is basically a boiled down version of https://github.com/opencobra/cobrapy/blob/master/cobra/core/DictList.py#L4 which I don't control. Anyways, I am just trying to get some code parallelized using IPython.parallel and would prefer using dill for the serialization using .use_dill() because dill is just awesome. However, that also means that I don't have low level control over how IPython handles dill.
I just discovered that dill==0.2a1 (installed via pip) does not throw the exception that you've posted above while 0.2b1 does. Hope this helps.
from dill.
I'm glad you like dill
. Yes, in dill=0.2b1 I set byref=False
as the default. If there's no way to pass that flag to ipython (apparently not), then it'll take until the issue as outlined above gets resolved. Maybe there's a better approach, having to use the byref=True
flag sometimes is annoying.
from dill.
Some notes on follow-up…
>>> class Bar(list):
... def __init__(self):
... super(Bar, self).__init__()
...
>>> b = Bar()
>>>
>>> dill.loads(dill.dumps(Bar))
<class '__main__.Bar'>
>>>
>>> Bar.mro()
[<class '__main__.Bar'>, <type 'list'>, <type 'object'>]
>>> dill.loads(dill.dumps(Bar)).mro()
[<class '__main__.Bar'>, <type 'list'>, <type 'object'>]
>>>
>>> dill.loads(dill.dumps(b))
[]
>>>
>>> b.append('foo')
>>> b
['foo']
>>> dill.loads(dill.dumps(b))
['foo']
So far so good.
>>> class Foo(list):
... def __init__(self):
... super(Foo, self).__init__()
... def count(self, obj):
… return super(Foo, self).count(obj)
...
>>> f = Foo()
>>> dill.loads(dill.dumps(Foo))
<class '__main__.Foo'>
>>>
>>> f.extend([1,2,3])
>>> f
[1, 2, 3]
>>> dill.loads(dill.dumps(f))
[1, 2, 3]
>>> f.count(2)
1
>>> dill.loads(dill.dumps(f)).count(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in count
TypeError: super(type, obj): obj must be an instance or subtype of type
Ah, that's the issue.
>>> class Test(list):
... def append(self, obj):
... super(Test, self).append(obj)
...
>>> t = Test()
>>> t.append('foo')
>>>
>>> class Test(list):
... def append(me, obj):
... super(Test, me).append(obj)
...
>>> t.append('bar')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in append
TypeError: super(type, obj): obj must be an instance or subtype of type
So apparently, when dill is serializing the class code… the class code is being recompiled, and the instance is considered "stale" for any existing instances. So that's what needs a fix...
from dill.
Looks like pickle
does this:
def find_class(self, module, name):
# Subclasses may override this
__import__(module)
mod = sys.modules[module]
klass = getattr(mod, name)
return klass
so, to make this work for classes in cases like the one in this issue, I'd either have to update the reference in the instance to point to the new class definition... or build a new instance from the new class definition, and then transfer the state from the existing (serialized) instance. Both are tricky, but feasible, I think.
from dill.
Changing this to an "enhancement" instead of a "bug". What might be considered a bug is to have byref=False
be the default value for dumps
.
from dill.
a simple fix for Foo
is:
>>> isinstance(f, Foo)
False
>>> f.__class__ = Foo
>>> isinstance(f, Foo)
True
>>> f.count(2)
1
or possibly better, since dill
already has a handle to __main__
:
>>> import __main__ as _main_module
>>> isinstance(f, Foo)
False
>>> f.__class__ = _main_module.Foo
>>> isinstance(f, Foo)
True
>>> f.count(2)
1
from dill.
then in load
, we'd have something like:
obj.__class__ = getattr(_main_module, type(obj).__name__)
after the object is loaded, but before it's returned.
Better than that might be to push the re-referencing back down to the save_reduce
level.
from dill.
should be 'fixed' in 07be939
from dill.
Strange that for Python 2.7 dill 2.7.1 it doesn't like to load classes with the super statement? Seems to be related to the above?
> class List_Wrap(list):
> def __init__(self):
> super(List_Wrap,self).__init__()
> return
> def test_dill():
> a_dill=dill.dumps(List_Wrap)
> AB=dill.loads(a_dill)
> AB()
> return
Still getting the error:
super(List_Wrap,self).__init__()
TypeError: super(type, obj): obj must be an instance or subtype of type
**update: the fix would be to run:
globals()[AB.__name__] = AB
before AB()
.
Sorry, I'm not sure why that fixed it.
from dill.
@panamantis: Thanks for the comment. I believe it's a duplicate of this or #75 or #56. Cases that we see failure to serialize in the presence of super
has been documented in other issues.
from dill.
Related Issues (20)
- dill.source.findsource fails when in asyncio REPL
- `save_function()` can't save function in a submodule that has the same name as an attribute of the parent module HOT 2
- bytes length not a multiple of item size
- dill.source.getsource returns decorated function instead of function
- 0.3.7 incorrectly pickles the class definition for module/class with the same name HOT 7
- Exception: --- Logging error --- HOT 3
- Use dump_module() in jupyter notebook with remote kernel HOT 1
- UnpicklingError using dill but not stdlib pickle HOT 2
- Compatibility with Pyodide HOT 6
- tfp.math.psd_kernels.ExponentiatedQuadratic no longer serialiases with latest versions
- `dill.load_session()` raises `NameError: name 'UnpicklingError' is not defined` HOT 5
- pydantic>=2.5 classes can't be serialized HOT 4
- Failure to load dumped partial function (in older python) properly (in newer python)
- threading.Thread fails to pickle in python > 3.13.0a5 HOT 4
- Error with typing.AbstractContextManager in Python 3.8 and newer dill versions (>=0.3.6) HOT 1
- File gets truncated upon unpickling
- PyTorch C++-generated module "not found as" itself? HOT 1
- Importing dill changes the python pickler behavior HOT 8
- Recursive self-references that trigger a RecursionError
- Attempting to `dill` a function defined in a `doctest` run with `pytest` causes a `TypeError: cannot pickle 'EncodedFile' object`.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dill.