Comments (8)
This is the intended behavior. It's partially due to historical choices with python 2.x, where just by an import dill
one could then extend the registry so that any python module that used pickle
could utilize dill
serialization without being modified or worked around. If this behavior is not what you want, then there's a function that will remove the injected types from the pickle
registry:
Python 3.8.19 (default, Mar 22 2024, 01:55:18)
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import io
>>> import pickle
>>> from collections import namedtuple
>>> SomeTuple = namedtuple('SomeTuple', 'a b')
>>> thing = SomeTuple(1, 2)
>>> f = io.BytesIO()
>>> pickle._Pickler(f).dump(thing)
>>> before_dill = f.getvalue()
>>> len(pickle._Pickler.dispatch)
15
>>>
>>> import dill
>>> dill.extend(False)
>>> f = io.BytesIO()
>>> pickle._Pickler(f).dump(thing)
>>> after_dill = f.getvalue()
>>> assert before_dill == after_dill
>>> len(pickle._Pickler.dispatch)
15
from dill.
This is an incredibly unfortunate default and I think it should be reconsidered.
As a user I do not control the behavior of third party libraries (dill and otherwise).
I may have neither a direct dependency on dill or pickle and be bitten by this behavior.
Consider the following:
- I use a third party library that uses
pickle._Pickler
internally - I use another third party library that imports dill
- I am not directly use either pickle or dill in my own code
- I use this third party library to send an object to another process which doesn't have dill available in its environment
- Things explode
from dill.
I agree it's an unfortunate default. We looked at the impact of changing a poor decision made over a decade ago in a different ecosystem, and it'd also be very disruptive to reverse it. Hence, the addition of the "extend" function, even it has an unfortunate default.
from dill.
This behavior in dill breaks our software, despite our software not using dill.
We get dill transitively imported in some applications but not others depending on whether another third party library depends on it.
Given it seems this behavior won't be changed in dill, the only path forward for users in my position is to try and convince other library authors to avoid using dill themselves?
from dill.
Thoughts on something like the following in dill/__init__.py
try:
import dill_noextend
except ImportError:
extend()
I thought about an environment variable, but I think the above is more desirable, since it allows users to make the decision once at environment creation time.
from dill.
That's an interesting idea.
I don't understand your particular case, so maybe that would help as well
In general, importing dill
then using extend(False)
should globally drop the dill
extensions to the registry, even if your code doesn't use dll
otherwise. If another library is using dill
for serialization, it also is using a modified Pickler
and Unpickler
(i.e. dill
subclasses those of pickle
)... so if an object is serialized with dill
, one should also have dill
installed when you unpickle the object.
Some thought needs to go into what the impact of a dependency-based global switch is. While it would seem that the right thing to do in the abstract is to not extend the registry by default... an even better solution is to not extend the pickle registry at all (unfortunately, that's not feasible). Practically, regardless of what you do when dill
is imported, either inject into the registry by default or not, modifying the global pickle registry can have an impact on other codes. It's not really a problem specific to dill
-- any code that modifies the global pickle registry by adding a new type is going to cause the same kind of issue, I believe.... so the most conservative thing I can think of is to in your own code, explicitly set a switch to extend the registry when needed, and to explicitly 'clear' the registry when that's needed. dill
is the only code that I know of that gives that as an option (although an improved behavior would be to return the extended types (from dill
or elsewhere) as a dict, then extend
take True
for dill
, or a dict
(returned earlier).
from dill.
I don't understand your particular case, so maybe that would help as well
The scenario is something like this:
- we have a large active codebase with many hundreds of scripts, with new scripts being created frequently
- nothing in our own codebase uses dill
- we have a component that uses
pickle._Pickler
and depends on consistent pickle output between different scripts in our codebase - some scripts depend on third party libraries that transitively import dill
This results in a scenario where sometimes dill is imported and sometimes it is not, which changes the pickle output between scripts.
I don't want to import dill and unregister in the component that uses pickle._Pickler
because that component is also used in environments that don't have dill installed.
So my options are vaguely:
- completely ban dill from the environment and try to get it and any third party libraries that use it uninstalled
- go fix all scripts that transitively import dill to call unregister and hope more don't pop up
- add some try/except dill import to our
_Pickler
using component
None of those are very great - especially the last, since I really don't want to continue propagating the python antipattern of "try to import some thing and silently swallow the exception if it isn't there" within our codebase since it is already a great source of pain.
A better option for us would be to disable this behavior in dill outright. The existence of an empty file in our python environment called dill_noextend.py
that signals to dill "if you are imported, please don't screw with the python pickler" would do exactly that for us.
and now for something completely different
I'm not really sure why this default cannot be changed - it already seems to be not functioning as intended?
import io
import pickle
from collections import namedtuple
SomeTuple = namedtuple('SomeTuple', 'a b')
thing = SomeTuple(1, 2)
f = io.BytesIO()
pickle._Pickler(f).dump(thing)
before_dill = f.getvalue()
import dill
f = io.BytesIO()
pickle.Pickler(f).dump(thing)
after_dill_native = f.getvalue()
assert before_dill == after_dill_native
f = io.BytesIO()
pickle._Pickler(f).dump(thing)
after_dill_python = f.getvalue()
assert before_dill != after_dill_python
dill only extends the pure-python pickler, not the native pickler implementation, which is contrary to how this is described in the docs.
Given that you have to go very much out of your way to use the python pickle implementation, and most people will have the native implementation, I can't imagine very many people are relying on this extend behavior anymore.
from dill.
I believe all serialization packages, dill
included, globally edit the pickle registry by adding new types. python
gives the functionality to do that with copy_reg
. If any of the third party codes use something that edits the pickle registry breaks your usage model, then it would be helpful to understand what your exact issue is. If that's an unreasonable ask, my suggestion can only be that you take action on your end (i.e. change your model, or ban any third party code from editing the pickle registry). However, if you can clarify what your exact issue is, and why it's an issue, there might be potential to come up with a solution that could go into dill
.
Auto-extending the registry originated in python 2.x, and has never extended the native registry. When python switched pickle.Pickler
to use the native pickler, and moved the python pickler to pickle._Pickler
, ultimately from our perspective that was good in that it minimized the impact of any serialization package on the default global registry (i.e. the native one in python 3.x). I agree, it's a corner case to depend on the integrity of the registry by using pickle._Pickler
... so it would help to understand why exactly you are doing that, as to help inform the most appropriate mitigation. I'll reopen this issue, as it would seem there is more to discuss here.
from dill.
Related Issues (20)
- dill.source.getsource returns decorated function instead of function HOT 2
- 0.3.7 incorrectly pickles the class definition for module/class with the same name HOT 7
- Exception: --- Logging error --- HOT 3
- Use dump_module() in jupyter notebook with remote kernel HOT 1
- UnpicklingError using dill but not stdlib pickle HOT 2
- Compatibility with Pyodide HOT 6
- tfp.math.psd_kernels.ExponentiatedQuadratic no longer serialiases with latest versions
- `dill.load_session()` raises `NameError: name 'UnpicklingError' is not defined` HOT 5
- pydantic>=2.5 classes can't be serialized HOT 4
- Failure to load dumped partial function (in older python) properly (in newer python)
- threading.Thread fails to pickle in python > 3.13.0a5 HOT 4
- Error with typing.AbstractContextManager in Python 3.8 and newer dill versions (>=0.3.6) HOT 1
- File gets truncated upon unpickling
- PyTorch C++-generated module "not found as" itself? HOT 1
- Recursive self-references that trigger a RecursionError
- Attempting to `dill` a function defined in a `doctest` run with `pytest` causes a `TypeError: cannot pickle 'EncodedFile' object`.
- NumPy 2.0.0 test regression: `test_diff.py`: `RecursionError: maximum recursion depth exceeded`
- Python 3.13.0b3: `test_detect.py`: `assert len(s) is len(a)` fails HOT 1
- TypeError: 'bytes' object cannot be interpreted as an integer HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dill.