Comments (9)
Firstly, what an epic tool! Super useful when working with jupyter notebooks that take a long time to complete and recomputing everything is either a) impossible b) merely a massive pain - thank you for making it!
I think it would be great if this ^ above hack were incorporated into the package itself, i.e.
dill.dump_session(fileName, ignore=True)
Produces
Variables:
- var1
- var3
- var5
could not be pickled and will not be restored. Do you wish to continue (everything else that can be will be stored)? [y/n]:
The user types yes to continue and dill saves everything about the current environment that it can, ignoring the variables specified.
In a large file with a lot of globals I imagine that even running the check might take a while (it does in some of my files) so there should probably be an additional flag so that the user can set whether they want dill to automatically pickle even if it can't do everything - default being yes (if the user selects no, then, if dill can't pickle everything like above, they will be prompted asking whether they want to continue).
It would be a big quality of life improvement for me, and since I'm hardly unique I'm guessing many others.
Going hunting online for a work around isn't easy (this post, presenting the best solution, is quite hidden away and I'm guessing many don't read through it / miss it).
######################################################
P.S: Until such a time: @mmckerns fix needs a tiny bit of updating. iteritems is deprecated in Python3. Also, map is lazy, and thus doesn't actually do anything. An easy way to force execution is to turn it into a list. In all:
list(map(globals().pop, tuple(i for (i,j) in globals().items() if not dill.pickles(j) and i not in ('__builtins__',))))
dill.dump_session("testing.db")
from dill.
We could replace the objects with a dill.Ignored
singleton. It'd be a relatively simple change, but I'm having difficulty visualising a circumstance where its OK to have a load_session
where half the things don't work... But, as you say, it's the users problem. It would just have to come with a big doc string saying Use as last resort! The object will NOT work on the other end!
.
from dill.
It might make sense when there's an isolated object, such as a generator that was created but not used… but in the case that it's a matplotlib
plot in an IPython
session, and it doesn't serialize, but the user is primarily wanting to capture it… I think it's not so good.
Maybe a better alternate would be dump_session(ignore=True)
, where dill
just "skips" anything that it can't serialize… (i.e. catch all serialization errors, and move on). Then there's only some of the corner cases that blow up on load
… and the same could be done there. Is then a session with missing bits worthwhile for the user? That's for the user to decide.
from dill.
Yup, I suppose so. What would be the best way to implement that? Overload Pickler.save
and Unpickler.load
?
from dill.
I think either overload the dump
method of the dill.Pickler
, or wrap the behavior into the dump_session
function call -- probably the former. I could see a pure try
-except
approach, or an approach driven by the methods in dill.detect
. Similarly for load
.
from dill.
I just asked a StackOverflow question that might be a use case for this:
http://stackoverflow.com/questions/27351980/how-to-add-a-custom-type-to-dills-pickleable-types
from dill.
OK, @mittenchops, question answered. It could be a use case, but remember that if anything on the other end needs your collection, it wouldn't be there.
from dill.
Currently when I do a dump_session
, I do something like this:
map(globals().pop, tuple(i for (i,j) in globals().iteritems() if not dill.pickles(j) and i not in ('__builtins__',)))
to remove all objects that will not pickle. There's probably a better way to do it, but I tend to at least do some variant of the above on-the-fly. This will not be the most efficient, but will work as long as dill.pickles
is correct (which is overwhelmingly most of the time).
from dill.
Hello, I'm working on a new feature like this for dump_session()
. As it is not always possible or convenient to delete unpickable or large but cheap to generate objects from the namespace before saving the session, as they could be needed after it, I consider this to be a relevant feature.
Before I submit a draft PR, how do you think the API could be like? And how should load_session()
's behavior be in this case? Should it simply ignore the not saved variables, restore them as a dill singleton as suggested, or should it do this just for variables not defined in the namespace?
I already have a working prototype that can deal with IPython's command history variables. 👌🏼
from dill.
Related Issues (20)
- dill.source.findsource fails when in asyncio REPL
- `save_function()` can't save function in a submodule that has the same name as an attribute of the parent module HOT 2
- bytes length not a multiple of item size
- dill.source.getsource returns decorated function instead of function HOT 2
- 0.3.7 incorrectly pickles the class definition for module/class with the same name HOT 7
- Exception: --- Logging error --- HOT 3
- Use dump_module() in jupyter notebook with remote kernel HOT 1
- UnpicklingError using dill but not stdlib pickle HOT 2
- Compatibility with Pyodide HOT 6
- tfp.math.psd_kernels.ExponentiatedQuadratic no longer serialiases with latest versions
- `dill.load_session()` raises `NameError: name 'UnpicklingError' is not defined` HOT 5
- pydantic>=2.5 classes can't be serialized HOT 4
- Failure to load dumped partial function (in older python) properly (in newer python)
- threading.Thread fails to pickle in python > 3.13.0a5 HOT 4
- Error with typing.AbstractContextManager in Python 3.8 and newer dill versions (>=0.3.6) HOT 1
- File gets truncated upon unpickling
- PyTorch C++-generated module "not found as" itself? HOT 1
- Importing dill changes the python pickler behavior HOT 8
- Recursive self-references that trigger a RecursionError
- Attempting to `dill` a function defined in a `doctest` run with `pytest` causes a `TypeError: cannot pickle 'EncodedFile' object`.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dill.