Giter Club home page Giter Club logo

Comments (9)

RuneScape314159265 avatar RuneScape314159265 commented on July 18, 2024 3

Firstly, what an epic tool! Super useful when working with jupyter notebooks that take a long time to complete and recomputing everything is either a) impossible b) merely a massive pain - thank you for making it!

I think it would be great if this ^ above hack were incorporated into the package itself, i.e.

dill.dump_session(fileName, ignore=True)

Produces

Variables:
- var1
- var3
- var5 
could not be pickled and will not be restored. Do you wish to continue (everything else that can be will be stored)? [y/n]: 

The user types yes to continue and dill saves everything about the current environment that it can, ignoring the variables specified.

In a large file with a lot of globals I imagine that even running the check might take a while (it does in some of my files) so there should probably be an additional flag so that the user can set whether they want dill to automatically pickle even if it can't do everything - default being yes (if the user selects no, then, if dill can't pickle everything like above, they will be prompted asking whether they want to continue).

It would be a big quality of life improvement for me, and since I'm hardly unique I'm guessing many others.

Going hunting online for a work around isn't easy (this post, presenting the best solution, is quite hidden away and I'm guessing many don't read through it / miss it).

######################################################

P.S: Until such a time: @mmckerns fix needs a tiny bit of updating. iteritems is deprecated in Python3. Also, map is lazy, and thus doesn't actually do anything. An easy way to force execution is to turn it into a list. In all:

list(map(globals().pop, tuple(i for (i,j) in globals().items() if not dill.pickles(j) and i not in ('__builtins__',))))
dill.dump_session("testing.db")

from dill.

matsjoyce avatar matsjoyce commented on July 18, 2024 1

We could replace the objects with a dill.Ignored singleton. It'd be a relatively simple change, but I'm having difficulty visualising a circumstance where its OK to have a load_session where half the things don't work... But, as you say, it's the users problem. It would just have to come with a big doc string saying Use as last resort! The object will NOT work on the other end!.

from dill.

mmckerns avatar mmckerns commented on July 18, 2024 1

It might make sense when there's an isolated object, such as a generator that was created but not used… but in the case that it's a matplotlib plot in an IPython session, and it doesn't serialize, but the user is primarily wanting to capture it… I think it's not so good.

Maybe a better alternate would be dump_session(ignore=True), where dill just "skips" anything that it can't serialize… (i.e. catch all serialization errors, and move on). Then there's only some of the corner cases that blow up on load… and the same could be done there. Is then a session with missing bits worthwhile for the user? That's for the user to decide.

from dill.

matsjoyce avatar matsjoyce commented on July 18, 2024

Yup, I suppose so. What would be the best way to implement that? Overload Pickler.save and Unpickler.load?

from dill.

mmckerns avatar mmckerns commented on July 18, 2024

I think either overload the dump method of the dill.Pickler, or wrap the behavior into the dump_session function call -- probably the former. I could see a pure try-except approach, or an approach driven by the methods in dill.detect. Similarly for load.

from dill.

mittenchops avatar mittenchops commented on July 18, 2024

I just asked a StackOverflow question that might be a use case for this:
http://stackoverflow.com/questions/27351980/how-to-add-a-custom-type-to-dills-pickleable-types

from dill.

matsjoyce avatar matsjoyce commented on July 18, 2024

OK, @mittenchops, question answered. It could be a use case, but remember that if anything on the other end needs your collection, it wouldn't be there.

from dill.

mmckerns avatar mmckerns commented on July 18, 2024

Currently when I do a dump_session, I do something like this:

map(globals().pop, tuple(i for (i,j) in globals().iteritems() if not dill.pickles(j) and i not in ('__builtins__',)))

to remove all objects that will not pickle. There's probably a better way to do it, but I tend to at least do some variant of the above on-the-fly. This will not be the most efficient, but will work as long as dill.pickles is correct (which is overwhelmingly most of the time).

from dill.

leogama avatar leogama commented on July 18, 2024

Hello, I'm working on a new feature like this for dump_session(). As it is not always possible or convenient to delete unpickable or large but cheap to generate objects from the namespace before saving the session, as they could be needed after it, I consider this to be a relevant feature.

Before I submit a draft PR, how do you think the API could be like? And how should load_session()'s behavior be in this case? Should it simply ignore the not saved variables, restore them as a dill singleton as suggested, or should it do this just for variables not defined in the namespace?

I already have a working prototype that can deal with IPython's command history variables. 👌🏼

from dill.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.