Giter Club home page Giter Club logo

dill's People

Contributors

0xflotus avatar abrasive avatar adamwill avatar albertvillanova avatar anivegesana avatar benesch avatar byroot avatar charmoniumq avatar dependabot[bot] avatar eltoder avatar futrell avatar hugovk avatar jameslamb avatar jasonmyers avatar kelvinburke avatar kernc avatar kirillmakhonin avatar lazylynx avatar leogama avatar matsjoyce avatar mgorny avatar miguelinux314 avatar mindw avatar mmckerns avatar mtelka avatar peque avatar pkienzle avatar robertwb avatar sekikn avatar shortfinga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dill's Issues

ExitType = type(exit) gives NameError: name 'exit' is not defined

Running a frozen Python application (frozen using cx_Freeze) on Windows throws the error "NameError: name 'exit' is not defined" at at dill.py line 133.

ExitType = type(exit)  

This line is reached (as I understand it) any time the import of dill is not in an IPython shell. Why it only errors when frozen is unclear to me. However, changing this line to:

ExitType = type(sys.exit)  

...appears to solve the problem.

leverage cloudpickle for missing functionality in dill

As mentioned on stackoverflow picloud has an impressive picklers that handle numpy objects closures and a lot of other clever tricks that seem to be on the dill TODO list. and it is available under reasonably permissive licensing. Moreover, since picloud is shutting down, it is a propitious time to negotiate alternative licensing. And I can't see any mention of it in the commit logs for dill, so it it seems worth mentioning it here or the pathos tracker - I don't have an account on the pathos tracker, so I can't raise a ticket there.

Depending, of course, on coding conventions and methods, it might save some time for the dill project.

The code appears only to be available as a tarball -- creating a free picloud.com account may be required? but the LGPL conditions are clearly stated therein, so re-use if permissible, although it may not fit entirely with the modified BSD licence of dill.

extend dill.source.getsource for non-function-like objects

dill.source.getsource is pretty good about getting the source for imported and interactively defined function, lambdas, and class methods. It'd be nice to get the source code for classes and class instances and such. Currently, assumes the inspected object has a func_code attribute.

dumping functions does not store dependencies

If I use this script to serialize something:

import pandas as pd                                                                                                                                                                                                                                                              
import dill                                                                                                                                                                                                                                                                      

def func(x):                                                                                                                                                                                                                                                                     
    return pd.DataFrame({'a' : x})                                                                                                                                                                                                                                               

def func2(x):                                                                                                                                                                                                                                                                    
    return func(x) + func(x)                                                                                                                                                                                                                                                     

with open("out.dill", "w+") as f:                                                                                                                                                                                                                                                
    dill.dump(func2, f)    

And load it with:

import dill                                                                                                                                                                                                                                                                      

with open("out.dill") as f:                                                                                                                                                                                                                                                      
    func2 = dill.load(f)                                                                                                                                                                                                                                                         

print func2([1,2,3,4,5])                                                                                                                                                                                                                                                         

I get

Traceback (most recent call last):                                                                                                                                                                                                                                               
  File "read_test.py", line 6, in <module>                                                                                                                                                                                                                                       
    print func2([1,2,3,4,5])                                                                                                                                                                                                                                                     
  File "write_test.py", line 8, in func2                                                                                                                                                                                                                                         
    return func(x) + func(x)                                                                                                                                                                                                                                                     
NameError: global name 'func' is not defined           

What is the intention for how the user should handle this?

Thanks

Type's __dict__s are not pickled

Follow up for #41:

Python 3.4.1 (default, May 19 2014, 17:23:49) 
[GCC 4.9.0 20140507 (prerelease)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import abc, dill
>>> abc.ABCMeta.zzz=1
>>> dill.dump_session()
>>> 
================Restart================
Python 3.4.1 (default, May 19 2014, 17:23:49) 
[GCC 4.9.0 20140507 (prerelease)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session()
>>> abc.ABCMeta
<class 'abc.ABCMeta'>
>>> abc.ABCMeta.zzz
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'ABCMeta' has no attribute 'zzz'

This is due to https://github.com/uqfoundation/dill/blob/master/dill/dill.py#L808-L813 not adding the __dict__ attribute:

>>> dill.dill._trace(True)
>>> dill.dumps(abc.ABCMeta)
T4: <class 'abc.ABCMeta'>
b'\x80\x03cabc\nABCMeta\nq\x00.'

decrease size of pickle for class with byref=False

A pickle from a class with byref=False can tend to be large -- the serializing function is _dict_from_dictproxy. This often causes the pickle to contain things like copyright -- at first glance, the contents looks as if it may be similar to __builtins__.__dict__.

Would it be possible to reduce the size of the pickle for such a class, by removing any unnecessary items that are currently pickled?

Unpicklable objects defined in doctests interfere with serialization of lambdas

I found another example where Dill fails to pickle objects when running under doctest. This is a one-line addition to my test-case in #18.

I'm using Python 2.7.6 and Dill eb122e6.

/cc @distobj

import dill as pickle
import doctest

pickle.dill._trace(1)

class SomeUnreferencedUnpicklableClass(object):
    def __reduce__(self):
        raise Exception

unpicklable = SomeUnreferencedUnpicklableClass()

# This works fine outside of Doctest:
serialized = pickle.dumps(lambda x: x)

# This fails because it tries to pickle the unpicklable object:
def tests():
    """
    >>> unpicklable = SomeUnreferencedUnpicklableClass()  # <-- Added since #18
    >>> serialized = pickle.dumps(lambda x: x)
    """
    return

print "\n\nRunning Doctest:"
doctest.testmod()

Output:

F1: <function <lambda> at 0x101f8f848>
F2: <function _create_function at 0x101c96d70>
Co: <code object <lambda> at 0x100474ab0, file "dillbugtwo.py", line 13>
F2: <function _unmarshal at 0x101c96c08>
D1: <dict object at 0x10031ba20>
D2: <dict object at 0x101da9de0>


Running Doctest:
F1: <function <lambda> at 0x101f8f938>
F2: <function _create_function at 0x101c96d70>
Co: <code object <lambda> at 0x101f369b0, file "<doctest __main__.tests[1]>", line 1>
F2: <function _unmarshal at 0x101c96c08>
D2: <dict object at 0x102206120>
F1: <function tests at 0x101f8f848>
Co: <code object tests at 0x100474e30, file "dillbugtwo.py", line 16>
D1: <dict object at 0x10031ba20>
D2: <dict object at 0x10222e940>
**********************************************************************
File "dillbugtwo.py", line 19, in __main__.tests
Failed example:
    serialized = pickle.dumps(lambda x: x)
Exception raised:
    Traceback (most recent call last):
      File "/Users/joshrosen/anaconda/lib/python2.7/doctest.py", line 1289, in __run
        compileflags, 1) in test.globs
      File "<doctest __main__.tests[1]>", line 1, in <module>
        serialized = pickle.dumps(lambda x: x)
      File "/Users/joshrosen/anaconda/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 165, in dumps
        dump(obj, file, protocol, byref)
      File "/Users/joshrosen/anaconda/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 158, in dump
        pik.dump(obj)
      File "/Users/joshrosen/anaconda/lib/python2.7/pickle.py", line 224, in dump
        self.save(obj)
      File "/Users/joshrosen/anaconda/lib/python2.7/pickle.py", line 286, in save
        f(self, obj) # Call unbound method with explicit self
      File "/Users/joshrosen/anaconda/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 506, in save_function
        obj.__dict__), obj=obj)
      File "/Users/joshrosen/anaconda/lib/python2.7/pickle.py", line 401, in save_reduce
        save(args)
      File "/Users/joshrosen/anaconda/lib/python2.7/pickle.py", line 286, in save
        f(self, obj) # Call unbound method with explicit self
      File "/Users/joshrosen/anaconda/lib/python2.7/pickle.py", line 562, in save_tuple
        save(element)
      File "/Users/joshrosen/anaconda/lib/python2.7/pickle.py", line 286, in save
        f(self, obj) # Call unbound method with explicit self
      File "/Users/joshrosen/anaconda/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 538, in save_module_dict
        StockPickler.save_dict(pickler, obj)
      File "/Users/joshrosen/anaconda/lib/python2.7/pickle.py", line 649, in save_dict
        self._batch_setitems(obj.iteritems())
      File "/Users/joshrosen/anaconda/lib/python2.7/pickle.py", line 681, in _batch_setitems
        save(v)
      File "/Users/joshrosen/anaconda/lib/python2.7/pickle.py", line 306, in save
        rv = reduce(self.proto)
      File "dillbugtwo.py", line 8, in __reduce__
        raise Exception
    Exception
**********************************************************************
1 items had failures:
   1 of   2 in __main__.tests
***Test Failed*** 1 failures.

add tutorial

add tutorial to cover major features; should be built white-paper style to demonstrate solving a problem or set of problems

decrease import time by not loading "dill.objects"

import time can be significantly reduced by not loading "dill.objects". I believe the easiest way is to convert the local imports of dill.objects to strings that will be "exec'd" upon call of load_types.

Should dill provide an option to not overload the pickle registry?

Importing dill registers pickling handlers. Is it bad behavior that dill cannot be imported without side-effects on another module?

This could lead to subtle bugs, when one library does

import dill as pickle    # only needs dill

while another library uses standard pickle and perhaps relies in
some way on the fact that some objects cannot be pickled.

What about this:

import dill
dill.register_with_pickle()

after dump_session/load_session, source.getsource can confuse lambdas?

I've not been able to reproduce this, but I did observe a interpreter session where there were two lambdas built… and then dill.dump_session / dill.load_session was used… that dill.source.getsource picked up the wrong lambda. Attempts to reproduce this erroneous behavior in a less adhoc way have been unsuccessful.

methods in dill.source should not pollute global namespace

dill.source methods should only spawn new objects in the calling namespace for those that are specifically requested. For example, the following code introduces f, when all that was desired was bar. The code should be "enclosed" in a dummy closure, if nothing else.

>>> def foo(f):
...   def bar(x):
...     return f(x)+x
...   return bar
... 
>>> import math   
>>> zap = foo(math.sin)
>>> import dill
>>> print dill.source.importable(zap)             
from math import sin as f

def bar(x):
  return f(x)+x

dill doctests fail when run as `python -m doctest`

I'm encountering problems running doctests via python -m doctest and nosetests --with-doctest. This seems related to #18 (and also in PySpark, FWIW) but differs in that doctest.testmod() executed inside the module doesn't trigger the problem. Sample code;

import dill
import doctest

def test_dill():
    """
    >>> out = dill.dumps(lambda x: x)
    """

    out = dill.dumps(lambda x: x)

doctest.testmod()

So when executed via python -m doctest under Python 2.7.3, I get a long recursive stacktrace of save_module_dict and save_module calls, concluding with;

      ...
      File "/home/mbaker/venvs/bibframe/local/lib/python2.7/site-packages/dill/dill.py", line 773, in save_module
        state=_main_dict)
      File "/usr/lib/python2.7/pickle.py", line 419, in save_reduce
        save(state)
      File "/usr/lib/python2.7/pickle.py", line 286, in save
        f(self, obj) # Call unbound method with explicit self
      File "/home/mbaker/venvs/bibframe/local/lib/python2.7/site-packages/dill/dill.py", line 504, in save_module_dict
        StockPickler.save_dict(pickler, obj)
      File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
        self._batch_setitems(obj.iteritems())
      File "/usr/lib/python2.7/pickle.py", line 681, in _batch_setitems
        save(v)
      File "/usr/lib/python2.7/pickle.py", line 286, in save
        f(self, obj) # Call unbound method with explicit self
      File "/home/mbaker/venvs/bibframe/local/lib/python2.7/site-packages/dill/dill.py", line 816, in save_type
        StockPickler.save_global(pickler, obj)
      File "/usr/lib/python2.7/pickle.py", line 748, in save_global
        (obj, module, name))
    PicklingError: Can't pickle <class 'unittest.util.Mismatch'>: it's not found as unittest.util.Mismatch

Can't dump_session with inline plots

dump_session() seems to fail when used with ipython notebook set up for inline plots. When the %matplotlib inline magic is used you get a traceback resulting in
PicklingError: Can't pickle 'RendererAgg' object: <RendererAgg object at 0x42a2bb8>
and when using the --pylab=inline option at start up you get a traceback resulting in
PicklingError: Can't pickle <class 'matplotlib.axes.AxesSubplot'>: it's not found as matplotlib.axes.AxesSubplot

Using Dill 0.2a.dev, ipython 1.1.0

Cannot read arguments from terminal use dill

Using the master branch of dill with the lastest commit eb96239, I cannot read arguments from terminal for python script, for example test_args.py.

import sys
import dill

if __name__ == "__main__":
    print sys.argv[1]

and run test_args.py on terminal.

$ python test_args.py d
usage: PROG [-h]
PROG: error: unrecognized arguments: d

When I remove import dill, it works.

$ python test_args.py d
d

I am using python 2.7.3 and on ubuntu 12.04 TLS.

Add Travis tests

It will be easier to keep track of whether or not pull requests break anything if you test things with Travis. What test framework do you prefer to use to run the tests?

istallation issue with easy install and pip

hello i have been trying to install or upgrade pip with easy install but i keep getting this error:
easy_install pip
Traceback (most recent call last):
File "/Users/djibrilkeita/bin/easy_install", line 5, in
from pkg_resources import load_entry_point
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 2603, in
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 666, in require
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 565, in resolve
pkg_resources.DistributionNotFound: distribute==0.6.49

i have deleted all the easy_install versions in usr/bin or usr/local/bin or /Users/djibrilkeita/bin and resintall the setuptools but i still doesn't work any suggestions?

select pickling new-style classes by reference or by code

Generally it's more robust to pickle new-style classes by reference (as pickle does), except in the cases when the class definition is changing (or being deleted)… dill serializes the class definition instead of using a reference. However, for some cases, it may be better to serialize by reference -- the pickle is also much smaller. It would be good to be able to select how the new-style class is serialized.

desired behavior for file handles?

Load of a pickled file handle can create a file. Is that what is desired? If so, what should the file-pointer position in the file be? Right now, the state of the file is preserved, as well as the mode and the position. Here are the consequences...

dude@hilbert>$ python
Python 2.7.8 (default, Jul  3 2014, 05:59:29) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> f = open('test.txt', 'w+')
>>> f.write('hello')
>>> dill.dump_session('test.pkl')
>>> 

Deleting the file and then loading the session simulates going to another computer where test.txt does not exist.

dude@hilbert>$ rm test.txt 
remove test.txt? y
dude@hilbert>$ python      
Python 2.7.8 (default, Jul  3 2014, 05:59:29) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('test.pkl')
>>> f
<open file 'test.txt', mode 'w+' at 0x1096fac90>
>>> f.write('world')
>>> f.seek(0)
>>> f.read()
'\x00\x00\x00\x00\x00world'
>>> 

Similarly, if 'test.txt' existed, but had the contents "goodbye" instead of "hello"… it gets even nastier.

dude@hilbert>$ python
Python 2.7.8 (default, Jul  3 2014, 05:59:29) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> g = open('test.txt', 'r')
>>> g.read()
'goodbye\n'
>>> dill.load_session('test.pkl')
>>> f
<open file 'test.txt', mode 'w+' at 0x100e69d20>
>>> f.write('world')
>>> f.seek(0)
>>> f.read()
'\x00\x00\x00\x00\x00world'
>>> g.seek(0)
>>> g.read()
'\x00\x00\x00\x00\x00world'
>>> 

This indicates that dill is actually serializing more than the file handle, it's in essence serializing the file, but with "filler" if it's creating a new file and the position is greater than zero. Is that the desired behavior? Or better if a new file is created, the position be reset with seek(0)? Or better that if the file doesn't exist, the file handle be closed?

Dill is verbose under nosetests

mrocklin@notebook$ cat testdill.py
import dill

def test_dill():
    assert dill.dumps({'foo': 'bar'})

mrocklin@notebook$ nosetests testdill.py
D2: <dict object at 0x17ed1d0>
.
----------------------------------------------------------------------
Ran 1 test in 0.001s

OK

Note the line output during nosetests.

D2: <dict object at 0x17ed1d0>

For larger examples this swamps my screen. It doesn't seem to happen during normal execution. Perhaps this is going out to stderr?

attributes not pickled in subclasses of numpy.ndarrays

forked from question on issue #13.

Looks like when you import a subclassed numpy.ndarray, it only routes through the StockPickler.

>>> from numpy_new import *
{'color': 'green'}
B2: <built-in function _reconstruct>
T4: <class 'numpy_new.TestArray'>
T4: <type 'numpy.dtype'>
{}
B2: <built-in function _reconstruct>
T4: <class 'numpy_new.TestArray'>
T4: <type 'numpy.dtype'>
{}
{'color': 'green'}
B2: <built-in function _reconstruct>
T4: <class 'numpy_new.TestArray'>
T4: <type 'numpy.dtype'>
{}
B2: <built-in function _reconstruct>
T4: <class 'numpy_new.TestArray'>
T4: <type 'numpy.dtype'>
{}

function pickles are only quasi-deterministic, does this matter?

Pickles of functions (and pickles of things that contain functions, like classes) are not quite deterministic---they depend on iteration order of the _reverse_typemap inside dill.py. Depending on the order, either the symbol "LambdaType" or "FunctionType" will be used to represent functions. Either will work as far as unpickling goes, but having different representations of the same value can cause trouble with e.g. caching.

While most invocations of Python 2.x yield the same iteration order for _reverse_typemap, use of the -R flag (recommended for user-facing services; c.f. http://www.ocert.org/advisories/ocert-2011-003.html) randomizes this order.

Note that the functionality of -R is on by default for versions >= 3.3:
http://docs.python.org/3/whatsnew/3.3.html

dill can't pickle functions wrapped with ipythons @require

If you have a function like this:

@require("module_name")
def require_test(x):
return True

And you try to use IPython parallel's parallel map, you get this error:

  File "/n/home05/kirchner/anaconda/envs/gemini/lib/python2.7/site-packages/IPython/kernel/zmq/serialize.py", line 102, in serialize_object
    buffers.insert(0, pickle.dumps(cobj,-1))
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 419, in save_reduce
    save(state)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/site-packages/dill-0.2a2.dev-py2.7.egg/dill/dill.py", line 443, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 681, in _batch_setitems
    save(v)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 419, in save_reduce
    save(state)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/site-packages/dill-0.2a2.dev-py2.7.egg/dill/dill.py", line 443, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 681, in _batch_setitems
    save(v)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/site-packages/dill-0.2a2.dev-py2.7.egg/dill/dill.py", line 421, in save_function
    obj.func_closure), obj=obj)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 405, in save_reduce
    self.memoize(obj)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 244, in memoize
    assert id(obj) not in self.memo

But if you just use the regular pickle, it works fine. I have a minimal example here:

https://github.com/roryk/ipython-cluster-helper/blob/master/example/example.py

Do you have any thoughts about why? I dug around a bunch but quickly got out of my depth. :)

source.importable fails for double-lambdas

Works when doit and squared are defined in a file. Misidentifies squared when they are built in the interpreter.

>>> doit = lambda f: lambda x: f(x)**2
>>> @doit
... def squared(x):
...   return x
... 

installation issues with pip and easy_install

This is off of a clean environment with Python 2.7.6 and pip.

(test-pathos)mrocklin@linux2:~$ pip install dill
Downloading/unpacking dill
  Could not find a version that satisfies the requirement dill (from versions: 0.2a1, 0.2a1, 0.1a1)
Cleaning up...
No distributions matching the version for dill

Maybe I'm doing something overly naive. This is what I as a novice user would expect to work though.

New release?

I noticed that the version installed via pip install dill (I think 0.1) is broken. The github version seems to work. Any plans on making a new pypi release soon?

Dump of decorated function in module fails

To reproduce:
Place in this code in a file:

def f(func):
    def w(*args):
        return func(*args)
    return w

@f
def f2(): pass

import dill

print(dill.dumps(f2))

In a interactive session or another file:
import (your file name)
I have a crash in both versions of python with the lastest source which looks like:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "./debug.py", line 11, in <module>
    print(dill.dumps(f2))
  File "./dill/dill.py", line 130, in dumps
    dump(obj, file, protocol, byref)
  File "./dill/dill.py", line 123, in dump
    pik.dump(obj)
  File "/usr/lib/python3.3/pickle.py", line 235, in dump
    self.save(obj)
  File "/usr/lib/python3.3/pickle.py", line 297, in save
    f(self, obj) # Call unbound method with explicit self
  File "./dill/dill.py", line 438, in save_function
    obj.__closure__), obj=obj)
  File "/usr/lib/python3.3/pickle.py", line 416, in save_reduce
    self.memoize(obj)
  File "/usr/lib/python3.3/pickle.py", line 255, in memoize
    assert id(obj) not in self.memo
AssertionError

It looks like something to do with pickling the __code__ attribute, then repickling it when doing the __globals__ attribute, as if you comment out the __globals__ attribute in dill.py, there is no crash (just an incomplete pickle, presumably).

Builtin modules do not have the __dict__ pickled

Follow up for #41, number 2:

>>> import dill
>>> import numpy as np
>>> np.min = np.max
>>> dill.dump_session()
>>> ^D
############ restart ############
>>> import dill
>>> dill.load_session()
>>> np.min([1,2,3,4,5])
1

This could be fixed by deciding to pickle a module's __dict__ (https://github.com/uqfoundation/dill/blob/master/dill/dill.py#L763-L764) using a function like:

def _can_pickle_module(mod, _cache={}, _recursion_protection=[None]):
    if _recursion_protection[0] is not None:
        return mod is _recursion_protection[0]
    if mod not in _cache:
        _recursion_protection[0] = mod
        try:
            dumps(mod)
        except:
            _cache[mod] = False
        else:
            _cache[mod] = True
        finally:
            _recursion_protection[0] = None
    return _cache[mod]

and the condition in save_module being:

if _can_pickle_module(obj) or is_dill(pickler) and obj is pickler._main_module:

However, this approach is less efficient. This needs discussion.

dynamically generated class methods fail to pickle

When using full class pickling (e.g. cls.__module__ = '__main__'), pickling
dynamically generated class methods fails with maximum recursion depth exceeded
Looks like a separate issue from #56. Maybe due to handling of the circular references?
Class.__dict__ --> method ; method.im_class --> Class

test case:

import dill
import types

def proto_method(self):
    pass

def make_class(name):
    cls = type(name, (object,), dict())
    setattr(cls, 'methodname', types.MethodType(proto_method, None, cls))
    globals()[name] = cls
    return cls

if __name__ == '__main__':
    NewCls = make_class('NewCls')
    print(dill.pickles(NewCls))

dill.detect.trace:

INFO:dill:T2: <class '__main__.NewCls'>
INFO:dill:F2: <function _create_type at 0x7fe706a281b8>
INFO:dill:T1: <type 'type'>
INFO:dill:F2: <function _load_type at 0x7fe706a28140>
INFO:dill:T1: <type 'object'>
INFO:dill:D2: <dict object at 0x7fe706a31e88>
INFO:dill:Me: <unbound method NewCls.proto_method>
INFO:dill:T1: <type 'instancemethod'>
INFO:dill:F1: <function proto_method at 0x7fe70f2a1a28>
INFO:dill:F2: <function _create_function at 0x7fe706a28230>
INFO:dill:Co: <code object proto_method at 0x7fe710832530, file "/path/test.py", line 4>
INFO:dill:F2: <function _unmarshal at 0x7fe706a280c8>
INFO:dill:D4: <dict object at 0x7fe706a32280>
INFO:dill:D2: <dict object at 0x7fe706a1ab40>
INFO:dill:T2: <class '__main__.NewCls'>
INFO:dill:D2: <dict object at 0x7fe706a22c58>
INFO:dill:Me: <unbound method NewCls.proto_method>
INFO:dill:T2: <class '__main__.NewCls'>
INFO:dill:D2: <dict object at 0x7fe706a20b40>
INFO:dill:Me: <unbound method NewCls.proto_method>
INFO:dill:T2: <class '__main__.NewCls'>
INFO:dill:D2: <dict object at 0x7fe706a22d70>
INFO:dill:Me: <unbound method NewCls.proto_method>
...etc...

Python 3 support

Python 3 support would be great. I recommend using a single codebase. You just have to add a few compatibility definitions (or you can depend on six, but IMHO it's overkill).

is os.devnull good choice for nonexistent files?

From #57, when a file is dumped, then the file is deleted... then pickled file is loaded… and the file_mode is not such that the entire file was pickled (e.g. just the file handle was pickled), a new name is needed. Should it be os.devnull? Look for what python does for similar cases, if possible.

importable vomits entire file on closure built with inner lambda

There's a bug (I'm sure more than one) so that when you make a closure with a lambda as the inner function (as in foo below), dill.source.importable(bar) will puke out the entire history file. So that's not good. The following code reproduces the error.

>>> def foo(f):
...   squared = lambda x: f(x)**2
...   return squared
... 
>>> @foo
... def bar(x):
...   return 2*x
>>> 
>>> print dill.source.importable(bar)

should dump_session accept a list/dict of objects to ignore?

I'm not sure what kind of impact that might have if one would ignore an object... then expect to start up a session again and everything work. Maybe it's not up to dill to care… and it's the user's problem if it blows things up in the dump/load of the session.

Setting of _main_module

I've been using dill via the direct dill.Pickler() and dill.Unpickler() interface. This was crashing because the normal constructors don't set dill.Pickler._main_module and dill.Unpickler._main_module. This seems unusual, and I'm wondering what the rational is for it.

I added the following init() methods to dill.Pickler and dill.Unpickler, and they seemed to solve my problem. Am I missing something here? Should this change be incorporated into the github repo?

### Extend the Picklers
class Pickler(StockPickler):
    """python's Pickler extended to interpreter sessions"""
    dispatch = StockPickler.dispatch.copy()
    _main_module = None
    _session = False
    pass

    def __init__(self, *args, **kwargs):
        StockPickler.__init__(self, *args, **kwargs)
        self._main_module = _main_module

class Unpickler(StockUnpickler):
    """python's Unpickler extended to interpreter sessions and more types"""
    _main_module = None
    _session = False

    def find_class(self, module, name):
        if (module, name) == ('__builtin__', '__main__'):
            return self._main_module.__dict__ #XXX: above set w/save_module_dict
        return StockUnpickler.find_class(self, module, name)
    pass

    def __init__(self):
        StockUnpickler.__init__(self, *args, **kwargs)
        self._main_module = _main_module

dill.source.getsource() does not return expected results in IPython terminal or Notebook

dill.source.getsource(some_function, enclosing=True) appears to return incorrect output when called from within an IPython terminal or Notebook session. The results differ from that of an ordinary Python terminal.

I'm running Python 2.7.7 |Anaconda 2.0.1 (64-bit)| (default, Jun 11 2014, 10:40:02) [MSC v.1
500 64 bit (AMD64)] on Windows 7 professional.

Missing `pickler._session` attribute

Code like this:

import pickle
import dill
...
pickle.dumps(f)

Yields an error like this:

  File "/home/mrocklin/Software/anaconda/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
  File "/home/mrocklin/Software/anaconda/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/home/mrocklin/Software/anaconda/lib/python2.7/pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/mrocklin/Software/anaconda/lib/python2.7/pickle.py", line 419, in save_reduce
    save(state)
  File "/home/mrocklin/Software/anaconda/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/mrocklin/Software/anaconda/lib/python2.7/site-packages/dill/dill.py", line 501, in save_module_dict
    if pickler._session: # we only care about session the first pass thru
AttributeError: Pickler instance has no attribute '_session'
In [1]: import dill

In [2]: dill.__version__ 
Out[2]: '0.2'

dill fails to pickle classes with super, if byref=False

I stumbled over a weird case where dill fails to serialize an object that pickle handles fine. I guess the issues has something to do with inheriting from a builtin type and the usage of super in overriding append.

import pickle
import dill

class InheritsList(list):
    """docstring for InheritsList"""
    def __init__(self):
        super(InheritsList, self).__init__()

    def append(self, obj):
        super(InheritsList, self).append(obj)

obj = InheritsList()
obj.append('string')

# Works
repickled = pickle.loads(pickle.dumps(obj))
print repickled

# Does not work
redilled = dill.loads(dill.dumps(obj))
print repickled

It's related to an issue I posted here.
opencobra/cobrapy#72

dill.source.getblocks should include decorators

I believe inspect.getsource retrieves code blocks with decorators on decorated functions, however dill.source.getblocks doesn't. It should, or at least have the option to do so.

Additionally, there should be an option to return 'enclosing' code (i.e. just the inner function, or the enclosing block as well).

functions defined in doctests pickle differently than in scripts/shell

When pickling functions defined inside of doctests, Dill seems to include additional objects in the function closure even if the function doesn't reference them. This can cause the pickling to fail if some of these unreferenced objects are unpicklable; this also adds bloat to the serialized function.

I'm one of the authors of PySpark, a Python API for the Spark cluster computing framework, and I'm trying to use Dill to replace our current function serializer. We're currently using PiCloud's cloudpickle library, which seems to handle these doctest cases properly; I'd like to switch to Dill because it seems to be more actively developed and handles some cases that cloudpickle doesn't handle properly.

Dill seems to work perfectly from the Python shell, but it's different behavior in doctests is causing our test suite to break (unpicklable Py4J-wrapped Java objects are included in closures, among other issues).

Here's a small standalone testcase that reproduces the issue on Python 2.7.5:

import dill as pickle
import doctest
import logging
logging.basicConfig(level=logging.DEBUG)

class SomeUnreferencedUnpicklableClass(object):
    def __reduce__(self):
        raise Exception

unpicklable = SomeUnreferencedUnpicklableClass()

# This works fine outside of Doctest:
serialized = pickle.dumps(lambda x: x)

# This fails because it tries to pickle the unpicklable object:
def tests():
    """
    >>> serialized = pickle.dumps(lambda x: x)
    """
    return

print "\n\nRunning Doctest:"
doctest.testmod()

Here's the output, which shows that the unpicklable object is being included in the closure when running under doctest:

F1: <function <lambda> at 0x110b65de8>
INFO:dill:F1: <function <lambda> at 0x110b65de8>
T1: <type 'function'>
INFO:dill:T1: <type 'function'>
F2: <function _load_type at 0x110b626e0>
INFO:dill:F2: <function _load_type at 0x110b626e0>
Co: <code object <lambda> at 0x10ff4d5b0, file "unpickleable.py", line 14>
INFO:dill:Co: <code object <lambda> at 0x10ff4d5b0, file "unpickleable.py", line 14>
F2: <function _unmarshal at 0x110b62668>
INFO:dill:F2: <function _unmarshal at 0x110b62668>
D1: <dict object at 0x7f9401c1aee0>
INFO:dill:D1: <dict object at 0x7f9401c1aee0>


Running Doctest:
F1: <function <lambda> at 0x110b6e398>
INFO:dill:F1: <function <lambda> at 0x110b6e398>
T1: <type 'function'>
INFO:dill:T1: <type 'function'>
F2: <function _load_type at 0x110b626e0>
INFO:dill:F2: <function _load_type at 0x110b626e0>
Co: <code object <lambda> at 0x110b664b0, file "<doctest __main__.tests[0]>", line 1>
INFO:dill:Co: <code object <lambda> at 0x110b664b0, file "<doctest __main__.tests[0]>", line 1>
F2: <function _unmarshal at 0x110b62668>
INFO:dill:F2: <function _unmarshal at 0x110b62668>
D2: <dict object at 0x7f9401e930c0>
INFO:dill:D2: <dict object at 0x7f9401e930c0>
F1: <function tests at 0x110b65de8>
INFO:dill:F1: <function tests at 0x110b65de8>
Co: <code object tests at 0x10ff4d630, file "unpickleable.py", line 17>
INFO:dill:Co: <code object tests at 0x10ff4d630, file "unpickleable.py", line 17>
D1: <dict object at 0x7f9401c1aee0>
INFO:dill:D1: <dict object at 0x7f9401c1aee0>
M2: <module 'logging' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/logging/__init__.py'>
INFO:dill:M2: <module 'logging' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/logging/__init__.py'>
F2: <function _import_module at 0x110b62f50>
INFO:dill:F2: <function _import_module at 0x110b62f50>
M2: <module '__builtin__' (built-in)>
INFO:dill:M2: <module '__builtin__' (built-in)>
M2: <module 'doctest' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py'>
INFO:dill:M2: <module 'doctest' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py'>
T2: <class '__main__.SomeUnreferencedUnpicklableClass'>
INFO:dill:T2: <class '__main__.SomeUnreferencedUnpicklableClass'>
F2: <function _create_type at 0x110b62758>
INFO:dill:F2: <function _create_type at 0x110b62758>
T1: <type 'type'>
INFO:dill:T1: <type 'type'>
T1: <type 'object'>
INFO:dill:T1: <type 'object'>
D2: <dict object at 0x7f9402853920>
INFO:dill:D2: <dict object at 0x7f9402853920>
F1: <function __reduce__ at 0x110b6e1b8>
INFO:dill:F1: <function __reduce__ at 0x110b6e1b8>
Co: <code object __reduce__ at 0x10ff4fa30, file "unpickleable.py", line 8>
INFO:dill:Co: <code object __reduce__ at 0x10ff4fa30, file "unpickleable.py", line 8>
D1: <dict object at 0x7f9401c1aee0>
INFO:dill:D1: <dict object at 0x7f9401c1aee0>
M2: <module 'dill' from '/Users/joshrosen/env/lib/python2.7/site-packages/dill/__init__.pyc'>
INFO:dill:M2: <module 'dill' from '/Users/joshrosen/env/lib/python2.7/site-packages/dill/__init__.pyc'>
**********************************************************************
File "unpickleable.py", line 19, in __main__.tests
Failed example:
    serialized = pickle.dumps(lambda x: x)
Exception raised:
    Traceback (most recent call last):
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py", line 1254, in __run
        compileflags, 1) in test.globs
      File "<doctest __main__.tests[0]>", line 1, in <module>
        serialized = pickle.dumps(lambda x: x)
      File "/Users/joshrosen/env/lib/python2.7/site-packages/dill/dill.py", line 121, in dumps
        dump(obj, file, protocol)
      File "/Users/joshrosen/env/lib/python2.7/site-packages/dill/dill.py", line 115, in dump
        pik.dump(obj)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
        self.save(obj)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
        f(self, obj) # Call unbound method with explicit self
      File "/Users/joshrosen/env/lib/python2.7/site-packages/dill/dill.py", line 418, in save_function
        obj.func_closure), obj=obj)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 401, in save_reduce
        save(args)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
        f(self, obj) # Call unbound method with explicit self
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 562, in save_tuple
        save(element)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
        f(self, obj) # Call unbound method with explicit self
      File "/Users/joshrosen/env/lib/python2.7/site-packages/dill/dill.py", line 440, in save_module_dict
        StockPickler.save_dict(pickler, obj)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
        self._batch_setitems(obj.iteritems())
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 681, in _batch_setitems
        save(v)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 306, in save
        rv = reduce(self.proto)
      File "unpickleable.py", line 9, in __reduce__
        raise Exception
    Exception
**********************************************************************
1 items had failures:
   1 of   1 in __main__.tests

dynamic classes not created in __main__ fail to pickle

I would like to generate new classes programatically at runtime using type(), then serialize these (for use on other computers on a cluster). Is there a way to get dill to pickle these?

Strangely, the classes seem to pickle ok if this is done by the __main__ module, but not any other module. minimal test:

classmaker.py:

import dill

def f():
    cls = type('NewCls', (object,), dict())
    print(dill.pickles(cls))

if __name__ == "__main__":
    f()

consumer.py:

import classmaker
classmaker.f()

running these:

$ python classmaker.py
True
$ python consumer.py
False

In the second case the pickling exception is: Can't pickle <class 'classmaker.NewCls'>: it's not found as classmaker.NewCls

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.