Describe your issue. When I try to open pickle file containing lis

Untested: <div class="highlight highlight-source-python notranslate position-relat

Thanks for the report <a class="user-mention notranslate" data-hovercard-type="user" d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

thanks for the explanation <a class="user-mention notranslate" data-hovercard-type="us

Query: pickling across SciPy versions about scipy HOT 11 CLOSED

Prateek1410 commented on June 4, 2024

Query: pickling across SciPy versions

from scipy.

Comments (11)

rkern commented on June 4, 2024 3

The module scipy.sparse._arrays was removed in 1.11.0 (the classes there were reorganized to other private modules). Pickles with these sparse arrays created from 1.10 and prior cannot be unpickled with later versions. This is one of the fundamental drawbacks with pickles for long-term storage. The pickle file stores a reference to the precise name of the class (including the module in which it is defined) when it serializes an instance. That means that the unpickler won't be able to find it again if the new version of the library has reorganized the code and moved the class somewhere else. In numpy, it's worth the effort for us to go to extensive lengths to keep old pickles working because it's at the very bottom of the scientific computing stack and we only have a few classes to worry about, it's not sustainable to do this for every class in every library.

from scipy.

andyfaff commented on June 4, 2024 1

Pickling/unpickling works for me with csr_array or csr_matrix (substitute for csr_array) in the following example. Are you trying to pickle/unpickle across scipy versions? That is not guaranteed to work for pickles.

import numpy as np
from scipy.sparse import csr_array
import pickle

A = csr_array([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
B = csr_array([[1, 3, 0], [0, 0, 3], [4, 0, 5]])

l = [A, B]

lp = pickle.dumps(l)

rl = pickle.loads(lp)

rl[0]
print(rl[0])

from scipy.

rkern commented on June 4, 2024 1

Untested:

from pickle import Unpickler


class Scipy_1_10_Unpickler(Unpickler):
    def find_class(self, module_name, global_name):
        if module_name == "scipy.sparse._arrays":
            # This module has been refactored away. All of classes can be found in the public API.
            module_name = "scipy.sparse"
        return super().find_class(module_name, global_name)


def load_old_pickle(fp, **kwds):
    unp = Scipy_1_10_Unpickler(fp, **kwds)
    obj = unp.load()
    return obj

There is no general alternative to pickling that handles this any better. It's an inherent complexity of being able to serialize any class. Class objects know where they are defined (i.e. in scipy.sparse._arrays in 1.10 but in scipy.sparse._coo or one of the other specific modules in 1.11+), but not where they are publicly available (i.e. scipy.sparse) in the current version much less future versions.

If you have important long-term storage needs, you should look into doing the serialization yourself with a file format that you can control; e.g. a format defined on top of HDF5, for example. Use only public APIs (e.g. scipy.sparse.coo_array rather than scipy.sparse._arrays.coo_array) in the load routine, and your file will continue to be loadable while we maintain our promises about that public API (we might deprecate it at some point, but we won't casually move it around like we do for _private modules). This is more work than just throwing a general tool like pickle at your objects, so you have to weigh your needs judiciously.

from scipy.

lucascolley commented on June 4, 2024

Thanks for the report @Prateek1410 , can you include how you are using pickle and csr_matrix? Can you also include the full error traceback?

from scipy.

Prateek1410 commented on June 4, 2024

Sure, thanks for the prompt response Lucas.

ModuleNotFoundError Traceback (most recent call last)
Cell In[3], line 2
1 with open('list_of_100_sparcematrices_of_random_graphs.pkl', 'rb') as file:
----> 2 list_of_sparcematrices_of_random_graphs = pickle.load(file)

ModuleNotFoundError: No module named 'scipy.sparse._arrays'

I had originally generated these 100 random graphs like so:

import networkx as nx
import pickle
from scipy.sparse import csr_matrix

list_of_sparcematrices_of_random_graphs = []
for i in range(100):
    with open('graph.pkl', 'rb') as file:
        graph_to_be_randomized = pickle.load(file)
    random_graph = nx.directed_edge_swap(graph_to_be_randomized, nswap = 15000000, max_tries = 35000000)
    random_sparsematrix = nx.to_scipy_sparse_array(random_graph)
    list_of_sparcematrices_of_random_graphs.append(random_sparsematrix)
    
with open('list_of_100_sparcematrices_of_random_graphs.pkl', 'wb') as f:
    pickle.dump(list_of_sparcematrices_of_random_graphs , f)

from scipy.

lucascolley commented on June 4, 2024

Which version of networkx did you use to generate the graphs?

from scipy.

lucascolley commented on June 4, 2024

@dschult I see you in the blame of nx.to_scipy_sparse_array so you're probably the best person for the job here

from scipy.

rkern commented on June 4, 2024

To work around this in your case, you can subclass Unpickler to override the find_class(module, name) method. Whenever module=="scipy.sparse._arrays", you can redirect it to wherever the class pointed to by name currently resides. Then you can repickle the object with your current version of scipy and carry on (until we muck things up again, but that doesn't happen too often).

from scipy.

lucascolley commented on June 4, 2024

thanks for the explanation @rkern 🙏

from scipy.

Prateek1410 commented on June 4, 2024

Thank you all for your inputs! I can now proceed with my analysis feeling much relieved.

I downgraded scipy to 1.10.0 and the file got loaded in a few seconds.

A small request to @rkern, could you please simplify your suggestion in your last comment? I don't understand what to do.

Lastly, how to avoid this issue in the future? Is there a better alternative to pickling?

from scipy.

Prateek1410 commented on June 4, 2024

Thank you so much for the comprehensive response @rkern

from scipy.

Query: pickling across SciPy versions about scipy HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent