Giter Club home page Giter Club logo

Comments (11)

rkern avatar rkern commented on June 4, 2024 3

The module scipy.sparse._arrays was removed in 1.11.0 (the classes there were reorganized to other private modules). Pickles with these sparse arrays created from 1.10 and prior cannot be unpickled with later versions. This is one of the fundamental drawbacks with pickles for long-term storage. The pickle file stores a reference to the precise name of the class (including the module in which it is defined) when it serializes an instance. That means that the unpickler won't be able to find it again if the new version of the library has reorganized the code and moved the class somewhere else. In numpy, it's worth the effort for us to go to extensive lengths to keep old pickles working because it's at the very bottom of the scientific computing stack and we only have a few classes to worry about, it's not sustainable to do this for every class in every library.

from scipy.

andyfaff avatar andyfaff commented on June 4, 2024 1

Pickling/unpickling works for me with csr_array or csr_matrix (substitute for csr_array) in the following example. Are you trying to pickle/unpickle across scipy versions? That is not guaranteed to work for pickles.

import numpy as np
from scipy.sparse import csr_array
import pickle

A = csr_array([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
B = csr_array([[1, 3, 0], [0, 0, 3], [4, 0, 5]])

l = [A, B]

lp = pickle.dumps(l)

rl = pickle.loads(lp)

rl[0]
print(rl[0])

from scipy.

rkern avatar rkern commented on June 4, 2024 1

Untested:

from pickle import Unpickler


class Scipy_1_10_Unpickler(Unpickler):
    def find_class(self, module_name, global_name):
        if module_name == "scipy.sparse._arrays":
            # This module has been refactored away. All of classes can be found in the public API.
            module_name = "scipy.sparse"
        return super().find_class(module_name, global_name)


def load_old_pickle(fp, **kwds):
    unp = Scipy_1_10_Unpickler(fp, **kwds)
    obj = unp.load()
    return obj

There is no general alternative to pickling that handles this any better. It's an inherent complexity of being able to serialize any class. Class objects know where they are defined (i.e. in scipy.sparse._arrays in 1.10 but in scipy.sparse._coo or one of the other specific modules in 1.11+), but not where they are publicly available (i.e. scipy.sparse) in the current version much less future versions.

If you have important long-term storage needs, you should look into doing the serialization yourself with a file format that you can control; e.g. a format defined on top of HDF5, for example. Use only public APIs (e.g. scipy.sparse.coo_array rather than scipy.sparse._arrays.coo_array) in the load routine, and your file will continue to be loadable while we maintain our promises about that public API (we might deprecate it at some point, but we won't casually move it around like we do for _private modules). This is more work than just throwing a general tool like pickle at your objects, so you have to weigh your needs judiciously.

from scipy.

lucascolley avatar lucascolley commented on June 4, 2024

Thanks for the report @Prateek1410 , can you include how you are using pickle and csr_matrix? Can you also include the full error traceback?

from scipy.

Prateek1410 avatar Prateek1410 commented on June 4, 2024

Sure, thanks for the prompt response Lucas.


ModuleNotFoundError Traceback (most recent call last)
Cell In[3], line 2
1 with open('list_of_100_sparcematrices_of_random_graphs.pkl', 'rb') as file:
----> 2 list_of_sparcematrices_of_random_graphs = pickle.load(file)

ModuleNotFoundError: No module named 'scipy.sparse._arrays'


I had originally generated these 100 random graphs like so:

import networkx as nx
import pickle
from scipy.sparse import csr_matrix

list_of_sparcematrices_of_random_graphs = []
for i in range(100):
    with open('graph.pkl', 'rb') as file:
        graph_to_be_randomized = pickle.load(file)
    random_graph = nx.directed_edge_swap(graph_to_be_randomized, nswap = 15000000, max_tries = 35000000)
    random_sparsematrix = nx.to_scipy_sparse_array(random_graph)
    list_of_sparcematrices_of_random_graphs.append(random_sparsematrix)
    
with open('list_of_100_sparcematrices_of_random_graphs.pkl', 'wb') as f:
    pickle.dump(list_of_sparcematrices_of_random_graphs , f)

from scipy.

lucascolley avatar lucascolley commented on June 4, 2024

Which version of networkx did you use to generate the graphs?

from scipy.

lucascolley avatar lucascolley commented on June 4, 2024

@dschult I see you in the blame of nx.to_scipy_sparse_array so you're probably the best person for the job here

from scipy.

rkern avatar rkern commented on June 4, 2024

To work around this in your case, you can subclass Unpickler to override the find_class(module, name) method. Whenever module=="scipy.sparse._arrays", you can redirect it to wherever the class pointed to by name currently resides. Then you can repickle the object with your current version of scipy and carry on (until we muck things up again, but that doesn't happen too often).

from scipy.

lucascolley avatar lucascolley commented on June 4, 2024

thanks for the explanation @rkern 🙏

from scipy.

Prateek1410 avatar Prateek1410 commented on June 4, 2024

Thank you all for your inputs! I can now proceed with my analysis feeling much relieved.

I downgraded scipy to 1.10.0 and the file got loaded in a few seconds.

A small request to @rkern, could you please simplify your suggestion in your last comment? I don't understand what to do.

Lastly, how to avoid this issue in the future? Is there a better alternative to pickling?

from scipy.

Prateek1410 avatar Prateek1410 commented on June 4, 2024

Thank you so much for the comprehensive response @rkern

from scipy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.