Comments (11)
The module scipy.sparse._arrays
was removed in 1.11.0 (the classes there were reorganized to other private modules). Pickles with these sparse arrays created from 1.10 and prior cannot be unpickled with later versions. This is one of the fundamental drawbacks with pickles for long-term storage. The pickle file stores a reference to the precise name of the class (including the module in which it is defined) when it serializes an instance. That means that the unpickler won't be able to find it again if the new version of the library has reorganized the code and moved the class somewhere else. In numpy, it's worth the effort for us to go to extensive lengths to keep old pickles working because it's at the very bottom of the scientific computing stack and we only have a few classes to worry about, it's not sustainable to do this for every class in every library.
from scipy.
Pickling/unpickling works for me with csr_array or csr_matrix (substitute for csr_array) in the following example. Are you trying to pickle/unpickle across scipy versions? That is not guaranteed to work for pickles.
import numpy as np
from scipy.sparse import csr_array
import pickle
A = csr_array([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
B = csr_array([[1, 3, 0], [0, 0, 3], [4, 0, 5]])
l = [A, B]
lp = pickle.dumps(l)
rl = pickle.loads(lp)
rl[0]
print(rl[0])
from scipy.
Untested:
from pickle import Unpickler
class Scipy_1_10_Unpickler(Unpickler):
def find_class(self, module_name, global_name):
if module_name == "scipy.sparse._arrays":
# This module has been refactored away. All of classes can be found in the public API.
module_name = "scipy.sparse"
return super().find_class(module_name, global_name)
def load_old_pickle(fp, **kwds):
unp = Scipy_1_10_Unpickler(fp, **kwds)
obj = unp.load()
return obj
There is no general alternative to pickling that handles this any better. It's an inherent complexity of being able to serialize any class. Class objects know where they are defined (i.e. in scipy.sparse._arrays
in 1.10 but in scipy.sparse._coo
or one of the other specific modules in 1.11+), but not where they are publicly available (i.e. scipy.sparse
) in the current version much less future versions.
If you have important long-term storage needs, you should look into doing the serialization yourself with a file format that you can control; e.g. a format defined on top of HDF5, for example. Use only public APIs (e.g. scipy.sparse.coo_array
rather than scipy.sparse._arrays.coo_array
) in the load routine, and your file will continue to be loadable while we maintain our promises about that public API (we might deprecate it at some point, but we won't casually move it around like we do for _private
modules). This is more work than just throwing a general tool like pickle at your objects, so you have to weigh your needs judiciously.
from scipy.
Thanks for the report @Prateek1410 , can you include how you are using pickle
and csr_matrix
? Can you also include the full error traceback?
from scipy.
Sure, thanks for the prompt response Lucas.
ModuleNotFoundError Traceback (most recent call last)
Cell In[3], line 2
1 with open('list_of_100_sparcematrices_of_random_graphs.pkl', 'rb') as file:
----> 2 list_of_sparcematrices_of_random_graphs = pickle.load(file)
ModuleNotFoundError: No module named 'scipy.sparse._arrays'
I had originally generated these 100 random graphs like so:
import networkx as nx
import pickle
from scipy.sparse import csr_matrix
list_of_sparcematrices_of_random_graphs = []
for i in range(100):
with open('graph.pkl', 'rb') as file:
graph_to_be_randomized = pickle.load(file)
random_graph = nx.directed_edge_swap(graph_to_be_randomized, nswap = 15000000, max_tries = 35000000)
random_sparsematrix = nx.to_scipy_sparse_array(random_graph)
list_of_sparcematrices_of_random_graphs.append(random_sparsematrix)
with open('list_of_100_sparcematrices_of_random_graphs.pkl', 'wb') as f:
pickle.dump(list_of_sparcematrices_of_random_graphs , f)
from scipy.
Which version of networkx did you use to generate the graphs?
from scipy.
@dschult I see you in the blame of nx.to_scipy_sparse_array
so you're probably the best person for the job here
from scipy.
To work around this in your case, you can subclass Unpickler
to override the find_class(module, name)
method. Whenever module=="scipy.sparse._arrays"
, you can redirect it to wherever the class pointed to by name
currently resides. Then you can repickle the object with your current version of scipy and carry on (until we muck things up again, but that doesn't happen too often).
from scipy.
thanks for the explanation @rkern 🙏
from scipy.
Thank you all for your inputs! I can now proceed with my analysis feeling much relieved.
I downgraded scipy to 1.10.0 and the file got loaded in a few seconds.
A small request to @rkern, could you please simplify your suggestion in your last comment? I don't understand what to do.
Lastly, how to avoid this issue in the future? Is there a better alternative to pickling?
from scipy.
Thank you so much for the comprehensive response @rkern
from scipy.
Related Issues (20)
- BUG: ``optimize.nnls`` sometimes fails when input ``A`` is a Fortran order array HOT 4
- BUG: sparse.csgraph.minimum_spanning_tree: 64bit indices unsupported
- BUG: stats.levy_stable.rvs() ignores "S0"-parameterization
- BUG: SLSQP Inequality constraints incompatible when no constraints are present or constraints are respected HOT 7
- BUG: Numerically incorrect `scipy.signal.savgol_coeffs` (also affects `scipy.signal.savgol_filter`) HOT 6
- ENH: Add option to pass a single function to compute fun and jac in scipy.optimize.least_squares HOT 5
- query: scipy.signal.ShortTimeFFT import does not work HOT 1
- BUG: special.fdtri: inaccurate results for extreme arguments (exposed via scipy.stats.f.ppf) HOT 3
- DOC: Discourse release notes character limit HOT 2
- BUG: first shared library in scipy fails to be consumed by MSVC HOT 1
- BUG: sparse: testcase test_array_api_deprecations fails on Ubuntu 22.04 HOT 1
- DOC: update testing docs for alternative backends HOT 5
- ENH: sparse.csr_matrix: parallelize dot product HOT 2
- BUG: signal.iirfilter: returns NaNs
- MAINT/DEV: improve command line clarity of `dev.py` HOT 1
- DOC: stats.bootstrap: improve documentation multidimensional input HOT 14
- BUG: Library not loaded: @rpath/libgfortran.5.dylib for scipy v1.14.0rc1 x86 macos wheels HOT 21
- TYP: `_lib.doccer`: add type annotations HOT 2
- BUG/BLD: scipy-1.13.1 fails to build with msvc HOT 2
- BUG: sparse.csgraph: Test failures with sparse 0.15.1 installed. HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scipy.