Comments (21)
@Hoeze Also, you can control this behavior using the environment variable SPARSE_AUTO_DENSIFY
. Please note that this variable is read once on import time.
from sparse.
By starting python using
SPARSE_AUTO_DENSIFY=1 python
or running
import os
os.environ["SPARSE_AUTO_DENSIFY"] = 1
once Python is running.
from sparse.
That's a tough one... I'd probably avoid returning sparse or dense conditionally from the same routine, as that's a recipe for hard-to-catch bugs in corner cases.
from sparse.
Yeah, last week I would have come out strongly in favor of type stability. As I've been running computations in practice my confidence has decreased somewhat here. I'm still not entirely in favor of optional conversions, but if I'm unable to optimize down some sparse operations it may become an interesting option.
The saving grace of losing type stability is that this may be come less of an issue if the __array_ufunc__
protocol emerges in Numpy 1.13. The type of container may start to matter much less in numpy code in the future.
from sparse.
One issue that I see in the code is that operator.add
for COO
is treated differently from other operators, like operator.mul
, in the way that we do maybe_densify
for add
but not for other operators, and when trying to correct this, a unit test failed. I ran into this when I tried to implement operators.gt
, etc., and wanted to unify behavior across operators and consistently use the same code. Since there is a unit test, I'm assuming someone has a use for it, and I didn't want to change it without asking. We should decide on a concrete strategy to follow:
- We use
maybe_densify
consistently. - We always raise an error.
- We always densify when needed.
- We provide a global flag or option that can be used with
with
similar to other libraries. - We use
maybe_densify
consistently with global flags/options formaybe_densify
.
Whatever we choose, we should bake it into elemwise
and elemwise_binary
(and possibly provide kwargs). Since __array_ufunc__
is now here, it should be okay to lose type stability. I'm in favor of option 5, as it is the most general.
from sparse.
Two other things that would be helpful to think about are:
- Do we auto-densify above a certain sparsity level?
- Do we treat operators like
operator.mul
differently fromoperator.add
where the result is sparse if any one of the operands is sparse?
from sparse.
Paging @mrocklin @nils-werner @jakevdp. Your input would be very welcome.
from sparse.
At the moment I'm not in favor of auto-densification by default. However I do agree that we should make it easy for users to do this on their own. One approach would be to expose something like maybe_densify to the user? Config options with a context manager also seem sensible to me. I don't have confidence here in any direction.
from sparse.
Personally I'm inclined to ignore this issue until it comes up in a few more use cases.
from sparse.
I wanted to implement __array_ufunc__
so would it be okay (for now) to remove the auto-densification in __add__
and the test for it?
from sparse.
If we currently do auto-densification then yes, I think it makes sense to remove it.
An __array_ufunc__
implementation would be quite welcome.
from sparse.
When researching how to do configurations with context managers the proper, thread-safe way, I came accross PEP 567. I'm surprised it doesn't already exist.
When making our own implementation (which I'm somewhat skeptical about), the following problems come to mind:
- Inheriting from the parent thread.
- Making it so race conditions don't make threads override each others' data.
- I considered thread-local storage but that (probably?) won't be inherited by a child thread.
All of this is handled for us by PEP 567. However, if we're not worried about thread safety we can just use a stack.
from sparse.
Am I correct in assuming you talking about threadsafe context managers means you want to do something like
a = sparse.random((20, 20))
with sparse.allow_densify():
a = a + 1
In that case I would seriously warn against that, for exactly the issues you are fearing. Plus, you would have to couple some singleton/module state to the behaviour of your COO
instances, which can be a huge pain: Thread safety, unexpected behaviours, all unittests become much less meaningful etc.
This system of the module having an internal state is what gave us the horrible Matlab/Matplotlib principle where you need to track the state of plt
in order to manipulate your plots:
plt.subplot(2, 1, 1) # All commands after this implicitly target the first subplot
plt.plot(...)
plt.subplot(2, 1, 2) # All commands after this implicitly target the second subplot
plt.plot(...)
plt.show()
Only recently they started introducing (and encouraging) the right way of doing it: instead of the module having a state you get back instances which have states:
f, (a, b) = plt.subplots(2, 1) # a is the first, b the second subplot
a.plot(...)
b.plot(...)
plt.show()
So instead, if possible, I would recommend a solution that keeps the allow_densify
state in the COO
array itself:
a = sparse.random((20, 20))
with a.allow_densify():
a = a + 1
from sparse.
While this is a very good solution, I was thinking more of the use-case where np.ndarray
and COO
can be used interchangeably when we've implemented a significant amount of the API. Then, libraries we might want to do something like this:
# Coming from somewhere else, we don't know where.
a = sparse.random((20, 20))
b = sparse.random((20, 20))
def get_neg_add_plus_one(a, b):
""" In another library not related to us. """
return -(a + b) + 1
Then, even if we allow things like:
with a.allow_densify, b.allow_densify:
c = get_neg_add_plus_one(a, b):
We would have to make sure that (a + b)
has allow_densify
properly set. For arbitrary operations, we would need to maintain a DAG where each allow_densify
node is only set if all parents are set.
I'm not entirely against this approach, I think it's more elegant, but we would need a bit of processing at the densifying stage.
Edit: Not to mention... Memory leaks if we actually store the parents in the child object...
from sparse.
Another thing I was thinking was how to manage mixed sparse-dense operations. For example, if a
is dense and b
is sparse:
a + b
is dense, BUT:a * b
is sparse, and should be handled that way. Becomes even more important ifa
is being broadcasted to a higher dimension.
In this case,
- Auto-densification rules should kick in for
a + b
a * b
should be handled the sparse way.
from sparse.
I got around this by storing the parent configs (just the configs, not the full objects) in each child object using set
. This stores the parent configs, but only of the "root" parents.
I also found a way to handle the issues in my previous comment in #75. My implementation is available in #88. Feedback welcome.
from sparse.
np.asarray(x)
should definitively return a dense numpy array, no matter what type of (possibly sparse) array x
is.
This is also my current problem why I cannot use it with Dask.
from sparse.
Hello,
A design decision was made to disallow it (read the discussion over in #218). If you must have dense arrays, the best thing to do is to map blocks in Dask using .todense()
.
from sparse.
@hameerabbasi Thank you very much, you just saved my day!
from sparse.
Apologies if this is a dumb question but I am having the same issue as described in this thread and was wondering... Would anyone be able to tell me how you changed the SPARSE_AUTO_DENSIFY
environment variable in sparse
?
from sparse.
Ah! Sorry yes, I see! Thank you!
from sparse.
Related Issues (20)
- Unable to install shap on python 3.11 doubt to numba cannot install on Python version 3.11.2 HOT 1
- `einsum` doesn't handle `dtype` and `optimize` kwargs HOT 2
- Proper usage of sparse 3D array: initialization, assignment of values and use in calculations HOT 2
- np.diff equivalent HOT 2
- DeprecationWarning: coords should be an ndarray. This will raise a ValueError in the future. HOT 1
- Test failures in test_tensordot HOT 14
- Correct type hint for sparse COO matrix? HOT 1
- numpy ufuncs can change the compression of a gcxs array HOT 12
- Support `arr.size >= 2 ** 64` as long as each dimension is `< 2 ** 64` HOT 1
- Constructing GCXS from non-canonical `scipy.sparse.csr_matrix` results in wrong results HOT 5
- Support for Fortran order in COO.flatten() HOT 1
- Consider MatRepr for `__repr__` and `_repr_html_` HOT 9
- `GCXS` matmul => slice leads to incorrect results HOT 4
- `TypeError` when accessing `.real` and `.imag` attribute on sparse array of string type
- Installing directly from GitHub fails on Python 3.12 HOT 5
- Some functions seem missing from the top level import HOT 2
- [conda] Sparse version string is not recognized by pip list/check HOT 1
- Segmentation fault on arm64 HOT 11
- Remove runtime dependency on SciPy HOT 1
- Xarray/Numpy/Dask Indexing using a Sparse Array HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparse.