Operations on sparse arrays sometimes produce dense arrays. This type instability can

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

By starting python using <div class="snippet-clipboard-content notranslate positio

Paging <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Automatic conversion to dense arrays about sparse HOT 21 CLOSED

pydata commented on May 27, 2024

Automatic conversion to dense arrays

from sparse.

Comments (21)

hameerabbasi commented on May 27, 2024 1

@Hoeze Also, you can control this behavior using the environment variable SPARSE_AUTO_DENSIFY. Please note that this variable is read once on import time.

from sparse.

nils-werner commented on May 27, 2024 1

By starting python using

SPARSE_AUTO_DENSIFY=1 python

or running

import os
os.environ["SPARSE_AUTO_DENSIFY"] = 1

once Python is running.

from sparse.

jakevdp commented on May 27, 2024

That's a tough one... I'd probably avoid returning sparse or dense conditionally from the same routine, as that's a recipe for hard-to-catch bugs in corner cases.

from sparse.

mrocklin commented on May 27, 2024

Yeah, last week I would have come out strongly in favor of type stability. As I've been running computations in practice my confidence has decreased somewhat here. I'm still not entirely in favor of optional conversions, but if I'm unable to optimize down some sparse operations it may become an interesting option.

The saving grace of losing type stability is that this may be come less of an issue if the __array_ufunc__ protocol emerges in Numpy 1.13. The type of container may start to matter much less in numpy code in the future.

from sparse.

hameerabbasi commented on May 27, 2024

One issue that I see in the code is that operator.add for COO is treated differently from other operators, like operator.mul, in the way that we do maybe_densify for add but not for other operators, and when trying to correct this, a unit test failed. I ran into this when I tried to implement operators.gt, etc., and wanted to unify behavior across operators and consistently use the same code. Since there is a unit test, I'm assuming someone has a use for it, and I didn't want to change it without asking. We should decide on a concrete strategy to follow:

We use maybe_densify consistently.
We always raise an error.
We always densify when needed.
We provide a global flag or option that can be used with with similar to other libraries.
We use maybe_densify consistently with global flags/options for maybe_densify.

Whatever we choose, we should bake it into elemwise and elemwise_binary (and possibly provide kwargs). Since __array_ufunc__ is now here, it should be okay to lose type stability. I'm in favor of option 5, as it is the most general.

from sparse.

hameerabbasi commented on May 27, 2024

Two other things that would be helpful to think about are:

Do we auto-densify above a certain sparsity level?
Do we treat operators like operator.mul differently from operator.add where the result is sparse if any one of the operands is sparse?

from sparse.

hameerabbasi commented on May 27, 2024

Paging @mrocklin @nils-werner @jakevdp. Your input would be very welcome.

from sparse.

mrocklin commented on May 27, 2024

At the moment I'm not in favor of auto-densification by default. However I do agree that we should make it easy for users to do this on their own. One approach would be to expose something like maybe_densify to the user? Config options with a context manager also seem sensible to me. I don't have confidence here in any direction.

from sparse.

mrocklin commented on May 27, 2024

Personally I'm inclined to ignore this issue until it comes up in a few more use cases.

from sparse.

hameerabbasi commented on May 27, 2024

I wanted to implement __array_ufunc__ so would it be okay (for now) to remove the auto-densification in __add__ and the test for it?

from sparse.

mrocklin commented on May 27, 2024

If we currently do auto-densification then yes, I think it makes sense to remove it.

An __array_ufunc__ implementation would be quite welcome.

from sparse.

hameerabbasi commented on May 27, 2024

When researching how to do configurations with context managers the proper, thread-safe way, I came accross PEP 567. I'm surprised it doesn't already exist.

When making our own implementation (which I'm somewhat skeptical about), the following problems come to mind:

Inheriting from the parent thread.
Making it so race conditions don't make threads override each others' data.
I considered thread-local storage but that (probably?) won't be inherited by a child thread.

All of this is handled for us by PEP 567. However, if we're not worried about thread safety we can just use a stack.

from sparse.

nils-werner commented on May 27, 2024

Am I correct in assuming you talking about threadsafe context managers means you want to do something like

a = sparse.random((20, 20))

with sparse.allow_densify():
    a = a + 1

In that case I would seriously warn against that, for exactly the issues you are fearing. Plus, you would have to couple some singleton/module state to the behaviour of your COO instances, which can be a huge pain: Thread safety, unexpected behaviours, all unittests become much less meaningful etc.

This system of the module having an internal state is what gave us the horrible Matlab/Matplotlib principle where you need to track the state of plt in order to manipulate your plots:

plt.subplot(2, 1, 1)  # All commands after this implicitly target the first subplot
plt.plot(...)
plt.subplot(2, 1, 2)  # All commands after this implicitly target the second subplot
plt.plot(...)
plt.show()

Only recently they started introducing (and encouraging) the right way of doing it: instead of the module having a state you get back instances which have states:

f, (a, b) = plt.subplots(2, 1)  # a is the first, b the second subplot
a.plot(...)
b.plot(...)
plt.show()

So instead, if possible, I would recommend a solution that keeps the allow_densify state in the COO array itself:

a = sparse.random((20, 20))

with a.allow_densify():
    a = a + 1

from sparse.

hameerabbasi commented on May 27, 2024

While this is a very good solution, I was thinking more of the use-case where np.ndarray and COO can be used interchangeably when we've implemented a significant amount of the API. Then, libraries we might want to do something like this:

# Coming from somewhere else, we don't know where.
a = sparse.random((20, 20))
b = sparse.random((20, 20))

def get_neg_add_plus_one(a, b):
   """ In another library not related to us. """
    return -(a + b) + 1

Then, even if we allow things like:

with a.allow_densify, b.allow_densify:
    c = get_neg_add_plus_one(a, b):

We would have to make sure that (a + b) has allow_densify properly set. For arbitrary operations, we would need to maintain a DAG where each allow_densify node is only set if all parents are set.

I'm not entirely against this approach, I think it's more elegant, but we would need a bit of processing at the densifying stage.

Edit: Not to mention... Memory leaks if we actually store the parents in the child object...

from sparse.

hameerabbasi commented on May 27, 2024

Another thing I was thinking was how to manage mixed sparse-dense operations. For example, if a is dense and b is sparse:

a + b is dense, BUT:
a * b is sparse, and should be handled that way. Becomes even more important if a is being broadcasted to a higher dimension.

In this case,

Auto-densification rules should kick in for a + b
a * b should be handled the sparse way.

from sparse.

hameerabbasi commented on May 27, 2024

I got around this by storing the parent configs (just the configs, not the full objects) in each child object using set. This stores the parent configs, but only of the "root" parents.

I also found a way to handle the issues in my previous comment in #75. My implementation is available in #88. Feedback welcome.

from sparse.

Hoeze commented on May 27, 2024

np.asarray(x) should definitively return a dense numpy array, no matter what type of (possibly sparse) array x is.
This is also my current problem why I cannot use it with Dask.

from sparse.

hameerabbasi commented on May 27, 2024

Hello,

A design decision was made to disallow it (read the discussion over in #218). If you must have dense arrays, the best thing to do is to map blocks in Dask using .todense().

from sparse.

Hoeze commented on May 27, 2024

@hameerabbasi Thank you very much, you just saved my day!

from sparse.

TomMaullin commented on May 27, 2024

Apologies if this is a dumb question but I am having the same issue as described in this thread and was wondering... Would anyone be able to tell me how you changed the SPARSE_AUTO_DENSIFY environment variable in sparse?

from sparse.

TomMaullin commented on May 27, 2024

Ah! Sorry yes, I see! Thank you!

from sparse.

Automatic conversion to dense arrays about sparse HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent