Giter Club home page Giter Club logo

numpy-dispatch's Introduction

Get Array Module

This package is an implementation for NumPy's get_array_module proposal. This proposal is written up in NEP 37.

Important Note

The API implemented here includes certain arguments that are not currenlty part of NEP 37. These enable things that may never be part of the final implementation. There are three reasons for this:

  1. The first implementation required explicit Opt-In by the user by either enabling dispatching globally or using a context manager. This has been reversed: A library can require Opt-In, but dispatching is the default.
  2. To allow to see how a transition may look like and can be implemented.
  3. To provide library authors with the flexibility to try various models. It is not clear that library authors should have the option to limit which array-modules are acceptable.
  4. See how context managers on the user-side can be designed to deal with opt-in and transitions.

Both of these could be part of the API, or we could provide help for how to implement them in a specific library, but the default is more likely for them to not be part of it at this time.

Especially point 2 is my own thought, and mostly an option I want library authors to be aware of.

If libraries choose to use option 2., I could imagine it makes sense for them to allow users to extend the "acceptable-array-modules" list manually. That way a library can limit itself to tested array types while not limiting the users to try and use different array-types at their own risk.

For End-Users

End users will not use much of the module, except to enable the behaviour (which is only necessary if a library explicitly requires it!) or control FutureWarnings. In some cases a library is also a (localized) end-user.

For end users, there are thus two functionalities available:

import numpy_dispatch

numpy_dispatch.enable_dispatching_globally()

To Opt-In globally for all libraries which require explicit Opt-In. This function is meant to be called exactly once and only by the end-user. Calling the function more than once currently gives a warning. The global switch cannot be disabled. A user of this flag must be prepared for changes in behaviour e.g. when updating downstream libraries.

Local control is more safe and may be required in some cases. Locally enabling dispatching can be achieved in a thread-safe manner by using:

import numpy_dispatch

with numpy_dispatch.ensure_dispatching():
    dispatching_aware_library_function(arr1, arr2)

We currently assume that disabling dispatching is not necessary (once enabled within a context/globally). Effectively, disabling is only possible by converting all inputs to NumPy arrays.

This control over dispatching behaviour using a context manager may be useful in libraries.

For library authors

Library authors are the main audience for this, the central function is get_array_module. A typical use case may look like this:

import numpy as np

try:
    from numpy_dispatch import get_array_module
except
    # Simply use NumPy if unavailable
    get_array_module = lambda *args, **kwargs: np


def library_function(arr1, arr2, **other_parameters):
    npx = get_array_module(arr1, arr2)

    arr1 = npx.asarray(arr1)
    arr2 = npx.asarray(arr2)

    # use npx instead of all numpy code

If your library has internal helper functions, the best way to write these is probably:

def internal_helper(*args, npx=np):
   # old code, but replace `np` with `npx`.

That way the module is passed around in a safe manner. If your library calls into other libraries which may or may not dispatch (or even dispatch in the future), it is probably best to ensure that all types tidy so that dispatching is not expected to make a difference if enabled. Remember, you do not have control to disable dispatching.

More finegrained control

To enable transition towards allowing certain or all types, there are a few additional options, for example:

npx = get_array_module(*arrays, modules="numpy", future_modules="dask.array")

could be a spelling to say that currently NumPy is fully supported, and dask is supported, but the support will warn (the user can silence the warning).

Giving future_modules=None would allow any and all module to be returned, note that in general modules=None is probably the desired end result.

Please see the below section for details on how a transition can look like currently.

Library authors further have the option to provide the default, in case the library is originally written for something other than NumPy.

Additionally, since some libraries may want to disable the use of dispatching without explicit opt-in as a design choice (or maybe during an experimental stage). These libraries can use opt_in=False to signal this.

Transitioning for users and libraries

The future_modules keyword argument is a way to transition users (enabling new dispatching). When such a transition happens two things change:

  1. When no array-module can be found because none of the array types understands the others, a default has to be returned during a transition phase, this can be done using fallback="warn".
  2. future_modules allows to implement new modules, but not use them by default (give a FutureWarning instead).

In both cases, to opt-out of the change, the user will have to cast their input arrays. To opt-in to the new changes (error in 1., dispatching in 2.), the user can locally use the context manager:

from numpy_dispatch import future_dispatch_behavior

with future_dispatch_behavior():
    library_function_doing_a_transition()

Library during transition

A typical use-case should be a library transitioning from not supporting array-module, to supporting array-module always. This can currently be implemented by using:

npx = get_array_module(*arrays,
            modules="numpy", future_modules=None, fallback="warn")
npx.asarray(arrays[0])  # etc.

For a library that used to just call np.asarray() during transition and

npx = get_array_module(*arrays)

when the transition is done.

There are no additional features to allow an array-library to transition to implementing __array_module__. Such a library will need to give a generic FutureWarning and create its own context manager to opt-in to the new behaviour.

Array object implementors

Please read NEP 37. The only addition here is that module.__name__ has to return a good and stable name to be used by libraries.

Testing!

To allow some (silly) testing there are two functionalities to make it a bit simpler:

from numpy_dispatch import dummy  # will give a warning

# Create a dummy array, that returns its own module, although that
# module effectively is just NumPy. The dummy array accepts only NumPy
# arrays aside from itself.
dummy.array([1, 2, 3])

# Add `__array_module__` which accepts the given types as types that
# it can understand. This example is with cupy (untested), which is
# probably the best module to test:
import cupy
dummy.inject_array_module(cupy.ndarray, (np.ndarray, dummy.DummyArray,),
                          module=cupy)

numpy-dispatch's People

Contributors

seberg avatar shoyer avatar thomasjpfan avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

numpy-dispatch's Issues

Why is end-user opt-in required with this package?

The readme says this was "an early thought when scikit-learn experimented with using this type of functionality"

To me this feels like a layer that should live downstream e.g., in Scikit-Learn with sklearn.set_config. I appreciate that it can make sense for end-users to opt-in to breaking changes, but I think this really is a decision to make on a library-by-library basis.

The problem is that as soon as this global switch is used for compatibility with one library, it's useless for others. For example, suppose sklearn adds experimental dispatch support recommending that users opt-in. If matplotlib adds dispatching later, some of its users will have already opted into dispatching for sklearn, even though they may not have updated their matplotlib code.

That said, NEP 37 should probably add a recommendation for library authors considering using get_array_module() to consider adding their opt-in mechanism. We could probably even include an example of how to do so in a concurrency fashion using contextvars.

What does the alias "onp" stand for?

I agree that a short alias is a good idea to encourage for readable code, but I'm not quite sure where this name comes from.

There is also a loose convention in parts of the JAX codebase (which we are trying to remove) to use "import numpy as onp" to refer to "original" NumPy.

Exposed API is overloaded (too many features)

Creating an issue to be clear about it. The currently exposed API is overloaded and has features we probably do not want or will have to modify. Features related to transition and even the opt-in part are simply here for consideration.

Especially the feature concerning picking certain "supported" modules is here only as a thought experiment by me.

Duck array libraries that support __array_module__

I thought might be worth assembling a list of duck array libraries that support __array_module__.

In particular, both TensorFlow-NumPy and JAX now support __array_module__ and work with numpy-dispatch, but you'll need to install the latest development versions of TensorFlow/JAX to use it.

Injecting does not work for cupy

When trying to inject the array module:

dummy.inject_array_module(cupy.ndarray, (np.ndarray, dummy.DummyArray,),
                          module=cupy)

the following error appears:

numpy_dispatch/dummy.py in inject_array_module(arrtype, known_types, module)
    147         return NotImplemented
    148 
--> 149     arrtype.__array_module__ = __array_module__
    150 

TypeError: can't set attributes of built-in/extension type 'cupy.core.core.ndarray'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.