Comments (48)
from sparse.
I wrote up some thoughts on the use of Numba here: http://matthewrocklin.com/blog/work/2018/01/30/the-case-for-numba
from sparse.
If we we need to use C-level dynamic data structures like std::vector
then my recommendation would be to just write algorithms in straight C and hook them up with Cython rather than writing this sort of logic in Cython itself.
We might also look into stealing logic from Pandas, which I suspect has already had to solve a similar problem.
from sparse.
If we we need to use C-level dynamic data structures like std::vector then my recommendation would be to just write algorithms in straight C
My "straight C" I assume you mean "straight C++"?
Anyways, I agree that straight C++ has its virtues if you want to use C++ data structures.
from sparse.
from sparse.
The problem I often have with Numba is its lack of support for many features and the fact that it's limited to a subset of Numpy for fast performance. It does have the advantages of automatic optimization and pure Python, though.
On the other hand, Cython is a superset of Python and can be hand optimized. Of course, I don't have personal experience with either of these, just going off a few blog posts that I read. I'll leave the decision up to you.
from sparse.
Just a warning: be aware of the way cython is handling the installation and build step (see cython#477). They require you to import cython
and put code in your setup.py
, which creates a chicken-and-egg problem during installation.
I am not saying I am against using cython, as it's probably the best solution. I would just like to see us find a tidy solution that doesn't require weird hacks because of issues we didn't see coming when we started working on this.
from sparse.
@sklam do you happen to know of (or want to produce) a fast radix sort? It would be interesting to see what a numba solution would look like here and how much added hassle it would generate.
from sparse.
There are a few implementations on this SO question, particularly this answer. However, these will need to be modified as what we really need is an argsort, rather than a sort.
from sparse.
@mrocklin , the only radixsort I used is a GPU implementation from CUB. Internally, numba uses quicksort for both np.sort
and np.argsort
.
from sparse.
It seems we need hacks (as @nils-werner put it) to actually distribute Cython code without requiring Cython as a dependency. See this link for some methods.
Edit: It seems you need the Numba package installed to even use Numba code, I'd love to be proven wrong though.
from sparse.
@mrocklin, difficulty of writing radixsort in numba will depends on the kinds of keys you need. The algorithm itself is simple enough and numba can do bitwise operation very easily. If your keys are representable as fixed-width machine integer types (int32, int64 and no bigger), it is very doable. If the keys are bytearrays, it gets a lot harder.
And yes, I have interest in writing a fast sort algorithms in numba but I don't think I have the bandwidth in the near term. Btw, what's the size of the dataset you're sorting?
@hameerabbasi , the ahead-of-time compilation feature in numba is incomplete (only support a very small and static subset of the language). But, we do have plans to improve that front.
from sparse.
from sparse.
It could be uint8
, uint16
, uint32
or uint64
though.
from sparse.
good, these are the easy cases.
from sparse.
I started playing around with it a little bit, writing one version in NumPy, one in a Python loop and one in C++. The results are not exactly amazing yet ;-)
from sparse.
You might want to change the base to another power of two. ;-)
from sparse.
np.argsort()
is still 2+ times faster
from sparse.
We could just use the SciPy implementation in this SO answer. No need to implement ourselves, no need for any kind of dependency. We could also indirectly use it to fix #60 (which I realized yesterday).
from sparse.
I was also considering using std::vector
and Cython. Not sure if Numba transforms lists into std::vector
s.
I think the point where we're failing is we're computing segments of arrays for each modulus in the Numpy version... Each of which is (in and of itself) O(n)
. And in the C++ version we're allocating too much memory.
from sparse.
Cython vs. Numba (vs. Pythran, vs. ...) is a hard choice (maturity and better debugging vs. beter performance and a simple decorator - plus a bunch of other considerations). The Pythran maintainer just asked about using Pythran in SciPy, for which I came up with a list of everything that needs to be evaluated: https://mail.python.org/pipermail/scipy-dev/2018-January/022334.html
It seems we need hacks (as @nils-werner put it) to actually distribute Cython code without requiring Cython as a dependency. See this link for some methods.
Distributing generated C code in sdist
s is not much of a hack, it has been standard practice for many libraries for a long time. At this point I'd say it has become a lot less of a necessity, because there are wheels and conda packages for all the most common platforms.
from sparse.
The problem with Numba is it doesn't accelerate dict
s or list
s AFAIK. That's a deal-breaker, because we need these for the most critical parts, and allocating the worst-case makes some things as bad as the worst case.
Another factor to consider is merging upstream with SciPy. I'm not sure how willing they'd be if we had Numba in our codebase.
from sparse.
Cython won't meaningfully update these either. We might hope for a 2-3x speedup, but that requires significant specialization. Instead we might intentionally search for algorithms that can be done with fixed data structures, or we might go straight to C.
FWIW it's not clear to me that we should aim to merge this into the SciPy monolith. However I do think that we should aim to satisfy the same community constraints so that projects like SciPy would be willing to depend on this. The blogpost referred to above was intended to start the conversation of "well, maybe it should be ok to depend on numba today"
from sparse.
Actually, it's simple with Cython to wrap std::vector
, so the basic purpose of a list is fulfilled. I'd be willing to go straight to C but learning all the Python APIs would take a significant amount of time and effort, Cython sidesteps that.
from sparse.
FWIW it's not clear to me that we should aim to merge this into the SciPy monolith. However I do think that we should aim to satisfy the same community constraints so that projects like SciPy would be willing to depend on this.
That was basically what I meant. :-)
Edit: To clarify, I'm assuming data structures/basic ops in here, linear algebra etc. there.
from sparse.
If we we need to use C-level dynamic data structures like std::vector then my recommendation would be to just write algorithms in straight C and hook them up with Cython rather than writing this sort of logic in Cython itself.
That is actually something I had not considered before and is actually pretty viable without diving too much into the Python C API. Thanks for alerting me to it.
Edit:
We might also look into stealing logic from Pandas, which I suspect has already had to solve a similar problem.
Sounds reasonable to think Pandas has a fast radix sort/group by functionality.
from sparse.
If we we need to use C-level dynamic data structures
Although ideally we would find linear-time sorting algorithms for which dynamic data structures are not necessary. I don't know this space particularly well, but I would hope that algorithms that only require fixed-size data structures exist.
from sparse.
Actually, matching (for broadcasting/binary operations) can get significant speedups from this as well, I'm playing with an implementation over at this branch.
from sparse.
good, these are the easy cases.
@sklam can you recommend an algorithm for an O(n) argsort over unsigned integers?
from sparse.
I would hope that algorithms that only require fixed-size data structures exist
Not just that. np.empty/np.zeros
would kill speed in Numba. They can't even be dynamically allocated; I'm assuming.
from sparse.
But if fixed size is an issue; we can always do two passes over the data: One to count and one to actually add data. Of course, this would be slow.
@sklam can you recommend an algorithm for an O(n) argsort over unsigned integers?
The standard algorithm for this is in the title: radix sort. :-) I'm not aware of one without memory allocation or dynamic structures, though.
from sparse.
But if fixed size is an issue; we can always do two passes over the data: One to count and one to actually add data. Of course, this would be slow.
Two passes over an array can be much much faster than lots of allocations in dynamic data structures.
from sparse.
No harm in trying both and benchmarking. :-)
Edit: Depends on how expensive the passes are; I guess.
from sparse.
I will add that allocation will still be dynamic, even with multiple passes over the data. In the outer loop of radix sort is where we need to allocate the std::vector
or array, not the absolute top.
I guess we could do counting sort but the base for that would be so high that it's practically useless.
Edit: Fixed terms.
from sparse.
Do algorithms for linear-time sorting exist that use only fixed-size data structures and a reasonable number of passes?
from sparse.
My experimental C++ implementation of radixsort uses fixed size data structures (but copies between them for each pass).
from sparse.
It does, however, make them bigger than they have to be... I'm not sure how much of a performance penalty this incurs, though. I haven't really tested allocation vs other operations TBH. I usually use GC languages (I know how to do memory management but haven't done much of it) like C# (which optimizes away the cost of multiple closely spaced allocations).
I'm assuming this is the reason there is a low performance for bigger bases in your benchmarks.
May be relevant: https://stackoverflow.com/questions/282926/time-complexity-of-memory-allocation
from sparse.
Since I'm only allocating needlessly large virtual memory, shouldn't the performance penalty be not too significant? Come to think of it, maybe it's not the malloc but I am raising many page faults during the actual sort run though...
I am not that experienced with actual memory management and optimization, and I haven't profiled it in detail, so I don't know.
from sparse.
That might be the reason, since you're moving between needlessly largely spaced chunks of memory. This does have the advantage of a low allocation cost (take this with a grain of salt), but it does introduce a lot of space between the (actually used part of) chunks (which is usually bad).
from sparse.
We could use this, although they're in-place sorts rather than argsorts. (MIT Licensed, so we're good)
They have tuned the bases for integer sorts, and applied other tricks like histogramming.
Edit: This fork is slightly newer.
from sparse.
Radix sort in Numba needs list[list]
. xref numba/numba#2560.
from sparse.
from sparse.
Unfortunately, in this case, you would need either list[list]
(or list[ndarray]
if we used two passes), so that's not really an option right now (we can upper bound the lists and use a static 2-D ndarray
, but the upper bound is completely unreasonable).
from sparse.
I haven't looked into radix sort much in the past, so please correct me if I'm saying dumb things, but some references seem to implement radix sort as just many counting sorts, each of which is computable in fixed sized data structures in two passes.
https://www.geeksforgeeks.org/radix-sort/
from sparse.
Yes, that works for radix sort, but unfortunately not for radix argsort. You would need to keep an array of indices corresponding to each modulus. In the worst case, that can be len(arr)
long for each possible modulus, so len(arr) * base
is the total size for fixed data structures. To give an idea of how big the base is, usort
uses a base of 2 ** 10
or 2 ** 11
for int64
.
from sparse.
I'm in the process of adding radix sort to NumPy. You can see the PR: numpy/numpy#12586
from sparse.
I'm glad to see it! Hooray for pushing work upstream :) Thanks for the link, I enjoyed reading through the comments.
from sparse.
Closing this since Radix sort is in NumPy and indexing is implemented.
from sparse.
Related Issues (20)
- Test failures in test_tensordot HOT 14
- Correct type hint for sparse COO matrix? HOT 1
- numpy ufuncs can change the compression of a gcxs array HOT 12
- Support `arr.size >= 2 ** 64` as long as each dimension is `< 2 ** 64` HOT 1
- Constructing GCXS from non-canonical `scipy.sparse.csr_matrix` results in wrong results HOT 5
- Support for Fortran order in COO.flatten() HOT 1
- Consider MatRepr for `__repr__` and `_repr_html_` HOT 9
- `GCXS` matmul => slice leads to incorrect results HOT 4
- `TypeError` when accessing `.real` and `.imag` attribute on sparse array of string type
- Installing directly from GitHub fails on Python 3.12 HOT 5
- Some functions seem missing from the top level import HOT 2
- [conda] Sparse version string is not recognized by pip list/check HOT 1
- Segmentation fault on arm64 HOT 11
- Remove runtime dependency on SciPy HOT 1
- Xarray/Numpy/Dask Indexing using a Sparse Array HOT 6
- Faster matmul for case sparse.coo * np.ndarray HOT 1
- Memory usage compared to scipy.sparse.coo_matrix HOT 1
- `to_coo` raises DeprecationWarning HOT 1
- NumPy 2.0 support HOT 3
- ``RecursionError`` using `numpy.real`/`numpy.imag` on ``sparse.COO`` array HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparse.