Giter Club home page Giter Club logo

Comments (48)

mrocklin avatar mrocklin commented on June 6, 2024 1

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024 1

I wrote up some thoughts on the use of Numba here: http://matthewrocklin.com/blog/work/2018/01/30/the-case-for-numba

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024 1

If we we need to use C-level dynamic data structures like std::vector then my recommendation would be to just write algorithms in straight C and hook them up with Cython rather than writing this sort of logic in Cython itself.

We might also look into stealing logic from Pandas, which I suspect has already had to solve a similar problem.

from sparse.

shoyer avatar shoyer commented on June 6, 2024 1

If we we need to use C-level dynamic data structures like std::vector then my recommendation would be to just write algorithms in straight C

My "straight C" I assume you mean "straight C++"?

Anyways, I agree that straight C++ has its virtues if you want to use C++ data structures.

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

The problem I often have with Numba is its lack of support for many features and the fact that it's limited to a subset of Numpy for fast performance. It does have the advantages of automatic optimization and pure Python, though.

On the other hand, Cython is a superset of Python and can be hand optimized. Of course, I don't have personal experience with either of these, just going off a few blog posts that I read. I'll leave the decision up to you.

from sparse.

nils-werner avatar nils-werner commented on June 6, 2024

Just a warning: be aware of the way cython is handling the installation and build step (see cython#477). They require you to import cython and put code in your setup.py, which creates a chicken-and-egg problem during installation.

I am not saying I am against using cython, as it's probably the best solution. I would just like to see us find a tidy solution that doesn't require weird hacks because of issues we didn't see coming when we started working on this.

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024

@sklam do you happen to know of (or want to produce) a fast radix sort? It would be interesting to see what a numba solution would look like here and how much added hassle it would generate.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

There are a few implementations on this SO question, particularly this answer. However, these will need to be modified as what we really need is an argsort, rather than a sort.

from sparse.

sklam avatar sklam commented on June 6, 2024

@mrocklin , the only radixsort I used is a GPU implementation from CUB. Internally, numba uses quicksort for both np.sort and np.argsort.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

It seems we need hacks (as @nils-werner put it) to actually distribute Cython code without requiring Cython as a dependency. See this link for some methods.

Edit: It seems you need the Numba package installed to even use Numba code, I'd love to be proven wrong though.

from sparse.

sklam avatar sklam commented on June 6, 2024

@mrocklin, difficulty of writing radixsort in numba will depends on the kinds of keys you need. The algorithm itself is simple enough and numba can do bitwise operation very easily. If your keys are representable as fixed-width machine integer types (int32, int64 and no bigger), it is very doable. If the keys are bytearrays, it gets a lot harder.

And yes, I have interest in writing a fast sort algorithms in numba but I don't think I have the bandwidth in the near term. Btw, what's the size of the dataset you're sorting?

@hameerabbasi , the ahead-of-time compilation feature in numba is incomplete (only support a very small and static subset of the language). But, we do have plans to improve that front.

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

It could be uint8, uint16, uint32 or uint64 though.

from sparse.

sklam avatar sklam commented on June 6, 2024

good, these are the easy cases.

from sparse.

nils-werner avatar nils-werner commented on June 6, 2024

I started playing around with it a little bit, writing one version in NumPy, one in a Python loop and one in C++. The results are not exactly amazing yet ;-)

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

You might want to change the base to another power of two. ;-)

from sparse.

nils-werner avatar nils-werner commented on June 6, 2024

np.argsort() is still 2+ times faster

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

We could just use the SciPy implementation in this SO answer. No need to implement ourselves, no need for any kind of dependency. We could also indirectly use it to fix #60 (which I realized yesterday).

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

I was also considering using std::vector and Cython. Not sure if Numba transforms lists into std::vectors.

I think the point where we're failing is we're computing segments of arrays for each modulus in the Numpy version... Each of which is (in and of itself) O(n). And in the C++ version we're allocating too much memory.

from sparse.

rgommers avatar rgommers commented on June 6, 2024

Cython vs. Numba (vs. Pythran, vs. ...) is a hard choice (maturity and better debugging vs. beter performance and a simple decorator - plus a bunch of other considerations). The Pythran maintainer just asked about using Pythran in SciPy, for which I came up with a list of everything that needs to be evaluated: https://mail.python.org/pipermail/scipy-dev/2018-January/022334.html

It seems we need hacks (as @nils-werner put it) to actually distribute Cython code without requiring Cython as a dependency. See this link for some methods.

Distributing generated C code in sdists is not much of a hack, it has been standard practice for many libraries for a long time. At this point I'd say it has become a lot less of a necessity, because there are wheels and conda packages for all the most common platforms.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

The problem with Numba is it doesn't accelerate dicts or lists AFAIK. That's a deal-breaker, because we need these for the most critical parts, and allocating the worst-case makes some things as bad as the worst case.

Another factor to consider is merging upstream with SciPy. I'm not sure how willing they'd be if we had Numba in our codebase.

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024

Cython won't meaningfully update these either. We might hope for a 2-3x speedup, but that requires significant specialization. Instead we might intentionally search for algorithms that can be done with fixed data structures, or we might go straight to C.

FWIW it's not clear to me that we should aim to merge this into the SciPy monolith. However I do think that we should aim to satisfy the same community constraints so that projects like SciPy would be willing to depend on this. The blogpost referred to above was intended to start the conversation of "well, maybe it should be ok to depend on numba today"

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

Actually, it's simple with Cython to wrap std::vector, so the basic purpose of a list is fulfilled. I'd be willing to go straight to C but learning all the Python APIs would take a significant amount of time and effort, Cython sidesteps that.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

FWIW it's not clear to me that we should aim to merge this into the SciPy monolith. However I do think that we should aim to satisfy the same community constraints so that projects like SciPy would be willing to depend on this.

That was basically what I meant. :-)

Edit: To clarify, I'm assuming data structures/basic ops in here, linear algebra etc. there.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

If we we need to use C-level dynamic data structures like std::vector then my recommendation would be to just write algorithms in straight C and hook them up with Cython rather than writing this sort of logic in Cython itself.

That is actually something I had not considered before and is actually pretty viable without diving too much into the Python C API. Thanks for alerting me to it.

Edit:

We might also look into stealing logic from Pandas, which I suspect has already had to solve a similar problem.

Sounds reasonable to think Pandas has a fast radix sort/group by functionality.

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024

If we we need to use C-level dynamic data structures

Although ideally we would find linear-time sorting algorithms for which dynamic data structures are not necessary. I don't know this space particularly well, but I would hope that algorithms that only require fixed-size data structures exist.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

Actually, matching (for broadcasting/binary operations) can get significant speedups from this as well, I'm playing with an implementation over at this branch.

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024

good, these are the easy cases.

@sklam can you recommend an algorithm for an O(n) argsort over unsigned integers?

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

I would hope that algorithms that only require fixed-size data structures exist

Not just that. np.empty/np.zeros would kill speed in Numba. They can't even be dynamically allocated; I'm assuming.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

But if fixed size is an issue; we can always do two passes over the data: One to count and one to actually add data. Of course, this would be slow.

@sklam can you recommend an algorithm for an O(n) argsort over unsigned integers?

The standard algorithm for this is in the title: radix sort. :-) I'm not aware of one without memory allocation or dynamic structures, though.

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024

But if fixed size is an issue; we can always do two passes over the data: One to count and one to actually add data. Of course, this would be slow.

Two passes over an array can be much much faster than lots of allocations in dynamic data structures.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

No harm in trying both and benchmarking. :-)

Edit: Depends on how expensive the passes are; I guess.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

I will add that allocation will still be dynamic, even with multiple passes over the data. In the outer loop of radix sort is where we need to allocate the std::vector or array, not the absolute top.

I guess we could do counting sort but the base for that would be so high that it's practically useless.

Edit: Fixed terms.

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024

Do algorithms for linear-time sorting exist that use only fixed-size data structures and a reasonable number of passes?

from sparse.

nils-werner avatar nils-werner commented on June 6, 2024

My experimental C++ implementation of radixsort uses fixed size data structures (but copies between them for each pass).

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

It does, however, make them bigger than they have to be... I'm not sure how much of a performance penalty this incurs, though. I haven't really tested allocation vs other operations TBH. I usually use GC languages (I know how to do memory management but haven't done much of it) like C# (which optimizes away the cost of multiple closely spaced allocations).

I'm assuming this is the reason there is a low performance for bigger bases in your benchmarks.

May be relevant: https://stackoverflow.com/questions/282926/time-complexity-of-memory-allocation

from sparse.

nils-werner avatar nils-werner commented on June 6, 2024

Since I'm only allocating needlessly large virtual memory, shouldn't the performance penalty be not too significant? Come to think of it, maybe it's not the malloc but I am raising many page faults during the actual sort run though...

I am not that experienced with actual memory management and optimization, and I haven't profiled it in detail, so I don't know.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

That might be the reason, since you're moving between needlessly largely spaced chunks of memory. This does have the advantage of a low allocation cost (take this with a grain of salt), but it does introduce a lot of space between the (actually used part of) chunks (which is usually bad).

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

We could use this, although they're in-place sorts rather than argsorts. (MIT Licensed, so we're good)

They have tuned the bases for integer sorts, and applied other tricks like histogramming.

Edit: This fork is slightly newer.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

Radix sort in Numba needs list[list]. xref numba/numba#2560.

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

Unfortunately, in this case, you would need either list[list] (or list[ndarray] if we used two passes), so that's not really an option right now (we can upper bound the lists and use a static 2-D ndarray, but the upper bound is completely unreasonable).

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024

I haven't looked into radix sort much in the past, so please correct me if I'm saying dumb things, but some references seem to implement radix sort as just many counting sorts, each of which is computable in fixed sized data structures in two passes.

https://www.geeksforgeeks.org/radix-sort/

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

Yes, that works for radix sort, but unfortunately not for radix argsort. You would need to keep an array of indices corresponding to each modulus. In the worst case, that can be len(arr) long for each possible modulus, so len(arr) * base is the total size for fixed data structures. To give an idea of how big the base is, usort uses a base of 2 ** 10 or 2 ** 11 for int64.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

I'm in the process of adding radix sort to NumPy. You can see the PR: numpy/numpy#12586

from sparse.

mrocklin avatar mrocklin commented on June 6, 2024

I'm glad to see it! Hooray for pushing work upstream :) Thanks for the link, I enjoyed reading through the comments.

from sparse.

hameerabbasi avatar hameerabbasi commented on June 6, 2024

Closing this since Radix sort is in NumPy and indexing is implemented.

from sparse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.