Comments (3)
cupy.partiotion
as well as numpy.partition
belong to a problem domain called unordered partial sort, which is a very similar problem with selection of kth element.
https://en.wikipedia.org/wiki/Selection_algorithm#Unordered_partial_sorting
There are the following algorithms to solve such kind of problems:
- Quickselect
- Median of median
- Heap select
- Introselect
- Radixselect
numpy.partition
uses introselect, which is a hybrid algorithm of quickselect and median of medians and does not paralellize well. For GPU environment with CuPy, radixselect would be the most likely option.
- https://docs.scipy.org/doc/numpy/reference/generated/numpy.partition.html
- moderngpu/moderngpu#10 (comment)
The following papers mention on radixselect:
-
Fast K-selection Algorithms for Graphics Processing Units (http://www.math.grin.edu/~blanchaj/Research/ABGS_KSelection.pdf)
-
Radix Selection Algorithm for the kth Order Statistics (http://csis.pace.edu/~ctappert/srd2012/a4.pdf)
The first paper has its public source code and Tensorflow guys have some discussion on it recently.
from cupy.
I compared the performance of:
- radixselect on CPU
- std::sort & choose on CPU
- thrust::sort & choose on GPU
(Celeron G3900 @ 2.80GHz and GeForce GTX 750 Ti)
Just thrust::sort
& choose may be enough at first because its performance fit well to O(n)
in time complexity.
https://gist.github.com/sonots/7e11231af2f2cc6b1f519ad832632566
$ nvcc radixselect.cu -O2 -std=c++11 -Wno-deprecated-gpu-targets && ./a.out
----- n = 1000, k = 100
radixselect : 59233 Time elapsed 0.000006 sec
std::sort : 59233 Time elapsed 0.000043 sec
thrust::sort: 59233 Time elapsed 0.000583 sec
----- n = 10000, k = 1000
radixselect : 59017 Time elapsed 0.000032 sec
std::sort : 59017 Time elapsed 0.000597 sec
thrust::sort: 59017 Time elapsed 0.000586 sec
----- n = 100000, k = 10000
radixselect : 58992 Time elapsed 0.000347 sec
std::sort : 58992 Time elapsed 0.006836 sec
thrust::sort: 58992 Time elapsed 0.000436 sec
----- n = 1000000, k = 100000
radixselect : 58953 Time elapsed 0.004255 sec
std::sort : 58953 Time elapsed 0.074907 sec
thrust::sort: 58953 Time elapsed 0.002840 sec
----- n = 10000000, k = 1000000
radixselect : 58986 Time elapsed 0.039464 sec
std::sort : 58986 Time elapsed 0.745077 sec
thrust::sort: 58986 Time elapsed 0.023981 sec
----- n = 100000000, k = 10000000
radixselect : 58983 Time elapsed 0.489075 sec
std::sort : 58983 Time elapsed 7.569387 sec
thrust::sort: 58983 Time elapsed 0.235514 sec
from cupy.
According to Fig. 7 in the paper,
- Fast K-selection Algorithms for Graphics Processing Units (http://www.math.grin.edu/~blanchaj/Research/ABGS_KSelection.pdf)
sort & choose has O(N) complexity in time for vector length N
as theoretically shown, and is at least just four times slower than radixselect in case vector length is 2^27.
It seems reasonable to first implement cupy.partition
with naive sort & choose.
from cupy.
Related Issues (20)
- `matmul` VRAM outage that affects ubuntu, but runs okay on ubuntu inside WSL2 HOT 5
- Unexpected shape with cupy.fuse and multiple outputs HOT 1
- bug in distributed/_store.py HOT 1
- `expm` fails to compute matrix exponential of complex matrix
- cuDNN 9 causes build-time error HOT 1
- Pre-populate and ship a Jitify cache? HOT 4
- Support Cuda Stream creation with Priority HOT 3
- [Deleted]
- Typecasting issue HOT 2
- Support cuDNN 9.0 HOT 4
- Jitify cache paths on Windows in Cupy13 HOT 3
- Revisit: Using upstream Thrust complex headers and drop the vendored ones HOT 1
- Failed to import CuPy
- Failed to import CuPy. HOT 2
- How does CuPy work? HOT 1
- Can I launch cupy kernels in C++? HOT 15
- Cannot install CuPy from source on ROCm 6.0.2 HOT 1
- `ndimage.shift` results in incorrect values for inplace for float arrays HOT 2
- Possibly mistaken usage of the `cupyx.jit.thrust.device` execution policy in the JIT context? HOT 3
- Issue with cupy.random.XORWOW when generating random numbers beyond 2.1 billion HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cupy.