Comments (12)
That's a good question. As mentioned in the select
documentation it's mostly used for flattened indices. What I mean by that is when you run something like find
on a tensor, it must be able to work on arbitrary rank tensors. But we can't return an arbitrary for indices. Instead, it returns a 1D vector of indices that map to a flattened view of a tensor. So normally a 3x2 tensor that has the following indices:
[[ (0,0), (0,1), (0,2)],
[(1,0), (1,1), (1,2)]]
Would map to a flattened view like:
[[ 0, 1, 2],
[3, 4, 5]]
This flattened view is useful because it can be used with select
to pull out values for functions that need that format.
remap
is different in that it allows you to select values from an operator based on one or more dimensions, and each dimension has an operator/tensor input that you want to select from. Notably it does not need the flattened view and is likely a bit more intuitive and closer to Python. You can think of remap
as Python's selecting but you need to add the extra template parameter for the dimension you want to select from.
from matx.
Hi @HugoPhibbs , this can be done with select
: https://nvidia.github.io/MatX/api/manipulation/selecting/select.html#select
Let me know if that works
from matx.
Can you explain what preprocessing you need for remap? It does exactly this:
(tov = remap<0>(tiv, idx)).run(exec);
The above remaps dimension 0 of the tensors tiv by the index tensor idx and assigns it to the output tensor tov.
from matx.
@cliffburdick I'm doing something a bit like this:
// X (indices) has shape [20], Y (data) has shape [10, 3]
auto X_batch = matx::slice(X, {0}, {9});
matx::print(X_batch); // Prints fine, matrix of shape [10]
auto Y_batch = matx::make_tensor<float>({10, 3})
(Y_batch = matx::select(Y, X_batch)).run(cudaStream); // Fails here
matx::print(Y_batch);
Which prints the error (which honestly I don't know how to interpret):
unknown file: Failure
C++ exception with description "matxException (matxInvalidSize: size == 0 || size == Size(i): incompatib" thrown in the test body.
How does select work? - Does the index go along the first dimension? Say you had a idx tensor of shape [n], and a data tensor of shape [a, b, c], is the select output going to be of shape [n, b, c]?
from matx.
Hi @HugoPhibbs, it's not clear to my from your example what you're trying to select. But let's assume X_batch
is a 1D vector with values {0, 5, 7, 8}. To select rows {0, 5, 7, 8} from Y
you would do:
auto Y_batch = matx::make_tensor<float>({4, 3}); // 4 Since we're selecting 4 values
(Y_batch = remap<0>(Y, X_batch)).run(cudaStream);
After this expression Y_batch
will contain rows {0, 5, 7, 8} from Y
and all columns for those rows.
from matx.
@cliffburdick thx for reply. In this case, why are we using remap and not select? Whats the use cases for each?
from matx.
Hi again. I wasn't sure whether to make a new issue for this, or just add a comment, so I'm just gonna do it here...
So i've done some more implementations, and I gotta ask: how lightweight is remap
? - I know you've said before that all operators are inherently light weight. But I have this issue in my code where repmap is taking surprisingly long. The code looks like this:
// BBatch_t flattened is approx 2*300*2*2000 = 2_400_000, X_t has approx shape (70_000, 800)
// FYI tu is a module with timing functions
start = tu::timeNow();
(XBatch_t = matx::remap<0>(X_t, matx::flatten(BBatch_t))).run();
cudaDeviceSynchronize();
tu::printDurationSinceStart(start, "XBatch_t"); // Around 1.5 seconds (includes sync time)
I added the cuda synchronise because operations are done asynchronously, and I want to see exactly what part of my algo is taking so long.
Also is it best to call run()
on operators one at a time (which I'm doing now in order to profile it) or all in one thing at the end? For the latter case, this may be whats slowing my code down.
I've found that comparable code with CuPy is much, much faster (using array on array indexing).
Edit: BBatch_t size is actually 2_400_000 not 240_000
from matx.
It should be very light weight and result in a coalesced load of the index followed by a potentially scattered read depending on the indexing.
Have you verified the shapes of all of the bits are what you expect?
What are the shapes of the following?
Xt
matx::flatten(BBatch_t)
matx::remap<0>(X_t, matx::flatten(BBatch_t))
Xbatch_t
from matx.
Can you also paste the equivalent python code and the shapes for the same bits?
from matx.
Ok i made a stupid mistake.
I wasn't comparing oranges to oranges. The CPP code for the mats was much larger. However, what I did find was that having intermediate uses of run()
was making the code a lot slower - not sure why.
from matx.
Every time you call run() it will launch a kernel. It's usually faster to try to combine as many things as possible into one. If you paste more of your code we can take a look.
from matx.
@cliffburdick ok. please see #686, im getting weird results whether I call run()
immediately or deferred
from matx.
Related Issues (20)
- [FEA] Better error messages when allocating memory
- [QST] MatX is around x15 slower than CuPy for the same task HOT 55
- [DOC] Add detailed descriptions of memory space options
- [BUG] update CUB Cache
- [FEA] add argminmax function
- [FEA] Support UINT random generation with API
- [FEA] Add Normalize function
- [FEA] Add argsort function
- [BUG] ‘::rsqrt’ has not been declared when building MatX HOT 1
- [BUG] Less than operator doesn't seem to work HOT 2
- [BUG] Building docs too aggressive with recursive folder searching
- [QST] How to use the `sort` function? HOT 1
- [QST] How to multiply two random tensors together? HOT 3
- [FEA] add ability to print shape only HOT 1
- [BUG] MATX_EN_CUTENSOR / MATX_ENABLE_CUTENSOR Unified Language
- [BUG] Warning Message for Host call in device function for Einsum
- [BUG] sum function produces incorrect results HOT 2
- [BUG] Transform Ops have incorrect rank and size
- [QST] How to use norm ops with repmat? HOT 6
- [BUG] Random op can't be casted to tensor_t HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from matx.