Giter Club home page Giter Club logo

Comments (12)

cliffburdick avatar cliffburdick commented on August 23, 2024 1

That's a good question. As mentioned in the select documentation it's mostly used for flattened indices. What I mean by that is when you run something like find on a tensor, it must be able to work on arbitrary rank tensors. But we can't return an arbitrary for indices. Instead, it returns a 1D vector of indices that map to a flattened view of a tensor. So normally a 3x2 tensor that has the following indices:

[[ (0,0), (0,1), (0,2)],
 [(1,0), (1,1), (1,2)]]

Would map to a flattened view like:

[[ 0, 1, 2],
[3, 4, 5]]

This flattened view is useful because it can be used with select to pull out values for functions that need that format.

remap is different in that it allows you to select values from an operator based on one or more dimensions, and each dimension has an operator/tensor input that you want to select from. Notably it does not need the flattened view and is likely a bit more intuitive and closer to Python. You can think of remap as Python's selecting but you need to add the extra template parameter for the dimension you want to select from.

from matx.

cliffburdick avatar cliffburdick commented on August 23, 2024

Hi @HugoPhibbs , this can be done with select: https://nvidia.github.io/MatX/api/manipulation/selecting/select.html#select

Let me know if that works

from matx.

luitjens avatar luitjens commented on August 23, 2024

Can you explain what preprocessing you need for remap? It does exactly this:

(tov = remap<0>(tiv, idx)).run(exec);

The above remaps dimension 0 of the tensors tiv by the index tensor idx and assigns it to the output tensor tov.

from matx.

HugoPhibbs avatar HugoPhibbs commented on August 23, 2024

@cliffburdick I'm doing something a bit like this:

// X (indices) has shape [20], Y (data) has shape [10, 3]
auto X_batch = matx::slice(X, {0}, {9});

matx::print(X_batch); // Prints fine, matrix of shape [10]

auto Y_batch = matx::make_tensor<float>({10, 3})

(Y_batch = matx::select(Y, X_batch)).run(cudaStream); // Fails here

matx::print(Y_batch);

Which prints the error (which honestly I don't know how to interpret):

unknown file: Failure
C++ exception with description "matxException (matxInvalidSize: size == 0 || size == Size(i): incompatib" thrown in the test body.

How does select work? - Does the index go along the first dimension? Say you had a idx tensor of shape [n], and a data tensor of shape [a, b, c], is the select output going to be of shape [n, b, c]?

from matx.

cliffburdick avatar cliffburdick commented on August 23, 2024

Hi @HugoPhibbs, it's not clear to my from your example what you're trying to select. But let's assume X_batch is a 1D vector with values {0, 5, 7, 8}. To select rows {0, 5, 7, 8} from Y you would do:

auto Y_batch = matx::make_tensor<float>({4, 3}); // 4 Since we're selecting 4 values
(Y_batch = remap<0>(Y, X_batch)).run(cudaStream);

After this expression Y_batch will contain rows {0, 5, 7, 8} from Y and all columns for those rows.

from matx.

HugoPhibbs avatar HugoPhibbs commented on August 23, 2024

@cliffburdick thx for reply. In this case, why are we using remap and not select? Whats the use cases for each?

from matx.

HugoPhibbs avatar HugoPhibbs commented on August 23, 2024

Hi again. I wasn't sure whether to make a new issue for this, or just add a comment, so I'm just gonna do it here...

So i've done some more implementations, and I gotta ask: how lightweight is remap? - I know you've said before that all operators are inherently light weight. But I have this issue in my code where repmap is taking surprisingly long. The code looks like this:

// BBatch_t flattened is approx 2*300*2*2000 = 2_400_000, X_t has approx shape (70_000, 800)
// FYI tu is a module with timing functions

start = tu::timeNow();

(XBatch_t = matx::remap<0>(X_t, matx::flatten(BBatch_t))).run();

cudaDeviceSynchronize();
tu::printDurationSinceStart(start, "XBatch_t"); // Around 1.5 seconds (includes sync time)

I added the cuda synchronise because operations are done asynchronously, and I want to see exactly what part of my algo is taking so long.

Also is it best to call run() on operators one at a time (which I'm doing now in order to profile it) or all in one thing at the end? For the latter case, this may be whats slowing my code down.

I've found that comparable code with CuPy is much, much faster (using array on array indexing).

Edit: BBatch_t size is actually 2_400_000 not 240_000

from matx.

luitjens avatar luitjens commented on August 23, 2024

It should be very light weight and result in a coalesced load of the index followed by a potentially scattered read depending on the indexing.

Have you verified the shapes of all of the bits are what you expect?

What are the shapes of the following?

Xt
matx::flatten(BBatch_t)
matx::remap<0>(X_t, matx::flatten(BBatch_t))
Xbatch_t

from matx.

luitjens avatar luitjens commented on August 23, 2024

Can you also paste the equivalent python code and the shapes for the same bits?

from matx.

HugoPhibbs avatar HugoPhibbs commented on August 23, 2024

Ok i made a stupid mistake.

I wasn't comparing oranges to oranges. The CPP code for the mats was much larger. However, what I did find was that having intermediate uses of run() was making the code a lot slower - not sure why.

from matx.

cliffburdick avatar cliffburdick commented on August 23, 2024

Every time you call run() it will launch a kernel. It's usually faster to try to combine as many things as possible into one. If you paste more of your code we can take a look.

from matx.

HugoPhibbs avatar HugoPhibbs commented on August 23, 2024

@cliffburdick ok. please see #686, im getting weird results whether I call run() immediately or deferred

from matx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.