Giter Club home page Giter Club logo

Comments (5)

manodeep avatar manodeep commented on September 2, 2024 1

Glad the comments are (potentially) helpful :)

Good luck with the profiling - hopefully, that uncovers some easy-to-fix bottlenecks

from suave.

kstoreyf avatar kstoreyf commented on September 2, 2024

hi @manodeep, apologies for the delay here i lost track of this! this is very helpful feedback, thanks so much; all makes sense on a first read.

i just set up some basic speed and accuracy tests, so that when i go to address these comments I can see the difference. the scripts for those are here - i started with the most basic test of pair counting vs my estimator's version of it, will add more.

i will also check out the profiler to figure out where the current bottleneck is, and see how fixing these helps. will update here how it goes.

finally just a heads up that i'm about to change the name of my version of the repo/package!

from suave.

manodeep avatar manodeep commented on September 2, 2024

Meant to add earlier - will you please see if this numpy trick works for the brute-force calculation that you have? Presumably, you will need to adapt for the projection but may be some clever formulation will work for your case

from suave.

kstoreyf avatar kstoreyf commented on September 2, 2024

hi @manodeep, i've been trying to get that numpy trick to work - thanks for pointing me to it, the brute force calculation is hugely limiting right now. i can use the trick to compute the distances for all of the pairs, which works well. however I can't figure out how to adapt it to my formulation without hitting memory limits.

from what i understand, the trick constructs an NxN matrix of distances between the pairs, where N is the number of data points, like so:
dists = np.sqrt( -2 * np.dot(data, data.T) + np.sum(data**2, axis=1) + np.sum(data**2, axis=1)[:, np.newaxis] )
this is very fast but already seems a bit wild memory-wise so maybe i'm missing something?

anyway, then for my estimator, i need to for each pair distance return a (k,k) matrix (where k is usually 5-20, e.g. the number of bins), and then sum up these matrices over all the pairs. i can't figure out how to do this in a vectorized way, short of computing a (N,N,k,k) matrix which is ridiculous memory-wise. (more specifically, i compute a k-vector for each pair, and the (k,k) matrix is the outer product of the k-vector with itself. maybe there's a trick with this?)

i'll keep working at it but any advice appreciated!

from suave.

manodeep avatar manodeep commented on September 2, 2024

How about working in fixed chunks of 8/16/32 of N, N (say 8, 8 or 32, 32) and then generating the k, k matrix (so the output matrix would be [k, k, 8, 8] or [k, k, 32, 32]) and then summing over the last two dimensions. (Could be important to make the last two indices for the chunking, but I am not sure. Worth checking if the performance is better when the calculation is performed as [N, N, k, k] vs [k, k, N, N])

Stepwise:

  • create the k, k matrix before all loops and zero out the matrix
  • create a double for loop over i, j in chunks of 8 (or something small)
  • calculate the [k, k, 8, 8] matrix for each chunk of 8, 8 in the vectorized way
  • sum the last two dimensions and add to the matrix you created out of the loop
  • account for the remaining pairs that are not evaluated (because of the chunks)

Would that work?

from suave.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.