Leonardo dos Santos to start work on this

improve the speed of the column-by-column background subtraction about eureka HOT 10 OPEN

kevin218 commented on June 11, 2024

improve the speed of the column-by-column background subtraction

from eureka.

Comments (10)

kevin218 commented on June 11, 2024 1

This sounds great! In place of editing the existing routine, I suggest copying the code to a new routine and giving users the option to use either one. This will also allow us to compare the effectiveness of each method.

from eureka.

ladsantos commented on June 11, 2024 1

thanks for looking into this Leo and too bad it's proving to be a bit tricky!
one naive question is, have you looked at using np.apply_along_axis() ? I think this function will apply a function of your choice along a whole axis (e.g. all the columns) and that it's coded up efficiently so that it's a lot faster than a normal for loop. That might be able to remove at least one of the loops.

If that doesn't work here I'm happy to give some pointers on how to implement a C extension

Oh, I didn't know about numpy.apply_along_axis(), I will give it a try and see what happens.

from eureka.

kevin218 commented on June 11, 2024 1

Good idea! Let's also add it to our C2 proposal.

from eureka.

ladsantos commented on June 11, 2024

I will take a look at this and see how to optimize it.

from eureka.

ladsantos commented on June 11, 2024

So, I did a brief test here, and I saw that a for-loop to fit ~1500 polynomials should take only a fraction of a second, so that is not the bottleneck. I think the bottleneck happens because we have a while-loop nested inside the for-loop, and a bunch of if-/try-statements to check for bad pixels. Python is sluggish with nested loops.

What I suggest is, instead of using this nested while-loop, we could simply assign weights of zero to the bad pixels when fitting the polynomials (it's the parameter w when calling numpy.polyfit). This should do the trick. I will work on implementing this.

from eureka.

ladsantos commented on June 11, 2024

So, I have a bit of unsatisfying news. I realize that the while-loop was necessary for sigma-clipping, so it won't be trivial to remove the nested looping. I tried using Astropy's fitting modules with their implementation of sigma-clipping, and that was actually slower than the current implementation by a factor of 2-3.

I'm afraid that the code may already be at its limit for optimization, unless this background removal is implemented in a more optimized language, like C or Fortran. Or, there could be a clever way of fitting all integrations at the same time somehow, perhaps by scaling and concatenating all of the arrays together... But that would require an "open-heart surgery" in the code, particularly in how util.BGsubtraction works.

I can push the implementation in Astropy if you want, by the way. But it does not solve the issue 😬

from eureka.

lkreidberg commented on June 11, 2024

thanks for looking into this Leo and too bad it's proving to be a bit tricky!
one naive question is, have you looked at using np.apply_along_axis() ? I think this function will apply a function of your choice along a whole axis (e.g. all the columns) and that it's coded up efficiently so that it's a lot faster than a normal for loop. That might be able to remove at least one of the loops.

If that doesn't work here I'm happy to give some pointers on how to implement a C extension

from eureka.

taylorbell57 commented on June 11, 2024

Honestly, I haven't personally found the time spent on background subtraction to be prohibitive at this point. Do we still want to look into improving the runtime, or can we close this issue?

from eureka.

kevin218 commented on June 11, 2024

It's the slowest step within S3 and i have thoughts on how to speed it up for quick and dirty analyses, but it's also low on the priority list until we get real data.

from eureka.

taylorbell57 commented on June 11, 2024

One thing I don't think we've tried for this is using numba's jit decorator to allow for just-in-time compiled functions. I can't remember what all we use inside the relevant functions, but it's entirely possible that we could see some significant speed improvements by just adding @jit before some key lower-level functions

from eureka.

improve the speed of the column-by-column background subtraction about eureka HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent