Comments (5)
Hi,
yes. I agree. I was already checking for a known memory leak in scipy.signal.convolve
, but I couldn't find one.
I'll try to provide an example code asap.
Cheers.
from kdepy.
Thanks for letting me know about this @tglauch . I will look at it when time permits it. I have to admit I am not sure why it happens. Are you able to post code reproducing the problem? E.g. using random data and a seed?
from kdepy.
Seems to be related to the convolution in scipy.signal.convolve
based on the traceback.
from kdepy.
Here we go with a snippet
import numpy as np
import itertools
from KDEpy import FFTKDE
@profile
def run_code():
nbins_x, nbins_y, nbins_z, nbins_z2 = 120, 30, 100, 200
bins_x= np.linspace(0,1.0,nbins_x, dtype = np.float32)
bins_y = np.linspace(0,1.0, nbins_y, dtype = np.float32)
bins_z = np.linspace(0,1.0,nbins_z, dtype = np.float32)
bins_z2 = np.linspace(0,1.0, nbins_z2, dtype = np.float32)
np.random.seed(1)
x = np.random.random(int(3e6))
w = np.random.random(int(3e6))
#Generate KDE1
inp_data = np.column_stack((x, x, x, x))
print('start KDE1')
KDE = FFTKDE(kernel='gaussian', bw=0.1).fit(inp_data, w)
print('evaluate bins')
bins1 = np.array(list(itertools.product(bins_x, bins_y, bins_z, bins_z2)))
y = KDE.evaluate(bins1)
if __name__ == '__main__':
run_code()
Changing slightly the binning brings you from 2GB
Line # Mem usage Increment Line Contents
================================================
5 42.320 MiB 42.320 MiB @profile
6 def run_code():
7 42.320 MiB 0.000 MiB nbins_x, nbins_y, nbins_z, nbins_z2 = 110, 30, 100, 200
8 42.320 MiB 0.000 MiB bins_x= np.linspace(0,1.0,nbins_x, dtype = np.float32)
9 42.320 MiB 0.000 MiB bins_y = np.linspace(0,1.0, nbins_y, dtype = np.float32)
10 42.320 MiB 0.000 MiB bins_z = np.linspace(0,1.0,nbins_z, dtype = np.float32)
11 42.320 MiB 0.000 MiB bins_z2 = np.linspace(0,1.0, nbins_z2, dtype = np.float32)
12
13 42.320 MiB 0.000 MiB np.random.seed(1)
14 65.293 MiB 22.973 MiB x = np.random.random(int(3e6))
15 88.090 MiB 22.797 MiB w = np.random.random(int(3e6))
16 #Generate KDE1
17 179.719 MiB 91.629 MiB inp_data = np.column_stack((x, x, x, x))
18
19 179.719 MiB 0.000 MiB print('start KDE1')
20 205.508 MiB 25.789 MiB KDE = FFTKDE(kernel='gaussian', bw=0.1).fit(inp_data, w)
21 205.508 MiB 0.000 MiB print('evaluate bins')
22 1226.445 MiB 1020.938 MiB bins1 = np.array(list(itertools.product(bins_x, bins_y, bins_z, bins_z2)))
23 3515.254 MiB 2288.809 MiB y = KDE.evaluate(bins1)
to 52GB
Line # Mem usage Increment Line Contents
================================================
5 42.320 MiB 42.320 MiB @profile
6 def run_code():
7 42.320 MiB 0.000 MiB nbins_x, nbins_y, nbins_z, nbins_z2 = 120, 30, 100, 200
8 42.320 MiB 0.000 MiB bins_x= np.linspace(0,1.0,nbins_x, dtype = np.float32)
9 42.320 MiB 0.000 MiB bins_y = np.linspace(0,1.0, nbins_y, dtype = np.float32)
10 42.320 MiB 0.000 MiB bins_z = np.linspace(0,1.0,nbins_z, dtype = np.float32)
11 42.320 MiB 0.000 MiB bins_z2 = np.linspace(0,1.0, nbins_z2, dtype = np.float32)
12
13 42.320 MiB 0.000 MiB np.random.seed(1)
14 65.355 MiB 23.035 MiB x = np.random.random(int(3e6))
15 88.086 MiB 22.730 MiB w = np.random.random(int(3e6))
16 #Generate KDE1
17 179.781 MiB 91.695 MiB inp_data = np.column_stack((x, x, x, x))
18
19 179.781 MiB 0.000 MiB print('start KDE1')
20 205.637 MiB 25.855 MiB KDE = FFTKDE(kernel='gaussian', bw=0.1).fit(inp_data, w)
21 205.637 MiB 0.000 MiB print('evaluate bins')
22 1318.488 MiB 1112.852 MiB bins1 = np.array(list(itertools.product(bins_x, bins_y, bins_z, bins_z2)))
23 53852.961 MiB 52534.473 MiB y = KDE.evaluate(bins1)
Let me add:
numpy version is 1.15.4
from kdepy.
Hi again @tglauch . I'm afraid I won't be of much help regarding this. Running the code snippet you provided grinds my desktop computer to a halt, so it's a difficult problem to debug. Again, I believe the problem is related to scipy.signal.convolve
based on the traceback.
Ideas you could try for debugging.
- Check if the memory jump still happens when the bin count in the other dimensions are decreased.
- Check if the memory jump still happens when the shape is permuted, e.g.
(a, b, c, d) -> (b, c, d, a)
. - Check if the memory jump can be reproduced when convolving a random 4D array of the same shape as in the provided example. In other words, convolving a 4D array of shape
(120, 30, 100, 200)
with a smaller 4D kernel. This is essentially what FFTKDE does in the lineans = convolve(data, kernel_weights, mode='same').reshape(-1, 1)
.
An idea for working around this
import numpy as np
def run_code():
# Bins in each dimension, 50 * 30 * 80 * 100 = 12 million grid points
nbins_x, nbins_y, nbins_z, nbins_z2 = 50, 30, 80, 100
# Generate random data and weights
np.random.seed(1)
x = np.random.random(int(3e6))
w = np.random.random(int(3e6))
inp_data = np.column_stack((x, x, x, x))
# Utility functions for automatic grids and binning
from KDEpy.utils import autogrid
from KDEpy.binning import linear_binning
# Create a grid. Maximum of relative and absolute boundary limits is used
# Here the grid will go from -0.5 to 1.5, not -0.05 to 1.05
grid = autogrid(inp_data, boundary_abs=0.5,
num_points=(nbins_x, nbins_y, nbins_z, nbins_z2),
boundary_rel=0.05)
print(grid[0, :]) # Check boundaries
print(grid.shape)
# Bin the data on the grid points, the weights are used
data_vals = linear_binning(inp_data, grid, weights=w)
# At this point, `data_vals` is a linear approximation to your data, at the points `grid`.
# You can try to convolve this 4D weighted compression of your data with
# a kernel, or feed this to a different algorithm, but it must handle weighted data.
if __name__ == '__main__':
run_code()
from kdepy.
Related Issues (20)
- python3.11 compatibility
- Citation for your implementation HOT 2
- Add a JIT compiler? (feature request) HOT 1
- Including KDEpy in BSD licensed project HOT 4
- Some problems HOT 1
- Is it possible to fit and save the state of the FFTKDE? HOT 4
- Can `bw_selection.py` return a value when root finding did not converge? HOT 1
- Installing CytoPy HOT 1
- how to get pseudo-uniform samples HOT 1
- Unable to solve for support numerically. HOT 2
- cutils compiles to package parent directory HOT 2
- Remove matplotlib dependency HOT 1
- Change build action to not automatically publish to PyPI? HOT 12
- Docs failing HOT 1
- Using bandwidth matrices for bivariate FFTKDE HOT 1
- SPherical KDE HOT 1
- kde.evaluate for density plot HOT 1
- Calculate corresponding quantities HOT 1
- Add a new rule of thumb HOT 2
- FutureWarning HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kdepy.