Hi, First: thanks you for the great package, it works really well! H

Thanks for letting me know about this <a class="user-mention notranslate" data-hoverca

Here we go with a snippet <div class="snippet-clipboard-content notranslate positi

Hi again <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard

Memory error for large arrays about kdepy HOT 5 OPEN

tommyod commented on June 15, 2024

Memory error for large arrays

from kdepy.

Comments (5)

tglauch commented on June 15, 2024 1

Hi,
yes. I agree. I was already checking for a known memory leak in scipy.signal.convolve, but I couldn't find one.

I'll try to provide an example code asap.

Cheers.

from kdepy.

tommyod commented on June 15, 2024

Thanks for letting me know about this @tglauch . I will look at it when time permits it. I have to admit I am not sure why it happens. Are you able to post code reproducing the problem? E.g. using random data and a seed?

from kdepy.

tommyod commented on June 15, 2024

Seems to be related to the convolution in scipy.signal.convolve based on the traceback.

from kdepy.

tglauch commented on June 15, 2024

Here we go with a snippet

import numpy as np
import itertools
from KDEpy import FFTKDE

@profile
def run_code():
    nbins_x, nbins_y, nbins_z, nbins_z2 = 120, 30, 100, 200
    bins_x= np.linspace(0,1.0,nbins_x, dtype = np.float32)
    bins_y = np.linspace(0,1.0, nbins_y, dtype = np.float32)
    bins_z = np.linspace(0,1.0,nbins_z, dtype = np.float32)
    bins_z2 = np.linspace(0,1.0, nbins_z2, dtype = np.float32)

    np.random.seed(1)
    x = np.random.random(int(3e6))
    w = np.random.random(int(3e6))
    #Generate KDE1
    inp_data = np.column_stack((x, x, x, x))

    print('start KDE1')
    KDE = FFTKDE(kernel='gaussian', bw=0.1).fit(inp_data, w)
    print('evaluate bins')
    bins1 = np.array(list(itertools.product(bins_x, bins_y, bins_z, bins_z2)))
    y = KDE.evaluate(bins1)

if __name__ == '__main__':
    run_code()

Changing slightly the binning brings you from 2GB

Line #    Mem usage    Increment   Line Contents
================================================
     5   42.320 MiB   42.320 MiB   @profile
     6                             def run_code():
     7   42.320 MiB    0.000 MiB       nbins_x, nbins_y, nbins_z, nbins_z2 = 110, 30, 100, 200
     8   42.320 MiB    0.000 MiB       bins_x= np.linspace(0,1.0,nbins_x, dtype = np.float32)
     9   42.320 MiB    0.000 MiB       bins_y = np.linspace(0,1.0, nbins_y, dtype = np.float32)
    10   42.320 MiB    0.000 MiB       bins_z = np.linspace(0,1.0,nbins_z, dtype = np.float32)
    11   42.320 MiB    0.000 MiB       bins_z2 = np.linspace(0,1.0, nbins_z2, dtype = np.float32)
    12                             
    13   42.320 MiB    0.000 MiB       np.random.seed(1)
    14   65.293 MiB   22.973 MiB       x = np.random.random(int(3e6))
    15   88.090 MiB   22.797 MiB       w = np.random.random(int(3e6))
    16                                 #Generate KDE1
    17  179.719 MiB   91.629 MiB       inp_data = np.column_stack((x, x, x, x))
    18                             
    19  179.719 MiB    0.000 MiB       print('start KDE1')
    20  205.508 MiB   25.789 MiB       KDE = FFTKDE(kernel='gaussian', bw=0.1).fit(inp_data, w)
    21  205.508 MiB    0.000 MiB       print('evaluate bins')
    22 1226.445 MiB 1020.938 MiB       bins1 = np.array(list(itertools.product(bins_x, bins_y, bins_z, bins_z2)))
    23 3515.254 MiB 2288.809 MiB       y = KDE.evaluate(bins1)

to 52GB

Line #    Mem usage    Increment   Line Contents
================================================
     5   42.320 MiB   42.320 MiB   @profile
     6                             def run_code():
     7   42.320 MiB    0.000 MiB       nbins_x, nbins_y, nbins_z, nbins_z2 = 120, 30, 100, 200
     8   42.320 MiB    0.000 MiB       bins_x= np.linspace(0,1.0,nbins_x, dtype = np.float32)
     9   42.320 MiB    0.000 MiB       bins_y = np.linspace(0,1.0, nbins_y, dtype = np.float32)
    10   42.320 MiB    0.000 MiB       bins_z = np.linspace(0,1.0,nbins_z, dtype = np.float32)
    11   42.320 MiB    0.000 MiB       bins_z2 = np.linspace(0,1.0, nbins_z2, dtype = np.float32)
    12                             
    13   42.320 MiB    0.000 MiB       np.random.seed(1)
    14   65.355 MiB   23.035 MiB       x = np.random.random(int(3e6))
    15   88.086 MiB   22.730 MiB       w = np.random.random(int(3e6))
    16                                 #Generate KDE1
    17  179.781 MiB   91.695 MiB       inp_data = np.column_stack((x, x, x, x))
    18                             
    19  179.781 MiB    0.000 MiB       print('start KDE1')
    20  205.637 MiB   25.855 MiB       KDE = FFTKDE(kernel='gaussian', bw=0.1).fit(inp_data, w)
    21  205.637 MiB    0.000 MiB       print('evaluate bins')
    22 1318.488 MiB 1112.852 MiB       bins1 = np.array(list(itertools.product(bins_x, bins_y, bins_z, bins_z2)))
    23 53852.961 MiB 52534.473 MiB       y = KDE.evaluate(bins1)

Let me add:
numpy version is 1.15.4

from kdepy.

tommyod commented on June 15, 2024

Hi again @tglauch . I'm afraid I won't be of much help regarding this. Running the code snippet you provided grinds my desktop computer to a halt, so it's a difficult problem to debug. Again, I believe the problem is related to scipy.signal.convolve based on the traceback.

Ideas you could try for debugging.

Check if the memory jump still happens when the bin count in the other dimensions are decreased.
Check if the memory jump still happens when the shape is permuted, e.g. (a, b, c, d) -> (b, c, d, a).
Check if the memory jump can be reproduced when convolving a random 4D array of the same shape as in the provided example. In other words, convolving a 4D array of shape (120, 30, 100, 200) with a smaller 4D kernel. This is essentially what FFTKDE does in the line ans = convolve(data, kernel_weights, mode='same').reshape(-1, 1).

An idea for working around this

import numpy as np

def run_code():
    
    # Bins in each dimension, 50 * 30 * 80 * 100 = 12 million grid points
    nbins_x, nbins_y, nbins_z, nbins_z2 = 50, 30, 80, 100

    # Generate random data and weights 
    np.random.seed(1)
    x = np.random.random(int(3e6))
    w = np.random.random(int(3e6))
    inp_data = np.column_stack((x, x, x, x))
    
    # Utility functions for automatic grids and binning
    from KDEpy.utils import autogrid
    from KDEpy.binning import linear_binning
    
    # Create a grid. Maximum of relative and absolute boundary limits is used
    # Here the grid will go from -0.5 to 1.5, not -0.05 to 1.05
    grid = autogrid(inp_data, boundary_abs=0.5,
                    num_points=(nbins_x, nbins_y, nbins_z, nbins_z2), 
                    boundary_rel=0.05)
    
    print(grid[0, :])  # Check boundaries
    print(grid.shape)
    
    # Bin the data on the grid points, the weights are used
    data_vals = linear_binning(inp_data, grid, weights=w)
    
    # At this point, `data_vals` is a linear approximation to your data, at the points `grid`.
    # You can try to convolve this 4D weighted compression of your data with
    # a kernel, or feed this to a different algorithm, but it must handle weighted data.

if __name__ == '__main__':
    run_code()

from kdepy.

Memory error for large arrays about kdepy HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent