Giter Club home page Giter Club logo

Comments (8)

ehpor avatar ehpor commented on September 15, 2024

Confirmed: switching the order of the two subtests makes the smaller focal length fail, not the larger. So this is not focal length related. Test output:

z2 [  0.   6.  12.  18.  24.  30.  36.  42.  48.  54.  60.  66.  72.  78.
  84.  90.  96. 102. 108. 114. 120.]
peak2 [Field(1.94038034e-05), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan)]
z [ 0.  3.  6.  9. 12. 15. 18. 21. 24. 27. 30. 33. 36. 39. 42. 45. 48. 51.
 54. 57. 60.]
peak [Field(1.94038034e-05), Field(3.33415885e-05), Field(4.23350558e-05), Field(5.55878397e-05), Field(0.0001673), Field(0.00027638), Field(0.00023232), Field(0.00084203), Field(0.00173035), Field(0.00345173), Field(0.08296981), Field(0.00397364), Field(0.00105164), Field(0.00078266), Field(0.00023865), Field(0.00030616), Field(0.00010211), Field(9.74986306e-05), Field(0.00010154), Field(4.61751568e-05), Field(2.95089006e-05)]

from hcipy.

ehpor avatar ehpor commented on September 15, 2024

May also fail with a FresnelPropagator instead of an AngularSpectrumPropagator. This bug however disappeared upon a second confirmation run. I'll have to pick this back up when it resurfaces.

from hcipy.

ehpor avatar ehpor commented on September 15, 2024

The bug reappeared a few CI runs later. Failed for AngularSpectrumPropagator but passed for FresnelPropagator in the same run. I'm gonna push a change to FresnelPropagator to master and rebase #115 to see if this fixes it. The underlying cause has not been found yet.

from hcipy.

ehpor avatar ehpor commented on September 15, 2024

Breakthrough: the NaNs appear right after taking the FFT of the input array inside the FourierFilter class. There are no NaNs in the input array to the FFT, while there are NaNs on the output array. The transfer function itself has no NaNs.

While the number of pixels in the input array is not standard (513x513px), this should never happen with a commercial library. This however explains the intermittent behaviour, and why this only happens on the first run of the test and not the second, independent of the actual math that is going on.

Logging operating system and build image for the Ubuntu system running the failed test:

Agent name: 'Azure Pipelines 4'
Agent machine name: 'fv-az105-922'
Current agent version: '2.195.0'
Operating System
Ubuntu
20.04.3
LTS
Virtual Environment
Environment: ubuntu-20.04
Version: 20211108.1
Included Software: https://github.com/actions/virtual-environments/blob/ubuntu20/20211108.1/images/linux/Ubuntu2004-README.md
Image Release: https://github.com/actions/virtual-environments/releases/tag/ubuntu20%2F20211108.1
Virtual Environment Provisioner
1.0.0.0-master-20211108-1
Current image version: '20211108.1'
Agent running as: 'vsts'
Prepare build directory.
Set build variables.
Download all required tasks.
Downloading task: Bash (3.189.0)
Downloading task: PublishTestResults (2.192.1)

Installed mkl versions:

  mkl                pkgs/main/linux-64::mkl-2021.4.0-h06a4308_640
  mkl-service        pkgs/main/linux-64::mkl-service-2.4.0-py39h7f8727e_0
  mkl_fft            pkgs/main/linux-64::mkl_fft-1.3.1-py39hd3c417c_0
  mkl_random         pkgs/main/linux-64::mkl_random-1.2.2-py39h51133e4_0

Output of the offending test.

##[warning]Nans detected after fftn
##[warning]Nans detected after multiplication with tf.
##[warning]Nans detected after ifftn
##[warning]Nans detected in f
##[warning]NaNs detected in peak: [Field(1.94038034e-05), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan), Field(nan)] for focal length 30.0.
##[warning]tf nans for 30.0? False.
##[warning]transfer function: [[1.        -2.94491137e-05j 0.99999996-2.65068558e-04j

Relevant part of the code in FourierFilter._operation():

# Don't overwrite f if it's the input array.
if _use_mkl:
    kwargs = {'overwrite_x': self.cutout is not None and field.grid.ndim > 1}
else:
    kwargs = {}

if not np.all(np.isfinite(f)):
    import warnings
    warnings.warn('Nans detected in f')

f = _fft_module.fftn(f, axes=tuple(range(-self.input_grid.ndim, 0)), **kwargs)

if not np.all(np.isfinite(f)):
    import warnings
    warnings.warn('Nans detected after fftn')

# Omitted code for getting tf in the right format.
f *= tf

if not np.all(np.isfinite(f)):
    import warnings
    warnings.warn('Nans detected after multiplication with tf.')

if _use_mkl:
    kwargs = {'overwrite_x': True}
else:
    kwargs = {}
f = _fft_module.ifftn(f, axes=tuple(range(-self.input_grid.ndim, 0)), **kwargs)

if not np.all(np.isfinite(f)):
    import warnings
    warnings.warn('Nans detected after ifftn')

from hcipy.

syhaffert avatar syhaffert commented on September 15, 2024

This is still super weird. I have never had this issue with python 3.8 on Ubuntu, or on my windows machines. I really don't understand what changed in python 3.9. It already happens after the first fftn. Could it be an error in the new build of mkl? Does this also happen with pyfftw?

from hcipy.

syhaffert avatar syhaffert commented on September 15, 2024

I just ran the test code 20 times or so and I passed it every time. I never get this error. I used:
Ubuntu 18.04.4 LTS
Python 3.9.4
Numpy 1.21.4 (standard linked with mkl).

No errors whatsoever, even after many reruns. Is it the CI machines?

from hcipy.

ehpor avatar ehpor commented on September 15, 2024

I'm like 95% sure this is mkl fft related, yes, combined with their machines. There should be no reason for an array with no NaNs to be generating NaNs after an FFT. However, I can't find any bug report on this at all from others. It might be related to the 513px, so I might change that to 512.

All builds on CI use mkl_fft. So I didn't check other FFT libraries (nor pocketfft that is built into numpy).

I printed MKL versions + Azure Ubuntu image info to check for any software related differences between working and non-working runs. But so far I haven't seen the mkl versions change. I haven't been able to reproduce this on my Ubuntu 20.04 machine with an identical conda env. I haven't tried the Azure dockerfile yet.

from hcipy.

ehpor avatar ehpor commented on September 15, 2024

I've been manually triggering more CI runs over the day. A few runs with 512px across and a few with 513px across. The 512px runs all passed (6/6 passed). The 513px runs pass intermittently (1/7 passed). I'm gonna make the change to 512px.

from hcipy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.