Giter Club home page Giter Club logo

Comments (22)

rmjarvis avatar rmjarvis commented on August 17, 2024

This is the test that reads in the output of the piffify command. So can you try running

piffify simple.yaml

in the tests directory? It should make a number of files in the output directory:

$ ls -l output/simple_psf*
-rw-rw-r--  1 Mike  staff  40320 Dec 15 20:53 output/simple_psf.fits
-rw-rw-r--  1 Mike  staff  28298 Dec 15 20:53 output/simple_psf_rhostats.pdf
-rw-rw-r--  1 Mike  staff  24448 Dec 15 20:53 output/simple_psf_shapestats.pdf
-rw-rw-r--@ 1 Mike  staff  21711 Dec 15 20:53 output/simple_psf_twodhiststats.pdf
-rw-rw-r--@ 1 Mike  staff  20445 Dec 15 20:53 output/simple_psf_twodhiststats_std.pdf

from piff.

esheldon avatar esheldon commented on August 17, 2024
[esheldon@forest tests] piffify simple.yaml
Segmentation fault (core dumped)

from piff.

rmjarvis avatar rmjarvis commented on August 17, 2024

Do piffify -v3 simple.yaml and let me know how far it gets before crashing.

from piff.

esheldon avatar esheldon commented on August 17, 2024

here you go

Using config file simple.yaml
chipnums = None
image files = ['data/simple_image.fits']
cat files = ['data/simple_cat.fits']
Using default chipnums: range(0, 1)
Reading image file data/simple_image.fits
Making trivial (wt==1) weight image
Reading star catalog data/simple_cat.fits.
Removing objects with flag (col flag) != 0
Removing objects with use (col use) == 0
Making star list from catalog data/simple_cat.fits
Processing catalog data/simple_cat.fits with 7 stars
Read a total of 7 stars from 1 image
Parsing PSF based on config dict:
interp:
  type: Mean
model:
  fastfit: true
  include_pixel: false
  type: Gaussian

PSF type is SimplePSF
Building SimplePSF
Done building PSF
Initializing models
Segmentation fault (core dumped)

from piff.

rmjarvis avatar rmjarvis commented on August 17, 2024

This looks like it's crashing the first time it tries to import galsim. I'm not sure why it would fail when doing so from piffify, but not when running the tests directly.

Maybe check that head $PREFIX/bin/piffify (or wherever piffify lives on your machine) shows it using the same python as what nosetests uses.

And if you're willing to investigate a bit, maybe try running it within gdb, so you can get a backtrace. Maybe that will give a better clue about the nature of the seg fault.

from piff.

rmjarvis avatar rmjarvis commented on August 17, 2024

I just added some print statements on branch "#50". So could you try checking out this branch and running python test_simple.py in the tests directory. That might tell us more about why something different is happening in the two cases.

from piff.

esheldon avatar esheldon commented on August 17, 2024
[esheldon@forest tests] piffify -v3 simple.yaml
Using config file simple.yaml
chipnums = None
image files = ['data/simple_image.fits']
cat files = ['data/simple_cat.fits']
Using default chipnums: range(0, 1)
Reading image file data/simple_image.fits
Making trivial (wt==1) weight image
Reading star catalog data/simple_cat.fits.
Removing objects with flag (col flag) != 0
Removing objects with use (col use) == 0
Making star list from catalog data/simple_cat.fits
Processing catalog data/simple_cat.fits with 7 stars
Read a total of 7 stars from 1 image
Parsing PSF based on config dict:
interp:
  type: Mean
model:
  fastfit: true
  include_pixel: false
  type: Gaussian

PSF type is SimplePSF
Building SimplePSF
Done building PSF
Initializing models
start with_hsm for  <piff.star.Star object at 0x7faeacf965c0>
imported galsim:  1.5 /home/esheldon/miniconda3/lib/python3.5/site-packages/galsim/__init__.py
Segmentation fault (core dumped)

from piff.

esheldon avatar esheldon commented on August 17, 2024

Here is the tail end of valgrind

...snip...
Read a total of 7 stars from 1 image
Parsing PSF based on config dict:
interp:
  type: Mean
model:
  fastfit: true
  include_pixel: false
  type: Gaussian

PSF type is SimplePSF
Building SimplePSF
Done building PSF
Initializing models
==5985== Invalid read of size 4
==5985==    at 0x1A3726BB: galsim::BaseImage<float>::nonZeroBounds() const (in /home/esheldon/miniconda3/lib/libgalsim.so.1.5)
==5985==    by 0x1A4B33CB: galsim::ImageView<double> galsim::hsm::MakeMaskedImage<float>(galsim::ImageAlloc<double>&, galsim::BaseImage<float> const&, galsim::BaseImage<int> const&) (in /home/esheldon/miniconda3/lib/libgalsim.so.1.5)
==5985==    by 0x1A4B3989: galsim::hsm::CppShapeData galsim::hsm::FindAdaptiveMomView<float>(galsim::BaseImage<float> const&, galsim::BaseImage<int> const&, double, double, galsim::Position<double>, boost::shared_ptr<galsim::hsm::HSMParams>) (in /home/esheldon/miniconda3/lib/libgalsim.so.1.5)
==5985==    by 0x18D89943: boost::python::detail::caller_arity<6u>::impl<galsim::hsm::CppShapeData (*)(galsim::BaseImage<float> const&, galsim::BaseImage<int> const&, double, double, galsim::Position<double>, boost::shared_ptr<galsim::hsm::HSMParams>), boost::python::default_call_policies, boost::mpl::vector7<galsim::hsm::CppShapeData, galsim::BaseImage<float> const&, galsim::BaseImage<int> const&, double, double, galsim::Position<double>, boost::shared_ptr<galsim::hsm::HSMParams> > >::operator()(_object*, _object*) (in /home/esheldon/miniconda3/lib/python3.5/site-packages/galsim/_galsim.so)
==5985==    by 0x190C000C: boost::python::objects::function::call(_object*, _object*) const (in /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0)
==5985==    by 0x190C0207: ??? (in /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0)
==5985==    by 0x190C8052: boost::python::handle_exception_impl(boost::function0<void>) (in /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0)
==5985==    by 0x190BD408: ??? (in /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0)
==5985==    by 0x4EA1235: PyObject_Call (abstract.c:2165)
==5985==    by 0x4F7B313: do_call (ceval.c:4936)
==5985==    by 0x4F7B313: call_function (ceval.c:4732)
==5985==    by 0x4F7B313: PyEval_EvalFrameEx (ceval.c:3236)
==5985==    by 0x4F7EB48: _PyEval_EvalCodeWithName (ceval.c:4018)
==5985==    by 0x4F7DDF4: fast_function (ceval.c:4813)
==5985==    by 0x4F7DDF4: call_function (ceval.c:4730)
==5985==    by 0x4F7DDF4: PyEval_EvalFrameEx (ceval.c:3236)
==5985==  Address 0x1f9651cc is not stack'd, malloc'd or (recently) free'd
==5985== 
==5985== 
==5985== Process terminating with default action of signal 11 (SIGSEGV)
==5985==  Access not within mapped region at address 0x1F9651CC
==5985==    at 0x1A3726BB: galsim::BaseImage<float>::nonZeroBounds() const (in /home/esheldon/miniconda3/lib/libgalsim.so.1.5)
==5985==    by 0x1A4B33CB: galsim::ImageView<double> galsim::hsm::MakeMaskedImage<float>(galsim::ImageAlloc<double>&, galsim::BaseImage<float> const&, galsim::BaseImage<int> const&) (in /home/esheldon/miniconda3/lib/libgalsim.so.1.5)
==5985==    by 0x1A4B3989: galsim::hsm::CppShapeData galsim::hsm::FindAdaptiveMomView<float>(galsim::BaseImage<float> const&, galsim::BaseImage<int> const&, double, double, galsim::Position<double>, boost::shared_ptr<galsim::hsm::HSMParams>) (in /home/esheldon/miniconda3/lib/libgalsim.so.1.5)
==5985==    by 0x18D89943: boost::python::detail::caller_arity<6u>::impl<galsim::hsm::CppShapeData (*)(galsim::BaseImage<float> const&, galsim::BaseImage<int> const&, double, double, galsim::Position<double>, boost::shared_ptr<galsim::hsm::HSMParams>), boost::python::default_call_policies, boost::mpl::vector7<galsim::hsm::CppShapeData, galsim::BaseImage<float> const&, galsim::BaseImage<int> const&, double, double, galsim::Position<double>, boost::shared_ptr<galsim::hsm::HSMParams> > >::operator()(_object*, _object*) (in /home/esheldon/miniconda3/lib/python3.5/site-packages/galsim/_galsim.so)
==5985==    by 0x190C000C: boost::python::objects::function::call(_object*, _object*) const (in /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0)
==5985==    by 0x190C0207: ??? (in /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0)
==5985==    by 0x190C8052: boost::python::handle_exception_impl(boost::function0<void>) (in /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0)
==5985==    by 0x190BD408: ??? (in /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0)
==5985==    by 0x4EA1235: PyObject_Call (abstract.c:2165)
==5985==    by 0x4F7B313: do_call (ceval.c:4936)
==5985==    by 0x4F7B313: call_function (ceval.c:4732)
==5985==    by 0x4F7B313: PyEval_EvalFrameEx (ceval.c:3236)
==5985==    by 0x4F7EB48: _PyEval_EvalCodeWithName (ceval.c:4018)
==5985==    by 0x4F7DDF4: fast_function (ceval.c:4813)
==5985==    by 0x4F7DDF4: call_function (ceval.c:4730)
==5985==    by 0x4F7DDF4: PyEval_EvalFrameEx (ceval.c:3236)
==5985==  If you believe this happened as a result of a stack
==5985==  overflow in your program's main thread (unlikely but
==5985==  possible), you can try to increase the size of the
==5985==  main thread stack using the --main-stacksize= flag.
==5985==  The main thread stack size used in this run was 8388608.
==5985== 
==5985== HEAP SUMMARY:
==5985==     in use at exit: 20,510,130 bytes in 47,591 blocks
==5985==   total heap usage: 346,881 allocs, 299,290 frees, 207,064,066 bytes allocated
==5985== 
==5985== LEAK SUMMARY:
==5985==    definitely lost: 153,164 bytes in 81 blocks
==5985==    indirectly lost: 0 bytes in 0 blocks
==5985==      possibly lost: 651,475 bytes in 614 blocks
==5985==    still reachable: 19,705,491 bytes in 46,896 blocks
==5985==         suppressed: 0 bytes in 0 blocks
==5985== Rerun with --leak-check=full to see details of leaked memory
==5985== 
==5985== For counts of detected and suppressed errors, rerun with: -v
==5985== Use --track-origins=yes to see where uninitialised values come from
==5985== ERROR SUMMARY: 8871 errors from 188 contexts (suppressed: 0 from 0)

from piff.

rmjarvis avatar rmjarvis commented on August 17, 2024

Hm. Not what I expected at all. Maybe the image is getting corrupted somehow.

I added some more print statements to investigate. This time could you please run python test_simple.py. This will run the tests from within python first (which had seemed to work, at least when running via nosetests) and then call out to piffify where I expect it to fail. Looking at the difference between the two ways of running may be instructive.

The output might be long, so maybe easier to just redirect to a file and send me that file by email instead of posting here.

from piff.

rmjarvis avatar rmjarvis commented on August 17, 2024

Actually, one quick sanity check first. Was this where you were expecting boost to come from?

==5985==    by 0x190C000C: boost::python::objects::function::call(_object*, _object*) const (in /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0)

You didn't do conda install boost for this installation?

from piff.

esheldon avatar esheldon commented on August 17, 2024

That is what I expected. I'm using the system libraries for everything that I can. I did notice one of your dependencies installed mkl into anaconda however.

from piff.

esheldon avatar esheldon commented on August 17, 2024

Output of python test_simple.py attached
test_simple_log.txt

from piff.

rmjarvis avatar rmjarvis commented on August 17, 2024

Anaconda's going to be the death of me someday probably.

I've managed to reproduce this error on my system with Anaconda python3, so I'm investigating. (The same system where I reproduced this error.)

from piff.

rmjarvis avatar rmjarvis commented on August 17, 2024

This is very frustrating. About 1/3 of the time, I even get seg faults doing

python setup.py install --prefix=~

in the Piff directory. Here's a run of 5 in a row that seg faulted before the 6th finally worked.

mjarvis@susi:~/rmjarvis/Piff[#50*] $ python setup.py install --prefix=~
python setup.py install --prefix=~
Using setuptools version 27.2.0
Python version =  3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
packages =  ['piff', 'piff.des']
Segmentation fault
mjarvis@susi:~/rmjarvis/Piff[#50*] $ python setup.py install --prefix=~
Using setuptools version 27.2.0
Python version =  3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
packages =  ['piff', 'piff.des']
Segmentation fault
mjarvis@susi:~/rmjarvis/Piff[#50*] $ python setup.py install --prefix=~
Using setuptools version 27.2.0
Python version =  3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
packages =  ['piff', 'piff.des']
Segmentation fault
mjarvis@susi:~/rmjarvis/Piff[#50*] $ python setup.py install --prefix=~
Using setuptools version 27.2.0
Python version =  3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
packages =  ['piff', 'piff.des']
Segmentation fault
mjarvis@susi:~/rmjarvis/Piff[#50*] $ python setup.py install --prefix=~
Using setuptools version 27.2.0
Python version =  3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
packages =  ['piff', 'piff.des']
Segmentation fault
mjarvis@susi:~/rmjarvis/Piff[#50*] $ python setup.py install --prefix=~
Using setuptools version 27.2.0
Python version =  3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
packages =  ['piff', 'piff.des']
Piff version is 0.1
running install
running bdist_egg
running egg_info
writing dependency_links to Piff.egg-info/dependency_links.txt
writing requirements to Piff.egg-info/requires.txt
writing Piff.egg-info/PKG-INFO
writing top-level names to Piff.egg-info/top_level.txt
reading manifest file 'Piff.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'include'
writing manifest file 'Piff.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
copying piff/input.py -> build/lib/piff
...

This makes me inclined to believe this is a bug in Anaconda, but I'm still trying to work down to a minimum test case.

So far I've determined that when running piffify simple.yaml I reliably get a seg fault if I access one of the image arrays after returning from piff.Input.process, but not before returning. Pretty weird.

Maybe Anaconda is garbage collecting in between, so maybe there's some kind of memory leak where memory gets deleted, but GalSim still tries to access it? Trying to keep an open mind that this could be a GalSim or Piff bug, rather than just blame Anaconda. Even though every other system I have, it all works fine...

from piff.

rmjarvis avatar rmjarvis commented on August 17, 2024

I finally tracked it down. Turns out it was a bug in GalSim that apparently has only shown up on this system yet. But a bonda fide bug that is worth fixing.

So not Anaconda's fault. :)

I think there's probably still some kind of bug in Anaconda causing seg faults, since I still get seg faults randomly at the very start or end of python programs. Even python setup.py install as I mentioned above. So it's hard to imagine that this could be a bug in anything I did. I think it's related to having gcc 6 with Anaconda's stdc++ library intermingling badly. I switched to using gcc 4.4 on that machine for all python-related things, and those seg faults went away.

from piff.

esheldon avatar esheldon commented on August 17, 2024

FYI for reference, I'm using miniconda and I've not installed numba, so I don't have a stdc++ library in Anaconda /lib. And on this system I don't see the random seg faults in python setup.py install, so your diagnosis of that may be correct.

from piff.

rmjarvis avatar rmjarvis commented on August 17, 2024

Good. I think I'm going to post an issue about gcc 5+ working badly with Anaconda, since it shows up more purely with TreeCorr. Then there's no boost or TMV to connect with, so it's pure python-compiled-with-g++ leading to library errors at run time. Maybe something they'd be willing to consider trying to fix. But I'm glad this isn't an issue with miniconda at least.

from piff.

TallJimbo avatar TallJimbo commented on August 17, 2024

We've been having some trouble with LSST's conda packages recently due to anaconda's reference system being CentOS 5 with (I think) a custom gcc 5. That caused trouble when we built against it with later glibc versions or older gcc versions. I haven't been fully following the discussion, but if you think it may be the same thing, LSST's #dm-release-builds Slack channel might be useful.

from piff.

esheldon avatar esheldon commented on August 17, 2024

anaconda uses [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

from piff.

rmjarvis avatar rmjarvis commented on August 17, 2024

@esheldon The bug fix was merged into GalSim master branch. Could you confirm that this fixes things on your system?

from piff.

rmjarvis avatar rmjarvis commented on August 17, 2024

@TallJimbo Thanks for letting us know. I was actually wondering about that, since I know you guys have been moving toward using c++11 features now, and I thought you were using Anaconda. I was wondering whether those two facts were causing trouble for you.

from piff.

esheldon avatar esheldon commented on August 17, 2024

Here is what went to stderr (I noticed a lot of other info went to stdout)

................./home/esheldon/miniconda3/lib/python3.5/site-packages/galsim/phase_psf.py:404: UserWarning: Input pupil plane image may not be sampled well enough!
Consider increasing sampling by a factor 28.444900, and/or check PhaseScreenPSF outputs for signs of folding in real space.
  "PhaseScreenPSF outputs for signs of folding in real space."%ratio)
................................
----------------------------------------------------------------------
Ran 49 tests in 114.373s

OK

from piff.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.