Giter Club home page Giter Club logo

wdmerger's People

Contributors

alancalder avatar asalmgren avatar bcfriesen avatar kissformiss avatar maxpkatz avatar weiqunzhang avatar zingale avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wdmerger's Issues

wdmerger_collision crashes on 2nd-gen Xeon Phi (KNL) with Intel compilers

The test problem wdmerger_collision crashes somewhere in the multigrid solver when using Intel compilers v16 and v17 (beta) compiled for 2nd-gen Xeon Phi ("Knights Landing"). The exact point where it crashes tends to vary. Below are a few example outputs:

Example 1:

Initializing the data at level 1
Done initializing the level 1 data 
STEP = 0 TIME = 0 : REGRID  with lbase = 0
  Level 1   24 grids  393216 cells  5.555555556 % of domain
            smallest grid: 32 x 32 x 16  biggest grid: 32 x 32 x 16

... multilevel solve for new phi at base level 0 to finest level 1
Gravity::make_radial_phi() time = 0.3054199219
 ... Making bc's for phi at level 0 
Gravity::fill_multipole_BCs() time = 3.019109011
*** Error in `/global/u2/f/friesen/wdmerger/tests/wdmerger_3D/./Castro3d.intel.MPI.OMP.ex': free(): invalid next size (normal): 0x00000000033c1380 ***

Example 2:

Castro::numpts_1d at level  1 is 340
Initializing the data at level 1
Done initializing the level 1 data 
STEP = 0 TIME = 0 : REGRID  with lbase = 0
  Level 1   12 grids  393216 cells  5.555555556 % of domain
            smallest grid: 32 x 32 x 32  biggest grid: 32 x 32 x 32

... multilevel solve for new phi at base level 0 to finest level 1
Gravity::make_radial_phi() time = 0.007010936737
 ... Making bc's for phi at level 0 
Gravity::fill_multipole_BCs() time = 0.1369400024
 BOXLIB ERROR: fab_dataptr_bx_c: bx is too large
forrtl: severe (40): recursive I/O operation, unit -1, file unknown
forrtl: severe (40): recursive I/O operation, unit -1, file unknown
forrtl: severe (40): recursive I/O operation, unit -1, file unknown
forrtl: severe (40): recursive I/O operation, unit -1, file unknown
forrtl: severe (40): recursive I/O operation, unit -1, file unknown
forrtl: severe (40): recursive I/O operation, unit -1, file unknown
forrtl: severe (40): recursive I/O operation, unit -1, file unknown
forrtl: severe (40): recursive I/O operation, unit -1, file unknown
forrtl: severe (40): recursive I/O operation, unit -1, file unknown
*** Error in `/global/u2/f/friesen/wdmerger/tests/wdmerger_3D/./Castro3d.intel.MPI.OMP.ex': free(): corrupted unsorted chunks: 0x0000000001e61050 ***
*** Error in `/global/u2/f/friesen/wdmerger/tests/wdmerger_3D/./Castro3d.intel.MPI.OMP.ex': double free or corruption (!prev): 0x0000000001e61620 ***
*** Error in `/global/u2/f/friesen/wdmerger/tests/wdmerger_3D/./Castro3d.intel.MPI.OMP.ex': free(): corrupted unsorted chunks: 0x0000000001e60a50 ***

Example 3:

Castro::numpts_1d at level  1 is 340
Initializing the data at level 1
Done initializing the level 1 data 
STEP = 0 TIME = 0 : REGRID  with lbase = 0
  Level 1   12 grids  393216 cells  5.555555556 % of domain
            smallest grid: 32 x 32 x 32  biggest grid: 32 x 32 x 32

... multilevel solve for new phi at base level 0 to finest level 1
Gravity::make_radial_phi() time = 0.008671045303
 ... Making bc's for phi at level 0 
Gravity::fill_multipole_BCs() time = 0.1363129616
0::Segfault !!!
See Backtrace.rg_0_rl_0.0 file for details
Rank 0 [Fri Nov 25 13:52:41 2016] [c10-4c0s0n1] application called MPI_Abort(comm=0x84000000, -1) - process 0
srun: error: nid11137: task 0: Aborted
srun: Terminating job step 3155671.7

Interestingly, when compiled with DEBUG set to TRUE, it fails with this error:

Gravity::make_radial_phi() time = 0.008027076721
 ... Making bc's for phi at level 0 
Gravity::fill_multipole_BCs() time = 0.1858928204
forrtl: severe (408): fort: (2): Subscript #1 of the array IN has value 2 which is greater than the upper bound of 1

Although I've been unable to figure out which array it's talking about (the DDT debugger doesn't capture errors from forrtl so it can't show a call stack).

This error does not occur with either the GCC or Cray compilers targeting 2nd-gen Xeon Phi.

Another data point is that another user of Nyx on a different 2nd-gen Xeon Phi system also sees segfaults in Nyx using the Intel compilers. So the evidence suggests this may not be a Castro/wdmerger problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.