cement-psaap / mcdc Goto Github PK

MC/DC: Monte Carlo Dynamic Code

Home Page: https://mcdc.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 99.49% Shell 0.51%

high-performance-computing monte-carlo-simulation neutron-transport neutronics nuclear-engineering particle-transport radiation-transport parallel python simulation

mcdc's People

Contributors

Stargazers

Watchers

Forkers

ilhamv jpmorgan98 goodman17c spasmann aaronjamesreynolds shac170 clemekay moatazharb khurrumsaleem massimolarsen bwhewe-13 rohanpankaj antonioh2425 kyleniemeyer northroj braxtoncuneo

mcdc's Issues

MCDC (iQMC) fails with large processor count

iqmc_large_processor_error.txt

Failed due to memory constraints. It seems that somehow memory usage is linked to processor count.

Missing/incomplete Documentation

Hi MCDC team,

Some of my concerns on the documentation of the project.

Major points:

The API is not fully documented. For example cell, universe, lattice, tally, eigenmode, setting, iQMC, are among some of the functions not documented. Please add doc strings for all user-space functions detailed in the docs. I would also highly suggest adding doc strings to internal functions as well.

Minor points:

Following through the user guide, I've noticed that the reference solution isn't explictly given in the docs. It might be a good idea to add either an explicit link to this or as you've done already for the rest of the example, add the full code from the reference solution.
It might be a good idea to match the MPI runnning instructions given in the readme with those in the docs. For example, adding the bit about mpiexec/mpirun and how to control the number of cores used.
It might be due to me not being an expert in neutron transport, but the examples are a bit opaque to me. It would be helpful if there was a blurb or readme attached to each example explaning what experiment/setup is being simulated or even what parts of MCDC are being shown off. It might also be good to give the user a hint on whether to use Numba and/or MPI for each example as not to waste their time
On line 303 of user.rst there is a typo "statical" should be statistical.

openjournals/joss-reviews#6415

mpi4py issue with large k-eigenvalue simulations in Lassen

On Lassen, MCDC breaks after the first eigenvalue cycle if the number of histories per cycle is larger than 1e5 and numba is enabled.

#     k        k (avg)
 ==== ======= ===================
 1     1.43481
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
Traceback (most recent call last):
  File "/usr/WS1/northroj/SMR/mcdc/c5g7td/inffuel/input.py", line 78, in <module>
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    mcdc.run()
  File "/usr/WS1/northroj/miniconda3/MCDC/mcdc/main.py", line 41, in run
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
    loop_main(mcdc)
  File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '_'.
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\xba'.
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\x00'.
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\x00'.
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
OverflowError: BINBYTES exceeds system's maximum size of 9223372036854775807 bytes
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\x0f'.
  File "mpi4py/MPI/msgpickle.pxi", line 306, in mpi4py.MPI.PyMPI_recv_match
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\xfe'.
  File "mpi4py/MPI/msgpickle.pxi", line 152, in mpi4py.MPI.pickle_load
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\x00'.
_pickle.UnpicklingError: invalid load key, '\x00'.
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: unexpected MARK found
  File "mpi4py/MPI/msgpickle.pxi", line 141, in mpi4py.MPI.cloads
_pickle.UnpicklingError: invalid load key, '\x9f'.

Divide by zero error when running in numba mode in VSCode

Divide by zero error occurs randomly in numba mode, only when running in vs code. In the terminal it runs fine. On intel mac 6-core i7 running Ventura 13.2.1.

merging `harmonize-integration` with `main`

The situation

harmonize-integration which really was more of the of the GPU develop branch had fallen significantly behind main. @braxtoncuneo started the merge and successfully got all kernels compiled, operable, and running with Harmonize. Unfortunately when those kernels exit the data is not able to be moved back from the GPU to the CPU. This is due to some issue with the data layout of the mcdc global variable.

Other factors

The np.ndarray() structure is clearly at issue not just for this issue but also for #158 where it requires every entry to be of the same size. Finding an alternative data structure is probably the long term solution here but is very annoying. Trouble shooting the merge to be able to get all data working then making on the fly checks is maybe a less overhead issue but still fraught

Update installation script to match pyproject requirements

Currently, the install.sh script installs mpi4py from source when the --hpc flag is used. It's installing mpi4py version 3.1.4, which does not meet the pyproject's version requirement for mpi4py of 3.1.5, causing the source-installed version not to be used. I'll need to update this.

iqmc_cooper2 regression test fails in numba mode on linux OS

My branch iqmc/source_tilt is failing the iqmc_cooper2 regression test in Numba mode when I push to GitHub. You can see the test results here.
What's odd is it passes in pure python and numba mode on Windows and Mac.

This seems to be a Linux issue because it also passes in Python but fails in Numba on Quartz. I've tried messing with some of the input parameters, increasing/decreasing particles, but still can't get the test to pass.

Mac results:

PC results:

Linux (Quartz) results:

`kernel.split_particle` returns by reference in Numba

The use of kernel.split_particle is treated as a call by reference in Numba. The current use of the function is always followed by kernel.add_particle, which passes by value to particle banks. However, if we (naively) use it to create intermediate particles for some operations (like in the embedded sensitivity method), this would cause issues. What is interesting is that the function returns by value in Python.

Non-uniform structors of arrays in numba

Optimizations for how cross section data, tally meshes, and cells are stored. Right now with the np.ndarry the size of allocated memory for each item is that of the largest array and the same for all others. numba.jitclass is a remedy for this but is not GPU operable. Some initial ideas are

numba.jitclass with a puller function for GPU runs to nd.arrays
An offset scheme where data is stored as a single dim array with offsets
Use of other c-type data structures like pytorch or cupy

"A particle is lost" recursive loop

Running the 2D C5G7, I've run into a "Particle is lost" bug using both Monte Carlo and iQMC. A particle being lost could be due to an error in the input deck created by the user. So I'm not as concerned that a particle was lost, but that the "Particle is Lost" statement repeats infinitely. Shouldn't it print the error, kill the particle, and move on?

When get_particle_cell() fails to find a cell_ID we set:

P["alive"]=False,
return a cell_ID = -1,
and expect the while True loop in get_particle_material() to break?

However, it gets stuck in this recursive loop where it repeats the "Particle is Lost" statement, until the output file exceeds my disk quote and the run fails.

Sample of output:

A particle is lost at ( 16.612528973121755 -3.824586145564854 10000000000.0 )
A particle is lost at ( 16.612528973121755 -3.824586145564854 10000000000.0 )
A particle is lost at ( 16.612528973121755 -3.824586145564854 10000000000.0 )
A particle is lost at ( 16.612528973121755 -3.824586145564854 10000000000.0 )

Broken examples

Hi MCDC team,

I've found the following examples won't run or plot correctly:

fixed_source/pulsed_sphere gives the following error:

    mat_iron = mcdc.material(
               ^^^^^^^^^^^^^^
  File "/home/JOSS/MCDC/mcdc/input_.py", line 326, in material
    with h5py.File(dir_name + "/" + nuc_name + ".h5", "r") as f:
                   ~~~~~~~~~^~~~~
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

fixed_source/inf_pin_ce has the same error as above.
fixed_source/slab_ce has the same issue as above. This example could also benefit from more instructions (what is build_xml.py purpose?) and a mention that openmc is a dependency.
fixed_source/slab_reed_iqmc is missing a plt.show() or savefig from the process.py file.
eigenvalue/2d_c5g7 fails to plot with the following error. I think this is due to the "C" argument to pcolormesh only having 1 dimension

  File "/home/JOSS/MCDC/examples/eigenvalue/2d_c5g7/process.py", line 95, in <module>
    plt.pcolormesh(X, Y, phi_thermal_sd, shading="nearest")
  File "/home/anaconda3/envs/mcdc-env/lib/python3.11/site-packages/matplotlib/pyplot.py", line 3493, in pcolormesh
    __ret = gca().pcolormesh(
            ^^^^^^^^^^^^^^^^^
  File "/home/anaconda3/envs/mcdc-env/lib/python3.11/site-packages/matplotlib/__init__.py", line 1465, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anaconda3/envs/mcdc-env/lib/python3.11/site-packages/matplotlib/axes/_axes.py", line 6292, in pcolormesh
    X, Y, C, shading = self._pcolorargs('pcolormesh', *args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anaconda3/envs/mcdc-env/lib/python3.11/site-packages/matplotlib/axes/_axes.py", line 5815, in _pcolorargs
    nrows, ncols = C.shape[:2]
    ^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 2, got 1)

eigenvalue/slab_kornreich_iqmc is missing a plt.show() or savefig from the process.py file.
eigenvalue/smrg7 fails with ERROR: Particle census bank is full
c5g7/3d/TDX the default particle number does not resolve to the reference solution.

openjournals/joss-reviews#6415

Numba does not reproduce Python results

Numba does not reproduce the Python result for cooper2 in the regression test. The difference comes only from work index 17 (idx_work == 17 in loop.py). The issue only shows up if run with the original problem reflective BCs AND mcdc.implicit_capture.

The current, temporary, "fix" is to turn off the implicit capture.

The recommended action is to locate the issue by following all tally scorings happening in the said work index.

Outside GPU libraries for GPU functions

As we start to integrate more advanced hybrid methods on the GPU we are finding that most numpy functions are not supported on the GPU. I think we have two options here (1) reimplement all operations (gemm, LU decomp, etc.) in our own python-numba functions or (2) use CuPy supposed interoperability allowing for zero-overhead copy.

I think 2 is the way to go but would probably require an object mode call to work which is way less then ideal. This is related to #158 and how best to store data with the Cupy array potentially being the way to go.

adding functinality for cell detectors

Cell detectors are how most users actually tally information. Like MCNP here page 18.

Updating Regression Tests and Examples

The MPI issues keep failing out after repos are forked and we need to do better about testing all functionality. The specific tests we need to add:

Parallel regression test
Numba regression test
MPI & Numba Regression tests
Updating examples to have current functionality
I think as a stop gap measure we can write a number of SLURM or LSF batch scripts for Lassen or Quartz to at least somewhat optimize the process tho let me know what yall think.

adding CSG operators

Initially this needs to be union, intersection and difference operations for OpenMC->MC/DC input converter. Some tracking operations will be needed for a particle in a cell among potentially others

Documentation website initilization

Ticket to track progress on that. Individual tasks

make a docs branch of mcdc? I am not sure if this is needed
set up a sphinx thing in the read the docs format that can use our black code style thing

Performance testing via `chron` jobs

In the test/performance directory there is currently only one test. I am thinking we just start with whatever that test is on a given number of quartz nodes, then go from there, but I go some other questions:

Are we going to be comparing the solutions for correctness? If so where are we storing that tally data (which might be massive)
What are the jobs (c5g7 pulsed sphere)
What are the job parameters we are shooting for (machine, # of nodes, # MPI ranks, Numba v Python mode, etc.)
Where will we be storing the job runtimes (I say another GitHub repo in the CEMeNT)

Function Timers

Adding some kind of timer for individual functions to more readily identify hotpots in both Numba and Python mode. Working fork and branch

Obj mode in numba screws us over and the timing doesn't seem to be correct when comparing total wall clock runtime.

Things todo:

Investigate numba mode timing functions
Make run on multiple processors

SEED_SPLIT_UQ OverflowError on PC

I'm running into the error below on all MC/DC runs but only on my desktop PC. Everything runs fine on my Mac laptop.

Traceback (most recent call last):
  File "C:\Users\Sam\Documents\Github\MCDC\test\regression\c5g7_2d_k_eigenvalue\input.py", line 4, in <module>
    import mcdc
  File "c:\users\sam\documents\github\mcdc\mcdc\__init__.py", line 2, in <module>
    from mcdc.input_ import (
  File "c:\users\sam\documents\github\mcdc\mcdc\input_.py", line 11, in <module>
    from mcdc.card import (
  File "c:\users\sam\documents\github\mcdc\mcdc\card.py", line 3, in <module>
    from mcdc.constant import INF, GYRATION_RADIUS_ALL, PCT_NONE, PI, SHIFT
  File "c:\users\sam\documents\github\mcdc\mcdc\constant.py", line 51, in <module>
    SEED_SPLIT_UQ = nb.uint(0x5368656261)
  File "C:\Users\Sam\mambaforge\envs\mcdc_env\lib\site-packages\numba\core\types\abstract.py", line 176, in __call__
    return self.cast_python_value(args[0])
  File "C:\Users\Sam\mambaforge\envs\mcdc_env\lib\site-packages\numba\core\types\scalars.py", line 49, in cast_python_value
    return getattr(np, self.name)(value)
OverflowError: Python int too large to convert to C long

Visualization merging

I am trying to merge the visualizer into main but am having some difficulties

`Seg fault` with `* Process received signal *`

So in the OSU CI machine cretin numba problems would copmile but fail to run. This happened on a number of the regression tests as well that where passing in the gh action runner. The full error is here:

(mcdc_dev) cement ~/workspace/MCDC/examples/fixed_source/slab_absorbium 1026$ python input.py --mode=numba
  __  __  ____  __ ____   ____ 
 |  \/  |/ ___|/ /_  _ \ / ___|
 | |\/| | |   /_  / | | | |    
 | |  | | |___ / /| |_| | |___ 
 |_|  |_|\____|// |____/ \____|

           Mode | Numba
      Algorithm | History-based
  MPI Processes | 1
 OpenMP Threads | 1
 Now running TNT...
[cement:17804] *** Process received signal ***
[cement:17804] Signal: Segmentation fault (11)
[cement:17804] Signal code: Address not mapped (1)
[cement:17804] Failing at address: 0x256990c7fa14
[cement:17804] [ 0] /lib64/libpthread.so.0(+0xf630)[0x7fb3607a9630]
[cement:17804] [ 1] [0x7fb2ac5d2160]
[cement:17804] [ 2] [0x7fb2abf790a6]
[cement:17804] [ 3] [0x7fb2ac93d32b]
[cement:17804] [ 4] [0x7fb2ac6ac375]
[cement:17804] [ 5] [0x7fb2a6c13443]
[cement:17804] [ 6] [0x7fb2a6c1381e]
[cement:17804] [ 7] /nfs/stak/users/morgajoa/miniconda3/envs/mcdc_dev/lib/python3.11/site-packages/numba/_dispatcher.cpython-311-x86_64-linux-gnu.so(+0x53f4)[0x7fb3555cc3f4]
[cement:17804] [ 8] /nfs/stak/users/morgajoa/miniconda3/envs/mcdc_dev/lib/python3.11/site-packages/numba/_dispatcher.cpython-311-x86_64-linux-gnu.so(+0x5712)[0x7fb3555cc712]
[cement:17804] [ 9] python(_PyObject_MakeTpCall+0x26c)[0x5041ac]
[cement:17804] [10] python(_PyEval_EvalFrameDefault+0x6a7)[0x5116e7]
[cement:17804] [11] python[0x5cbeda]
[cement:17804] [12] python(PyEval_EvalCode+0x9f)[0x5cb5af]
[cement:17804] [13] python[0x5ec6a7]
[cement:17804] [14] python[0x5e8240]
[cement:17804] [15] python[0x5fd192]
[cement:17804] [16] python(_PyRun_SimpleFileObject+0x19f)[0x5fc55f]
[cement:17804] [17] python(_PyRun_AnyFileObject+0x43)[0x5fc283]
[cement:17804] [18] python(Py_RunMain+0x2ee)[0x5f6efe]
[cement:17804] [19] python(Py_BytesMain+0x39)[0x5bbc79]
[cement:17804] [20] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fb35fce5555]
[cement:17804] [21] python[0x5bbac3]
[cement:17804] *** End of error message ***
Segmentation fault (core dumped)

Whenever I see errors like lib64/libc.so.6 my mind immediately goes to incompatible compiler issues. First thing I tried as

conda install -c conda-forge gxx

and that fixed it for some problems but still resulted in a seg fault for others specifically in the regression tests. I am running this in a manual terminal right now but eventually this will be the env that we do gh actions on for GPU regression testing. I am going to try other modules that have g++ and maybe look at llvm versions.

One thing to emphasize is this does seem like a runtime issue, not a compilation failure

material dnp size error

Problem

If materials have differing numbers of delayed neutron groups all created arrays are of the same size even if they shouldn't be. Delayed parameters need to be displayed in the input otherwise it will break. Issue in main.py

Potential solution

having better default values setup for delayed neutron parameters

Issue on Quartz: HYDU_sock_connect unable to connect

Running into an issue on Quartz when I submit jobs from my PC (jobs run normally when submitting submitting from my Mac). All PC batch jobs fail with multiple errors like:

[proxy:0:23@quartz2664] HYDU_sock_connect (lib/utils/sock.c:140): unable to connect from "quartz2664" to "quartz79" (No route to host)
[proxy:0:23@quartz2664] main (proxy/pmip.c:105): unable to connect to server quartz79 at port 43377 (check for firewalls!)
srun: error: quartz2664: task 23: Exited with exit code 5

This is occurring regardless of branch or input deck. Based on similar bugs here and here it seems to be an issue either with my ssh or hostname issue.

I've submitted an LC-Hotline ticket and will report back what they say.

De`object modeing` MC/DC

object mode has a large set of issues that forces MC/DC to crash. Removing object mode from anywhere possible is important as we do larger runs. Removing it from places like MPI functions that aren't needed or functions we can re implement the object mode functionality

"Particle Lost" Monte Carlo Eigenvalue Bug

I'm running into a bug with the 2D C5G7 input deck in examples/eigenvalue/. An inifinite "particle is lost..." loop occurs but only at very high particle counts, N~=5e8. I have been unable to recreate it with any lower particle count on pdebug nodes. Error output below is from Quartz

 __  __  ____  __ ____   ____ 
 |  \/  |/ ___|/ /_  _ \ / ___|
 | |\/| | |   /_  / | | | |    
 | |  | | |___ / /| |_| | |___ 
 |_|  |_|\____|// |____/ \____|

           Mode | Numba
      Algorithm | History-based
  MPI Processes | 9216
 OpenMP Threads | 1
 Now running TNT...

 #     k        k (avg)            
 ====  =======  ===================
 1     0.65242
 2     1.09593
 3     1.12501
 4     1.14159
 5     1.15266
 6     1.16068
 7     1.16651
 8     1.17110
 9     1.17463
 10    1.17729
 11    1.17924
 12    1.18093
 13    1.18206
 14    1.18316
 15    1.18404
 16    1.18450
 17    1.18501
 18    1.18518
 19    1.18554
 20    1.18574
 21    1.18592
 22    1.18600
 23    1.18618
 24    1.18621
 25    1.18636
 26    1.18629
 27    1.18648
 28    1.18635
 29    1.18658
 30    1.18643
 31    1.18640
 32    1.18641
 33    1.18640
 34    1.18656
[============                    ] 38%A particle is lost at ( 5.4429562620289635 9.999771016501171e-11 1.6518620649803937 )
A particle is lost at ( 5.4429562620289635 9.999771016501171e-11 1.6518620649803937 )
A particle is lost at ( 5.4429562620289635 9.999771016501171e-11 1.6518620649803937 )
A particle is lost at ( 5.4429562620289635 9.999771016501171e-11 1.6518620649803937 )
...

CI and Testing

So after determining that the build bot is a wee too complex for our needs I have been looking at using something recommended by some folks on the Cantara project using a process with githooks. I have it working with my own accounts and running tests but some issues:

Only works with my local machine and my local account
Not automated (could probably be with a timer but not with a webhook cus we don't have a web server)

I am trying to think up some way of doing this with email notifications but that seems janky. For now when someone makes a PR they are probably going to have to ask me to run the performance tests manually.

Visualizer crashes on 3D C5G7 and SMRG7

Using latest version of MCDC, after inserting mcdc.visualize() after defining the geometry in the 3D C5G7 and SMRG7 input decks I get the following error

(mcdc-vis-env) PS C:\Users\Sam\Documents\Github\Experiments\3d_c5g7> python .\input.py
optfile ./ng.opt does not exist - using default values
togl-version : 2
OCC module loaded
loading ngsolve library
NGSolve-6.2.2307
Using Lapack
Including sparse direct solver Pardiso
Running parallel using 16 thread(s)
Traceback (most recent call last):
  File "C:\Users\Sam\Documents\Github\Experiments\3d_c5g7\input.py", line 272, in <module>
    mcdc.visualize()
  File "C:\Users\Sam\mambaforge\envs\mcdc-vis-env\Lib\site-packages\mcdc\visualizer.py", line 435, in visualize
    color_key_dic = draw_Geometry(
                    ^^^^^^^^^^^^^^
  File "C:\Users\Sam\mambaforge\envs\mcdc-vis-env\Lib\site-packages\mcdc\visualizer.py", line 318, in draw_Geometry
    cell_geometry = create_cell_geometry(
                    ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Sam\mambaforge\envs\mcdc-vis-env\Lib\site-packages\mcdc\visualizer.py", line 233, in create_cell_geometry
    math.sqrt(abs(surface_list[surface_ID]["J"][0, 0] - z**2 - y**2))
                                                        ^
UnboundLocalError: cannot access local variable 'z' where it is not associated with a value

Any ideas @RohanPankaj ?

`cache` issues

Unknown behaviors and bugs of new cache feature

cache work flows before sending to MPI ranks
Size of saved cache growing unbounded
cache requiring recompilation for altered meshes, census time, and census bank buffer

mesh_get_index returns a value of -1 when a particle lies on mesh minimum boundary

The mesh_get_index returns a value of -1 when a particle is at the lower boundary of a mesh in any of space or time. This is problematic for having an initial condition or boundary source. Often, the effect is unnoticed as Python will treat the -1 index to reference the last element of the array without error or warning.

A workaround has been to modify the mesh in the input file with some small perturbation.

RNG unit test failure - TypeError

You Silly Goose.
The RNG unit test is failing with a Type Error.

Pytest failure error message attached below.
YouSillyGoose.txt

Future documentation improvments

Adding more documentation to various parts of MC/DC t

Doc-strings to all internal functions as well as the user API functions
Chapters on the documentation website that work through each one of the examples
Theory guide explaining the physics of our methods and algorithms better

This is planned work for the April 2024 hackathon. Future PR's should also include additions to docstrings and documentation when adding functions and features

Installation issue

Hi MCDC team,
I'm one of the reviewers working on reviewing your JOSS submission openjournals/joss-reviews#6415.

Currently following the instructions in the README and the readthedocs webpage fails to install the package.
Currently on creating a Conda env, cloning the repo and running bash install.sh gives:

CommandNotFoundError: No command 'conda mpi4py'.

I see from PR #162 and #165 it has been fixed but not merged into the main branch.
Can this be merged into the main branch so I can tick off the installation item on my review list?