Giter Club home page Giter Club logo

moment_kinetics's Introduction

Getting started

The full documentation is online at https://mabarnes.github.io/moment_kinetics.

Setup

First clone this git repository, e.g. (to clone it into a directory with the default name moment_kinetics)

$ git clone [email protected]:mabarnes/moment_kinetics

The command above assumes that you have an account on Github.com, and that account has ssh keys set up. If that is not the case you can clone using https instead

$ git clone https://github.com/mabarnes/moment_kinetics

When using https some things (e.g. pushing to the remote repository) may require you to use 2-factor authentication, see https://docs.github.com/en/get-started/getting-started-with-git/about-remote-repositories#cloning-with-https-urls.

!!! warning Do not download the zip-file from the Github.com page. This gives you the source code files but does not create a git repository. We get some version information from git when running the code, so without the git repository you will not be able to run a simulation.

  1. If you have already installed Julia, ensure that the Julia version is >= 1.9.0 by doing

    $ julia --version
    

    at command line. The setup script in step 2 can also download a Julia binary if you have not already installed Julia.

  2. If you are running on a desktop/laptop (rather than an HPC cluster) ensure that you have an MPI implementation installed (using whatever the usual way of installing software is on your system). It should not matter which MPI implementation - openmpi is often a good choice if you have no reason to prefer a particular one. Check that the MPI compiler wrapper mpicc is available, e.g.

    $ mpicc --version
    

    should run without an error.

  3. Run the setup script

    $ machines/machine_setup.sh
    

    This script will prompt you for various options. The default choices should be sensible in most cases. On a laptop/desktop the 'name of machine to set up' will be 'generic-pc' and will set up for interactive use. On supported clusters, 'name of machine' will be the name of the cluster. On other clusters 'generic-batch' can be used, but requires some manual setup (see machines/generic-batch-template/README.md).

    For more information, see machine_setup notes.

    If you want or need to set up 'by hand' without using machines/machine_setup.sh, see Manual setup.

Some other notes that might sometimes be useful:

  • To speed up running scripts or the first call of run_moment_kinetics in a REPL session, it is possible to compile a 'system image' (moment_kinetics.so). By running

    $ julia --project -O3 precompile.jl
    

    and then start Julia by running for example

    $ julia --project -O3 -Jmoment_kinetics.so 
    

    this significantly decreases the load time but prevents code changes from taking effect when moment_kinetics.so is used until you repeat the compilation of the system image. Note that this also prevents the Revise package from updating moment_kinetics when you edit the code during and interactive session.

    System images are created by default on HPC clusters, and are required to use the provided jobscript-*.template submission scripts (used by submit-run.sh and submit-restart.sh). This is to try and minimise the compilation that has to be replicated on all the (possibly thousands of) processes in a parallel run. After changing source code, you should run

    $ precompile-submit.sh
    

    (to re-compile the moment_kinetics.so system image).

  • In the course of development, it is sometimes helpful to upgrade the Julia version. Upgrading the version of Julia or upgrading packages may require a fresh installation of moment_kinetics. To make a fresh install with the latest package versions you should be able to just run

    pkg> update

    (to enter 'Package mode' enter ']' at the julia> prompt). It might sometimes necessary or helpful to instead remove (or rename) the Manifest.jl file in the main directory, and re-run the setup from step 2) above. It can sometimes be necessary to remove or rename the .julia/ directory (located by default in your home directory) to force all the dependencies to be rebuilt.

  • When using the Plots-based post-processing library, one may have to set an environment variable to avoid error messages from the Qt library. If you execute the command

    $ julia --project run_post_processing.jl runs/your_run_dir/
    

    and see the error message

    qt.qpa.xcb: could not connect to display
    qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
    This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
    

    this can be suppressed by setting

    export QT_QPA_PLATFORM=offscreen
    

    in your .bashrc or .bash_profile files.

Run a simulation

To run julia with optimization, type

$ julia -O3 --project run_moment_kinetics.jl input.toml

Note that the middle character in -O3 is a capital letter 'O', not a zero. (On HPC clusters, or if you selected the "set up separate packages for post processing" option from machines/machine_setup.sh, you should use -O3 --check-bounds=no instead of just -O3, and the same in the Restarting section.)

Options are specified in a TOML file, e.g. input.toml here. The defaults are specified in moment_kinetics_input.jl.

  • To run in parallel, just put mpirun -np <n> in front of the call you would normally use, with <n> the number of processes to use.
  • It may be more convenient when running moment_kinetics more than once to work from the Julia REPL, e.g.
    $ julia -O3 --project
    julia> using moment_kinetics
    julia> run_moment_kinetics("input.toml")
    
    where input is the name of a TOML file containing the desired options. It is also possible to pass a Dict() containing any non-default options desired, which might sometimes be useful in tests or scripts
    julia> run_moment_kinetics(input)
    
    Especially when developing the code, a lot of compilation time can be saved by using Revise.jl, and re-running a test case in the REPL (without restarting julia) - this is enabled by default when setting up using machines/machine_setup.sh for 'generic-pc'.

On an HPC cluster, you can submit a simulation (using the input file input.toml) to the batch queue using the convenience script

$ ./submit-run.sh input.toml

See the help text

$ ./submit-run.sh -h

for various command line options to change parameters (e.g. number of nodes, etc.).

If you need to rebuild the system images moment_kinetics.so and makie_postproc.so or plots_postproc.so because you have updated the code since they were built, it may be convenient to use

$ ./submit-precompile-and-run.sh input.toml

which will submit jobs for compilation, to run the simulation, and to do post processing. The simulation job will wait for the compilation job creating moment_kinetics.so to finish before starting. The post processing job will wait for the compilation job creating makie_postproc.so or plots_postproc.so to finish before starting.

Stopping a run

When running in the REPL (especially with MPI) interrupting a run using Ctrl-C can mess things up, and require you to restart Julia. There is also a chance that you might interrupt while writing the output files and corrupt them. To avoid these problems, you can stop the run cleanly (including writing the distribution functions at the last time point, so that it is possible to restart the run from where you stopped it), by creating an empty file called stop in the run directory. For example, if the name of your run is 'my_example'

$ touch runs/my_example/stop

moment_kinetics checks for this file when it is going to write output, and if it is present writes all output and then returns cleanly. The 'stop file' is deleted when a run is (re-)started, if present, so you do not have to manually delete it before (re-)starting the run again.

Restarting

To restart a simulation using input.toml from the last time point in the existing run directory,

$ julia -O3 --project run_moment_kinetics --restart input.toml

or to restart from a specific output file - either from the same run or (if the settings are compatible, see below) a different one - here runs/example/example.dfns.h5

$ julia -O3 --project run_moment_kinetics input.toml runs/example/example.dfns.h5

The output file must include distribution functions. When not using parallel I/O there will be multiple output files from different MPI ranks - any one of these can be passed.

To do the same from the Julia REPL

$ julia -O3 --project
julia> run_moment_kinetics("input.toml", restart=true)

or

julia> run_moment_kinetics("input.toml", restart="runs/example/example.dfns.h5")

When calling the run_moment_kinetics() function you can also choose a particular time index to restart from, e.g.

julia> run_moment_kinetics("input.toml", restart="runs/example/example.dfns.h5", restart_time_index=42)

On an HPC cluster, you can submit a restart (using the input file input.toml) to the batch queue using the convenience script

$ ./submit-restart.sh input.toml

or to restart from a particular output file

$ ./submit-restart.sh -r runs/example/example.dfns.h5 input.toml

See the help text

$ ./submit-restart.sh -h

for various other command line options to change parameters (e.g. number of nodes, etc.).

If you need to rebuild the system images moment_kinetics.so and makie_postproc.so or plots_postproc.so because you have updated the code since they were built, it may be convenient to use

$ ./submit-precompile-and-restart.sh [-r runs/example/example.dfns.h5] input.toml

which will submit jobs for compilation, to restart the simulation, and to do post processing. The simulation job will wait for the compilation job creating moment_kinetics.so to finish before starting. The post processing job will wait for the compilation job creating makie_postproc.so or plots_postproc.so to finish before starting.

It is possible to restart a run from another output file with different resolution settings or different moment-kinetic options. This is done by interpolating variables from the old run onto the new grid.

  • When interpolating in spatial dimensions it is not recommended to change the length of the domain.
  • For velocity space dimensions, changing the size of the domain should be OK. Points outside the original domain will be filled with $\propto \exp(-v^2)$ decreasing values.
  • When changing from 1D (no $r$-dimension) to 2D (with $r$-dimension), the interpolated values will be constant in $r$.
  • When changing from 1V to 2V or 3V, the interpolated values will be proportional to $\exp(-v_j^2)$ in the new dimension(s).

When running in parallel, both the old and the new grids must be compatible with the distributed-MPI parallelisation. When not using Parallel I/O, the distributed-MPI domain decomposition must be identical in the old and new runs (as each block only reads from a single file).

Post-processing with makie_post_processing

The default post-processing module, written to be a bit more generic and flexible than the original Plots-based one, and able to be used interactively, is provided in makie_post_processing, see Post processing.

On an HPC cluster, when you call ./submit-run.sh or ./submit-restart.sh, a job will (by default) be submitted to run makie_post_processing.makie_post_process or plots_post_processing.analyze_and_plot_data (depending on which you have set up, or on whether you pass the -o argument when both are set up) on the output after the run is finished. You can skip this by passing the -a argument to ./submit-run.sh or ./submit-restart.sh.

Original, Plots-based post processing quickstart

This post-processing functionality is now disabled by default, but you can enable it by entering y at the "Would you like to set up plots_post_processing?" prompt in machines/machine_setup.sh.

To make plots and calculate frequencies/growth rates, run

$ julia --project -O3 run_post_processing.jl runs/<directory to process>

passing the directory to process as a command line argument. Input options for post-processing can be specified in post_processing_input.jl. Note that even when running interactively, it is necessary to restart Julia after modifying post_processing_input.jl.

Post processing can be done for several directories at once using

$ julia --project -O3 post_processing_driver.jl runs/<directory1> runs/<directory2> ...

passing the directories to process as command line arguments. Optionally pass a number as the first argument to parallelise post processing of different directories.

Parallel I/O

To enable parallel I/O, HDF5.jl needs to be configured to use an HDF5 library which has MPI enabled and is compiled using the same MPI as you run Julia with. To ensure this happens, machines/machine_setup.sh will download the HDF5 source code and compile a local copy of the library under machines/artifacts, unless you enter n at the "Do you want to download, and compile a local version of HDF5" prompt (except on known HPC clusters where an MPI-enabled HDF5 is provided by a module - this is currently true on ARCHER2 - where the module-provided HDF5 is used).

Running parameter scans

Parameter scans (see Parameter scans) can be performed by running

$ julia -O3 --project run_parameter_scan.jl path/to/scan/input.toml

If running a scan, it can be parallelised by passing the -p argument to julia, e.g. to run on 8 processes

$ julia -p 8 -O3 --project run_parameter_scan.jl path/to/scan/input.toml

Tests

There is a test suite in the test/ subdirectory. It can be run in a few ways:

  • Execute some or all of the tests as a script. For example in the terminal run
    $ julia -O3 --project moment_kinetics/test/runtests.jl
    
    or in the REPL run
    julia> include("moment_kinetics/test/runtests.jl")
    
    Individual test files can also be used instead of runtests.jl, which runs all the tests.
  • You can also run the tests using Pkg. Either using pkg> mode
    $ julia -O3 --project
    julia> <press ']' to enter pkg mode>
    (moment_kinetics) pkg> test moment_kinetics
    
    using Pkg in the REPL
    $ julia -O3 --project
    julia> import Pkg
    julia> Pkg.test("moment_kinetics")
    
    or run on the command line
    julia -O3 --project -e "import Pkg; Pkg.test("moment_kinetics")`
    
    The downside of this method is that it will cause NCDatasets to be installed if you did not install it already, which might sometimes cause linking errors (related to the HDF5 library, see Optional dependencies).

By default the test suite should run fairly quickly (in a few minutes). To do so, it skips many cases. To run more comprehensive tests, you can activate the --long option:

  • In the REPL, run
    julia> push!(ARGS, "--long")
    
    before running the tests.
  • Running from the terminal, pass as a command line argument, e.g.
    $ julia -O3 --project --long moment_kinetics/test/runtests.jl
    
  • Using test_args argument
    julia> Pkg.test("moment_kinetics"; test_args=["--long"])
    
    Note the semicolon is necessary.

To get more output on what tests were successful, an option --verbose (or -v) can be passed in a similar way to --long (if any tests fail, the output is printed by default).

moment_kinetics's People

Contributors

johnomotani avatar mrhardman avatar mabarnes avatar lucasmontoya4 avatar josephthomasparker avatar

Stargazers

 avatar nansi avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

moment_kinetics's Issues

Gyroaverages

Gyroaverages may be crucial for providing a physical cut-off to the wavenumber spectrum required to be resolved in the moment_kinetics model.

Initial experiments implementing an ion gyroaverage is in the branch https://github.com/mabarnes/moment_kinetics/tree/feature-gyroaverages.

To test the feature, run https://github.com/mabarnes/moment_kinetics/blob/feature-gyroaverages/test_scripts/gyroaverage_test.jl.

Here the Lagrange Polynomial representation is used to precompute integration weights as a function of z and r, in place of the integral in gyrophase, which is only a line integral in (z,r) space. The test plots the field phi and the gyroaverage gphi for several values of vperp. Boundary conditions are not yet dealt with, meaning that the code will definitely produce poor results near the domain boundary. The code is written assuming fully local grids in memory.

To do list:

  • Implement sandbox-gyroaverage
  • Create a test for the gyroaverage feature
  • Determine good ideal behaviour for boundary regions
  • Implement good boundary behaviour
  • Implement the gyroaverage with shared-memory MPI
  • Consider how to efficiently implement the gyroaverage with distributed memory MPI
  • Consider how to leverage the sparse nature of the gyroaverage operator to speedup evaluation
  • Extend gyroaverage feature to main code
  • Compare to numerical dissipation
    Comments appreciated @johnomotani @mabarnes.

Documentation/Implementation for output timing data

Currently the code has a TimerOutput depency in moment_kinetics.jl, however, by default timing data does not seem to be output.

An enhancement therefore is to document or implement the output of timing data.

Default initial condition for charged pdf & wall boundary condition does not satisfy Chodura.

The default initial condition for the charged pdf, set at this line

for iz โˆˆ zrange
(in master), does not appear to satisfy the kinetic Chodura condition calculated in the diagnostic here:
function update_chodura_integral_species!(ff,dffdr,ff_dummy,vpa,vperp,z,r,composition,geometry,vz,dens,del_vpa)

This may be an issue for production simulations.

Reduce numbers of arguments to functions?

At the moment we have a lot of functions, especially in time_advance.jl with very long argument lists. That makes the function signatures hard to read, and makes maintenance more difficult.

One possibility could be to wrap things up into bigger structs so that there are only a few things to pass around. Not sure what the best choice is for what to include in a struct or what to name them, but as a first stab at this we could for example take the output of setup_time_advance() and put it all in a wrapper struct like

struct time_advance_state{
    T_moments,
    T_fields,
    T_vpa_advect,
    T_z_advect,
    T_r_advect,
    T_vpa_SL,
    T_z_SL,
    T_r_SL,
    T_scratch,
    T_advance,
    T_scratch_dummy_sr,
}
    moments::T_moments,
    fields::T_fields,
    vpa_advect::T_vpa_advect,
    z_advect::T_z_advect,
    r_advect::T_r_advect,
    vpa_SL::T_vpa_SL,
    z_SL::T_z_SL,
    r_SL::T_r_SL,
    scratch::T_scratch,
    advance::T_advance,
    scratch_dummy_sr::T_scratch_dummy_sr,
end

Then setup_time_advance!() would end with

return time_advance_state(moments, fields, vpa_advect, z_advect, r_advect, vpa_SL, z_SL, r_SL,
                          scratch, advance, scratch_dummy_sr)

and we pass around that struct anywhere any of its contents is needed. The type-parameterized definition of time_advance_state should mean that every function where it's used is still type-stable and (I suspect though I'm not sure and would have to check performance after the change) getting a field from a struct should be inline-able so the change should have no effect on performance.

Create docs page

@mabarnes could you please create a 'Github pages' page pointing to the gh-pages branch?

Super quick how-to:

  • go to Settings->Pages
  • Set the 'Source' to gh-pages from the drop-down menu
  • Pick any theme - doesn't matter which because whatever the theme does is overridden by Documenter.jl anyway
  • Click save

Then our docs should get published to https://mabarnes.github.io/moment_kinetics/ ๐Ÿ˜ƒ

--package option not recognised when trying to run using new README.md instructions

I have tried running moment_kinetics with the latest version of the source using the command

julia -O3 --package -Jdependencies.so run_moment_kinetics.jl
as indicated in the README.md file. When I do so I get the error message
ERROR: unknown option --package

Not sure if there is something else I am meant to do (other than creating dependencies.so) before using the above command or if there is some other problem.

finite differences broken for anything other than periodic BC

The finite difference functions incorrectly assume information beyond the boundary domain at the downwind boundaries. This is true both for third-order upwinded and all centered differences. The upwind derivatives can be fixed simply by using an expression that does not need any downwind info. The centered derivatives are more of a pain because information will need to be passed to the function about which boundary is the downwind boundary, and the passing of such information is currently used to choose between upwind and centred differencing. For the moment I don't have time to fix this but thought I'd flag it here so we do not forget!

Resolving the phi and Ez near z = z_wall

We typically observe a diverging Ez = - d phi / dz as z -> z_wall. Analytical considerations lead us to expect phi ~ sqrt(z - z_wall) as z -> z_wall. Using a finite-element grid with uniformly spaced element boundaries is potentially a wasteful use of resolution when we require many points near the wall, and few in the centre of the domain.

An optimisation of the code which permits flexibility is to change the grid specification to allow non-uniformly spaced element boundaries.

Branch and PR where this is carried out will shortly be pushed.

Synchronisation using shared-memory formalism in moment kinetics

When generalising the Fokker-Planck collision operator test script, I need to convert a series of code blocks from supporting parallelism in vpa & vperp to covering the whole of z r s vpa and vperp. This is not trivial because the operations are not wholly embarassingly parallel, but there are steps that are trivially parallelised which we should parallelise to get an optimally fast version of the operator.

For example, in the test script there is a block which has the following form

begin_vpa_vperp_region()
@loop_vperp_vpa ivperp ivpa begin
# do something parallelisable, 
# e.g., set sources for an elliptic solve
   S[ivpa,ivperp] = ... 
# recast S into a 1D array indexed by c
   S[ic] = ...
end 

begin_serial_region()
# synchronise with the above call and do the matrix solve
# in the compound index c
 field = lu_obj \ S

begin_vpa_vperp_region()
# get back to ivpa, ivperp
@loop_vperp_vpa ivperp ivpa begin
    result[ivpa,ivperp] = ....
end

The problem with adding z r and s is that we would ideally place the whole of the above inside a simple loop. None of the elliptic solves above are connected in z r and s and so they should be parallelised over, i.e., we would like the following.

begin_z_r_s_region()
@loop_s_r_z is ir iz begin
 # do stuff in velocity space
end

However, in the current shared memory formalism, it does not seem possible to call a begin_*_region() within a loop. This seems to mean that we cannot synchronise the array data whilst inside the z r s loop. Previously, I have avoided this difficulty by introducing buffer arrays to store the appropriate data. The downside is that one needs to store data of size nz*nr*ns*nvpa*nvperp. For the Fokker-Planck operator this leads to unacceptable memory usage, as we would need ~ 10 pdf sized buffer arrays.

Would it be possible to consider an upgrade to the shared-memory framework to permit synchronisation over a subset of the shared-memory block? In the example above, we would want to synchronise all of vpa and vperp at fixed iz ir is. I am sure that this is possible with normal distributed memory MPI with an MPI.Allreduce. Unfortunately, the present distributed-memory MPI only parallelises elements, and not points, meaning that we cannot have a single spatial point on the shared-memory process.

@johnomotani Discussion is appreciated!

Introduce magnetic mirror physics/ generalised geometry into moment_kinetics

To implement magnetic geometry we need to modify several features of moment_kinetics.

@johnomotani thoughts appreciated on order of developments.

Separate velocity integration test on 2D grids

The branch https://github.com/mabarnes/moment_kinetics/tree/radial-vperp-standard-DKE-Julia-1.7.2-mpi provides a test script for diagnosing the performance of the 2D velocity integration on a Maxwellian and a bi-Maxwellian distribution. See https://github.com/mabarnes/moment_kinetics/blob/radial-vperp-standard-DKE-Julia-1.7.2-mpi/test_velocity_integrals.jl.

This script is also useful for making sure that the definitions of the pressure are consistent with the definitions of the thermal speed and the temperature. I propose to add this script to the existing check-in tests.

Syntax for loop macros

I've made a branch (https://github.com/mabarnes/moment_kinetics/tree/loop-macros) that generates 'simple' macros for parallelized loops. I think the implementation is OK, but I think the names I've ended up giving the macros are not great, so before I make a PR to merge the branch, I thought I'd ask for opinions on what makes most sense to everyone.

There are 2 kinds of macro, one to loop over several dimensions at once, the other with separate macros for each nested level, e.g.

@loop_z_s is iz begin
     for ivpa โˆˆ 1:nvpa
         pdf.unnorm[ivpa,iz,is] = new_scratch.pdf[ivpa,iz,is]
     end
 end

and

@s_loop_z_s is begin
    derivative!(z.scratch, view(fvec.ppar,:,is), z, z_spectral)
    @z_loop_z_s iz begin
        @views advect[is].speed[:,iz] .= z.scratch[iz]/(fvec.density[iz,is]*moments.vth[iz,is])
    end
    derivative!(z.scratch, view(moments.qpar,:,is), z, z_spectral)
    @z_loop_z_s iz begin
        @views @. advect[is].speed[:,iz] += 0.5*vpa.grid*z.scratch[iz]/fvec.ppar[iz,is]
    end
end

Each combination of dimensions is split up among processors so that the load balance is optimal, so results in a different 'type' of loop. The 'loop type' is labelled (at the moment) by the letters after loop_. For the second type of macro, the level of the loop is labelled by the prefix letter.

Questions:

  1. Should I swap the naming so that the 'loop type' comes before _loop, and the loop level is indicated by a suffix?
  2. When listing the dimensions being looped over, which order should they be in? I've gone for the same order as array indices, but that's the opposite order from the loop nesting (and, currently, the opposite of the loop variables after the macro in the first type of macro).
  3. When naming the loop variables (after the macro and before begin), which order should they be in? I've gone for the order of the nested loops at the moment.
  4. Should we use different/longer name(s) to make the distinction between the two types of loop macro clearer?

machine setup fails on macbook

I am running on a Mac, and In machine setup options, I obtain the following error when entering the julia executable path.

Please enter the path to the Julia executable to be used ['/opt/local/bin/julia']:

/opt/local/bin/julia
realpath: illegal option -- s
usage: realpath [-q] [path ...]

I can install manually but would be nice to get the script working. Maybe I'm not understanding something I should be doing?

Better deallocation handling for shared-memory arrays?

When cleanup_moment_kinetics!() is called, the memory being used by shared-memory arrays is deallocated, but the arrays themselves are not deleted. Then accessing one of these shared-memory arrays causes a segfault which crashes Julia. It doesn't seem like there is a better solution for the deallocation, but maybe it's possible to provide nice error messages and avoid the segfaults? I.e. some error checking (using @bounds_check so it can be disabled for performance) that records if the shared-memory array is allocated and raises an error when trying to access it if it is not allocated.

Interactivity in Makie plots

(This assumes #105 gets merged.)

If you switch the backend to GLMakie

julia> using GLMakie; GLMakie.activate!()

Then calling plot functions, e.g. plot_vs_z(run_info, "phi") will pop up a plot window that you can interact with (zoom, pan, etc.). There are no icons or hints on how to interact, and I can't find any docs at the moment except this issue MakieOrg/Makie.jl#827, but it does work. This process could do with being documented. Also, do we want to add GLMakie to the Project.toml dependencies? Not sure if it would increase the precompile time. If we don't it can be installed in the base environment and still be available when running julia --project.

Makie has an option to create a DataInspector (https://docs.makie.org/stable/documentation/inspector/index.html) that shows data/coordinate values when you hover over a line. It would be nice to add these by default (if GLMakie is active - need to find out how to test that).

At the moment though the DataInspector is broken with heatmap plots (MakieOrg/Makie.jl#3101. Fingers crossed that bug will be fixed by the time we get around to implementing this.

Clarify the magnetic geometry.

The model used in moment_kinetics is derived in the limit bzed << 1 and rho* << 1, in a magnetic geometry where \vec{b} = bzed \hat{z} + bzeta \hat{\zeta}. As such the equations that we implement do not explicitly feature bzeta ~ 1. This may be confusing when working with relations for the physics at the sheath entrance, which often depend on the very small grazing angle and the sign of bzeta. It may be useful to insert bzeta into the appropriate places in the code to make the origin of the terms more transparent.

Parallelising and loop performance

I have been looking at documentation for Julia and some packages for thread-based parallelism and other loop optimisations. Opening this issue to keep notes...

  • Just sticking the @threads macro on a loop seems to work pretty well (test case and results posted in a comment below).
    • variations on putting the work in a function or just writing it out in full seem to make minimal difference
  • I didn't find a nice solution for getting threading with array broadcasting. This thread JuliaLang/julia#19777 is a bit old unfortunately. Suggestions from it:
    • GPUArrays.jl. Might be interesting in the long run for actual GPU usage. At one point it had a non-GPU implementation that used threads, but that has been removed from the main repo (they still use it for testing).
    • Strided.jl. I haven't understood what it's really for yet - some kind of optimisation of array operations. Does support threading of broadcast operations, but for a trivial test case is slower than just using a threaded loop (as this is not its main purpose). Maybe worth looking at again and testing with multi-dimensional array operations.
  • This comment Jutho/Strided.jl#9 (comment) suggested looking at LoopVectorization.jl. This package is aimed at optimising loops using things like AVX instructions. I ran into a bug trying to use a tan function - ERROR: LoadError: UndefVarError: tan_fast not defined - and for a trivial test (similar to the one in the comment below but swapping exp in place of tan) didn't get better performance than just using @threads.
  • According to the README.md of LoopVectorization.jl https://github.com/JuliaSIMD/LoopVectorization.jl/blob/master/README.md#broadcasting

    Note that loops will be faster than broadcasting in general. This is because the behavior of broadcasts is determined by runtime information (i.e., dimensions other than the leading dimension of size 1 will be broadcasted; it is not known which these will be at compile time).

2D code appears to hang without error when running on HPC

Using the branch radial-vperp-standard-DKE-with-neutrals-pdf-animations (commit da276a4) for simulations on a HPC cluster, I noted that the simulation appeared to hang after a relatively small number of timesteps, with no error messages. Using the new screen printout that shows the time when the last write to file was completed, the simulation failed to advance to the next nwrite steps after many multiples of the time that the simulation should have taken. The simulation could only be cancelled.

e.g.

 beginning time advance...14:27:41
 finished time step 100  14:32:35
 finished time step 200  14:37:25

But the simulation ran for another hour without further output of any kind.

Global variables for dimension numbers

Looking around one of the latest versions of master, one finds various structure definitions containing arrays with integer numbers of dimensions, like below, e.g.,

from: initial_conditions.jl

struct pdf_struct
norm::MPISharedArray{mk_float,3}
unnorm::MPISharedArray{mk_float,3}
end

It is not necessarily clear what these integers refer to if one is not already familiar with Julia Syntax. I think that the readability and longevity of the code could be improved if these integers were replaced with constant global variables set in a new module file. There already exists code, for example, below, where the structure has a much more transparent meaning. I see no reason why this could not be extended.

from: time_advance.jl

scratch = Vector{scratch_pdf{3,2}}(undef, n_rk_stages+1)

struct scratch_pdf{n_distribution, n_moment}
pdf::MPISharedArray{mk_float, n_distribution}
density::MPISharedArray{mk_float, n_moment}
upar::MPISharedArray{mk_float, n_moment}
ppar::MPISharedArray{mk_float, n_moment}
temp_z_s::MPISharedArray{mk_float, n_moment}
end

Bearing in mind that we will later need to increase the dimensionality of the code beyond (v||,z,r) to (v||,vperp,z,r,zeta) for charged particles and (v||,vx,vy,z,r,zeta) for neutral particles, making transparent code showing where the dimensions enter as clearly as possible seems to be quite important. If we are required to duplicate the number of layouts to handle charged particles and neutral particles separately, it would make sense to reduce the number of times we write "3" for spatial dimensions or "2" (or "3") for velocity dimensions.

Convergence in time diagnostic

The current post_processing.jl script does not provide a global test of the convergence of the simulations in time. The best option is to look at the heatmaps of z or r vs t, but these do not neatly capture the entire solution. It may be useful to introduce a plot of the volume-averaged moments with time as a way of quickly determining if the simulation is close to steady state.

Swap order of z and vpa dimensions?

Was there a reason for choosing to make the z dimension the fastest-changing index ('inner' dimension) in the distribution function arrays, rather than vpa? I the places I'm looking at the moment, because we have to integrate over vpa and do operations like normalise f[iz,:,is] by density[iz,is], etc., it looks like it would be more efficient to have vpa as the fastest-changing index (contiguous in memory).

I'm currently doing some profiling and trying to fix the extra allocations that julia does if you're not very careful, and I think in some places this would be simplified (as well as more efficient) by swapping z and vpa dimensions.

@mabarnes unless there are reasons why this is a bad idea, I'll have a go at implementing, and I'll take responsibility for fixing any merge conflicts with stuff you've been working on...

Satisfaction of the Chodura/Bohm condition

Using the branch https://github.com/mabarnes/moment_kinetics/tree/wall-boundary-condition-experiment I have experimented with changes to the wall boundary condition to test how sensitive the PDF at sheath entrance is to this condition.

The nominal wall boundary condition is that no particles should return from the wall, meaning that f = 0 for dzdt <0 at the lower wall plate, and f=0 for dzdt > 0 at the upper wall plate. However, due to the huge electric field generated in the steady state sheath solutions there are particle characteristics that would reverse direction before even reaching the first grid point in space away from the wall. One modification to the boundary condition (justified in a purely Lagrangian code) could be to set to zero the parts of the incoming f that should be filled by these characteristics. This is the experiment made in this branch.

Plots with the modified BC
modified_wall_BC.zip
Note that the Chodura integral is ~ 0.9, meaning that the Chodura/Bohm condition is over satisfied. The distribution function over vpa^2 is smooth and has the "exponentially" small region enforced to be zero near the origin.

Plots with the standard BC:
standard_wall_BC.zip
Note that the Chodura integral is ~ 1.20, meaning that the Chodura/Bohm condition is not satisfied. This is due to an unphysical spike near the origin.

Clearly the choice of where to set the incoming PDF to zero is somewhat arbitrary, which means that this experiment just shows how hard it is to resolve this physics in an Eulerian code. Note that the switch epsz that differentiates continuously from the standard BC (epsz = 0) and the modified BC (epsz = 1.0) is a hardcoded parameter is the current revision of the experiment branch. It may be there there is a tunable value of epsz that corresponds to zeroing out the part of the incoming f for particles that make it a certain fraction of the grid spacing towards the first point away from the wall.

@johnomotani Perhaps of interest as a quick project to scan over epsz if it can be made an input parameter?

Broken tests in master

Some of the tests provided in master return as broken. What does this indicate?

Looking at the current master branch (using git log)

commit 74e16e9 (HEAD -> master, origin/master, origin/HEAD)

we can run the tests via the REPL, to find

using moment_kinetics
include("test/runtests.jl")
calculus tests
interpolation tests
sound wave tests
- testing finite_difference
- testing finite_difference_split_3_moments
- testing chebyshev_pseudospectral
- testing chebyshev_pseudospectral_split_1_moment
- testing chebyshev_pseudospectral_split_2_moments
- testing chebyshev_pseudospectral_split_3_moments
nonlinear sound wave tests
- testing finite_difference
- testing finite_difference_split_1_moment
- testing chebyshev_pseudospectral
- testing chebyshev_pseudospectral_split_1_moment
Harrison-Thompson wall boundary condition tests
- testing chebyshev_pseudospectral
Wall boundary condition tests
- testing chebyshev_pseudospectral
Test Summary: | Pass Broken Total
moment_kinetics tests | 29496 6 29502

Note the 6 broken tests.

error downloading/installing dependencies after PR 39

Just pulled the latest Master (post-PR39 merge) and tried pkg> instantiate, but got the error message "ERROR: TypeError: in typeassert, expected VersionNumber, got a value of type Pkg.Types.VersionSpec". The stack trace for the error does not look very helpful to me, with pointers to Julia's own files. I do not receive this error if running instantiate with the pre-PR39 version of Master.

Simplify/future-proof parallel loops

Looking ahead, implementing MPI domain decomposition might require moving some loops inside generic advection functions. Also as we add more dimensions, the current structure (in #34) where we need to get the correct index-range for each level of the loop will get more complicated and fragile.

This issue is to discuss possible improvements that would make the code more maintainable (for example if we want to re-order array dimensions in future without it being as painful as #22 was).

NamedDims.jl

I think using NamedDims.jl (https://github.com/invenia/NamedDims.jl) would be a good start:

NamedDimsArray is a zero-cost abstraction to add names to the dimensions of an array.

It would let us index arrays in a more readable way, and because names are used the order of indices doesn't matter! E.g.

julia> using NamedDims

julia> f = NamedDimsArray{(:z,:y,:x)}(rand(4,3,2));

julia> f[x=2,z=3] # this works, giving a slice
3-element NamedDimsArray{(:y,), Float64, 1, Vector{Float64}}:
 0.8249968213028751
 0.9973308654206787
 0.2462796811020469

julia> f[z=3,x=2] # this works too - index order doesn't matter!
3-element NamedDimsArray{(:y,), Float64, 1, Vector{Float64}}:
 0.8249968213028751
 0.9973308654206787
 0.2462796811020469

Where to keep loop ranges?

It would be useful to have just one struct containing all the loop ranges. One way, keeping the choice from #34 to put the ranges in coordinate structs, would be to create a wrapper coordinates struct containing all the coordinates. An alternative would be to take the loop ranges out of coordinate and put them into some struct of their own. I think the second option might be the best because as we go to more dimensions, I think we will end up with a large number of ranges so it would be best to hide them away somewhere.

loop macros?

The final part, which I'm not so sure about a good design for, would be to design a macro (?) to create for-loops. I'd like to replace something like

for is in loop_ranges.species_range_level0,
        ix in loop_ranges.x_range_level1,
        iy in loop_ranges.y_range_level2,
        iz in loop_ranges.z_range_level3,
        ivpa in loop_ranges.vpa_range_level4
    @views update_f1!(pdf[vpa=ivpa,z=iz,y=iy,x=ix], ...)
end
block_synchronize()
for is in loop_ranges.species_range_level0,
        ix in loop_ranges.x_range_level1,
        iy in loop_ranges.y_range_level2,
        ivpa in loop_ranges.vpa_range_level3,
        ivperp in loop_ranges.vperp_range_level4
    @views update_f2!(pdf[vperp=ivperp,vpa=ivpa,y=iy,x=ix], ...)
end

with maybe

@loop_s_x_y_z_vpa is ix iy iz ivpa begin
    @views update_f1!(pdf[vpa=ivpa,z=iz,y=iy,x=ix], ...)
end
block_synchronize()
@loop_s_x_y_vpa_vperp is ix iy ivpa ivperp begin
    @views update_f2!(pdf[vperp=ivperp,vpa=ivpa,y=iy,x=ix], ...)
end

Questions I have:

  • is it better to pass loop_ranges (or equivalent) as a parameter to the macro, or keep it as a 'global' (within some module namespace) variable. I'm leaning towards taking the loop ranges out of coordinate, putting them in a dedicated struct like loop_ranges, and then make that a pseudo-global because it will be constant for the whole of any run, and it will be much simpler not to have to pass it around into every function.
  • Do we pass the loop variables as parameters to the macro (as in the example above) or hard code them? I have slight doubts about how clear things are if we do re-order array dimensions, but once we're using NamedDims.jl we can think of the layout in memory as an implementation detail, and write code just using a fixed, logical order (like {species,x,y,z,vpa,vperp}). If we 'hard code' the loop variables, there is less to type but the loop variables appear 'by magic' out of the macro (which I don't like much), i.e. the loop would look like
    @loop_s_x_y_z_vpa begin
        @views update_f1!(pdf[vpa=ivpa,z=iz,y=iy,x=ix], ...)
    end
    

Fancier generic looping?

I'd originally hoped we'd be able to do something generic like make a derivative!() function that takes a dim argument and then internally loops over all dimensions except for dim while calculating a derivative along dim. Still thinking about if this is possible...

One thing that doesn't work is to get the dimensions from an array in a macro and operate on them, because the dimensions are encoded in the type of the NamedDimsArray, but macros only operate on symbols and do not have access to type information, so this doesn't work.

It might be possible to just hard-code a list of dimension names and have a macro select the appropriate loop (like @loop_s_x_y_z_vpa, etc.) using a dim parameter.

Another possibility might be to somehow use dynamic dispatch and pass in the 'kernel function' that should be called inside the loop. Something like (for inner loop over z)

function multidim_loop(inner_dim::Val{:z}, kernel::Function, args...)
    @loop_s_x_y_vpa_vperp is ix iy ivpa ivperp begin
        kernel((s=is, x=ix, y=iy, vpa=ivpa, vperp=ivperp), args...)
    end
end

and duplicates (possibly auto-generated) for inner loops being other dimensions. Not sure it's possible to pass the indices into the kernel like this (it seems not to be supported by NamedDims.jl natively at the moment, not sure if it's an easy function to add with a new version of to_index(), or if it's bad because it wouldn't compile away, so would have to pay the cost of creating a NamedTuple to pass into kernel() at every step of the loop... Might be possible to assume more about the structure of the arguments to kernel, and do something like

function multidim_loop(inner_dim::Val{:z}, kernel::Function, f::AbstractArray, args...)
    @loop_s_x_y_vpa_vperp is ix iy ivpa ivperp begin
        kernel(f[s=is, x=ix, y=iy, vpa=ivpa, vperp=ivperp, args...)
    end
end

MPI issues

We get MPI errors if we run in parallel with the latest version of all the dependencies (as of 19/11/2023).

I think the problem is a slightly complicated mixup, that will probably be fixed in the Julia packages relatively soon:

  • The HDF5_jll.jl package (which bundles a version of HDF5 with Julia) has started supporting MPI (I think just if MPI.jl is being used, but that is always the case for us). However, it always links to the Julia-installed MPI (not the 'system-provided' one, which is what we usually use). Linking to two different MPI libraries causes errors. This is a bug in the Julia packages (not in moment_kinetics), see JuliaIO/HDF5.jl#1079, JuliaPackaging/Yggdrasil#6893.
  • For HDF5.jl the bug isn't actually a problem, because we tell HDF5 to use a 'system-provided' HDF5 library, which is (at least should be!) linked with the correct MPI library. This means that HDF5.jl doesn't actually use HDF5_jll.jl, so we wouldn't be affected by the bug, except that...
  • The NetCDF wrapper NCDatasets.jl doesn't support MPI, and does link to the HDF5 provided by HDF5_jll.jl. Previously this wasn't (or at least didn't seem to be) a problem - apparently linking two versions of libhdf5.so that are used in different places is OK - but now that HDF5_jll.jl links to MPI, it means linking two versions of the MPI library, which causes errors.

Possible workarounds:

  1. Wait for the Julia packages to be fixed, then the problem should go away.
  2. Pin HDF5_jll to a slightly older, working version (i.e. version 1.12.x) at least until the Julia packages are fixed.
  3. Get rid of the NetCDF file I/O.
  4. Tell NCDatasets to use a system-provided libnetcdf.so, so it doesn't link to the HDF5_jll.jl version of HDF5. On systems where we have to compile HDF5 for ourselves, this would be annoying as we would have to compile NetCDF as well, and link it to the local version of HDF5.

I think option 2 is the best and easiest solution, while we wait for a fix to JuliaPackaging/Yggdrasil#6893. I think it's possible to tell Julia to pin a package to a certain version, rather than everyone having to do it by hand (and even if we did it 'by hand' the CI jobs would have to do the same thing, which would probably be more work than pinning a package). I'll try to make a PR...

Krook source term for sheath simulation "boundary condition"

In sheath simulations in moment_kinetics we have two wall boundaries at z = +-L/2. We prescribe a source in the volume of the domain to fuel the ions. It may be useful to prescribe a source which forces the solution towards a specific function in a small range of the plasma. For example

S_krook[F_imposed,F] = nu * ( F_imposed - F),

with F_imposed = a Maxwellian with a specified and fixed density, parallel flow and thermal speed, and nu a frequency that may (or should) depend on spatial coordinates. For example, we might use the source only near z = 0.

This is potentially desirable in simulations with the self-collision operator C[F,F], because it is plausible that we require that there is no source term proportional to the null space of C[F,F] for a steady state to be reached. F = F_imposed would be a solution for which C = S_krook = 0.

@johnomotani Does such a source term already exist in the moment kinetics code?

Factors of 2 in normalisation of pressure and definition of the thermal speed.

In the initial development of the 1D1V moment_kinetics models, the (parallel) pressure ppar was defined such that the thermal speed vth was calculated in the code like

vth = sqrt(2*ppar/density)

with density the species density. At the same time, the velocities were normalised to a reference speed cref = sqrt(2 Tref /mref). Whilst self-consistent for a certain normalisation of the pressure, the above convention leads to a pressure that is 1/2 of the conventionally normalised pressure. Typically, we would normalise the pressure to nref Tref = (1/2) mref nref cref^2, where nref is a reference density. In this normalisation

vth = sqrt(ppar/density)

The branch https://github.com/mabarnes/moment_kinetics/tree/radial-vperp-standard-DKE-Julia-1.7.2-mpi currently has all pressures defined in the latter convention, where is the master branch currently uses the former convention. Developers beware!

N.B. This discussion assumes that all masses are unity.

To Do: agree to a standard normalisation convention that are natural for both 1V and 2V simulations (where pperp and ppar are present).

dependency error during instantiation

After typing 'instantiate' during build, receive a message that '1 dependency errored. To see a full report either run import Pkg; Pkg.precompile() or load the package'. When I do this, I get the more useful error message 'ERROR: LoadError: InitError: MPI library has changed, please re-run Pkg.build("MPI").'. After building MPI again, instantiate no longer returns an error and the code seems to run fine.

Trying to understand manual setup for new split package version of moment kinetics

Using https://github.com/mabarnes/moment_kinetics/tree/geometry-upgrade-separate-packages with Julia 1.10 I am trying to understand the manual set up steps on a Linux machine to understand the machine setup scripts.

I fall down at the first hurdle https://mabarnes.github.io/moment_kinetics/previews/PR174/manual_setup/#Manual-setup

pkg> develop ./moment_kinetics/
ERROR: MethodError: no method matching get(::Pair{String,Any}, ::String, ::Nothing)
Closest candidates are:
  get(::Base.EnvDict, ::AbstractString, ::Any) at env.jl:77
  get(::REPL.Terminals.TTYTerminal, ::Any, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/Terminals.jl:176
  get(::IdDict{K,V}, ::Any, ::Any) where {K, V} at abstractdict.jl:594

Is this a problem with Julia 1.1.0, or have I broken something locally in the repo?

Diagnosis of 2D wall BC MMS test issue

The MMS test suite is designed to allow for rapid development of new features using a symbolic algebra package to compute the sources in the DKE given a 'target' manufactured solution. For the 2D-3V version of the code with E_r =/= 0 and wall boundary conditions, the test has identified a possible issue of non-convergence with increasing resolution

Here I start to try to track down the error. I start with commit ecfa0f8. Running the input files
runs/2D-wall_cheb-with-neutrals_nel_r_1_z_2_vpa_2_vperp_2
runs/2D-wall_cheb-with-neutrals_nel_r_1_z_4_vpa_4_vperp_4
runs/2D-wall_cheb-with-neutrals_nel_r_1_z_8_vpa_8_vperp_8
runs/2D-wall_cheb-with-neutrals_nel_r_1_z_16_vpa_16_vperp_16
where I increase the number of elements of all the z, v||, vperp, vz, vr, vzeta grids (for a single r point with E_r = 0) and reduce the time step in tandem, I generate the following figures (with run_MMS_test.jl).
MMS_test_velocity_nelement_1D-3V-wall_cheb-updated.pdf
MMS_test_velocity_nelement_1D-3V-wall_cheb-updated_fields_no_Er.pdf

These figures demonstrate declining errors in all fields, approaching machine precision. The error in the neutral density remains surprisingly high.

NCDataset incompatibility

After leaving the source code for a period, I have rebuilt from scratch the branch radial-vperp-standard-DKE-with-neutrals on commit 19ba012 , using Julia/1.7.2

I have found an issue where I am unable to run the code if I build a new Manifest.toml, whereas if I use the old Manifest that is contained within the repo, the code runs fine without complaint.

$ git clone https://github.com/mabarnes/moment_kinetics.git moment_kinetics
$ cd moment_kinetics/; git checkout radial-vperp-standard-DKE-with-neutrals
$ mv Manifest.toml Manifest_old.toml
$ julia --project

(moment_kinetics) pkg> instantiate

julia> using Pkg

julia> Pkg.test(; test_args=["--long"])

Leads to the following error

Got exception outside of a @test
MethodError: Cannot convert an object of type
NCDatasets.CFVariable{Float64, 1, NCDatasets.Variable{Float64, 1, NCDatasets.NCDataset{Nothing}}, NCDatasets.Attributes{NCDatasets.NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Nothing, Tuple{}, Nothing, Nothing, Nothing, Nothing, Nothing}}} to an object of type
Union{NCDatasets.CFVariable{Float64, 1, NCDatasets.Variable{Float64, 1, NCDatasets.NCDataset}, NCDatasets.Attributes{NCDatasets.NCDataset{Nothing}}, NamedTuple{(:fillvalue, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), NTuple{6, Nothing}}}, NCDatasets.CFVariable{Float64, 1, NCDatasets.Variable{Float64, 1, NCDatasets.NCDataset{Nothing}}, NCDatasets.Attributes{NCDatasets.NCDataset{Nothing}}, NamedTuple{(:fillvalue, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), NTuple{6, Nothing}}}}

All MPI controlled processes do not exit with error on a single process

When running with MPI, I have noticed that if there is an error that causes Julia to abort the calculation on a single process, then the program does not exit all processes. Instead, the MPI processes which have not met an error remain stalled. This may be dependent on the architectural which we run the code.

sphinx documentation?

README.md is getting a bit long now. It might be nice to set up better documentation. I've used sphinx plus readthedocs.io before to generate online docs, and I think they work with Julia. Anyone have other preferences?

Reshape `DebugMPISharedArray`

If we want to use reshape() to reshape arrays in the collision operator (as an optimization to replace a copy from a vperp-vpa indexed array to a flattened-index array), then we would need to be able to reshape a DebugMPISharedArray in order for the debug checks to work. For this we would need to define reshape(A::DebugMPISharedArray, ...).

Manufactured solutions testing with nontrivial geometry

To perform MMS tests when using general geometry, it makes sense to upgrade the geometric coefficients in the manufactured_solns.jl module to symbolic functions, and to house these symbolic functions in a struct which can be passed around like the old geometry struct within this module.

Is it possible to make a struct of symbolic functions in julia?

@johnomotani Thoughts? What would be your preferred way to tackle this problem?

vperp BC for chebyshev pseudospectral option

The vperp regularity boundary condition at vperp = 0 is dFdvperp = 0. This is imposed in the "gausslegendre_pseudospectral" option by computing the derivative vector for vperp = 0 -- note that this matrix is independent of the normal derivative matrix for the element near vperp =0, which uses a Gauss-Radau grid without vperp =0. The same derivative vector for d F / dvperp = 0 is required for the chebyshev radau grid.

File i/o for dfn in 2D code and beyond

Currently we write all time-dependent data (fields, moments, dfn) to file every nwrite timesteps. This is OK for the 1D-1V code as the distribution functions are manageable in size (nvpa*nz). For the 2D-3V code, the neutral distribution function is significantly larger (nz*nr*nvz*nvr*nvzeta), as is the ion distribution function (nz*nr*nvpa*nvperp). We need to decide how to write distribution function data to file in the future.

It would be desirable to

  1. write slices of the dfn as a function of time (e.g. f(z,v||) at fized r, or f(r,v||) at the wall)
  2. keep the whole distribution for restart files
  3. write out the distribution at fewer time steps than nwrite, but more than once.
  4. have a separate .cdf file for distribution function data

This would require reworking the post processing diagnostics and file i/o routines, and so should not be undertaken lightly without a plan.

@mabarnes @johnomotani Do you have preferred solutions or features that we should keep or add going forward? I have in mind that we should make any changes now that make life easy for us to go 3D in the future.

Precompilation timeouts on cluster systems

This issue is a reminder that on some HPC systems the precompilation step can take in excess of 1 hr. Whilst this may not be due to choices made within the moment_kinetics package, it does represent a block on development of large-scale simulations. Reducing the time to compile the main code may be a useful feature.

Distributed memory bug affecting 2D simulations

Using commit 3b3af85 of https://github.com/mabarnes/moment_kinetics/tree/generalised-chodura-diagnostic, I observe behaviour on a HPC cluster that suggests a bug in the distributed memory MPI. Input files for the tests described are in the attached .zip file.

For simulation wall-bc_cheb_2D1V_constant_source_small, the solver appears to crash with NaNs in the output:

$ tail wall-bc_cheb_2D1V_constant_source_small/slurm-4552646.out
Found NaN, stopping simulation
Found NaN, stopping simulation
Found NaN, stopping simulation
Found NaN, stopping simulation
Found NaN, stopping simulation
Found NaN, stopping simulation
finished time step 33000    11:44:56
writing distribution functions at step 33000  11:44:56
finished file io         11:44:57
finished runs/wall-bc_cheb_2D1V_constant_source_small.toml Thu 28 Sep 11:45:01 BST 2023

Making the simulation local to a single shared memory region appears to let the simulation run without fault:

$ tail wall-bc_cheb_2D1V_constant_source_local_small/slurm-4552596.out
finished time step 397000   14:34:18
writing distribution functions at step 397000  14:34:18
finished time step 398000   14:34:45
writing distribution functions at step 398000  14:34:45
finished time step 399000   14:35:12
writing distribution functions at step 399000  14:35:12
finished time step 400000   14:35:39
writing distribution functions at step 400000  14:35:39
finished file io         14:35:59
finished runs/wall-bc_cheb_2D1V_constant_source_local_small.toml Thu 28 Sep 14:36:25 BST 2023

Parallelising over the z dimension also appears to be safe:

$ tail wall-bc_cheb_2D1V_constant_source_smallz/slurm-4552856.out
finished time step 397000   14:49:57
writing distribution functions at step 397000  14:49:57
finished time step 398000   14:50:08
writing distribution functions at step 398000  14:50:08
finished time step 399000   14:50:18
writing distribution functions at step 399000  14:50:18
finished time step 400000   14:50:29
writing distribution functions at step 400000  14:50:29
finished file io         14:50:31
finished runs/wall-bc_cheb_2D1V_constant_source_smallz.toml Thu 28 Sep 14:50:46 BST 2023

Whereas parallelising over r leads to issues:

$ tail wall-bc_cheb_2D1V_constant_source_smallr/slurm-4552853.out
Found NaN, stopping simulation
Found NaN, stopping simulation
Found NaN, stopping simulation
Found NaN, stopping simulation
Found NaN, stopping simulation
Found NaN, stopping simulation
finished time step 46000    13:50:29
writing distribution functions at step 46000  13:50:29
finished file io         13:50:30
finished runs/wall-bc_cheb_2D1V_constant_source_smallr.toml Thu 28 Sep 13:50:33 BST 2023

Setting rhostar = 0 allows the simulation to run successfully

$ tail wall-bc_cheb_2D1V_constant_source_smallr_rhostar0/slurm-4557971.out
finished time step 397000   10:34:55
writing distribution functions at step 397000  10:34:55
finished time step 398000   10:35:09
writing distribution functions at step 398000  10:35:09
finished time step 399000   10:35:23
writing distribution functions at step 399000  10:35:23
finished time step 400000   10:35:37
writing distribution functions at step 400000  10:35:37
finished file io         10:35:39
finished runs/wall-bc_cheb_2D1V_constant_source_smallr_rhostar0.toml Fri 29 Sep 10:35:49 BST 2023

debugging.zip

Unify and simplify the imposition of boundary conditions on the charged particle distribution function and collision operator calculation.

The charged particle boundary conditions are used in #149 to impose numerical conservation on the density, upar, and pressure. The boundary conditions are imposed on the distribution function in initial_conditions.jl, but the boundary conditions on the collision operator are imposed via a function in fokker_planck_calculus.jl.

The reason for this behaviour is that the @loop macro that is naturally used is different in each case. This is related to issue #140.

[time_evolving_new.zip](https://github.com/mabarnes/moment_kinetics/files/13459088/time_evolving_new.zip)

          [time_evolving_new.zip](https://github.com/mabarnes/moment_kinetics/files/13459088/time_evolving_new.zip)

time_evolving_1D2V_new.zip
Uploads including example input files and output .gif/.pdfs for a 1D2V sheath problem with C[F,F] and wall boundaries, and the corresponding Maxwellian relaxation problem (0D2V). These simulations suggest that commit c3b6261 behaves acceptably for 1) small enough time steps and 2) simulations where the stagnation point z=0 are eliminated with the choice of grids.

Originally posted by @mrhardman in #149 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.