clima / oceananigans.jl Goto Github PK

View Code? Open in Web Editor NEW

878.0 25.0 171.0 116.57 MB

🌊 Julia software for fast, friendly, flexible, ocean-flavored fluid dynamics on CPUs and GPUs

Home Page: https://clima.github.io/OceananigansDocumentation/stable

License: MIT License

Julia 97.65% Dockerfile 0.01% Mathematica 1.85% TeX 0.13% Python 0.37%

climate ocean fluid-dynamics julia gpu climate-change machine-learning data-assimilation

oceananigans.jl's People

Contributors

Stargazers

Watchers

Forkers

christophernhill glwagner ali-ramadhan spencerx logankilpatrick milankl vchuravy leios ocefpaf pocean23 aetherks hajsong beta-effect huangynj suyashbire1 centerquench edetscript mohansha kamontat arcavaliere nravic islandowner95 ashao csdyou zhenwu0728 zhuochenhan yanglei50 verseve gaelforget josuemtzmo imran273 dashexi raphaelouillon brodiepearson hdrake navidcy qingli411 ecalzavarini johnnyduo jiaqi-knight nolanrei99 andrewtuma smuzyg liujiamingustc francispoulin tomchor jamesthesnake kburns jcarlos01 hetaoz ssghost tfeng7 ankitsb ascheinb stjordanis jm-c jliuocean karthigeyanrgs charleskawczynski sam12396 garfield74 glauberesearch eloisedurand standardgalactic jamiejquinn whitleyv hennyg888 davibarreira fei-njtech batmanabcdefg simenglv vanderlei-filho amrapallig gvn22 xiaozhour sumanshekhar17 maeckha moelf will214 galer-king playfloor chanjeunlam chemicalfiend generalpeng26 jatropj leeara0 aramirezreyes erikqqy pgbrodrick amy-gardening fadaie91 matinraayai mfkiwl umlvcheng johnryantaylor chabbymark jherkenhoff syou83syou83 ziyanren-666 ocesaulo

oceananigans.jl's Issues

Horizontal and vertical transforms

The wavenumber and transforms are being worked on.

@ali-ramadhan, see here for how FourierFlows.jl constructs wavenumbers:

https://github.com/FourierFlows/FourierFlows.jl/blob/master/src/domains.jl#L130

I think it makes sense to do the r2r transform in z first. That way you can take advantage of the realness of your output in the subsequent horizontal transform.

For the 2D horizontal transform, you can use the 2D rfft. If your input array is nx, ny, the 2D rfft will output an array of size nx/2+1, ny --- using a real transform + conjugate form for the transform along x (the first dimension), and a full transform along y (since after transforming along x, your quantities are no longer real).

Hopefully this makes sense. Let me know if you have questions.

Ultimately I believe it will be best to use a tridiagonal solve in the vertical + Fourier transforms in the horizontal to permit arbitrary vertical grids for similar computational cost (but slightly more complex algorithm).

Grid should encode whether positive means east, north, etc.

John suggested it would be nice if this kind of metadata is kept track of in the grid structs.

β-plane and full Coriolis

For long Cartesian slices? Will this work on a Cartesian grid? I know a full treatment of Coriolis in the ocean is nontrivial...

Rayleigh–Bénard convection example should produce steady convective rolls at Ra=5000.

Thanks to @SandreOuza for raising these points in regards to this issue:

Aspect ratio should be 6:1.
Prandtl number should be 0.7.
Buoyancy b is related to temperature T through b = αgT where g is the acceleration due to gravity and α is the thermal diffusivity of water.
Boundary conditions at the top and bottom must be constant T. Random perturbations should only be applied at the first time step.
The convective rolls should show up in 2D as well (easier to debug).

Use correct data types in CPU/GPU shared kernels.

E.g. do not divide by 2.0f0 in the avg! operators. CPU is fine with this and will convert to Float64 if needed, but GPUs will complain.

set!(u::Field{G}, f::Function) should use xC, yC, zC (or xF, yF, zF) depending on Field.

It'll be much cleaner. Not relevant until #13 is resolved.

Only perform operations as needed depending on model dimension.

E.g. no need to calculate y-derivatives or y-averages, or even evolve v if the model is a 2D xz-slice. But this is low priority as it only affects the already fast 2D and 1D models.

Abstract diagnostics framework.

Should work similarly to the abstract OutputWriter framework.

Some ideas:

Calculate averages and write them to disk or just print some statistics.
Check for and locate blowups.
Check for NaNs and where they first pop up (useful for debugging).
Calculate the Nusselt number.

Pretty printing for model structs.

See show(io::IO, g::RegularCartesianGrid) for a simple example.

FieldWriter will fail with non-integer times.

Padded integers are used to form the binary file names, so if model.clock.time is not an integer, FieldWriter will write files with ambiguous filenames.

circshift question

@ali-ramadhan this looks odd to me
https://github.com/ali-ramadhan/OceanDispatch.jl/blob/c4aaa79323008270294d9052c088da7ddaf47fb2/src/operators.jl#L34

if I do

  julia> a=Vector(1:4);circshift(a,1)

I get

4-element Array{Int64,1}:
 4
 1
 2
 3

doesn't this mean that

δˣ(f) = (f .- circshift(f, (1, 0, 0)))

is more intuitively

δˣ(f) = (circshift(f, (1, 0, 0)) .- f)

Refactor diagnostics framework?

Instead of defining both a Diagnostic struct and a run_diagnostic(::Model, ::Diagnostic) function, maybe there's a way of being able to just define a run_diagnostic function.

Also look into implementing generic callback. See: https://github.com/climate-machine/CLIMA/blob/master/src/ClimaAtmos/Dycore/src/GenericCallbacks.jl

Also see: https://fluxml.ai/Flux.jl/stable/training/training/#Callbacks-1

Docs are failing.

I'm using Documenter.jl to compile the documentation from Markdown. I'm hoping to use mkdocs (with the mkdocs-material theme) to create the static documentation website and then publish it on readthedocs.io.

I haven't tried too hard but I can't seem to get this set up to work. Maybe it's too complicated? Anyways, this is a reminder to come back and work on getting some documentation online.

Spectral solver for the nonhydrostatic pressure must produce divergence-free velocity field.

I guess this is not something I was thinking of but John pointed out that it's crucial that the Fourier-spectral solver returns a nonhydrostatic pressure that when used to update the velocity field, produces a velocity field that is non-divergent at every grid point. Otherwise mass is being unphysically accumulated and tracer quantities will also be accumulated due to nonzero Q(∇·u) terms in the flux divergence operators ∇·(uQ) = Q(∇·u) + u·∇Q, leading to divergences and blowups.

Right now the wavenumbers are computed as

kx = 2π/Lx  # DFT
ky = 2π/Ly  # DFT
kz = 1π/Ly  # DCT

which should lead to a solver whose solutions converge spectrally. While it may solve for the pressure at the center of the cells very accurately, if ∇·u is non-zero this will be a big problem.

This will require some testing on my part to see which solver best satisfies ∇·u. If we can satisfy it to machine precision, that would be amazing. If not, hopefully it can satisfy it better than the conjugate-gradient method and then we can use the continuity equation to enforce ∇·u=0.

An alternative (not sure if this would work) is to discretize the derivative operators using a second-order centered-difference scheme (which I believe I've done for the 1D solver, and previous 3D solver) which explicitly places the discretization points on the center of the cells. Then the wavenumbers are

kˣ² = (4 / Δx²) * sin(πl / Nˣ)²  # DFT
kʸ² = (4 / Δy²) * sin(πm / Nʸ)²  # DFT
kᶻ² = (2 / Δz²) * (cos(πn / Nᶻ) - 1)  # DCT

and of course you expect second-order convergence. But if it better satisfies ∇·u=0 then it might be the way to go. You can also derive wavenumbers for fourth-order discretization.

EDIT: Fixed second-order wavenumbers.

Add Appveyor for Windows testing and Coveralls for cooler code coverage.

Ambigous wording for clock.time_step.

time step could refer to the time step number or the time step Δt.

Linear equation of state should be upgraded to account for salinity and pressure.

See title.

Arbitrary tracer fields.

We might want more than just T, S, rho. E.g. carbon, nitrogen, arbitrary passive tracer, etc.

See CliMA/ClimateMachine.jl#79 for some considerations on time stepping 10-100 tracers on the GPU.

Model should probably be mutable.

https://github.com/ali-ramadhan/Oceananigans.jl/blob/a41c604e9360ba79ce11efe6d4d6370bf79a3cc6/src/model.jl#L1

It would make model construction much more flexible. Would this hinder performance?

Broadcasting and operations on different Field types

Right now I've decided to not implement broadcasting for Field types. It would be really nice but I'd have to figure out how to take into account the different field types, e.g. can broadcast T .+ S as they're both of type CellField but not u .+ v as u::FaceFieldX, v::FaceFieldY.

Maybe even take it further and implement u .+ v which would have to do u .+ avgx(avgy(v)), and isn't commutative anymore.

Should field operations be defined in @eval?

@eval does things in global scope, is this the desired behavior?

Model implicitly assumes Clock.Δt is constant.

This is probably fine as I don't think the MITgcm uses adaptive time stepping and for what we do I doubt we'll be changing Δt halfway through a simulation, but as it stands if Δt changes it will break some code, read_output(...) methods in particular.

Element-wise kernels: Prefetch data needed for time stepping at the start?

Jumping the gun here but instead of accessing e.g. model.tracers.T[i, j, k+1] multiple times during a time step, can it be prefetched, i.e. T_kp1 = model.tracers.T[i, j, k+1], and then reused multiple times? Would the value or the pointer need to be accessed?

The only reason to do this is performance gain. Will this work or will the code turn into spaghetti? Can some sort of compiler figure this stuff out for us?

Ability to turn off/on certain terms in the momentum equations.

E.g. turn off/on momentum advection or Coriolis. Could be very useful for debugging.

3D Fourier-spectral solver for Poisson's equation with mixed boundary conditions is only first-order convergent.

The 3D solver with mixed boundary conditions (periodic in the horizontal with DFTs, Neumann in the vertical with DCTs) works now as tested against an analytic solution but for some reason once I switched to DCTs it's only first-order convergent. The relevant function is solve_poisson_3d_mbc.

Running with Float32 leaves numerical artifacts.

Using 32-bit floats leaves numerical artifacts. See attached images of deep convection: top is surface temperature and bottom is vertical slice through the center of the cooling disk at time step 1500. They appear when using 32-bit floats on the CPU and GPU. I suspect the Poisson pressure solve is to blame. There are things we can consider to increase the accuracy of the FFT at single-precision.

Float64 on CPU (good):

Float32 on CPU (bad):

Float32 on GPU (even worse, but still stable!):

set!(::Field, ::Number) does not work for some values.

set!(u::Field, v) = @. u.data = v will not work if T=Float32 and v::Irrational. Not relevant until #13 gets fixed.

To support multiple precision, abstractions like Clock should let the float type be a parameter (like Grid).

Shouldn't be mixing precisions.

Generic vertical integral operator.

Should convert ∫δρgdz!(g::Grid, c::PlanetaryConstants, δρ::CellField, δρz::FaceFieldZ, pHY′::CellField) to a generic vertical integral operator. Is this one particularly special?

Current Field structs are pretty unintuitive to use...

@glwagner Mostly a couple of notes of where I was in case you're thinking of working on the abstractions:

Currently Field is a struct with an f::Array in them, and are collected together in a struct of type FieldCollection. This kind of sucks because to create a new set of velocity + tracer fields and set the surface temperature field to 300 K for example, the code looks like:

using OceanDispatch
g = RegularCartesianGrid((100, 100, 50), (2000, 2000, 1000), Float64)
fs = Fields(g)
fs.θ.f[:, :, 1] .= 300

when I think it would be much more intuitive to be able to just write fs.θ[:, :, 1] .= 300.

Fast GPU support using element-wise kernels and GPUifyLoops.jl

This is the highest priority issue for now.

Checkpointer should not use the unstable Serialization package.

Should switch to something more stable/persistent like HDF5.jl or JLD.jl.

Should only keep bang! operators.

Right now they're both the same so only the much faster (and abstraction using) bang! operators should be kept.

Would be good to have some debugging diagnostics moving forward.

Would be good to have some diagnostics for debugging:

CFL condition violation checker
NaN checker
Checker for non-zero velocity divergence.

Using Reqiures.jl for optional GPU usage

Instead of these 3 lines:

https://github.com/ali-ramadhan/Oceananigans.jl/blob/cpu-gpu-shared-kernels/src/Oceananigans.jl#L56

it might be possible to use Requires.jl

3D ppn Laplacian operator should be using δ operators.

To reuse δ! and avg! operators.

Feedback

Here const all the things.
For embarrasingly parallel : pmap (others = @parallel, DistributedArrays, MPIArrays)
GPU: CuArrays (quick and dirty), GPUArrays (architecture agnostic)

Running with Posits as well as AbstractFloat.

Might be interesting to see if the model will run with 16,32,64-bit Posits. It'll be slow because there's no hardware implementation of Posits yet, but would be interesting to see if the model will run with 16-bit Posits.

SigmoidNumbers may have a Julia implementation.

Use multiple dispatch to control Poisson solver

This if-statement is not necessary, because the architecture can be known from the array types of the CellFields.

Specifically, this line can be changed to

function solve_poisson_3d_ppn_planned!(ssp::SpectralSolverParameters, g::RegularCartesianGrid, f::CellField{T}, ϕ::CellField{T}) where T<:CuArray

...

end

I think it would actually be preferable to dispatch on the type of the arrays in RegularCartesianGrid. However, it does not appear that the arrays representing the coordinates of the grid are parameterized in the definition of RegularCartesianGrid. Why is that?

If the array type of the problem is part of RegularCartesianGrid, then dispatch can be used in place of if-statements all over the code, especially in fields.jl.

Finally, the term 'spectral solvers' is misleading here. The solver does not have spectral accuracy; it simply uses the FFT, which happens to be used to solve different problems that have a 'spectral' character.

More performant mod1 functions.

@vchuravy suggested using if-else statements instead of

@inline incmod1(a, n) = a == n ? one(a) : a + 1
@inline decmod1(a, n) = a == 1 ? n : a - 1

as it might be faster on the GPU. Also worth checking out Julia's mod1(x, y) function.

I'll test a bunch of different mod1 functions once we have the algorithm fully working on the GPU.

Source terms should probably be in StepperTemporaryFields.

What if we don't use AB2 anymore?

https://github.com/ali-ramadhan/Oceananigans.jl/blob/a41c604e9360ba79ce11efe6d4d6370bf79a3cc6/src/model.jl#L11-L12

CI testing for GPU code.

Right now easiest thing to do would be to manually test the package on Supercloud or somewhere with multiple GPUs, but CI with GPUs would be nice.

See for ideas: https://github.com/JuliaGPU/gitlab-ci

Might not be worth setting up for a while, seems like too much work.

Better way of writing data to disk?

Right now FieldWriter writes field.data using write(filepath, array) so you lose the size of the array (and the field type) when writing. This is probably inevitable here.

A better solution would be to write output as NetCDF.

getindex and setindex! much slower than just accessing array contents directly.

Apparently it's 4~5x faster to do operations on Field.data instead of Field even thought I've inlined getindex and setindex!, not that it changes things much.

@inline getindex(f::Field, inds...) = getindex(f.data, inds...)
@inline setindex!(f::Field, v, inds...) = setindex!(f.data, v, inds...)

Probably just missing something simple but for now I'll use Field.data. Would be nice to figure this out though.

g = RegularCartesianGrid((100, 100, 100), (10, 10, 10))
f1, f2 = CellField(g), FaceFieldX(g)

function δx1!(g::RegularCartesianGrid, f::CellField, δxf::FaceField)
    for k in 1:g.Nz, j in 1:g.Ny, i in 1:g.Nx
        @inbounds δxf[i, j, k] =  f[i, j, k] - f[decmod1(i, g.Nx), j, k]
    end
end

julia> @benchmark δx1!(g, f1, f2)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.542 ms (0.00% GC)
  median time:      5.007 ms (0.00% GC)
  mean time:        5.120 ms (0.00% GC)
  maximum time:     11.010 ms (0.00% GC)
  --------------
  samples:          975
  evals/sample:     1

function δx2!(g::RegularCartesianGrid, f::CellField, δxf::FaceField)
    for k in 1:g.Nz, j in 1:g.Ny, i in 1:g.Nx
        @inbounds δxf.data[i, j, k] =  f.data[i, j, k] - f.data[decmod1(i, g.Nx), j, k]
    end
end

julia> @benchmark δx2!(g, f2, f1)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.099 ms (0.00% GC)
  median time:      1.198 ms (0.00% GC)
  mean time:        1.253 ms (0.00% GC)
  maximum time:     2.679 ms (0.00% GC)
  --------------
  samples:          3967
  evals/sample:     1

IDCT for dim=3 in 3D Poisson solver does not work on the GPU.

I could not get the Poisson pressure solver to work on the GPU. Most of it works but CUDA does not have a DCT function so I had to perform the DCT/IDCT in terms of the FFT/IFFT.

The DCT/IDCT functions work in isolation (regression tested with FFTW.r2r!, see link to Jupyter notebook below) but not in the Poisson solver. More specifically, the IDCT fails when applied to the third dimension (after or before the IFFT is applied to dimensions 1 and 2).

For now I got around this by copying the right hand side to the CPU, doing the transform on the CPU, and copying the geopotential back to the GPU. This operation is so much slower than the time stepping that it takes up like 98%+ of wall clock time. It might also be introducing further numerical errors.

Link to current Poisson GPU solver:
https://github.com/ali-ramadhan/Oceananigans.jl/blob/93aa0038b3126470f263475d648bceb9562bbe91/src/spectral_solvers.jl#L421

Messy Jupyter notebook: Testing DCT/IDCT on the GPU

Messy Jupyter notebook: Testing GPU Poisson solver

Refactor SavedFields struct into a FieldWriter and FieldMemoryWriter.

To fit with the abstract OutputWriter way of dealing with output (whether to memory or disk).

Too many "spectral solvers"

Only keep what the model needs.

Vertically stretched Cartesian grid

Current operators assume constant Δz which allow the model to use faster operators and use a tiny bit less space, so they'd have to be rewritten a tiny bit to account for a variable Δz when that gets implemented.

We can either write new operators that get dispatched on HorizontallyRegularCartesianGrid structs (already possible), or maybe the performance gain is so tiny that we just make RegularCartesianGrid a subset of HorizontallyRegularCartesianGrid and only have one set of operators.

HorizontallyRegularCartesianGrid might be a descriptive but pretty bad struct name.

Poisson solver that allows for variable vertical grid spacing.

@christophernhill has the details figured out. Sounds like it just needs to be implemented.

Support for parameter profiles and fields.

So instead of having, for example, a constant eddy diffusivity κ, it can be a profile κ(z) or a field κ(x, y, z), or even κ(x, y, z, t).