gridtools / gt4py Goto Github PK

Python library for generating high-performance implementations of stencil kernels for weather and climate modeling from a domain-specific language (DSL).

Home Page: https://GridTools.github.io/gt4py

License: Other

Python 99.96% Dockerfile 0.04%

gt4py's Introduction

The GridTools framework is a set of libraries and utilities to develop performance portable applications in the area of weather and climate. To achieve the goal of performance portability, the user-code is written in a generic form which is then optimized for a given architecture at compile-time. The core of GridTools is the stencil composition module which implements a DSL embedded in C++ for stencils and stencil-like patterns. Further, GridTools provides modules for halo exchanges, boundary conditions, data management and bindings to C and Fortran.

GridTools is successfully used to accelerate the dynamical core of the COSMO model with improved performance on CUDA-GPUs compared to the current official version, demonstrating production quality and feature-completeness of the library for models on lat-lon grids. The GridTools-based dynamical core is shipped with COSMO v5.7 and later, see release notes COSMO v5.7.

Although GridTools was developed for weather and climate applications it might be applicable for other domains with a focus on stencil-like computations.

A detailed introduction can be found in the documentation.

Installation instructions

git clone https://github.com/GridTools/gridtools.git
cd gridtools
mkdir -p build && cd build
cmake ..
make -j8
make test

For choosing the compiler, use the standard CMake techniques, e.g. setting the environment variables

CXX=`which g++` # full path to the C++ compiler
CC=`which gcc` # full path to theC compiler
FC=`which gfortran` # full path to theFortran compiler
CUDACXX=`which nvcc` # full path to NVCC
CUDAHOSTCXX=`which g++` # full path to the C++ compiler to be used as CUDA host compiler

Requirements

C++17 compiler (see also list of tested compilers)
Boost headers (1.73 or later)
CMake (3.18.1 or later)
CUDA Toolkit (11.0 or later, optional)
MPI (optional, CUDA-aware MPI for the GPU communication module gcl_gpu)

Supported compilers

The GridTools libraries are currently nightly tested with the following compilers on CSCS supercomputers.

Compiler	Backend	Tested on	Comments
Cray clang version 12.0.3	all backends	Piz Daint	P100 GPU
Cray clang version 10.0.2 + NVCC 11.2	all backends	Piz Daint	P100 GPU
Cray clang version 12.0.3	all backends	Piz Daint	with -std=c++20
GNU 11.2.0 + NVCC 11.0	all backends	Piz Daint	P100 GPU
GNU 11.2.0 + NVCC 11.2	all backends	Dom	P100 GPU
GNU 8.3.0 + NVCC 11.2	all backends	Tsa	V100 GPU

Known issues

Some tests are failing with ROCm3.8.0 (Clang 11).
CUDA 11.0.x has a severe issue, see #1522. Under certain conditions, GridTools code will not compile for this version of CUDA. CUDA 11.1.x and later should not be affected by this issue.
Cray Clang version 11.0.0 has a problem with the gridtools::tuple conversion constructor, see #1615.

Partly supported (expected to work, but not tested regularly)

Compiler	Backend	Date	Comments
Intel 19.1.1.217	all backends	2021-09-30	with `cmake . -DCMAKE_CXX_FLAGS=-qnextgen`
NVHPC 23.3	all backends	2023-04-20	only compilation is tested regularly in CI

Contributing

Contributions to the GridTools framework are welcome. Please open an issue for any bugs that you encounter or provide a fix or enhancement as a PR. External contributions to GridTools require us a signed copy of a copyright release form to ETH Zurich. We will contact you on the PR.

gt4py's People

Contributors

Stargazers

Watchers

gt4py's Issues

Slices of gt4py storages cannot be copied

I have a piece of code which extracts variables (kind of like tracer variables) out of a larger array. After this, I needed to copy the array, and ran into this issue (shown as a MCVE):

>>> arr = gt4py.storage.empty("numpy", default_origin=[0, 0, 0], shape=[10, 10, 10], dtype=float)
>>> s = arr[:, :, 0]
>>> copy.deepcopy(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/copy.py", line 161, in deepcopy
    y = copier(memo)
  File "/usr/src/gt4py/src/gt4py/storage/storage.py", line 350, in __deepcopy__
    res = super().__deepcopy__(memo=memo)
  File "/usr/src/gt4py/src/gt4py/storage/storage.py", line 186, in __deepcopy__
    managed_memory=not isinstance(self, ExplicitlySyncedGPUStorage),
  File "/usr/src/gt4py/src/gt4py/storage/storage.py", line 38, in empty
    shape=shape, dtype=dtype, backend=backend, default_origin=default_origin, mask=mask
  File "/usr/src/gt4py/src/gt4py/storage/storage.py", line 141, in __new__
    shape = storage_utils.normalize_shape(shape, mask)
  File "/usr/src/gt4py/src/gt4py/storage/utils.py", line 49, in normalize_shape
    "len(shape) must be equal to len(mask) or the number of 'True' entries in mask."
ValueError: len(shape) must be equal to len(mask) or the number of 'True' entries in mask.

I may find a way to work around the issue, but copy.deepcopy should probably work when used on slices?

Validate GridTools parallel model

Put in place tests to validate that gt4py backend implementations follow the gridtools parallel model.

Refactor options systems

Redesign the option system to make it more general

Proposal for gtscript region decorator

We would like your opinion on a decorator extension to GT4Py that we have been thinking about. This would mark regions that could be fused into one gtscript.stencil, so that corners and edges could be treated inside a single dawn stencil.

Example usage:

@gtscript.region
def execute_code():
  stencil1(..., domain=..., origin=...)
  if cond1:
    stencil2(..., domain=..., origin=...)
  elif cond2:
    stencil3(..., domain=..., origin=...)
  for i in range(3):
    stencil4(..., domain=..., origin=...)

What do you think of such a feature?

Document possible incompatibilities with user installed GridTools

Problem

If a user has previously installed GridTools (C++ libs) in a standard prefix (like /usr/local) then it is possible that during setuptools.build_ext the compiler will find those headers. Namely if

the python version being used was compiled using includes from the same prefix
boost is installed in the same prefix

This is a problem if the user installed GridTools is a different version from what GT4Py uses and there are breaking changes, like different header structure.

Solution

document the GidTools version requirement for GT4Py and advise users to (re-) install other versions of GT C++ to a separate prefix

Luxury solution

get actual include paths used for compilation and scan them for existing gridtools sources at gt4py.gt_src_manager install time, decide to give an error message or use the existing sources if compatible

Evaluate memory usage in all the backends

Introduce tests/benchmarks to evaluate memory usage of gt4py.

Are Cupy/Gridtools memory pools compatible?
Are there memory leaks?

Duck storages reference implementation

Work can be tracked here: #29

Have a first implementation by the end of the sprint.

We try to make the initial implementation as "optional" module to be used instead of the current storage. Later the current storage will be replaced by the new one.

PR #14 errors for FV3 stencils using dawn:gtmc backend

The following error occurs when testing FV3 stencils with dawn:gtmc backend using arguments: pytest --exec_backend=dawn:gtmc --data_backend=gtmc:

file "test_python_modules.py", line 357, in test_serialized_savepoints()
ERROR running stencilKE_C_SW-In
	The layout of the field ke is not compatible with the backend.
Traceback (most recent call last):
  File "/fv3/test/test_python_modules.py", line 348, in test_serialized_savepoints
    process_savepoint(serializer, sp, args)
  File "/fv3/test/test_python_modules.py", line 299, in process_savepoint
    process_test_savepoint(serializer, sp, split_name, args)
  File "/fv3/test/test_python_modules.py", line 277, in process_test_savepoint
    process_input_savepoint(serializer, sp, testobj, test_name, args)
  File "/fv3/test/test_python_modules.py", line 239, in process_input_savepoint
    args["output_data"][test_name] = testobj.compute(input_data)
  File "/fv3/translate/translate_ke_c_sw.py", line 35, in compute
    ke_c, vort_c = KE_C_SW.compute(**inputs)
  File "/fv3/stencils/ke_c_sw.py", line 64, in compute
    copy_uc_values(ke_c, uc, ua, origin=origin, domain=copy_domain)
  File "/.gt_cache/py37_1013/dawngtmc/fv3/stencils/ke_c_sw/m_copy_uc_values__dawngtmc_92bfa304e2.py", line 80, in __call__
    field_args=field_args, parameter_args=parameter_args, domain=domain, origin=origin, exec_info=exec_info
  File "/usr/src/gt4py/src/gt4py/stencil_object.py", line 188, in _call_run
    f"The layout of the field {name} is not compatible with the backend."
ValueError: The layout of the field ke is not compatible with the backend.
```file "test_python_modules.py", line 357, in test_serialized_savepoints()
ERROR running stencilKE_C_SW-In
	The layout of the field ke is not compatible with the backend.
Traceback (most recent call last):
  File "/fv3/test/test_python_modules.py", line 348, in test_serialized_savepoints
    process_savepoint(serializer, sp, args)
  File "/fv3/test/test_python_modules.py", line 299, in process_savepoint
    process_test_savepoint(serializer, sp, split_name, args)
  File "/fv3/test/test_python_modules.py", line 277, in process_test_savepoint
    process_input_savepoint(serializer, sp, testobj, test_name, args)
  File "/fv3/test/test_python_modules.py", line 239, in process_input_savepoint
    args["output_data"][test_name] = testobj.compute(input_data)
  File "/fv3/translate/translate_ke_c_sw.py", line 35, in compute
    ke_c, vort_c = KE_C_SW.compute(**inputs)
  File "/fv3/stencils/ke_c_sw.py", line 64, in compute
    copy_uc_values(ke_c, uc, ua, origin=origin, domain=copy_domain)
  File "/.gt_cache/py37_1013/dawngtmc/fv3/stencils/ke_c_sw/m_copy_uc_values__dawngtmc_92bfa304e2.py", line 80, in __call__
    field_args=field_args, parameter_args=parameter_args, domain=domain, origin=origin, exec_info=exec_info
  File "/usr/src/gt4py/src/gt4py/stencil_object.py", line 188, in _call_run
    f"The layout of the field {name} is not compatible with the backend."
ValueError: The layout of the field ke is not compatible with the backend.

Please find the code from the ke_c_sw.py file below:

def compute(uc, vc, u, v, ua, va, dt2):
    grid = spec.grid
    # co = grid.compute_origin()
    origin = (grid.is_ - 1, grid.js - 1, 0)

    # Create storage objects to hold the new vorticity and kinetic energy values
    ke_c = utils.make_storage_from_shape(uc.shape, origin=origin)
    vort_c = utils.make_storage_from_shape(vc.shape, origin=origin)

    # Set vorticity and kinetic energy values (ignoring edge values)
    copy_domain = (grid.nic + 2, grid.njc + 2, grid.npz)
    copy_uc_values(ke_c, uc, ua, origin=origin, domain=copy_domain)
    copy_vc_values(vort_c, vc, va, origin=origin, domain=copy_domain)
    ...

# Kinetic energy field computations
@gtscript.stencil(backend=utils.exec_backend, rebuild=True)
def copy_uc_values(ke: sd, uc: sd, ua: sd):
    with computation(PARALLEL), interval(...):
        ke[0, 0, 0] = uc if ua > 0.0 else uc[1, 0, 0]

The make_storage methods are included below for reference:

def make_storage_data(array, full_shape, istart=0, jstart=0, kstart=0, origin=origin, backend=data_backend):
    full_np_arr = np.zeros(full_shape)
    if len(array.shape) == 2:
        return make_storage_data_from_2d(array, full_shape, istart=istart, jstart=jstart, origin=origin, backend=backend)
    elif len(array.shape) == 1:
        return make_storage_data_from_1d(array, full_shape, kstart=kstart, origin=origin, backend=backend)
    else:
        isize, jsize, ksize = array.shape
        full_np_arr[istart:istart+isize, jstart:jstart+jsize, kstart:kstart+ksize] = array
        return gt.storage.from_array(data=full_np_arr, backend=backend, default_origin=origin, shape=full_shape)

def make_storage_data_from_2d(array2d, full_shape, istart=0, jstart=0, origin=origin, backend=data_backend):
    shape2d = full_shape[0:2]
    isize, jsize = array2d.shape
    full_np_arr_2d = np.zeros(shape2d)
    full_np_arr_2d[istart:istart+isize, jstart:jstart+jsize] = array2d
    #full_np_arr_3d = np.lib.stride_tricks.as_strided(full_np_arr_2d, shape=full_shape, strides=(*full_np_arr_2d.strides, 0))
    full_np_arr_3d = np.repeat(full_np_arr_2d[:, :, np.newaxis], full_shape[2], axis=2)
    return gt.storage.from_array(data=full_np_arr_3d, backend=backend, default_origin=origin, shape=full_shape)

# TODO: surely there's a shorter, more generic way to do this.
def make_storage_data_from_1d(array1d, full_shape, kstart=0, origin=origin, backend=data_backend, axis=2):
    # r = np.zeros(full_shape)
    tilespec = list(full_shape)
    full_1d = np.zeros(full_shape[axis])
    full_1d[kstart:kstart+len(array1d)] = array1d
    tilespec[axis] = 1
    if axis == 2:
        r = np.tile(full_1d, tuple(tilespec))
        # r[:, :, kstart:kstart+len(array1d)] = np.tile(array1d, tuple(tilespec))
    elif axis == 1:
        x = np.repeat(full_1d[np.newaxis, :], full_shape[0], axis=0)
        r = np.repeat(x[:, :, np.newaxis], full_shape[2], axis=2)
    else:
        y = np.repeat(full_1d[:, np.newaxis], full_shape[1], axis=1)
        r = np.repeat(y[:, :, np.newaxis], full_shape[2], axis=2)
    return gt.storage.from_array(data=r, backend=backend, default_origin=origin, shape=full_shape)

def make_storage_from_shape(shape, origin, backend=data_backend):
    return gt.storage.from_array(data=np.zeros(shape), backend=backend, default_origin=origin, shape=shape)

Investigate if tests (specially with hypothesis) are always properly rebuilt

gt4py.stencil(definition=...., rebuild=TRUE)

Improve parametrization of tests

As suggested be @gronerl , and according to pytest-dev/pytest#815, we could update this:

 ["name", "backend"], itertools.product(stencil_definitions.names, CPU_BACKENDS)

to this cleaner option:

@pytest.mark.parametrize("name", stencil_definitions.names)
@pytest.mark.parametrize("backend", CPU_BACKENDS)
def test_generation_cpu(name, backend):

Dynamic compilation of stencils for types found at run-time (like numba)

Extract types of input fields at runtime and call the actual @Stencil decorator with the function and the extracted types to load or compile an appropriate version of the stencil.

Adapt backend API for CLI

This is the first step towards the implementation of the CLI (GDP-1). This issue serves to track the implementation of the described changes. Each change will be treated in a separate issue.

Current situation

The GDP-1 proof-of-concept had to use internals of concrete backends which are not part of the public API to achieve a number of things. This would obviously not be maintainable.

Specifically the problems arise from the intention of making the CLI output easy to integrate into an external build system, while the current backend API is geared towards JIT generation/compilation for use from python only. Code generation is not cleanly separated from compilation and stencil-ids are baked into file / code object names and paths.

Changes to the API

Functionality:

Generate the primary language code without (or with optional) unique stencil-id, returning a hierarchy of source files ready to be independently compiled in a client-defined location (or in-memory in a way that allows programmatically writing out).
Generate language bindings (for python or other secondary language), again without (or optionally) baking in stencil-id and return in a format ready to be written / copied programatically to client-defined location (without compiling anything).
Generate a secondary language module / source file (for bindings) without actually compiling the bindings, in a way that if the language bindings are compiled correctly the module / source file can be used from the target language.
~~After generating bindings and secondary language source, compile bindings to make the target language source immediately usable.~~ (not a backend refactoring anymore)

Data:

set of secondary languages for which bindings can be generated

Design:

separate caching from generating in backend API

Explanation:

Generate primary language code

For the GridTools backends this would be C++ or Cuda code. Both the JIT process as well as the CLI require the source, however for the CLI to embed transparently into external build systems the unique stencil-id must be optional. Also the user of the CLI should have control over where the source files end up to be written and the source file hierarchy (if there are more than one) must make sense for an external build tool. Obviously compiling for JIT usage must hapen separately from this step.

Generate language bindings

For GridTools, this is the pybind11 .cpp file for python as a secondary language. If no stencil-id is given, it should be generated under the assumption that the source files it refers to are generated without stencil-id and the output must make it clear where this file expects to be located relative to the primary language source files.

If a backend does not support secondary languages (if python is the primary language), or if the secondary language can call the primary at runtime this functionality may be absent.

Generate secondary language source

For GridTools backends this is the extension module that imports the compiled bindings from an .so object file and provides the python wrapper on top of that. In general it is the entry point for the secondary language to call the optimized stencils in an idiomatic way.

This may be absent if no secondary languages are supported (if python is the primary) or if the primary and secondary languages are so compatible that they do not require a wrapper.

Compile bindings

This is for when the CLI is also to be used as the build system, when the client code calling the stencils is written in a secondary language supported by the backend. This should be the same process as JIT compilation, except without / optional stencil-id in the file names of course, with the source files in the CLI-user specified location. The Idea is that afterwards the secondary language wrapper can be used immediately.

Explicit IJK column-based indexing in GTScript

Add syntax support to specify offsets in fields related to the axis (e.g. [I+3]) instead of always using a tuple of ints (e.g. [3, 0, 0])

Using a runtime float as both a condition and inside the conditional

A stencil with a conditional that triggers off a runtime float variable, and is also used in the condition, triggers an error.
For example:

@GTScript.stencil(backend="numpy")
def cap_var(q: gtscript.Field[_dtype], b: gtscript.Field[_dtype], q_max: float):
if q > q_max:
b = q_max

triggers an Assertion error in the merge_extents method
def _merge_extents(self, refs: list):
result = {}
params = set()

    # Merge offsets for same symbol
    for name, extent in refs:
        if extent is None:
           assert name in params or name not in result
           AssertionError

usr/src/gt4py/src/gt4py/analysis/passes.py:359: AssertionError

docs: Intro Page

Non-technical motivation (usecase)
Concise
motivating code sample / figure(s)
Fits on one screen

At least put
Opening Paragraph and feature highlights. Sample code / figures can be place holders

Support for 1d and 2d fields

Define GTScript syntax for argument fields
Define syntax for temporary fields
Define broadcasting behavior in GTScript
Update analysis pipeline and code generation backends

Depends on #28

Asynchronous (non-blocking) compilation of multiple stencils

A possible implementation strategy could be to return a proxy StencilObject with a concurrent.future inside. The first time that any attribute is accessed (_getattr_) wait for the completion of the future and load the actual stencil object (similar to JAX arrays implementation).

Notes: check if workarounds modifying setuptools objects to compile CUDA code still work when different stencils at compile simultaenously.

Add support for new Python versions with AST modifications

Python 3.8 introduced changes in the Python AST that breaks the current frontend and this may happen again in the future. A new generic version-independent AST could be used to avoid this problems in the future.

Write documentation

Review Stefano's sample codes to be included in the paper

Code samples can be found @ https://gist.github.com/mbianco/418f10a72aad3ad373ad428ceaa3be14

Does it make sense to use these also for testing and/or examples in the repository?

Add support for augmented assignments

Propose adding support for augmented assignments (e.g., a += b * c). An example of this use case is the Coriolis stencil:

@gtscript.stencil(backend=backend)
def coriolis_stencil(
    u_nnow: gtscript.Field[dtype],
    v_nnow: gtscript.Field[dtype],
    fc: gtscript.Field[dtype],
    u_tens: gtscript.Field[dtype],
    v_tens: gtscript.Field[dtype]

    with computation(FORWARD), interval(...):
        z_fv_north = fc * (v_nnow + v_nnow[1, 0, 0])
        z_fv_south = fc[0, -1, 0] * (v_nnow[0, -1, 0] + v_nnow[1, -1, 0])
        u_tens += (0.25 * (z_fv_north + z_fv_south))
        z_fu_east = fc * (u_nnow + u_nnow[0, 1, 0])
        z_fu_west = fc[-1, 0, 0] * (u_nnow[-1, 0, 0] + u_nnow[-1, 1, 0])
        v_tens -= (0.25 * (z_fu_east + z_fu_west))

The current version returns None for these statements. One simple solution is to implement the AugAssign visitor method in the gtscript_frontend.IRMaker class, convert it to a standard Assign node, and visit that.

    def visit_AugAssign(self, node: ast.AugAssign) -> list:
        bin_op = ast.BinOp(left=node.target, op=node.op, right=node.value)
        assign = ast.Assign(targets=[node.target], value=bin_op)
        return self.visit_Assign(assign)

This approach has been implemented in the augmented_assign branch.

Study GPU optimizations

Bugs in Stefano stencils (microphysics, ...) with MC backend

All the paths I will mention in the following refer to the prognostic_saturation branch of the tasmania repo

Vertical stencils with gtmc backend: validation fails unless I perform any stage (really doesn’t matter which) sequentially, although not needed. To reproduce this issue: in tests/, run pytest isentropic/test_isentropic_vertical_advection_debug.py. The error does not occur with the numpy and gtx86 backend. I haven’t tried the gtcuda backend yet, but I would wait for the issue with the storages to be sorted out.

Detect and deal with decorators properly in stencil definition functions

Implement differential operators as part of GTScript standard library

Differential operators should be implemented in one or more gtscript modules and not as builtins, to avoid an explosion of builtins in the language.

Fix stefano's NaN/Managed memory issues

We need to verify if this is relevant, and add a test if it is.

Refactor backend subsystem

Convert backend class methods to regular methods
Pass the name of the backend to the constructor
Extract cache management functionality of a different class

Generate computation source in CLI friendly way

backend.generate_computation(stencil_id=None) or similar should return the stencil in computation language source together with the intended file hierarchy (relative locations of the source files, file names if required).

The return format must be standardized enough for the CLI to write the source to files (or copy the temporary files) to a file system location specified by the user, so that they can be compiled or interpreted without additional changes.

The stencil_id=None is intended to allow reuse for the JIT machinery. It might be replaced or dropped by another mechanism to the same effect.

This is part of #57, which describes the context of this change.

Add optional data member to backend API to expose supported secondary languages

This is an enabler for the final GDP-1 implementation. See #57 for more details.

Support for member methods as stencil definition functions

In order to support object-oriented stencil definitions, we could try a similar approach to dataclasses.
Instance methods could be allowed as stencil definitions by relying in the convention of naming the first parameter as self. In this way, the gt.stencil decorator should check for self as the first parameter and add a tag attribute to the function to be compiled. The class needs to be also decorated with a class_with_stencils decorator function that looks for the tag attribute in all the member methods and then adds a new _init_ that calls the original one after compiling the tagged method and assigning them to the instance as bound methods.

Example:

@class_with_stencils
class Component:
    ...

    @stencil
    def my_stencil_definition(self, field_a: Field[], ....):
         ....

Fix star imports from packages' init.py

As suggested by @DropD, fix import * statements in all package's __init__.py.

From:

from .concepts import *

to:

from .concepts import Concept1, Concep2, func

Support callable objects as gtscript function definitions

It can be easily done by using the __call__ member of the object as a function definition, if the object is a callable object but not a regular function.

Unexpected behaviour of nested subroutines with vertical data dependencies

Results differ for the same calculation done in a subroutine depending on whether it is called by the stencil directly or through another subroutine.

The attached ipython notebook illustrates the problem:

nested_subroutines_with_data_dependencies.ipynb.zip

calling gtscript functions from conditionals

Perhaps this should not be supported, but when a gtscript function is called inside of a conditional, often an error raises "Temporary field {name} implicitly defined within run-time if-else region.", even if there are no temporaries in the function, if it's just using and assigning fields passed from the input arguments to the stencil... I assume using a function causes a temporary variable to be generated, and since that is not declared before the conditional, it is not allowed.
Here's and example:

@gtscript.function
def qcon_func(qcon, q0_liquid + q0_rain):
    qcon = q0_liquid  + q0_rain
    return qcon

then the stencil

with computation(BACKWARD), interval(0, -1):
    if ri < ri_ref:
        qcon = qcon_func(qcon, q0_liquid, q0_rain)

This works if instead of assigning qcon in the function

@gtscript.function
def qcon_func(qcon, q0_liquid, q0_rain):
    return q0_liquid  + q0_rain

So it is totally reasonable in this case to say do not assign qcon inside the function. But more complex functions can't be returned as one-line operations, and assign an input field that wouldn't cause problems inside the stencil spec. And the field being assigned is not temporary, it is just not known in the context of the function?

This is a lower order issue, but has tripped me up a couple of times so thought I'd bring it up! It's nice to be able to reuse functions, but of course less essential than having all the other functionality.

StencilFusion Library Transform

Add proper check for redundant fields in the analysis pipeline

Verify if the check for fields with redundancy (horizontal blocks) is still needed and thus it should be added to the analysis pipeline for GridTools backends.

Note: started from discussions with @egparedes and @lukasm91

TEs

Enhance templates for code generation backends

Improve the code generation templates for the internal backends.

Reduce the number of conditionals and logic to implement cpu/gpu specific logic by using a master Jinja template inheritance from a common base.
Evaluate the possibility to switch from Jinja to Mako, since it's probably better suited for source code generation and is likely easier to learn for developers.

Move concept explanations from "Quickstart" to a new "Concepts" section

Let's keep the Quickstart tutorial lean and fast-paced by explaining only exactly what is needed to understand the code and linking to more in-depth explanations of concepts in the "Concepts" section (needs to be created).

The "Concepts" section will also serve as a normalization point for a shared language surrounding GT4Py.

k interval offset larger than 2?

I have a stencil where the code specification indicates the need for a with interval(3, 4), but this causes an error.
E.g. :

@gtscript.stencil(backend="numpy")
def example(ub:  gtscript.Field[_dtype]):
 with computation(PARALLEL), interval(3, 4):
        ub = 5.

This raises and error indicating this interval is invalid, which looks like is stemming from the default offset_limit value in the make_axis_interval function:

def make_axis_interval(bounds: tuple, *, offset_limit: int = 2)
 assert isinstance(bounds[0], (VarRef, UnaryOpExpr, BinOpExpr)) or (
          isinstance(bounds[0], int) and abs(bounds[0]) <= offset_limit
        )
       AssertionError

Is there a way to override the offset_limit from the frontend?

use temp field in self-assignment

An assignment such as:

a = a[J - 1]  + 2

has a data race. It should be transformed into

 tmp = a[J - 1]  + 2 ; a = tmp

Cannot install GT sources

When I try installing the GT library sources using the command: pip3 install ./gt4py/setup.py install_gt_sources, I get the following error:

WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see pypa/pip#5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Defaulting to user installation because normal site-packages is not writeable
ERROR: Invalid requirement: './gt4py/setup.py'
Hint: It looks like a path. It does exist.

This worked previously so I don't know why it has stopped working. I am trying to install on my local laptop running Ubuntu 18.04 and I am using Python 3.6.9.

Any assistance would be appreciated.

Thanks,
Mark

Move concept explanations from "Quickstart" to a new "Concepts" section in the Docs

The "Concepts" section will also serve as a normalization point for a shared language surrounding GT4Py.

multiple 'ands' in a conditional without parenthesis

I had a conditional with 3 terms joined by 'ands' --

elif a > 0.0 and q > 0.0 and b < q

But this gave the wrong answer (did not go into the conditional when it should have) until I added parenthesis:

elif a > 0.0 and (q > 0.0 and b < q)

Perhaps there is something incorrect about chaining ands like that without marking the order?

Runtime conditionals

For backends of 'debug' and 'numpy', runtime conditionals are not implemented (and the error nicely tells you so). A runtime conditional on data can be achieved with vectorized expressions...e.g. a = q[0, 2, 0]* (c[0,0,0] > 0.0) + q[0, 3, 0] * (c[0,0,0] <= 0.0) to achieve a conditional:
if (c[0,0,0] > 0.0):
a = q[0,2,0]
else:
a = q[0,3,0]
It is likely not performant, but works. If I switch the backend to 'gtmc' with this expression I no longer get the same answer -- significant differences, not just roundoff error (changes on the order of 10% seen). The non-vectorized conditional form of that stencil nicely does not crash with the gtmc backend, but produces the same incorrect answers. It's possible there is an issue with my setup, but I would not expect a dramatic answer change with a change in backend. It is known that the runtime conditionals are future work, I am not insisting this be resolved asap, but wanted to share it as an issue.

Add example of working OpenMP settings on MacOS

For MacOS users, the installation and configuration of OpenMP can be tricky, so we could add to the documentation an example of a possible way to install this on MacOS:

export BOOST_ROOT=/usr/local/opt/boost
export OPENMP_CPPFLAGS="-Xpreprocessor -fopenmp"
export OPENMP_LDFLAGS="$(brew --prefix libomp)/lib/libomp.a"

# Apple Command Line Tools and Boost and OpenMP should have been installed via homebrew:
$ brew install boost libomp

put gdp-1 in list of accepted gdps in the docs

stencil failure with gtmc that works for numpy

Hello! I have specified a stencil that works with the numpy backend, but not gtmc. It appears too break because it's a long stencil, and can be fixed by using temporary variables.
Here is the problem stencil:

@gtscript.stencil(backend=backend, rebuild=True)
def mystencil(uc: sd, vc: sd, ut: sd, vt: sd, cosa_u: sd, cosa_v: sd):
    with computation(PARALLEL), interval(0, None):
        damp_u = 1. / (1.0 - 0.0625 * cosa_u[0, 0, 0] * cosa_v[-1, 0, 0])
        ut[0, 0, 0] = (uc[0, 0, 0]-0.25 * cosa_u[0, 0, 0] * (vt[-1, 1, 0] + vt[0, 1, 0] + vt[0, 0, 0] + vc[-1, 0, 0] - 0.25 * cosa_v[-1, 0, 0] * (ut[-1, 0, 0] + ut[-1, -1, 0] + ut[0, -1, 0]))) * damp_u

This gives:
GRIDTOOLS ERROR=> Horizontal extents of the outputs of ESFs are not all empty. All outputs must have empty (horizontal) extents

48 | GT_STATIC_ASSERT(extent_t::iminus::value == 0 && extent_t::iplus::value == 0
I don't quite know what this means..

It works if I break this long stencil into several functions.

temporaries inside of conditionals

Related to #52, we are thinking it might be reasonable to turn off the check for temporaries inside of conditionals, e.g. make the user responsible if they end up with a read before write.
if a > 0.
b = 1.
else:
b = 2.

This triggers the temporaries error even though it is defined in all pathways. I am not suggesting you try to determine whether a new temporary is defined in all pathways, but rather could let the user figure it out. Perhaps the errors would be hard to understand when they did mess it up.

In many programming languages you can have a variable that is only defined in a certain condition, and if outside that condition.
if cond:
f = 1.

...
if cond:
f = f + 1

There may be barriers to this I am not thinking of.