meteoswiss-apn / dawn Goto Github PK

View Code? Open in Web Editor NEW

28.0 28.0 30.0 21.56 MB

Compiler toolchain to enable generation of high-level DSLs for geophysical fluid dynamics models

License: MIT License

CMake 2.94% Python 6.23% C++ 88.62% Shell 0.31% Dockerfile 0.11% Cuda 1.73% Fortran 0.07%

dawn's People

Contributors

Stargazers

Watchers

dawn's Issues

Add support for splitting code generation into multiple translation units

When compiling large files it would be convenient to generate multiple .cpp files instead of just one.

Reasoning:

Improve compilation speed by leveraging multi-threaded compilation.
Allow incremental builds i.e only generate the files if it is necessary. E.g if only one stencil is changed we only need to regenerate the corresponding translation unit. To check if a file needs to be touched we could simply compute a hash and compare.

Possible problems:

It is unknown how to integrate this nicely with the CMake module as we do not a priori know how many files will be generated. A solution could be to set a fixed number of files which need to be generated.

Caching can lead to invalid results

The fill and flush strategy does not fully care about ranges so far. So if on a certain range we need flush (that is not an EP-Flush) but we only wrote into a part of the filed while it was cached, we flush uninitialized data back to main memory since flushes span the full domain.

To prevent this there are two options:

Allow for flushes on specific ranges (similar to PB-Fill / EP-Flush)
Do not cache once this behavior is found

demote in/out storage to temporary storage

When a user in/out storage is used within a single mss as in/out, there will be false sharing that will deteriorate the performance. We should version it
Example is lap in hori_diff_stencil_01, although in this case we should check that the storage is not read before write, so no need to version it

Invalid SourceLocation comparison

Is this a bug or intended?
https://github.com/MeteoSwiss-APN/dawn/blob/master/src/dawn/Support/SourceLocation.cpp#L25

extern bool operator==(const SourceLocation& a, const SourceLocation& b) {
  return a.Line == b.Line && b.Column == b.Column;
}

I guess it should be .. && a.Column == b.Column.

Bug in field versioning (in vertical reduction pattern)

The following example will do versioning of tmp, which will lead to wrong result

stencil stencil {
  storage a, tmp, c;

  Do {
    vertical_region(k_end, k_end)
      tmp=0;

    vertical_region(k_end-1, k_start) {
      a = tmp[k+1];
      tmp = a;
      c = tmp;
    }
  }
};

    struct stage_0_0 {
      using c = gridtools::accessor<0, gridtools::enumtype::inout, gridtools::extent<0, 0, 0, 0, 0, 0>>;
      using tmp = gridtools::accessor<1, gridtools::enumtype::inout, gridtools::extent<0, 0, 0, 0, 0, 0>>;
      using a = gridtools::accessor<2, gridtools::enumtype::inout, gridtools::extent<0, 0, 0, 0, 0, 0>>;
      using tmp_1 = gridtools::accessor<3, gridtools::enumtype::inout, gridtools::extent<0, 0, 0, 0, 0, 1>>;
      using arg_list = boost::mpl::vector<c, tmp, a, tmp_1>;

      template <typename Evaluation>
      GT_FUNCTION static void Do(Evaluation& eval, interval_end_0_end_0) {
        eval(tmp_1(0, 0, 0)) = (int)0;
      }

      template <typename Evaluation>
      GT_FUNCTION static void Do(Evaluation& eval, interval_start_0_end_minus_1) {
        eval(a(0, 0, 0)) = eval(tmp_1(0, 0, 1));
        eval(tmp(0, 0, 0)) = eval(a(0, 0, 0));
        eval(c(0, 0, 0)) = eval(tmp(0, 0, 0));
      }
    };

I guess tmp should not be versioned in this case

Create Iterators to loop over a StencilInstantiation's Stages

Technical Description

Currently there is no way to loop over a StencilInstation's Stage directly. The only way to access all stages is either a nested loop over all MultiStages and their Stages or a loop over StageIDX and finding them (which implements 1). An iterator over all the Stages that would find the next MS and go to it's stages could be implemented to be more efficient

Add support for customizing the type of backend and temporary_storage

It is currently not possible for the user to provide its own typedef for the backend or temporary_storage. This could be solved by templating the stencil wrapper (on the backend and/or temporary storage type) or providing a macro.

KCaches

The following example is not being kcache in

    vertical_region(k_start, k_end-1) {
        in = 3.5;
        out = in[i+1] + in;
}

because the condition here does not pass (not temporary and output)
https://github.com/MeteoSwiss-APN/dawn/blob/master/src/dawn/Optimizer/PassSetCaches.cpp#L231

Wrong code generation of GT interval

In the following example:

vertical_region(k_start, k_start) {}
 vertical_region(k_start+1, k_end) {
// code
}

the empty interval for k_start is required, otherwise it generates an interval<0,0> for GT

https://github.com/MeteoSwiss-APN/clang-gridtools/blob/673a219112bcae1841bd199b9aa1c7eedf2af04b/src/dycore/horizontal_advection_wwcon.cpp#L95

we should remove that hack in the wwcon (and probably others) once GT solves this issue

Temporary to Stencil Functions

This is a list of pending issues:

In here:
https://github.com/cosunae/dawn/blob/temporary_to_stencil_functions/src/dawn/Optimizer/PassTemporaryToStencilFunction.cpp#L78
this would assert in case of a stmt w/o assignment
field++;
Also in case of blockstmt, if, etc.
Currently new sir::StencilFunction are added to the SIR. We should not modify the SIR, once we refactor the internal IIR

Stencil Functions trailing ; in code gen

The trailing ; are not removed from stencil functions in code-gen leaving bad looking code

Specification of precision of local variables

Technical Descirption

There might be use-cases where the user wants to specify the precision of specific pieces of computation. The idea is to enhance the language with var[double] or var[float] to achieve this. We do not want to fall back on double / float as this still needs to go through the checks of local variables vs temporary storages.

Bug in -write-sir with block

When there is a block like if, or {} , sir fails to write

    vertical_region(k_start, k_end) {
      out = u[i+1];
      out += u[j-1];
      if(out==1) {
//        out *= u[k+2]+2.4;
//        out -= u[k-1];
        out=2;
      }
    }

codegen precision

currently we codegen clang::gridtools::float_type what can be specified at runtime. If precision is specified, it should be code-generated as such

protection for uninitialized tmp accesses

In the following example:
u tmp is not initialized in many vertical regions. We would need a protection issuing an error

stencil compute_extent_test_stencil {
  storage in, out1, out2;

  var u;
  Do {
    vertical_region(k_start+2, k_start+3)
      u = in;
    vertical_region(k_start+2,k_start+10)
      out1 = u[k+2] + u[k+1];
    vertical_region(k_start,k_start+8)
      out2 = u[k+6];
  }
};

PassSetSyncStage using graph

The PassSetSyncStage uses currently a linear traverse of the stages algorithm to detect if we need synchronization. That is correct but sometimes could add more sync than required.
We would like to use an algorithm on a DAG.

for(stage: stages)
  if(stage has edge) 
    add_sync
    remove all edges of stage

Static asserts are overly cautious

Currently we merge the field-extents with the stage-extents and add those up for the static asserts. This is too greedy since we're not considering the do-method intervals for the computation of halos in the k-dimension. See example here: We get the assert that we need a halo of 2 despite this is coming form the start_start interval:

stencil foo {
  storage a, b, c;
  storage out;

  Do {
    vertical_region(k_start, k_start) {
      a = b;
    }
    vertical_region(k_start + 1, k_end) {
      c = b;
    }
    vertical_region(k_start, k_start) {
      out = c + a[k + 2, j + 1];
    }
  }
};

demote temporary field to variable even if used in function

Here we could demote the temp field to a temp variable even if used in function,
as far as it is not used with an extent

dawn/dawn/src/dawn/Optimizer/PassTemporaryType.cpp

Line 183 in fa22a7d

 temporary.extent_.isPointwise() && !usedAsArgumentInStencilFun(stencilPtr, accessID) && 

illegal redundant computation with `+=` on fields

The following state of IIR should be illegal:
+= on a field (not temporary) in a stage with non-null extents

stage0 <extent<0,1> > {
  u += 3.14;
}
stage1 {
  res = sum(i+1, u);
}

A possible solution would be to run a second pass of the field versioning, after the extent of the stages has been computed.

bug in field versioning pass

In the following gtclang example, where u,v,w, pp, should be versioned, since the output is in the same storage as the input, the field version crashes:

#include "gridtools/clang_dsl.hpp"

using namespace gridtools::clang;

stencil_function avg {
  offset off;
  storage in;

  Do { return 0.5 * (in[off] + in); }
};

stencil_function delta {
  offset off;
  storage data;

  Do { return data[off] - data; }
};

stencil_function laplacian {
  storage data, crlato, crlatv;

  Do {
    return data[i + 1] + data[i - 1] - 2.0 * data +
           crlato * delta(j + 1, data) + crlatv * delta(j - 1, data);
  }
};

stencil_function diffusive_flux_x {
  storage lap, data;

  Do {
    const double flx = delta(i + 1, lap);
    return (flx * delta(i + 1, data)) > 0.0 ? 0.0 : flx;
  }
};

stencil_function diffusive_flux_y {
  storage lap, data, crlato;

  Do {
    const double fly = crlato * delta(j + 1, lap);
    return (fly * delta(j + 1, data)) > 0.0 ? 0.0 : fly;
  }
};

#pragma gtclang no_codegen
stencil type2 {
  storage data, crlato, crlatu, hdmask;
  var lap;

  Do {
    vertical_region(k_start, k_end) {
      lap = laplacian(data, crlato, crlatu);
      const double delta_flux_x = diffusive_flux_x(lap, data) -
                                  diffusive_flux_x(lap[i - 1], data[i - 1]);
      const double delta_flux_y =
          diffusive_flux_y(lap, data, crlato) -
          diffusive_flux_y(lap[j - 1], data[j - 1], crlato[j - 1]);
      data = data - hdmask * (delta_flux_x + delta_flux_y);
    }
  }
};

stencil horizontal_diffusion_type2 {
  // output
  storage u, v, w, pp;
  // input
  storage crlato, crlatu, hdmask;

  Do {
    type2(u, crlato, crlatu, hdmask);
    type2(v, crlato, crlatu, hdmask);
    type2(w, crlato, crlatu, hdmask);
    type2(pp, crlato, crlatu, hdmask);
  }
};

(gdb)
(gdb) bt
#0 0x00007ffff6811e70 in std::basic_streambuf<char, std::char_traits >::xsputn(char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007ffff6802ec6 in std::basic_ostream<char, std::char_traits >& std::__ostream_insert<char, std::char_traits >(std::basic_ostream<char, std::char_traits >&, char const*, long) ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x000000000163754e in dawn::DiagnosticsBuilder::operator<< <std::__cxx11::basic_string<char, std::char_traits, std::allocator >&> (this=0x7fffffffc0a0,
value=<error: Cannot access memory at address 0x6d68636e65622f77>) at /code/dawn/src/dawn/Compiler/DiagnosticsMessage.h:90
#3 0x0000000001657e69 in dawn::(anonymous namespace)::reportRaceCondition (statement=..., instantiation=0x22a49c0) at /code/dawn/src/dawn/Optimizer/PassFieldVersioning.cpp:84
#4 0x0000000001658cd2 in dawn::PassFieldVersioning::fixRaceCondition (this=0x22a5c80, graph=0x22a5e00, stencil=..., doMethod=..., loopOrder=dawn::LoopOrderKind::LK_Forward, stageIdx=3, index=2)
at /code/dawn/src/dawn/Optimizer/PassFieldVersioning.cpp:237
#5 0x000000000165833e in dawn::PassFieldVersioning::run (this=0x22a5c80, stencilInstantiation=0x22a49c0) at /code/dawn/src/dawn/Optimizer/PassFieldVersioning.cpp:129
#6 0x00000000016732be in dawn::PassManager::runPassOnStecilInstantiation (this=0x22a5dc8, instantiation=0x22a49c0, pass=0x22a5c80) at /code/dawn/src/dawn/Optimizer/PassManager.cpp:47
#7 0x000000000167314f in dawn::PassManager::runAllPassesOnStecilInstantiation (this=0x22a5dc8, instantiation=0x22a49c0) at /code/dawn/src/dawn/Optimizer/PassManager.cpp:36
#8 0x00000000016336af in dawn::DawnCompiler::runOptimizer (this=0x7fffffffce90, SIR=0x22a3da0) at /code/dawn/src/dawn/Compiler/DawnCompiler.cpp:167
#9 0x0000000001633bc7 in dawn::DawnCompiler::compile (this=0x7fffffffce90, SIR=0x22a3da0, codeGen=dawn::DawnCompiler::CG_GTClangNaiveCXX) at /code/dawn/src/dawn/Compiler/DawnCompiler.cpp:192
#10 0x0000000000a4e47e in gtclang::GTClangASTConsumer::HandleTranslationUnit (this=0x1e2e410, ASTContext=...) at /code/gtclang/src/gtclang/Frontend/GTClangASTConsumer.cpp:143
#11 0x000000000141cc2a in clang::ParseAST(clang::Sema&, bool, bool) ()
#12 0x0000000000e2f66e in clang::FrontendAction::Execute() ()
#13 0x0000000000e05146 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) ()
#14 0x0000000000a16671 in gtclang::Driver::run (args=...) at /code/gtclang/src/gtclang/Driver/Driver.cpp:69
#15 0x0000000000a15a3d in main (argc=5, argv=0x7fffffffdac8) at /code/gtclang/src/gtclang/Driver/gtclang.cpp:21

kcache intervals

Currently, the kcaches are identified based on non pointwise accesses in the vertical
and the interval where the field is accessed.
Sometimes the field is accessed with, for example extent [0,1] in interval1, but with pointwise access in interval2.

Still the kcache is used and generated for interval2, even if in that case does not make sense.

Example from vertical_diffusion_T:

    // Center fill of kcaches
    if (iblock >= 0 && iblock <= block_size_i - 1 + 0 && jblock >= 0 && jblock <= block_size_j - 1 + 0) {
      pia_kcache[0] = gridtools::clang::math::pow(
          ((gridtools::clang::float_type)1.0000000000000001E-5 * __ldg(&(p_s[idx110]))),
          ((gridtools::clang::float_type)287.05000000000001 / (gridtools::clang::float_type)1005));
    }
    // Flush of kcaches
    if (iblock >= 0 && iblock <= block_size_i - 1 + 0 && jblock >= 0 && jblock <= block_size_j - 1 + 0) {
      if (ksize - 1 + 1 - k >= 1) {
        pia[idx111 + stride_111_2 * 1] = pia_kcache[1];
      }
    }

assert instead of ignoring non-temp caches

The line here can be reached with a stencil of the shape

stencil foo {
  storage a, b, c;

  Do {
    vertical_region(k_start, k_start) {
      a = b[k + 2];
    }
    vertical_region(k_start + 1, k_end) {
      c = a;
      b = a;
    }
  }
};

I don't think this code should be considered illegal and the assert should not be asserting but reolving the problem. Am I missing something?

Promotion of Local Variables

If we want to promote local variables to temporaries and their first occurence is in an if-stmt, the pass breaks:

stencil Stencil {
  storage out;
  var temporary;
  Do {
    vertical_region(k_start, k_end) {
      if(false) {
        temporary = 0;
      }
      out = temporary;
    }
  }
};

This should be addressed in the checks in

dawn/src/dawn/IIR/StencilInstantiation.cpp

Line 384 in 1295420

ExprStmt* exprStmt = dyn_cast<ExprStmt>(oldStatement->ASTStmt.get());

and should handle these cases

error in dump()

dump() will crash in vertical_diffusion_dqvdt.cpp
because the getNameFromAccessID() crashes (accessID is not registered:

for the local variable increment (below as NOTFOUND)

  Stencil_0
  {
    MultiStage_0 [parallel]
    {
      Stage_0
      {
        Do_0 { Start : End }
        {
          float_type increment = ((fun-call:gridtools::clang::math::max(0, QV[0, 0, 0]) - QV_nnow[0, 0, 0]) * zdtr);
            Write Accesses:
              __local_increment_16 : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              QV_nnow : [(0, 0), (0, 0), (0, 0)]
              zdtr : [(0, 0), (0, 0), (0, 0)]
              QV : [(0, 0), (0, 0), (0, 0)]
              0 : [(0, 0), (0, 0), (0, 0)]

          if((true && (1 > 0)))
          {
            dqvdt[0, 0, 0] += increment;
          }
          else
          {
            dqvdt[0, 0, 0] = increment;
          }
            Write Accesses:
              dqvdt : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              __local_increment_16 : [(0, 0), (0, 0), (0, 0)]
              NOTFOUND : [(0, 0), (0, 0), (0, 0)]
              NOTFOUND : [(0, 0), (0, 0), (0, 0)]
              0 : [(0, 0), (0, 0), (0, 0)]
              dqvdt : [(0, 0), (0, 0), (0, 0)]

        }
        Extents: [(0, 0), (0, 0), (0, 0)]
      }
    }
  }

illegal field versioning

Technical issue

If field conversioning is applied to an input field that has not been written to before, no synchronization happens. This leads to inputdata being completely ignored

Example

stencil test {
  storage a;
  Do {
    vertical_region(k_start, k_end) { 
        a += a[i + 1];
    }
  }
};

Output of the field [initialzied with one] is 1 rather than 2.

Unary operators are not supported for storages

Technical Description

Unary operators are not allowed in gtclang if applied on storages but for temporaries (built-in types) it works fine.

Example

#include "gridtools/clang_dsl.hpp"

using namespace gridtools::clang;

stencil globals_stencil {
  storage out;

  Do {
    vertical_region(k_start, k_end) {
      out[i, j, k]++;
    }
  }
};

namespace-comments miss last letter

in code-gen, the last letter of the closing comments for namespaces is dropped

Static Asserts in GT-codegen depenent on the size (including halo) of Storages

Technical Description

We have the full extent information of any storages used in stencils. In order to avoid (hard to read) Gridtools-runtime errors, we could add static asserts to the constructor of the class holding the stencils

Staggering Fields leads to other use of halos in K

Currenty we do not have halos in K as this would require new types of fields. In oder to have production-code, we extend the domain by one. This leads to a minor change in static asserts for the storages that needs to be removed once ESCAPE solves the frontend-language problem

Add support for defining stencil functions in headers

Add support defining stencil functions in header files (currently we only parse stencil functions in the main file of the translation unit)

Issue with Promoting Temporaries

If you call

stencil_function footest {
  storage a, b;
  var c;
  Do {
    var d;
    b = a[i + 1];
    c = b[i + 1];
    d = a;
    b = d + a;
  }
};

stencil Test04 {
  storage a, b;
  var bar;
  void Do() {
    vertical_region(k_start, k_end) {
      bar = 20;
      footest(a, b, bar);
    }
  }
};

you get a compiletime assertion as the promotion fails

Jenkins tests are failing due to protobuf XML's

[xUnit] [INFO] - [GoogleTest-1.6] - 25 test report file(s) were found with the pattern '**/*.xml' relative to '/scratch/snx3000/jenkins/workspace/dawn/build_type/debug/label/daint' for the testing framework 'GoogleTest-1.6'.
[xUnit] [ERROR] - Test reports were found but not all of them are new. Did all the tests run?.

/scratch/snx3000/jenkins/workspace/dawn/build_type/debug/label/daint/bundle/build/protobuf-prefix/src/protobuf/java/compatibility_tests/v2.5.0/deps/pom.xml is 10 mo old
....

Bug in stage extents

The following example does not properly compute extents of stages:

stencil hori_diff_stencil {
  storage b, a;
  var tmp;

  Do {
    vertical_region(k_start, k_start) {
      tmp = a;
    }
    vertical_region(k_start+1, k_end-1) {
      tmp = tmp[k-1];
    }
    vertical_region(k_end, k_end) {
      tmp = tmp[k-1]+a;
    }

    vertical_region(k_start, k_start) {
      b = tmp[i+2];
    }
    vertical_region(k_start+1, k_end-4) {
      b = tmp[k-1,i+2];
    }
    vertical_region(k_end-3, k_end) {
      b = tmp[k-1,i-1]+a;
    }
  }
};

  Stencil_0
  {
    MultiStage_0 [forward]
    {
      Stage_0
      {
        Do_0 { Start : Start }
        {
          tmp[0, 0, 0] = a[0, 0, 0];
            Write Accesses:
              tmp : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              a : [(0, 0), (0, 0), (0, 0)]

        }
        Do_1 { Start+1 : End-1 }
        {
          tmp[0, 0, 0] = tmp[0, 0, -1];
            Write Accesses:
              tmp : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              tmp : [(0, 0), (0, 0), (-1, 0)]

        }
        Extents: [(-1, 0), (0, 0), (-1, 0)]
      }
      Stage_1
      {
        Do_0 { Start : Start }
        {
          b[0, 0, 0] = tmp[2, 0, 0];
            Write Accesses:
              b : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              tmp : [(0, 2), (0, 0), (0, 0)]

        }
        Do_1 { Start+1 : End-4 }
        {
          b[0, 0, 0] = tmp[2, 0, -1];
            Write Accesses:
              b : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              tmp : [(0, 2), (0, 0), (-1, 0)]

        }
        Do_2 { End : End }
        {
          tmp[0, 0, 0] = (tmp[0, 0, -1] + a[0, 0, 0]);
            Write Accesses:
              tmp : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              a : [(0, 0), (0, 0), (0, 0)]
              tmp : [(0, 0), (0, 0), (-1, 0)]

        }
        Extents: [(-1, 0), (0, 0), (-1, 0)]
      }
      Stage_2
      {
        Do_0 { End-3 : End }
        {
          b[0, 0, 0] = (tmp[-1, 0, -1] + a[0, 0, 0]);
            Write Accesses:
              b : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              a : [(0, 0), (0, 0), (0, 0)]
              tmp : [(-1, 0), (0, 0), (-1, 0)]

        }
        Extents: [(0, 0), (0, 0), (0, 0)]
      }
    }
  }

The line tmp[i+2]; should request an extent of at least i+2 for the stage computing tmp

Separation of Concerns between Cuda-Codegen and PassSetCaches

The most recent implementation of more advanced k-caches (coming with PR #158) mixed the algorithmic part of determining caches with the codegen. This part should be moved to the caching-pass.

We've decided to do this separately to not clutter up the PR too much

Literal expresssion derived from constexpr globals not found in StencilFunctionInstantiation

Error triggering from horizontal_advection.

THe problem was fixed with this hack

https://github.com/MeteoSwiss-APN/dawn/blob/master/src/dawn/IIR/StencilFunctionInstantiation.cpp#L320

The problem is that globals with constexpr value are replaced by literals but not inserted in the StencilFUnctionInstantiaion (if used there), only in StencilInstantiation

might be related to #88

Fix Splitting dependent on Max-Halo

Technical Description

Currently, we greedily merge multistages until we either hit a synchronisation issue or the halo extent becomes bigger than the specified max-halo. Former is correct but the latter yields to wrong results.

Example:

out of

#include "gtclang_dsl_defs/gtclang_dsl.hpp"

using namespace gtclang::dsl;
stencil test {

  storage in, out, thrid;
  Do {
    vertical_region(k_start, k_end) { out = in; }
    vertical_region(k_start, k_end) { thrid = out[i + 1]; }
  }
};

with the arguments -max-fields=1 -fsplit-stencils we currently create:

 Stencil_0
  {
    MultiStage_0 [parallel]
    {
      Stage_0
      {
        Do_0 { Start : End }
        {
          out[<no_horizontal_offset>,0] = in[<no_horizontal_offset>,0];
            Write Accesses:
              out : [<no_horizontal_extent>,(0,0)]
            Read Accesses:
              in : [<no_horizontal_extent>,(0,0)]

        }
        Extents: [<no_horizontal_extent>,(0,0)]
      }
    }
  }
  Stencil_1
  {
    MultiStage_0 [parallel]
    {
      Stage_0
      {
        Do_0 { Start : End }
        {
          thrid[<no_horizontal_offset>,0] = out[1,0,0];
            Write Accesses:
              thrid : [<no_horizontal_extent>,(0,0)]
            Read Accesses:
              out : [(1,1),(0,0),(0,0)]

        }
        Extents: [<no_horizontal_extent>,(0,0)]
      }
    }
  }

We would need an apply of boundary conditions between these stencils. Since these are currently disabled, we should not be able to split stencils here in the first place

Compiler flag for report of passes

In order to minimize the amount of options shown by the compiler, we would like to have the convention

-freport-pass-<PassName>

and additionally a -freport-pass-all. Then all the individual report flags dont need to be shown in the --help

Mutations for metaheuristics

This is an issue to discuss what mutations,how to perform them on the IIR and what functionality we need to apply them:

<i,j,k> block sizes: We need to add these parameters to the IIR level
stencil function <-> tmp precomputation replacement. We would need to configure the Pass so that it makes a replacement in each of both direction of a single (random?) tmp computation.
Elemental EPU transformation: In a DAG of EPUs, take a single EPU inside a MS into another MS. We need a new pass that does that. And we will need to re-run some passes inside the modified MSs, like the fusion of stages, identification of caches, etc.
we need to add EPUs label as a derived info of the StatementAccessPair level. We need to use the ms splitting and stage with maxSplit option in such a way that does not reorganize the stms but rather extracts EPUs labels. We also need to build the DAG of EPUs
Cache mutation pass: The IIR will contain a subset of caches that is generated by the Caches Pass. We need to either remove one cache from the IIR or re-run the Caches pass and from the
Level Fusion of Stages: Sometimes fusion all the statements possible due to dependencies will create artificially large redundant computations, that could be avoided with a more fine grained split of stages. We are not sure how we can experiment with multiple permutations here, probably it can be left for a second stage.

More???

Naive CodeGen does a lot more synchronistation than needed

Currently, all the fields of any given stencil are snychronized before every MSS. See here This could be reduced to only include the ones that are needed

bug in SIR serializer with if/else blocks

The following gtclang code produces a SIR with if blocks (no else).
The absence of an else triggers a crash in
https://github.com/MeteoSwiss-APN/dawn/blob/master/src/dawn/SIR/SIRSerializer.cpp#L292

stencil hori_diff_stencil {
  storage u, out, coeff;

  var flx, fly, lap;
  Do {
    vertical_region(k_start, k_end) {
      lap = u[i + 1] + u[i - 1] + u[j + 1] + u[j - 1] - 4.0 * u;
      flx = lap[i+1] - lap;
      if (flx * (u[i+1] - u) > 0)
        flx = 0.;
      fly = lap[j+1] - lap;
      if (fly * (u[j+1] - u) > 0)
        fly = 0.;
      out = u - coeff * (flx - flx[i-1] + fly - fly[j-1]);
    }
  }
};

naive code requires BOOST_MPL_INCLUDE

After the update to the new Gritdools version, stencils require BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS to be set for unknown reasons. For example, horizontal_diffusion_limiter breaks

Bug parsing

parsing the following

double sqrtgrhor;
storage kh, vdtch;
      var gct = computeGct(kh, vdtch) * sqrtgrhor;

where computeGct is a stencil function I get the following error

../src/dycore/vertical_diffusion_T.cpp:176:7: error: only single declarations are currently supported: expected ; got ,
      var gct = computeGct(kh, vdtch) * sqrtgrhor;

Diagnostics: Stack trace

During detection of unresolvable race conditions a proper stack trace should be issued to the user.

(partially implemented)

fix var for non void do-methods

Technical Description

Current GTClang-syntax is supporting three methods of writing Do-Methods:

Do { ...}
void Do {...}
void Do() {...}

The preprocessor always modifies versions 1 and 2 to read void Do(). If temporary variables are present that are declared inside the Do-Method the preprocessor is messing up the replacement and only variant 3 is working.

Example

// RUN: %gtclang% %file% -dump-pp

#include "gridtools/clang_dsl.hpp"
using namespace gridtools::clang;

stencil test {
  storage a;
  Do {
    vertical_region(k_start, k_end) {
      var ee = 2;
      a += a[i + 1] + ee;
    }
  }
};

IIR refactoring TODO

The fields store the extent and extentRB, starting from the Stage level. Additionally StatementAccessPair and DoMethod compute a similar quantity with computeMaximumExtent(), we should make this later one probably a derived info, pushing the derived info to the level of the stmt access or do method. I am not sure though if they actually produce the same numbers, since the method takes into account block if/else statements... we would need to double check
computeEnclosingAccessInterval is another candidate to move into Fields or derived info?
currently StencilInstantiation contains an IIR. We discussed it should be the other way around, the current StencilInstantiation could be the DerivedInfo of an IIR. We need first to extract what of the StencilInstatiation really belongs to IIR and what is derived info.
One more consideration, we also want to have a StencilInstantiation and StencilFunctionInstantiation inheriting from the same base class. Additionally the StencilInstantiation captures global info that is used by multiple levels of the tree. See for example, the Stage is using the maps stored in StencilInstantiation. Therefore I think we need to distinguish between derived info and global info. For example, GlobalVariableAccessIDSet_ is not a derived info, on the contrary it is used by other levels, like the Stage in order to compute derived info. Derived info should be information that is solely computed with the information of the IIR of the current level and children. And list of access ids of global variables is not, it is computed from the input HIR ? So I would propose to :

Separate from StencilInstantiation what really belongs to the IIR
Create a Context object with all the global maps information (that is not derived info), like GlobalVariableAccessIDSet_ . Additionally it contains methods, and getters like getNameFromAccessID, etc... We can then have StencilContext and StencilFunctionContext inheriting from Context base class. Other levels of the tree and visitors can use a Context.
Finally create a derived info that contains the derived info from the IIR of the current level and below.

In PassStencilSplitter, we return in some methods a container of iir::Stencil. We should create a new IIR object

Greedy Algorithm for StageExtents needs optimization

Technical Descirption

Proper Stage Extent Propagation is not working since one of the patterns assumed to be illegal when designing the algorithm is actually legal: we can have a stage where we encouter a read before write within a stage:

If we had a stencil of this sort:

vertical_region(k_start, k_end){ 
  a = b[k-1];
  c = a + 10;
  b = a + c;
}

we would have an iterative process that requires the reading of b beforehand. The current algorithm breaks here.
With the PR #103 we resolve this issue to maintain legallity but move to a greedier algorithm. We would need the following (functionality is not all in place):

auto readInterval = computeReadAccessInterval(accessID);
        if(readInterval.empty())
            continue;
        Extents fieldExtent = fromFieldExtents;
        fieldExtent.expand(stageExtent);
        for(int j = i - 1; j >= 0; --j) {
          Stage& toStage = *(stencil.getStage(j));
            if(!readInterval.overlaps(toStage.interval()))
              continue;
      ......

1 argument stencil function with no return

Currently it is not possible to parse stencil functions which have 1 argument and no return statements

foo(a);

The reason is this is interpreted as a variable declaration (clang::VarDecl) i.e

foo a;

which shadows the type of a (it is thus not possible to determine if a was a storage without comparing the name to the members of the stencil)

Reduce Redundant Computation on GPU

Technical Description

If compuatation demands it, we increase the local compute domain to ensure correctness.
If we end up in a case where this dependency is split across Multistages, this leads to unnecessary computation if we do GPU calcuation since every Multistage is its own kernel-call. Ideally we would only extend the global compute-domain and ignore the locally increased extent.

Example

#include "gtclang_dsl_defs/gtclang_dsl.hpp"

using namespace gtclang::dsl;
stencil test {

  storage in, out, thrid;
  Do {
    vertical_region(k_start, k_end) {
      out = in[j + 1] + out[k - 1];
      thrid = out[i + 1, k + 1];
    }
  }
};

Generates

Stencil_0
  {
    MultiStage_0 [forward]
    {
      Stage_0
      {
        Do_0 { Start : End }
        {
          out[<no_horizontal_offset>,0] = (in[0,1,0] + out[<no_horizontal_offset>,-1]);
            Write Accesses:
              out : [<no_horizontal_extent>,(0,0)]
            Read Accesses:
              out : [<no_horizontal_extent>,(-1,-1)]
              in : [(0,0),(1,1),(0,0)]

        }
        Extents: [(0,1),(0,0),(0,0)]
      }
    }
    MultiStage_1 [parallel]
    {
      Stage_0
      {
        Do_0 { Start : End }
        {
          thrid[<no_horizontal_offset>,0] = out[1,0,1];
            Write Accesses:
              thrid : [<no_horizontal_extent>,(0,0)]
            Read Accesses:
              out : [(1,1),(0,0),(1,1)]

        }
        Extents: [<no_horizontal_extent>,(0,0)]
      }
    }
  }

syntax for output parameters to function calls

cosunae:
With C++-17 structured bindings feature,(supposedly available in clang 4)
https://clang.llvm.org/cxx_status.html

we can improve the semantic of input/output in the function call

auto [u_tens_stage, v_tens_stage] = horizontal_advection::uv( u_stage, v_stage, tgrlatda0,
tgrlatda1);

global generation bug

The global member of
https://github.com/MeteoSwiss-APN/gtclang/blob/master/test/integration-test/CodeGen/globals_stencil.cpp#L23
is generated as int instead of double since the type is taken from the literal (2)

meteoswiss-apn / dawn Goto Github PK

dawn's People

Contributors

Stargazers

Watchers

Forkers

dawn's Issues

Technical Description

Technical Descirption

Technical issue

Example

Technical Description

Example

Technical Description

Technical Description

Example:

Technical Description

Example

Technical Descirption

Technical Description

Example

Recommend Projects

Recommend Topics

Recommend Org