Giter Club home page Giter Club logo

dawn's People

Contributors

benweber42 avatar cosunae avatar eddie-c-davis avatar havogt avatar jdahm avatar lukasm91 avatar mroethlin avatar muellch avatar rupertford avatar samkellerhals avatar stagno avatar stefanmoosbrugger avatar thfabian avatar twicki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dawn's Issues

Add support for splitting code generation into multiple translation units

When compiling large files it would be convenient to generate multiple .cpp files instead of just one.

Reasoning:

  • Improve compilation speed by leveraging multi-threaded compilation.
  • Allow incremental builds i.e only generate the files if it is necessary. E.g if only one stencil is changed we only need to regenerate the corresponding translation unit. To check if a file needs to be touched we could simply compute a hash and compare.

Possible problems:

  • It is unknown how to integrate this nicely with the CMake module as we do not a priori know how many files will be generated. A solution could be to set a fixed number of files which need to be generated.

Caching can lead to invalid results

The fill and flush strategy does not fully care about ranges so far. So if on a certain range we need flush (that is not an EP-Flush) but we only wrote into a part of the filed while it was cached, we flush uninitialized data back to main memory since flushes span the full domain.

To prevent this there are two options:

  • Allow for flushes on specific ranges (similar to PB-Fill / EP-Flush)
  • Do not cache once this behavior is found

demote in/out storage to temporary storage

When a user in/out storage is used within a single mss as in/out, there will be false sharing that will deteriorate the performance. We should version it
Example is lap in hori_diff_stencil_01, although in this case we should check that the storage is not read before write, so no need to version it

Invalid SourceLocation comparison

Is this a bug or intended?
https://github.com/MeteoSwiss-APN/dawn/blob/master/src/dawn/Support/SourceLocation.cpp#L25

extern bool operator==(const SourceLocation& a, const SourceLocation& b) {
  return a.Line == b.Line && b.Column == b.Column;
}

I guess it should be .. && a.Column == b.Column.

Bug in field versioning (in vertical reduction pattern)

The following example will do versioning of tmp, which will lead to wrong result

stencil stencil {
  storage a, tmp, c;

  Do {
    vertical_region(k_end, k_end)
      tmp=0;

    vertical_region(k_end-1, k_start) {
      a = tmp[k+1];
      tmp = a;
      c = tmp;
    }
  }
};
    struct stage_0_0 {
      using c = gridtools::accessor<0, gridtools::enumtype::inout, gridtools::extent<0, 0, 0, 0, 0, 0>>;
      using tmp = gridtools::accessor<1, gridtools::enumtype::inout, gridtools::extent<0, 0, 0, 0, 0, 0>>;
      using a = gridtools::accessor<2, gridtools::enumtype::inout, gridtools::extent<0, 0, 0, 0, 0, 0>>;
      using tmp_1 = gridtools::accessor<3, gridtools::enumtype::inout, gridtools::extent<0, 0, 0, 0, 0, 1>>;
      using arg_list = boost::mpl::vector<c, tmp, a, tmp_1>;

      template <typename Evaluation>
      GT_FUNCTION static void Do(Evaluation& eval, interval_end_0_end_0) {
        eval(tmp_1(0, 0, 0)) = (int)0;
      }

      template <typename Evaluation>
      GT_FUNCTION static void Do(Evaluation& eval, interval_start_0_end_minus_1) {
        eval(a(0, 0, 0)) = eval(tmp_1(0, 0, 1));
        eval(tmp(0, 0, 0)) = eval(a(0, 0, 0));
        eval(c(0, 0, 0)) = eval(tmp(0, 0, 0));
      }
    };

I guess tmp should not be versioned in this case

Create Iterators to loop over a StencilInstantiation's Stages

Technical Description

Currently there is no way to loop over a StencilInstation's Stage directly. The only way to access all stages is either a nested loop over all MultiStages and their Stages or a loop over StageIDX and finding them (which implements 1). An iterator over all the Stages that would find the next MS and go to it's stages could be implemented to be more efficient

Specification of precision of local variables

Technical Descirption

There might be use-cases where the user wants to specify the precision of specific pieces of computation. The idea is to enhance the language with var[double] or var[float] to achieve this. We do not want to fall back on double / float as this still needs to go through the checks of local variables vs temporary storages.

Bug in -write-sir with block

When there is a block like if, or {} , sir fails to write

    vertical_region(k_start, k_end) {
      out = u[i+1];
      out += u[j-1];
      if(out==1) {
//        out *= u[k+2]+2.4;
//        out -= u[k-1];
        out=2;
      }
    }

codegen precision

currently we codegen clang::gridtools::float_type what can be specified at runtime. If precision is specified, it should be code-generated as such

protection for uninitialized tmp accesses

In the following example:
u tmp is not initialized in many vertical regions. We would need a protection issuing an error

stencil compute_extent_test_stencil {
  storage in, out1, out2;

  var u;
  Do {
    vertical_region(k_start+2, k_start+3)
      u = in;
    vertical_region(k_start+2,k_start+10)
      out1 = u[k+2] + u[k+1];
    vertical_region(k_start,k_start+8)
      out2 = u[k+6];
  }
};

PassSetSyncStage using graph

The PassSetSyncStage uses currently a linear traverse of the stages algorithm to detect if we need synchronization. That is correct but sometimes could add more sync than required.
We would like to use an algorithm on a DAG.

for(stage: stages)
  if(stage has edge) 
    add_sync
    remove all edges of stage

Static asserts are overly cautious

Currently we merge the field-extents with the stage-extents and add those up for the static asserts. This is too greedy since we're not considering the do-method intervals for the computation of halos in the k-dimension. See example here: We get the assert that we need a halo of 2 despite this is coming form the start_start interval:

stencil foo {
  storage a, b, c;
  storage out;

  Do {
    vertical_region(k_start, k_start) {
      a = b;
    }
    vertical_region(k_start + 1, k_end) {
      c = b;
    }
    vertical_region(k_start, k_start) {
      out = c + a[k + 2, j + 1];
    }
  }
};

illegal redundant computation with `+=` on fields

The following state of IIR should be illegal:
+= on a field (not temporary) in a stage with non-null extents

stage0 <extent<0,1> > {
  u += 3.14;
}
stage1 {
  res = sum(i+1, u);
}

A possible solution would be to run a second pass of the field versioning, after the extent of the stages has been computed.

bug in field versioning pass

In the following gtclang example, where u,v,w, pp, should be versioned, since the output is in the same storage as the input, the field version crashes:

#include "gridtools/clang_dsl.hpp"

using namespace gridtools::clang;

stencil_function avg {
  offset off;
  storage in;

  Do { return 0.5 * (in[off] + in); }
};

stencil_function delta {
  offset off;
  storage data;

  Do { return data[off] - data; }
};

stencil_function laplacian {
  storage data, crlato, crlatv;

  Do {
    return data[i + 1] + data[i - 1] - 2.0 * data +
           crlato * delta(j + 1, data) + crlatv * delta(j - 1, data);
  }
};

stencil_function diffusive_flux_x {
  storage lap, data;

  Do {
    const double flx = delta(i + 1, lap);
    return (flx * delta(i + 1, data)) > 0.0 ? 0.0 : flx;
  }
};

stencil_function diffusive_flux_y {
  storage lap, data, crlato;

  Do {
    const double fly = crlato * delta(j + 1, lap);
    return (fly * delta(j + 1, data)) > 0.0 ? 0.0 : fly;
  }
};

#pragma gtclang no_codegen
stencil type2 {
  storage data, crlato, crlatu, hdmask;
  var lap;

  Do {
    vertical_region(k_start, k_end) {
      lap = laplacian(data, crlato, crlatu);
      const double delta_flux_x = diffusive_flux_x(lap, data) -
                                  diffusive_flux_x(lap[i - 1], data[i - 1]);
      const double delta_flux_y =
          diffusive_flux_y(lap, data, crlato) -
          diffusive_flux_y(lap[j - 1], data[j - 1], crlato[j - 1]);
      data = data - hdmask * (delta_flux_x + delta_flux_y);
    }
  }
};

stencil horizontal_diffusion_type2 {
  // output
  storage u, v, w, pp;
  // input
  storage crlato, crlatu, hdmask;

  Do {
    type2(u, crlato, crlatu, hdmask);
    type2(v, crlato, crlatu, hdmask);
    type2(w, crlato, crlatu, hdmask);
    type2(pp, crlato, crlatu, hdmask);
  }
};

(gdb)
(gdb) bt
#0 0x00007ffff6811e70 in std::basic_streambuf<char, std::char_traits >::xsputn(char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007ffff6802ec6 in std::basic_ostream<char, std::char_traits >& std::__ostream_insert<char, std::char_traits >(std::basic_ostream<char, std::char_traits >&, char const*, long) ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x000000000163754e in dawn::DiagnosticsBuilder::operator<< <std::__cxx11::basic_string<char, std::char_traits, std::allocator >&> (this=0x7fffffffc0a0,
value=<error: Cannot access memory at address 0x6d68636e65622f77>) at /code/dawn/src/dawn/Compiler/DiagnosticsMessage.h:90
#3 0x0000000001657e69 in dawn::(anonymous namespace)::reportRaceCondition (statement=..., instantiation=0x22a49c0) at /code/dawn/src/dawn/Optimizer/PassFieldVersioning.cpp:84
#4 0x0000000001658cd2 in dawn::PassFieldVersioning::fixRaceCondition (this=0x22a5c80, graph=0x22a5e00, stencil=..., doMethod=..., loopOrder=dawn::LoopOrderKind::LK_Forward, stageIdx=3, index=2)
at /code/dawn/src/dawn/Optimizer/PassFieldVersioning.cpp:237
#5 0x000000000165833e in dawn::PassFieldVersioning::run (this=0x22a5c80, stencilInstantiation=0x22a49c0) at /code/dawn/src/dawn/Optimizer/PassFieldVersioning.cpp:129
#6 0x00000000016732be in dawn::PassManager::runPassOnStecilInstantiation (this=0x22a5dc8, instantiation=0x22a49c0, pass=0x22a5c80) at /code/dawn/src/dawn/Optimizer/PassManager.cpp:47
#7 0x000000000167314f in dawn::PassManager::runAllPassesOnStecilInstantiation (this=0x22a5dc8, instantiation=0x22a49c0) at /code/dawn/src/dawn/Optimizer/PassManager.cpp:36
#8 0x00000000016336af in dawn::DawnCompiler::runOptimizer (this=0x7fffffffce90, SIR=0x22a3da0) at /code/dawn/src/dawn/Compiler/DawnCompiler.cpp:167
#9 0x0000000001633bc7 in dawn::DawnCompiler::compile (this=0x7fffffffce90, SIR=0x22a3da0, codeGen=dawn::DawnCompiler::CG_GTClangNaiveCXX) at /code/dawn/src/dawn/Compiler/DawnCompiler.cpp:192
#10 0x0000000000a4e47e in gtclang::GTClangASTConsumer::HandleTranslationUnit (this=0x1e2e410, ASTContext=...) at /code/gtclang/src/gtclang/Frontend/GTClangASTConsumer.cpp:143
#11 0x000000000141cc2a in clang::ParseAST(clang::Sema&, bool, bool) ()
#12 0x0000000000e2f66e in clang::FrontendAction::Execute() ()
#13 0x0000000000e05146 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) ()
#14 0x0000000000a16671 in gtclang::Driver::run (args=...) at /code/gtclang/src/gtclang/Driver/Driver.cpp:69
#15 0x0000000000a15a3d in main (argc=5, argv=0x7fffffffdac8) at /code/gtclang/src/gtclang/Driver/gtclang.cpp:21

kcache intervals

Currently, the kcaches are identified based on non pointwise accesses in the vertical
and the interval where the field is accessed.
Sometimes the field is accessed with, for example extent [0,1] in interval1, but with pointwise access in interval2.

Still the kcache is used and generated for interval2, even if in that case does not make sense.

Example from vertical_diffusion_T:

    // Center fill of kcaches
    if (iblock >= 0 && iblock <= block_size_i - 1 + 0 && jblock >= 0 && jblock <= block_size_j - 1 + 0) {
      pia_kcache[0] = gridtools::clang::math::pow(
          ((gridtools::clang::float_type)1.0000000000000001E-5 * __ldg(&(p_s[idx110]))),
          ((gridtools::clang::float_type)287.05000000000001 / (gridtools::clang::float_type)1005));
    }
    // Flush of kcaches
    if (iblock >= 0 && iblock <= block_size_i - 1 + 0 && jblock >= 0 && jblock <= block_size_j - 1 + 0) {
      if (ksize - 1 + 1 - k >= 1) {
        pia[idx111 + stride_111_2 * 1] = pia_kcache[1];
      }
    }

assert instead of ignoring non-temp caches

The line here can be reached with a stencil of the shape

stencil foo {
  storage a, b, c;

  Do {
    vertical_region(k_start, k_start) {
      a = b[k + 2];
    }
    vertical_region(k_start + 1, k_end) {
      c = a;
      b = a;
    }
  }
};

I don't think this code should be considered illegal and the assert should not be asserting but reolving the problem. Am I missing something?

Promotion of Local Variables

If we want to promote local variables to temporaries and their first occurence is in an if-stmt, the pass breaks:

stencil Stencil {
  storage out;
  var temporary;
  Do {
    vertical_region(k_start, k_end) {
      if(false) {
        temporary = 0;
      }
      out = temporary;
    }
  }
};

This should be addressed in the checks in

ExprStmt* exprStmt = dyn_cast<ExprStmt>(oldStatement->ASTStmt.get());

and should handle these cases

error in dump()

dump() will crash in vertical_diffusion_dqvdt.cpp
because the getNameFromAccessID() crashes (accessID is not registered:

for the local variable increment (below as NOTFOUND)

  Stencil_0
  {
    MultiStage_0 [parallel]
    {
      Stage_0
      {
        Do_0 { Start : End }
        {
          float_type increment = ((fun-call:gridtools::clang::math::max(0, QV[0, 0, 0]) - QV_nnow[0, 0, 0]) * zdtr);
            Write Accesses:
              __local_increment_16 : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              QV_nnow : [(0, 0), (0, 0), (0, 0)]
              zdtr : [(0, 0), (0, 0), (0, 0)]
              QV : [(0, 0), (0, 0), (0, 0)]
              0 : [(0, 0), (0, 0), (0, 0)]

          if((true && (1 > 0)))
          {
            dqvdt[0, 0, 0] += increment;
          }
          else
          {
            dqvdt[0, 0, 0] = increment;
          }
            Write Accesses:
              dqvdt : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              __local_increment_16 : [(0, 0), (0, 0), (0, 0)]
              NOTFOUND : [(0, 0), (0, 0), (0, 0)]
              NOTFOUND : [(0, 0), (0, 0), (0, 0)]
              0 : [(0, 0), (0, 0), (0, 0)]
              dqvdt : [(0, 0), (0, 0), (0, 0)]

        }
        Extents: [(0, 0), (0, 0), (0, 0)]
      }
    }
  }

illegal field versioning

Technical issue

If field conversioning is applied to an input field that has not been written to before, no synchronization happens. This leads to inputdata being completely ignored

Example

stencil test {
  storage a;
  Do {
    vertical_region(k_start, k_end) { 
        a += a[i + 1];
    }
  }
};

Output of the field [initialzied with one] is 1 rather than 2.

Unary operators are not supported for storages

Technical Description

Unary operators are not allowed in gtclang if applied on storages but for temporaries (built-in types) it works fine.

Example

#include "gridtools/clang_dsl.hpp"

using namespace gridtools::clang;

stencil globals_stencil {
  storage out;

  Do {
    vertical_region(k_start, k_end) {
      out[i, j, k]++;
    }
  }
};

Staggering Fields leads to other use of halos in K

Currenty we do not have halos in K as this would require new types of fields. In oder to have production-code, we extend the domain by one. This leads to a minor change in static asserts for the storages that needs to be removed once ESCAPE solves the frontend-language problem

Issue with Promoting Temporaries

If you call

stencil_function footest {
  storage a, b;
  var c;
  Do {
    var d;
    b = a[i + 1];
    c = b[i + 1];
    d = a;
    b = d + a;
  }
};

stencil Test04 {
  storage a, b;
  var bar;
  void Do() {
    vertical_region(k_start, k_end) {
      bar = 20;
      footest(a, b, bar);
    }
  }
};

you get a compiletime assertion as the promotion fails

Jenkins tests are failing due to protobuf XML's

[xUnit] [INFO] - [GoogleTest-1.6] - 25 test report file(s) were found with the pattern '**/*.xml' relative to '/scratch/snx3000/jenkins/workspace/dawn/build_type/debug/label/daint' for the testing framework 'GoogleTest-1.6'.
[xUnit] [ERROR] - Test reports were found but not all of them are new. Did all the tests run?.

  • /scratch/snx3000/jenkins/workspace/dawn/build_type/debug/label/daint/bundle/build/protobuf-prefix/src/protobuf/java/compatibility_tests/v2.5.0/deps/pom.xml is 10 mo old
  • ....

Bug in stage extents

The following example does not properly compute extents of stages:

stencil hori_diff_stencil {
  storage b, a;
  var tmp;

  Do {
    vertical_region(k_start, k_start) {
      tmp = a;
    }
    vertical_region(k_start+1, k_end-1) {
      tmp = tmp[k-1];
    }
    vertical_region(k_end, k_end) {
      tmp = tmp[k-1]+a;
    }

    vertical_region(k_start, k_start) {
      b = tmp[i+2];
    }
    vertical_region(k_start+1, k_end-4) {
      b = tmp[k-1,i+2];
    }
    vertical_region(k_end-3, k_end) {
      b = tmp[k-1,i-1]+a;
    }
  }
};
  Stencil_0
  {
    MultiStage_0 [forward]
    {
      Stage_0
      {
        Do_0 { Start : Start }
        {
          tmp[0, 0, 0] = a[0, 0, 0];
            Write Accesses:
              tmp : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              a : [(0, 0), (0, 0), (0, 0)]

        }
        Do_1 { Start+1 : End-1 }
        {
          tmp[0, 0, 0] = tmp[0, 0, -1];
            Write Accesses:
              tmp : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              tmp : [(0, 0), (0, 0), (-1, 0)]

        }
        Extents: [(-1, 0), (0, 0), (-1, 0)]
      }
      Stage_1
      {
        Do_0 { Start : Start }
        {
          b[0, 0, 0] = tmp[2, 0, 0];
            Write Accesses:
              b : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              tmp : [(0, 2), (0, 0), (0, 0)]

        }
        Do_1 { Start+1 : End-4 }
        {
          b[0, 0, 0] = tmp[2, 0, -1];
            Write Accesses:
              b : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              tmp : [(0, 2), (0, 0), (-1, 0)]

        }
        Do_2 { End : End }
        {
          tmp[0, 0, 0] = (tmp[0, 0, -1] + a[0, 0, 0]);
            Write Accesses:
              tmp : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              a : [(0, 0), (0, 0), (0, 0)]
              tmp : [(0, 0), (0, 0), (-1, 0)]

        }
        Extents: [(-1, 0), (0, 0), (-1, 0)]
      }
      Stage_2
      {
        Do_0 { End-3 : End }
        {
          b[0, 0, 0] = (tmp[-1, 0, -1] + a[0, 0, 0]);
            Write Accesses:
              b : [(0, 0), (0, 0), (0, 0)]
            Read Accesses:
              a : [(0, 0), (0, 0), (0, 0)]
              tmp : [(-1, 0), (0, 0), (-1, 0)]

        }
        Extents: [(0, 0), (0, 0), (0, 0)]
      }
    }
  }

The line tmp[i+2]; should request an extent of at least i+2 for the stage computing tmp

Fix Splitting dependent on Max-Halo

Technical Description

Currently, we greedily merge multistages until we either hit a synchronisation issue or the halo extent becomes bigger than the specified max-halo. Former is correct but the latter yields to wrong results.

Example:

out of

#include "gtclang_dsl_defs/gtclang_dsl.hpp"

using namespace gtclang::dsl;
stencil test {

  storage in, out, thrid;
  Do {
    vertical_region(k_start, k_end) { out = in; }
    vertical_region(k_start, k_end) { thrid = out[i + 1]; }
  }
};

with the arguments -max-fields=1 -fsplit-stencils we currently create:

 Stencil_0
  {
    MultiStage_0 [parallel]
    {
      Stage_0
      {
        Do_0 { Start : End }
        {
          out[<no_horizontal_offset>,0] = in[<no_horizontal_offset>,0];
            Write Accesses:
              out : [<no_horizontal_extent>,(0,0)]
            Read Accesses:
              in : [<no_horizontal_extent>,(0,0)]

        }
        Extents: [<no_horizontal_extent>,(0,0)]
      }
    }
  }
  Stencil_1
  {
    MultiStage_0 [parallel]
    {
      Stage_0
      {
        Do_0 { Start : End }
        {
          thrid[<no_horizontal_offset>,0] = out[1,0,0];
            Write Accesses:
              thrid : [<no_horizontal_extent>,(0,0)]
            Read Accesses:
              out : [(1,1),(0,0),(0,0)]

        }
        Extents: [<no_horizontal_extent>,(0,0)]
      }
    }
  }

We would need an apply of boundary conditions between these stencils. Since these are currently disabled, we should not be able to split stencils here in the first place

Compiler flag for report of passes

In order to minimize the amount of options shown by the compiler, we would like to have the convention

-freport-pass-<PassName>

and additionally a -freport-pass-all. Then all the individual report flags dont need to be shown in the --help

Mutations for metaheuristics

This is an issue to discuss what mutations,how to perform them on the IIR and what functionality we need to apply them:

  • <i,j,k> block sizes: We need to add these parameters to the IIR level
  • stencil function <-> tmp precomputation replacement. We would need to configure the Pass so that it makes a replacement in each of both direction of a single (random?) tmp computation.
  • Elemental EPU transformation: In a DAG of EPUs, take a single EPU inside a MS into another MS. We need a new pass that does that. And we will need to re-run some passes inside the modified MSs, like the fusion of stages, identification of caches, etc.
  • we need to add EPUs label as a derived info of the StatementAccessPair level. We need to use the ms splitting and stage with maxSplit option in such a way that does not reorganize the stms but rather extracts EPUs labels. We also need to build the DAG of EPUs
  • Cache mutation pass: The IIR will contain a subset of caches that is generated by the Caches Pass. We need to either remove one cache from the IIR or re-run the Caches pass and from the
  • Level Fusion of Stages: Sometimes fusion all the statements possible due to dependencies will create artificially large redundant computations, that could be avoided with a more fine grained split of stages. We are not sure how we can experiment with multiple permutations here, probably it can be left for a second stage.

More???

bug in SIR serializer with if/else blocks

The following gtclang code produces a SIR with if blocks (no else).
The absence of an else triggers a crash in
https://github.com/MeteoSwiss-APN/dawn/blob/master/src/dawn/SIR/SIRSerializer.cpp#L292

stencil hori_diff_stencil {
  storage u, out, coeff;

  var flx, fly, lap;
  Do {
    vertical_region(k_start, k_end) {
      lap = u[i + 1] + u[i - 1] + u[j + 1] + u[j - 1] - 4.0 * u;
      flx = lap[i+1] - lap;
      if (flx * (u[i+1] - u) > 0)
        flx = 0.;
      fly = lap[j+1] - lap;
      if (fly * (u[j+1] - u) > 0)
        fly = 0.;
      out = u - coeff * (flx - flx[i-1] + fly - fly[j-1]);
    }
  }
};

naive code requires BOOST_MPL_INCLUDE

After the update to the new Gritdools version, stencils require BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS to be set for unknown reasons. For example, horizontal_diffusion_limiter breaks

Bug parsing

parsing the following

double sqrtgrhor;
storage kh, vdtch;
      var gct = computeGct(kh, vdtch) * sqrtgrhor;

where computeGct is a stencil function I get the following error

../src/dycore/vertical_diffusion_T.cpp:176:7: error: only single declarations are currently supported: expected ; got ,
      var gct = computeGct(kh, vdtch) * sqrtgrhor;

Diagnostics: Stack trace

During detection of unresolvable race conditions a proper stack trace should be issued to the user.

(partially implemented)

fix var for non void do-methods

Technical Description

Current GTClang-syntax is supporting three methods of writing Do-Methods:

  • Do { ...}
  • void Do {...}
  • void Do() {...}

The preprocessor always modifies versions 1 and 2 to read void Do(). If temporary variables are present that are declared inside the Do-Method the preprocessor is messing up the replacement and only variant 3 is working.

Example

// RUN: %gtclang% %file% -dump-pp

#include "gridtools/clang_dsl.hpp"
using namespace gridtools::clang;

stencil test {
  storage a;
  Do {
    vertical_region(k_start, k_end) {
      var ee = 2;
      a += a[i + 1] + ee;
    }
  }
};

IIR refactoring TODO

  • The fields store the extent and extentRB, starting from the Stage level. Additionally StatementAccessPair and DoMethod compute a similar quantity with computeMaximumExtent(), we should make this later one probably a derived info, pushing the derived info to the level of the stmt access or do method. I am not sure though if they actually produce the same numbers, since the method takes into account block if/else statements... we would need to double check
  • computeEnclosingAccessInterval is another candidate to move into Fields or derived info?
  • currently StencilInstantiation contains an IIR. We discussed it should be the other way around, the current StencilInstantiation could be the DerivedInfo of an IIR. We need first to extract what of the StencilInstatiation really belongs to IIR and what is derived info.
    One more consideration, we also want to have a StencilInstantiation and StencilFunctionInstantiation inheriting from the same base class. Additionally the StencilInstantiation captures global info that is used by multiple levels of the tree. See for example, the Stage is using the maps stored in StencilInstantiation. Therefore I think we need to distinguish between derived info and global info. For example, GlobalVariableAccessIDSet_ is not a derived info, on the contrary it is used by other levels, like the Stage in order to compute derived info. Derived info should be information that is solely computed with the information of the IIR of the current level and children. And list of access ids of global variables is not, it is computed from the input HIR ? So I would propose to :
  • Separate from StencilInstantiation what really belongs to the IIR
  • Create a Context object with all the global maps information (that is not derived info), like GlobalVariableAccessIDSet_ . Additionally it contains methods, and getters like getNameFromAccessID, etc... We can then have StencilContext and StencilFunctionContext inheriting from Context base class. Other levels of the tree and visitors can use a Context.
    Finally create a derived info that contains the derived info from the IIR of the current level and below.
  • In PassStencilSplitter, we return in some methods a container of iir::Stencil. We should create a new IIR object

Greedy Algorithm for StageExtents needs optimization

Technical Descirption

Proper Stage Extent Propagation is not working since one of the patterns assumed to be illegal when designing the algorithm is actually legal: we can have a stage where we encouter a read before write within a stage:

If we had a stencil of this sort:

vertical_region(k_start, k_end){ 
  a = b[k-1];
  c = a + 10;
  b = a + c;
}

we would have an iterative process that requires the reading of b beforehand. The current algorithm breaks here.
With the PR #103 we resolve this issue to maintain legallity but move to a greedier algorithm. We would need the following (functionality is not all in place):

auto readInterval = computeReadAccessInterval(accessID);
        if(readInterval.empty())
            continue;
        Extents fieldExtent = fromFieldExtents;
        fieldExtent.expand(stageExtent);
        for(int j = i - 1; j >= 0; --j) {
          Stage& toStage = *(stencil.getStage(j));
            if(!readInterval.overlaps(toStage.interval()))
              continue;
      ......

1 argument stencil function with no return

Currently it is not possible to parse stencil functions which have 1 argument and no return statements

foo(a);

The reason is this is interpreted as a variable declaration (clang::VarDecl) i.e

foo a;

which shadows the type of a (it is thus not possible to determine if a was a storage without comparing the name to the members of the stencil)

Reduce Redundant Computation on GPU

Technical Description

If compuatation demands it, we increase the local compute domain to ensure correctness.
If we end up in a case where this dependency is split across Multistages, this leads to unnecessary computation if we do GPU calcuation since every Multistage is its own kernel-call. Ideally we would only extend the global compute-domain and ignore the locally increased extent.

Example

#include "gtclang_dsl_defs/gtclang_dsl.hpp"

using namespace gtclang::dsl;
stencil test {

  storage in, out, thrid;
  Do {
    vertical_region(k_start, k_end) {
      out = in[j + 1] + out[k - 1];
      thrid = out[i + 1, k + 1];
    }
  }
};

Generates

Stencil_0
  {
    MultiStage_0 [forward]
    {
      Stage_0
      {
        Do_0 { Start : End }
        {
          out[<no_horizontal_offset>,0] = (in[0,1,0] + out[<no_horizontal_offset>,-1]);
            Write Accesses:
              out : [<no_horizontal_extent>,(0,0)]
            Read Accesses:
              out : [<no_horizontal_extent>,(-1,-1)]
              in : [(0,0),(1,1),(0,0)]

        }
        Extents: [(0,1),(0,0),(0,0)]
      }
    }
    MultiStage_1 [parallel]
    {
      Stage_0
      {
        Do_0 { Start : End }
        {
          thrid[<no_horizontal_offset>,0] = out[1,0,1];
            Write Accesses:
              thrid : [<no_horizontal_extent>,(0,0)]
            Read Accesses:
              out : [(1,1),(0,0),(1,1)]

        }
        Extents: [<no_horizontal_extent>,(0,0)]
      }
    }
  }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.