meteoswiss-apn / dawn Goto Github PK
View Code? Open in Web Editor NEWCompiler toolchain to enable generation of high-level DSLs for geophysical fluid dynamics models
License: MIT License
Compiler toolchain to enable generation of high-level DSLs for geophysical fluid dynamics models
License: MIT License
Is this a bug or intended?
https://github.com/MeteoSwiss-APN/dawn/blob/master/src/dawn/Support/SourceLocation.cpp#L25
extern bool operator==(const SourceLocation& a, const SourceLocation& b) {
return a.Line == b.Line && b.Column == b.Column;
}
I guess it should be .. && a.Column == b.Column
.
Currently, we greedily merge multistages until we either hit a synchronisation issue or the halo extent becomes bigger than the specified max-halo. Former is correct but the latter yields to wrong results.
out of
#include "gtclang_dsl_defs/gtclang_dsl.hpp"
using namespace gtclang::dsl;
stencil test {
storage in, out, thrid;
Do {
vertical_region(k_start, k_end) { out = in; }
vertical_region(k_start, k_end) { thrid = out[i + 1]; }
}
};
with the arguments -max-fields=1 -fsplit-stencils
we currently create:
Stencil_0
{
MultiStage_0 [parallel]
{
Stage_0
{
Do_0 { Start : End }
{
out[<no_horizontal_offset>,0] = in[<no_horizontal_offset>,0];
Write Accesses:
out : [<no_horizontal_extent>,(0,0)]
Read Accesses:
in : [<no_horizontal_extent>,(0,0)]
}
Extents: [<no_horizontal_extent>,(0,0)]
}
}
}
Stencil_1
{
MultiStage_0 [parallel]
{
Stage_0
{
Do_0 { Start : End }
{
thrid[<no_horizontal_offset>,0] = out[1,0,0];
Write Accesses:
thrid : [<no_horizontal_extent>,(0,0)]
Read Accesses:
out : [(1,1),(0,0),(0,0)]
}
Extents: [<no_horizontal_extent>,(0,0)]
}
}
}
We would need an apply of boundary conditions between these stencils. Since these are currently disabled, we should not be able to split stencils here in the first place
In the following example:
vertical_region(k_start, k_start) {}
vertical_region(k_start+1, k_end) {
// code
}
the empty interval for k_start is required, otherwise it generates an interval<0,0>
for GT
we should remove that hack in the wwcon (and probably others) once GT solves this issue
cosunae:
With C++-17 structured bindings feature,(supposedly available in clang 4)
https://clang.llvm.org/cxx_status.html
we can improve the semantic of input/output in the function call
auto [u_tens_stage, v_tens_stage] = horizontal_advection::uv( u_stage, v_stage, tgrlatda0,
tgrlatda1);
The following example does not properly compute extents of stages:
stencil hori_diff_stencil {
storage b, a;
var tmp;
Do {
vertical_region(k_start, k_start) {
tmp = a;
}
vertical_region(k_start+1, k_end-1) {
tmp = tmp[k-1];
}
vertical_region(k_end, k_end) {
tmp = tmp[k-1]+a;
}
vertical_region(k_start, k_start) {
b = tmp[i+2];
}
vertical_region(k_start+1, k_end-4) {
b = tmp[k-1,i+2];
}
vertical_region(k_end-3, k_end) {
b = tmp[k-1,i-1]+a;
}
}
};
Stencil_0
{
MultiStage_0 [forward]
{
Stage_0
{
Do_0 { Start : Start }
{
tmp[0, 0, 0] = a[0, 0, 0];
Write Accesses:
tmp : [(0, 0), (0, 0), (0, 0)]
Read Accesses:
a : [(0, 0), (0, 0), (0, 0)]
}
Do_1 { Start+1 : End-1 }
{
tmp[0, 0, 0] = tmp[0, 0, -1];
Write Accesses:
tmp : [(0, 0), (0, 0), (0, 0)]
Read Accesses:
tmp : [(0, 0), (0, 0), (-1, 0)]
}
Extents: [(-1, 0), (0, 0), (-1, 0)]
}
Stage_1
{
Do_0 { Start : Start }
{
b[0, 0, 0] = tmp[2, 0, 0];
Write Accesses:
b : [(0, 0), (0, 0), (0, 0)]
Read Accesses:
tmp : [(0, 2), (0, 0), (0, 0)]
}
Do_1 { Start+1 : End-4 }
{
b[0, 0, 0] = tmp[2, 0, -1];
Write Accesses:
b : [(0, 0), (0, 0), (0, 0)]
Read Accesses:
tmp : [(0, 2), (0, 0), (-1, 0)]
}
Do_2 { End : End }
{
tmp[0, 0, 0] = (tmp[0, 0, -1] + a[0, 0, 0]);
Write Accesses:
tmp : [(0, 0), (0, 0), (0, 0)]
Read Accesses:
a : [(0, 0), (0, 0), (0, 0)]
tmp : [(0, 0), (0, 0), (-1, 0)]
}
Extents: [(-1, 0), (0, 0), (-1, 0)]
}
Stage_2
{
Do_0 { End-3 : End }
{
b[0, 0, 0] = (tmp[-1, 0, -1] + a[0, 0, 0]);
Write Accesses:
b : [(0, 0), (0, 0), (0, 0)]
Read Accesses:
a : [(0, 0), (0, 0), (0, 0)]
tmp : [(-1, 0), (0, 0), (-1, 0)]
}
Extents: [(0, 0), (0, 0), (0, 0)]
}
}
}
The line tmp[i+2];
should request an extent of at least i+2 for the stage computing tmp
During detection of unresolvable race conditions a proper stack trace should be issued to the user.
(partially implemented)
If field conversioning is applied to an input field that has not been written to before, no synchronization happens. This leads to inputdata being completely ignored
stencil test {
storage a;
Do {
vertical_region(k_start, k_end) {
a += a[i + 1];
}
}
};
Output of the field [initialzied with one] is 1 rather than 2.
In order to minimize the amount of options shown by the compiler, we would like to have the convention
-freport-pass-<PassName>
and additionally a -freport-pass-all
. Then all the individual report flags dont need to be shown in the --help
There might be use-cases where the user wants to specify the precision of specific pieces of computation. The idea is to enhance the language with var[double]
or var[float]
to achieve this. We do not want to fall back on double / float as this still needs to go through the checks of local variables vs temporary storages.
The following gtclang code produces a SIR with if blocks (no else).
The absence of an else triggers a crash in
https://github.com/MeteoSwiss-APN/dawn/blob/master/src/dawn/SIR/SIRSerializer.cpp#L292
stencil hori_diff_stencil {
storage u, out, coeff;
var flx, fly, lap;
Do {
vertical_region(k_start, k_end) {
lap = u[i + 1] + u[i - 1] + u[j + 1] + u[j - 1] - 4.0 * u;
flx = lap[i+1] - lap;
if (flx * (u[i+1] - u) > 0)
flx = 0.;
fly = lap[j+1] - lap;
if (fly * (u[j+1] - u) > 0)
fly = 0.;
out = u - coeff * (flx - flx[i-1] + fly - fly[j-1]);
}
}
};
This is a list of pending issues:
In here:
https://github.com/cosunae/dawn/blob/temporary_to_stencil_functions/src/dawn/Optimizer/PassTemporaryToStencilFunction.cpp#L78
this would assert in case of a stmt w/o assignment
field++;
Also in case of blockstmt, if, etc.
Currently new sir::StencilFunction are added to the SIR. We should not modify the SIR, once we refactor the internal IIR
When compiling large files it would be convenient to generate multiple .cpp
files instead of just one.
Reasoning:
Possible problems:
The PassSetSyncStage uses currently a linear traverse of the stages algorithm to detect if we need synchronization. That is correct but sometimes could add more sync than required.
We would like to use an algorithm on a DAG.
for(stage: stages)
if(stage has edge)
add_sync
remove all edges of stage
This is an issue to discuss what mutations,how to perform them on the IIR and what functionality we need to apply them:
More???
In the following gtclang example, where u,v,w, pp, should be versioned, since the output is in the same storage as the input, the field version crashes:
#include "gridtools/clang_dsl.hpp"
using namespace gridtools::clang;
stencil_function avg {
offset off;
storage in;
Do { return 0.5 * (in[off] + in); }
};
stencil_function delta {
offset off;
storage data;
Do { return data[off] - data; }
};
stencil_function laplacian {
storage data, crlato, crlatv;
Do {
return data[i + 1] + data[i - 1] - 2.0 * data +
crlato * delta(j + 1, data) + crlatv * delta(j - 1, data);
}
};
stencil_function diffusive_flux_x {
storage lap, data;
Do {
const double flx = delta(i + 1, lap);
return (flx * delta(i + 1, data)) > 0.0 ? 0.0 : flx;
}
};
stencil_function diffusive_flux_y {
storage lap, data, crlato;
Do {
const double fly = crlato * delta(j + 1, lap);
return (fly * delta(j + 1, data)) > 0.0 ? 0.0 : fly;
}
};
#pragma gtclang no_codegen
stencil type2 {
storage data, crlato, crlatu, hdmask;
var lap;
Do {
vertical_region(k_start, k_end) {
lap = laplacian(data, crlato, crlatu);
const double delta_flux_x = diffusive_flux_x(lap, data) -
diffusive_flux_x(lap[i - 1], data[i - 1]);
const double delta_flux_y =
diffusive_flux_y(lap, data, crlato) -
diffusive_flux_y(lap[j - 1], data[j - 1], crlato[j - 1]);
data = data - hdmask * (delta_flux_x + delta_flux_y);
}
}
};
stencil horizontal_diffusion_type2 {
// output
storage u, v, w, pp;
// input
storage crlato, crlatu, hdmask;
Do {
type2(u, crlato, crlatu, hdmask);
type2(v, crlato, crlatu, hdmask);
type2(w, crlato, crlatu, hdmask);
type2(pp, crlato, crlatu, hdmask);
}
};
(gdb)
(gdb) bt
#0 0x00007ffff6811e70 in std::basic_streambuf<char, std::char_traits >::xsputn(char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007ffff6802ec6 in std::basic_ostream<char, std::char_traits >& std::__ostream_insert<char, std::char_traits >(std::basic_ostream<char, std::char_traits >&, char const*, long) ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x000000000163754e in dawn::DiagnosticsBuilder::operator<< <std::__cxx11::basic_string<char, std::char_traits, std::allocator >&> (this=0x7fffffffc0a0,
value=<error: Cannot access memory at address 0x6d68636e65622f77>) at /code/dawn/src/dawn/Compiler/DiagnosticsMessage.h:90
#3 0x0000000001657e69 in dawn::(anonymous namespace)::reportRaceCondition (statement=..., instantiation=0x22a49c0) at /code/dawn/src/dawn/Optimizer/PassFieldVersioning.cpp:84
#4 0x0000000001658cd2 in dawn::PassFieldVersioning::fixRaceCondition (this=0x22a5c80, graph=0x22a5e00, stencil=..., doMethod=..., loopOrder=dawn::LoopOrderKind::LK_Forward, stageIdx=3, index=2)
at /code/dawn/src/dawn/Optimizer/PassFieldVersioning.cpp:237
#5 0x000000000165833e in dawn::PassFieldVersioning::run (this=0x22a5c80, stencilInstantiation=0x22a49c0) at /code/dawn/src/dawn/Optimizer/PassFieldVersioning.cpp:129
#6 0x00000000016732be in dawn::PassManager::runPassOnStecilInstantiation (this=0x22a5dc8, instantiation=0x22a49c0, pass=0x22a5c80) at /code/dawn/src/dawn/Optimizer/PassManager.cpp:47
#7 0x000000000167314f in dawn::PassManager::runAllPassesOnStecilInstantiation (this=0x22a5dc8, instantiation=0x22a49c0) at /code/dawn/src/dawn/Optimizer/PassManager.cpp:36
#8 0x00000000016336af in dawn::DawnCompiler::runOptimizer (this=0x7fffffffce90, SIR=0x22a3da0) at /code/dawn/src/dawn/Compiler/DawnCompiler.cpp:167
#9 0x0000000001633bc7 in dawn::DawnCompiler::compile (this=0x7fffffffce90, SIR=0x22a3da0, codeGen=dawn::DawnCompiler::CG_GTClangNaiveCXX) at /code/dawn/src/dawn/Compiler/DawnCompiler.cpp:192
#10 0x0000000000a4e47e in gtclang::GTClangASTConsumer::HandleTranslationUnit (this=0x1e2e410, ASTContext=...) at /code/gtclang/src/gtclang/Frontend/GTClangASTConsumer.cpp:143
#11 0x000000000141cc2a in clang::ParseAST(clang::Sema&, bool, bool) ()
#12 0x0000000000e2f66e in clang::FrontendAction::Execute() ()
#13 0x0000000000e05146 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) ()
#14 0x0000000000a16671 in gtclang::Driver::run (args=...) at /code/gtclang/src/gtclang/Driver/Driver.cpp:69
#15 0x0000000000a15a3d in main (argc=5, argv=0x7fffffffdac8) at /code/gtclang/src/gtclang/Driver/gtclang.cpp:21
Currently, all the fields of any given stencil are snychronized before every MSS. See here This could be reduced to only include the ones that are needed
Here we could demote the temp field to a temp variable even if used in function,
as far as it is not used with an extent
currently we codegen clang::gridtools::float_type what can be specified at runtime. If precision is specified, it should be code-generated as such
The following example will do versioning of tmp, which will lead to wrong result
stencil stencil {
storage a, tmp, c;
Do {
vertical_region(k_end, k_end)
tmp=0;
vertical_region(k_end-1, k_start) {
a = tmp[k+1];
tmp = a;
c = tmp;
}
}
};
struct stage_0_0 {
using c = gridtools::accessor<0, gridtools::enumtype::inout, gridtools::extent<0, 0, 0, 0, 0, 0>>;
using tmp = gridtools::accessor<1, gridtools::enumtype::inout, gridtools::extent<0, 0, 0, 0, 0, 0>>;
using a = gridtools::accessor<2, gridtools::enumtype::inout, gridtools::extent<0, 0, 0, 0, 0, 0>>;
using tmp_1 = gridtools::accessor<3, gridtools::enumtype::inout, gridtools::extent<0, 0, 0, 0, 0, 1>>;
using arg_list = boost::mpl::vector<c, tmp, a, tmp_1>;
template <typename Evaluation>
GT_FUNCTION static void Do(Evaluation& eval, interval_end_0_end_0) {
eval(tmp_1(0, 0, 0)) = (int)0;
}
template <typename Evaluation>
GT_FUNCTION static void Do(Evaluation& eval, interval_start_0_end_minus_1) {
eval(a(0, 0, 0)) = eval(tmp_1(0, 0, 1));
eval(tmp(0, 0, 0)) = eval(a(0, 0, 0));
eval(c(0, 0, 0)) = eval(tmp(0, 0, 0));
}
};
I guess tmp should not be versioned in this case
Proper Stage Extent Propagation is not working since one of the patterns assumed to be illegal when designing the algorithm is actually legal: we can have a stage where we encouter a read before write within a stage:
If we had a stencil of this sort:
vertical_region(k_start, k_end){
a = b[k-1];
c = a + 10;
b = a + c;
}
we would have an iterative process that requires the reading of b beforehand. The current algorithm breaks here.
With the PR #103 we resolve this issue to maintain legallity but move to a greedier algorithm. We would need the following (functionality is not all in place):
auto readInterval = computeReadAccessInterval(accessID);
if(readInterval.empty())
continue;
Extents fieldExtent = fromFieldExtents;
fieldExtent.expand(stageExtent);
for(int j = i - 1; j >= 0; --j) {
Stage& toStage = *(stencil.getStage(j));
if(!readInterval.overlaps(toStage.interval()))
continue;
......
Currently we merge the field-extents with the stage-extents and add those up for the static asserts. This is too greedy since we're not considering the do-method intervals for the computation of halos in the k-dimension. See example here: We get the assert that we need a halo of 2 despite this is coming form the start_start interval:
stencil foo {
storage a, b, c;
storage out;
Do {
vertical_region(k_start, k_start) {
a = b;
}
vertical_region(k_start + 1, k_end) {
c = b;
}
vertical_region(k_start, k_start) {
out = c + a[k + 2, j + 1];
}
}
};
After the update to the new Gritdools version, stencils require BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS
to be set for unknown reasons. For example, horizontal_diffusion_limiter breaks
Currently it is not possible to parse stencil functions which have 1 argument and no return statements
foo(a);
The reason is this is interpreted as a variable declaration (clang::VarDecl
) i.e
foo a;
which shadows the type of a
(it is thus not possible to determine if a
was a storage without comparing the name to the members of the stencil)
The global member of
https://github.com/MeteoSwiss-APN/gtclang/blob/master/test/integration-test/CodeGen/globals_stencil.cpp#L23
is generated as int
instead of double
since the type is taken from the literal (2)
The line here can be reached with a stencil of the shape
stencil foo {
storage a, b, c;
Do {
vertical_region(k_start, k_start) {
a = b[k + 2];
}
vertical_region(k_start + 1, k_end) {
c = a;
b = a;
}
}
};
I don't think this code should be considered illegal and the assert should not be asserting but reolving the problem. Am I missing something?
Current GTClang-syntax is supporting three methods of writing Do-Methods:
Do { ...}
void Do {...}
void Do() {...}
The preprocessor always modifies versions 1 and 2 to read void Do()
. If temporary variables are present that are declared inside the Do-Method the preprocessor is messing up the replacement and only variant 3 is working.
// RUN: %gtclang% %file% -dump-pp
#include "gridtools/clang_dsl.hpp"
using namespace gridtools::clang;
stencil test {
storage a;
Do {
vertical_region(k_start, k_end) {
var ee = 2;
a += a[i + 1] + ee;
}
}
};
When there is a block like if, or {} , sir fails to write
vertical_region(k_start, k_end) {
out = u[i+1];
out += u[j-1];
if(out==1) {
// out *= u[k+2]+2.4;
// out -= u[k-1];
out=2;
}
}
computeMaximumExtent()
, we should make this later one probably a derived info, pushing the derived info to the level of the stmt access or do method. I am not sure though if they actually produce the same numbers, since the method takes into account block if/else statements... we would need to double checkFields
or derived info?GlobalVariableAccessIDSet_
is not a derived info, on the contrary it is used by other levels, like the Stage in order to compute derived info. Derived info should be information that is solely computed with the information of the IIR of the current level and children. And list of access ids of global variables is not, it is computed from the input HIR ? So I would propose to :Context
object with all the global maps information (that is not derived info), like GlobalVariableAccessIDSet_ . Additionally it contains methods, and getters like getNameFromAccessID, etc... We can then have StencilContext and StencilFunctionContext inheriting from Context base class. Other levels of the tree and visitors can use a Context.The most recent implementation of more advanced k-caches (coming with PR #158) mixed the algorithmic part of determining caches with the codegen. This part should be moved to the caching-pass.
We've decided to do this separately to not clutter up the PR too much
It is currently not possible for the user to provide its own typedef for the backend or temporary_storage. This could be solved by templating the stencil wrapper (on the backend and/or temporary storage type) or providing a macro.
The trailing ; are not removed from stencil functions in code-gen leaving bad looking code
The following state of IIR should be illegal:
+=
on a field (not temporary) in a stage with non-null extents
stage0 <extent<0,1> > {
u += 3.14;
}
stage1 {
res = sum(i+1, u);
}
A possible solution would be to run a second pass of the field versioning, after the extent of the stages has been computed.
[xUnit] [INFO] - [GoogleTest-1.6] - 25 test report file(s) were found with the pattern '**/*.xml' relative to '/scratch/snx3000/jenkins/workspace/dawn/build_type/debug/label/daint' for the testing framework 'GoogleTest-1.6'.
[xUnit] [ERROR] - Test reports were found but not all of them are new. Did all the tests run?.
Currently, the kcaches are identified based on non pointwise accesses in the vertical
and the interval where the field is accessed.
Sometimes the field is accessed with, for example extent [0,1] in interval1, but with pointwise access in interval2.
Still the kcache is used and generated for interval2, even if in that case does not make sense.
Example from vertical_diffusion_T:
// Center fill of kcaches
if (iblock >= 0 && iblock <= block_size_i - 1 + 0 && jblock >= 0 && jblock <= block_size_j - 1 + 0) {
pia_kcache[0] = gridtools::clang::math::pow(
((gridtools::clang::float_type)1.0000000000000001E-5 * __ldg(&(p_s[idx110]))),
((gridtools::clang::float_type)287.05000000000001 / (gridtools::clang::float_type)1005));
}
// Flush of kcaches
if (iblock >= 0 && iblock <= block_size_i - 1 + 0 && jblock >= 0 && jblock <= block_size_j - 1 + 0) {
if (ksize - 1 + 1 - k >= 1) {
pia[idx111 + stride_111_2 * 1] = pia_kcache[1];
}
}
In the following example:
u tmp is not initialized in many vertical regions. We would need a protection issuing an error
stencil compute_extent_test_stencil {
storage in, out1, out2;
var u;
Do {
vertical_region(k_start+2, k_start+3)
u = in;
vertical_region(k_start+2,k_start+10)
out1 = u[k+2] + u[k+1];
vertical_region(k_start,k_start+8)
out2 = u[k+6];
}
};
dump() will crash in vertical_diffusion_dqvdt.cpp
because the getNameFromAccessID() crashes (accessID is not registered:
for the local variable increment (below as NOTFOUND)
Stencil_0
{
MultiStage_0 [parallel]
{
Stage_0
{
Do_0 { Start : End }
{
float_type increment = ((fun-call:gridtools::clang::math::max(0, QV[0, 0, 0]) - QV_nnow[0, 0, 0]) * zdtr);
Write Accesses:
__local_increment_16 : [(0, 0), (0, 0), (0, 0)]
Read Accesses:
QV_nnow : [(0, 0), (0, 0), (0, 0)]
zdtr : [(0, 0), (0, 0), (0, 0)]
QV : [(0, 0), (0, 0), (0, 0)]
0 : [(0, 0), (0, 0), (0, 0)]
if((true && (1 > 0)))
{
dqvdt[0, 0, 0] += increment;
}
else
{
dqvdt[0, 0, 0] = increment;
}
Write Accesses:
dqvdt : [(0, 0), (0, 0), (0, 0)]
Read Accesses:
__local_increment_16 : [(0, 0), (0, 0), (0, 0)]
NOTFOUND : [(0, 0), (0, 0), (0, 0)]
NOTFOUND : [(0, 0), (0, 0), (0, 0)]
0 : [(0, 0), (0, 0), (0, 0)]
dqvdt : [(0, 0), (0, 0), (0, 0)]
}
Extents: [(0, 0), (0, 0), (0, 0)]
}
}
}
Currently there is no way to loop over a StencilInstation's Stage directly. The only way to access all stages is either a nested loop over all MultiStages and their Stages or a loop over StageIDX and finding them (which implements 1). An iterator over all the Stages that would find the next MS and go to it's stages could be implemented to be more efficient
If we want to promote local variables to temporaries and their first occurence is in an if-stmt, the pass breaks:
stencil Stencil {
storage out;
var temporary;
Do {
vertical_region(k_start, k_end) {
if(false) {
temporary = 0;
}
out = temporary;
}
}
};
This should be addressed in the checks in
dawn/src/dawn/IIR/StencilInstantiation.cpp
Line 384 in 1295420
When a user in/out storage is used within a single mss as in/out, there will be false sharing that will deteriorate the performance. We should version it
Example is lap in hori_diff_stencil_01, although in this case we should check that the storage is not read before write, so no need to version it
Currenty we do not have halos in K as this would require new types of fields. In oder to have production-code, we extend the domain by one. This leads to a minor change in static asserts for the storages that needs to be removed once ESCAPE solves the frontend-language problem
The following example is not being kcache in
vertical_region(k_start, k_end-1) {
in = 3.5;
out = in[i+1] + in;
}
because the condition here does not pass (not temporary and output)
https://github.com/MeteoSwiss-APN/dawn/blob/master/src/dawn/Optimizer/PassSetCaches.cpp#L231
If you call
stencil_function footest {
storage a, b;
var c;
Do {
var d;
b = a[i + 1];
c = b[i + 1];
d = a;
b = d + a;
}
};
stencil Test04 {
storage a, b;
var bar;
void Do() {
vertical_region(k_start, k_end) {
bar = 20;
footest(a, b, bar);
}
}
};
you get a compiletime assertion as the promotion fails
Add support defining stencil functions in header files (currently we only parse stencil functions in the main file of the translation unit)
The fill and flush strategy does not fully care about ranges so far. So if on a certain range we need flush (that is not an EP-Flush) but we only wrote into a part of the filed while it was cached, we flush uninitialized data back to main memory since flushes span the full domain.
To prevent this there are two options:
If compuatation demands it, we increase the local compute domain to ensure correctness.
If we end up in a case where this dependency is split across Multistages, this leads to unnecessary computation if we do GPU calcuation since every Multistage is its own kernel-call. Ideally we would only extend the global compute-domain and ignore the locally increased extent.
#include "gtclang_dsl_defs/gtclang_dsl.hpp"
using namespace gtclang::dsl;
stencil test {
storage in, out, thrid;
Do {
vertical_region(k_start, k_end) {
out = in[j + 1] + out[k - 1];
thrid = out[i + 1, k + 1];
}
}
};
Generates
Stencil_0
{
MultiStage_0 [forward]
{
Stage_0
{
Do_0 { Start : End }
{
out[<no_horizontal_offset>,0] = (in[0,1,0] + out[<no_horizontal_offset>,-1]);
Write Accesses:
out : [<no_horizontal_extent>,(0,0)]
Read Accesses:
out : [<no_horizontal_extent>,(-1,-1)]
in : [(0,0),(1,1),(0,0)]
}
Extents: [(0,1),(0,0),(0,0)]
}
}
MultiStage_1 [parallel]
{
Stage_0
{
Do_0 { Start : End }
{
thrid[<no_horizontal_offset>,0] = out[1,0,1];
Write Accesses:
thrid : [<no_horizontal_extent>,(0,0)]
Read Accesses:
out : [(1,1),(0,0),(1,1)]
}
Extents: [<no_horizontal_extent>,(0,0)]
}
}
}
We have the full extent information of any storages used in stencils. In order to avoid (hard to read) Gridtools-runtime errors, we could add static asserts to the constructor of the class holding the stencils
parsing the following
double sqrtgrhor;
storage kh, vdtch;
var gct = computeGct(kh, vdtch) * sqrtgrhor;
where computeGct is a stencil function I get the following error
../src/dycore/vertical_diffusion_T.cpp:176:7: error: only single declarations are currently supported: expected ; got ,
var gct = computeGct(kh, vdtch) * sqrtgrhor;
Error triggering from horizontal_advection.
THe problem was fixed with this hack
The problem is that globals with constexpr value are replaced by literals but not inserted in the StencilFUnctionInstantiaion (if used there), only in StencilInstantiation
might be related to #88
Unary operators are not allowed in gtclang if applied on storages but for temporaries (built-in types) it works fine.
#include "gridtools/clang_dsl.hpp"
using namespace gridtools::clang;
stencil globals_stencil {
storage out;
Do {
vertical_region(k_start, k_end) {
out[i, j, k]++;
}
}
};
in code-gen, the last letter of the closing comments for namespaces is dropped
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.