Giter Club home page Giter Club logo

hiperc's Introduction

High Performance Computing Strategies for Boundary Value Problems

Welcome to the HiPerC repository.

This stub branch (nist-pages) is only used to generate and serve the HiPerC Docs: you're probably looking for main.

hiperc's People

Contributors

actions-user avatar guyer avatar tkphd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

hiperc's Issues

blank result from CUDA

Resulting dataset is blank, giving extremely large wrss. Think about how data transfers are undertaken (addresses and pointers in cudaMemcpy), how the mask is constructed, and look for off-by-one errors.

write final CSV only

CSV output files are plain text, and therefore large. Only write the last one for scrutiny -- PNG is fine for the rest.

trivial BCs in CPU diffusion

Having a constant source along the left wall, only, is a simple testing condition with an analytical solution. However, its solution is trivial, and will not effectively exercise accelerator hardware. Introduce modest complexity.

init and finalize for CUDA

GPU array handling is inefficient: arrays get cudaMallocd and cudaFreed every time a function is called. Improve flow by moving these operations into gpu_init and gpu_finalize functions, called once at the beginning and once at the end of main().

implement better algorithms

  • Comment on expected values of wrss: 0.2% means success, 7% means failure.
  • Replace parallel reduce in weighted residual sum of squares with a coprocessor-compatible algorithm, either prefix sum (parallel scan) or an approximation (calculate wrss, store in an array, then sum the array)
  • Recast the nested loop timestep equation (B[j][i] = A[j][i] + kDC[j][i]) as vector sum (B[n] = A[n] + kDC[n]).
    • This would not be a better algorithm, since boundary values would get updated in addition to the bulk.
  • Call boundaries, convolution, and solve from one function in main, not each separately.

explain what this is for

Per @wd15:
README leaves it unclear who this code benefits, and how it's intended to be used. Is it for a specific software? What's the audience? Why bother?

Warnings when compiling cpu version

make[1]: Entering directory '/home/amj/projects/phasefield-accelerator-benchmarks/cpu/tbb'
g++ -O2 -Wall -pedantic -std=c++11 -I../ -c boundaries.cpp
In file included from /opt/moose/tbb/include/tbb/tbb.h:68:0,
from boundaries.cpp:10:
/opt/moose/tbb/include/tbb/pipeline.h:328:74: warning: ‘template struct std::has_trivial_copy_constructor’ is deprecated [-Wdeprecated-declarations]
template struct tbb_trivially_copyable { enum { value = std::has_trivial_copy_constructor::value }; };
^
In file included from /opt/moose/gcc-5.3.0/include/c++/5.3.0/bits/move.h:57:0,
from /opt/moose/gcc-5.3.0/include/c++/5.3.0/bits/stl_pair.h:59,
from /opt/moose/gcc-5.3.0/include/c++/5.3.0/bits/stl_algobase.h:64,
from /opt/moose/gcc-5.3.0/include/c++/5.3.0/memory:62,
from /opt/moose/tbb/include/tbb/tbb_stddef.h:421,
from /opt/moose/tbb/include/tbb/aligned_space.h:24,
from /opt/moose/tbb/include/tbb/tbb.h:35,
from boundaries.cpp:10:
/opt/moose/gcc-5.3.0/include/c++/5.3.0/type_traits:1389:12: note: declared here
struct has_trivial_copy_constructor
^
g++ -O2 -Wall -pedantic -std=c++11 -I../ -c discretization.cpp
In file included from /opt/moose/tbb/include/tbb/tbb.h:68:0,
from discretization.cpp:10:
/opt/moose/tbb/include/tbb/pipeline.h:328:74: warning: ‘template struct std::has_trivial_copy_constructor’ is deprecated [-Wdeprecated-declarations]
template struct tbb_trivially_copyable { enum { value = std::has_trivial_copy_constructor::value }; };
^
In file included from /opt/moose/gcc-5.3.0/include/c++/5.3.0/bits/move.h:57:0,
from /opt/moose/gcc-5.3.0/include/c++/5.3.0/bits/stl_pair.h:59,
from /opt/moose/gcc-5.3.0/include/c++/5.3.0/bits/stl_algobase.h:64,
from /opt/moose/gcc-5.3.0/include/c++/5.3.0/memory:62,
from /opt/moose/tbb/include/tbb/tbb_stddef.h:421,
from /opt/moose/tbb/include/tbb/aligned_space.h:24,
from /opt/moose/tbb/include/tbb/tbb.h:35,
from discretization.cpp:10:
/opt/moose/gcc-5.3.0/include/c++/5.3.0/type_traits:1389:12: note: declared here
struct has_trivial_copy_constructor
^
g++ -O2 -Wall -pedantic -std=c++11 -I../ -c ../mesh.c
g++ -O2 -Wall -pedantic -std=c++11 -I../ -c ../output.c
g++ -O2 -Wall -pedantic -std=c++11 -I../ -c ../timer.c
g++ -O2 -Wall -pedantic -std=c++11 -I../ boundaries.o discretization.o mesh.o output.o timer.o ../main.c -o diffusion -lm -lpng -ltbb
make[1]: Leaving directory '/home/amj/projects/phasefield-accelerator-benchmarks/cpu/tbb'

highlight subtleties of CUDA code

  • CUDA code looks weird due to thread allocation model
    • Traditional: many memory addresses per core -> looping constructs
    • Accelerator: one memory address per thread -> no loops
  • cached mask
  • input vs. output tiles, indexing
  • linear vs. dimensional array access
  • block size and domain boundaries
  • domain-boundary cells (ghosts) vs. tile-boundary cells (halos)
  • tile size, block size, grid size; local, source, and destination arrays
    • tile sizes are static: each block is the same, ignorant of domain boundaries
    • meaning of cuda_kernel<<<num_blocks, threads_per_block, shared_array_size>>> construct
    • block size, misfit, and GPU utilization
  • take care withint, ceil(), and floor()
  • troubleshooting
    • wrss==0.0029?
    • Segfault? printf array locations and sizes; recompile with -g then use cuda-memtest and backtrace
    • CUDA slower than OpenCL? Enable persistence mode.
  • Makefile flags and function locations: main() should be in a .c file, CUDA functions in .cu, and objects built with -dc flags

inflexible convolution kernel

Convolution hard-codes [1, nx-1] rather than [nm, nx-nm] (for example).

  • Assume square convolution kernels
  • Accept user-defined kernels sizes
  • Dynamically allocate mask using user-specified nm
  • Use nm/2 in place of hard-coded 1 in loops
  • Provide example kernels:
    • 5-point Laplacian
    • 9-point Laplacian

landing page README needs clarification for making

You have the "work in progress" section that tells the reader about the different sections, but no Makefile in the directory. Would be nice if there is some brief description that there are multiple versions in the different directories and they are made separately. I know that's kind of picky, but my first instinct is to look for a makefile in the top level directory...

edit: also, the "Basic Algorithm" section is confusing. A little text telling the reader that you implement the same Basic Algorithm in each of the different directories would clear that up.

gpu Makefile directory for OpenACC is wrong

The makefile has acc, but the actual directory is openacc. No compiling here :(

[amj][~/projects/phasefield-accelerator-benchmarks/gpu]> make
make -C acc
make[1]: *** acc: No such file or directory. Stop.
Makefile:7: recipe for target 'acc/diffusion' failed
make: *** [acc/diffusion] Error 2

Initial conditions / png color scheme are scary

It looks like the initial condition is two high-concentration lines on the edge of the computational domain, and concentration diffuses toward the center (see attached). This IC combined with the black-and-white color scheme made the first output image look like the code had errored. Perhaps choose a different color scheme (viridis?) and maybe consider a different initial condition. Will it serve your purposes to put a blob of mass in the center of the domain and let it diffuse outward?
dat_ic

variable names

Per @amjokisaari:

  • Change from math labels (A, B, C) to matsci labels (oldGrid, newGrid, convGrid)
  • Eliminate dataX arrays

dynamic documentation

Per @amjokisaari:

  • Apply lessons from the Art of README
  • Apply lessons from Code Complete
  • Include terms of use in implementation files
  • Annotate function implementations with brief and extended statements of purpose, inputs, outputs, etc. (Doxygen-compatible syntax)
  • Document flow of data, explain how c gets computed
  • Use Doxygen to convert comments to documentation
  • Bridge Doxygen to Sphinx using Breathe

clarity of mission

  • Focus this effort by writing a mission statement into the top-level README
  • rename the repository to reflect the mission
  • transfer from @tkphd to @usnistgov

add console output during execution

Performing 'make run' gives me some make command info, but then it seems like the code is just hanging (it's not, but my data is being written out ...somewhere... I believe to the subdirectories). It would be nice to:

  1. get additional console output during execution so I know how far along the run is and

  2. the readme in the subdirectory should tell the reader where their output is going.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.