Light

usnistgov / hiperc Goto Github PK

High Performance Computing Strategies for Boundary Value Problems

Home Page: https://pages.nist.gov/hiperc/en/latest/index.html

HTML 93.21% CSS 3.16% JavaScript 3.63%

phase-field gpgpu xeon-phi materials-science computational-science cuda openacc diffusion-equation scientific-computing diffusion gpu gpu-computing finite-difference shared-memory-parallel

hiperc's Introduction

High Performance Computing Strategies for Boundary Value Problems

Welcome to the HiPerC repository.

This stub branch (nist-pages) is only used to generate and serve the HiPerC Docs: you're probably looking for main.

hiperc's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger wme7 pdrersin denconst arunbaskaran tblattner bkmgit ozkibr

hiperc's Issues

nth does not limit thread count in TBB diffusion

use lambdas in TBB diffusion

require C++11
replace classes with lambda functions
substitute .cpp for .c extensions on affected sources

Resulting dataset is blank, giving extremely large wrss. Think about how data transfers are undertaken (addresses and pointers in cudaMemcpy), how the mask is constructed, and look for off-by-one errors.

discuss OpenMP+X in GPU README

missing GPU diffusion code

missing PHI ripening code

Implement using

Knights Landing

OpenACC is slow

CUDA is 2× faster

automate scaling studies

data generation
visualization

write final CSV only

CSV output files are plain text, and therefore large. Only write the last one for scrutiny -- PNG is fine for the rest.

discuss vectorAdd in README

OpenMP targets GPUs

http://on-demand.gputechconf.com/gtc/2016/presentation/s6510-jeff-larkin-targeting-gpus-openmp.pdf

trivial BCs in CPU diffusion

Having a constant source along the left wall, only, is a simple testing condition with an analytical solution. However, its solution is trivial, and will not effectively exercise accelerator hardware. Introduce modest complexity.

missing PHI diffusion code

Implement using

Knights Landing

init and finalize for CUDA

GPU array handling is inefficient: arrays get cudaMallocd and cudaFreed every time a function is called. Improve flow by moving these operations into gpu_init and gpu_finalize functions, called once at the beginning and once at the end of main().

missing GPU ripening code

Implement using

OpenACC
OpenCL
CUDA

GPU monitoring tools

nvidia-smi
intel_gpu_top
radeontop

make parameters file human-readable

implement better algorithms

Comment on expected values of wrss: 0.2% means success, 7% means failure.
Replace parallel reduce in weighted residual sum of squares with a coprocessor-compatible algorithm, either prefix sum (parallel scan) or an approximation (calculate wrss, store in an array, then sum the array)
~~Recast the nested loop timestep equation (B[j][i] = A[j][i] + kDC[j][i]) as vector sum (B[n] = A[n] + kDC[n]).~~
- This would not be a better algorithm, since boundary values would get updated in addition to the bulk.
Call boundaries, convolution, and solve from one function in main, not each separately.

explain what this is for

Per @wd15:
README leaves it unclear who this code benefits, and how it's intended to be used. Is it for a specific software? What's the audience? Why bother?

Warnings when compiling cpu version

make[1]: Entering directory '/home/amj/projects/phasefield-accelerator-benchmarks/cpu/tbb'
g++ -O2 -Wall -pedantic -std=c++11 -I../ -c boundaries.cpp
In file included from /opt/moose/tbb/include/tbb/tbb.h:68:0,
from boundaries.cpp:10:
/opt/moose/tbb/include/tbb/pipeline.h:328:74: warning: ‘template struct std::has_trivial_copy_constructor’ is deprecated [-Wdeprecated-declarations]
template struct tbb_trivially_copyable { enum { value = std::has_trivial_copy_constructor::value }; };
^
In file included from /opt/moose/gcc-5.3.0/include/c++/5.3.0/bits/move.h:57:0,
from /opt/moose/gcc-5.3.0/include/c++/5.3.0/bits/stl_pair.h:59,
from /opt/moose/gcc-5.3.0/include/c++/5.3.0/bits/stl_algobase.h:64,
from /opt/moose/gcc-5.3.0/include/c++/5.3.0/memory:62,
from /opt/moose/tbb/include/tbb/tbb_stddef.h:421,
from /opt/moose/tbb/include/tbb/aligned_space.h:24,
from /opt/moose/tbb/include/tbb/tbb.h:35,
from boundaries.cpp:10:
/opt/moose/gcc-5.3.0/include/c++/5.3.0/type_traits:1389:12: note: declared here
struct has_trivial_copy_constructor
^
g++ -O2 -Wall -pedantic -std=c++11 -I../ -c discretization.cpp
In file included from /opt/moose/tbb/include/tbb/tbb.h:68:0,
from discretization.cpp:10:
/opt/moose/tbb/include/tbb/pipeline.h:328:74: warning: ‘template struct std::has_trivial_copy_constructor’ is deprecated [-Wdeprecated-declarations]
template struct tbb_trivially_copyable { enum { value = std::has_trivial_copy_constructor::value }; };
^
In file included from /opt/moose/gcc-5.3.0/include/c++/5.3.0/bits/move.h:57:0,
from /opt/moose/gcc-5.3.0/include/c++/5.3.0/bits/stl_pair.h:59,
from /opt/moose/gcc-5.3.0/include/c++/5.3.0/bits/stl_algobase.h:64,
from /opt/moose/gcc-5.3.0/include/c++/5.3.0/memory:62,
from /opt/moose/tbb/include/tbb/tbb_stddef.h:421,
from /opt/moose/tbb/include/tbb/aligned_space.h:24,
from /opt/moose/tbb/include/tbb/tbb.h:35,
from discretization.cpp:10:
/opt/moose/gcc-5.3.0/include/c++/5.3.0/type_traits:1389:12: note: declared here
struct has_trivial_copy_constructor
^
g++ -O2 -Wall -pedantic -std=c++11 -I../ -c ../mesh.c
g++ -O2 -Wall -pedantic -std=c++11 -I../ -c ../output.c
g++ -O2 -Wall -pedantic -std=c++11 -I../ -c ../timer.c
g++ -O2 -Wall -pedantic -std=c++11 -I../ boundaries.o discretization.o mesh.o output.o timer.o ../main.c -o diffusion -lm -lpng -ltbb
make[1]: Leaving directory '/home/amj/projects/phasefield-accelerator-benchmarks/cpu/tbb'

create website for results and discussion

column headers in runlog.csv need to be described

...in the readme, in a manual, something (please don't make me dig through the source...)

dual accelerator support

Some (many?) GPUs are, in fact, two GPUs stuck together. Figure out how to use both.

missing CPU diffusion code

Implement using

serial
OpenMP
TBB

highlight subtleties of CUDA code

missing ACC spinodal code

missing GPU diffusion code

Implement using:

document structure and usage

inflexible convolution kernel

Convolution hard-codes [1, nx-1] rather than [nm, nx-nm] (for example).

Assume square convolution kernels
Accept user-defined kernels sizes
Dynamically allocate mask using user-specified nm
Use nm/2 in place of hard-coded 1 in loops
Provide example kernels:
- 5-point Laplacian
- 9-point Laplacian

missing GPU spinodal code

Implement using
~~- [ ] OpenACC~~
~~- [ ] OpenCL~~

CUDA

nine-point stencil in README

Per @amjokisaari:

Create table showing 5-point and 9-point stencil
Explain how the sum of coefficients sums to zero, if dx=dy=h

Need documentation on what results to compare

and how!

landing page README needs clarification for making

You have the "work in progress" section that tells the reader about the different sections, but no Makefile in the directory. Would be nice if there is some brief description that there are multiple versions in the different directories and they are made separately. I know that's kind of picky, but my first instinct is to look for a makefile in the top level directory...

edit: also, the "Basic Algorithm" section is confusing. A little text telling the reader that you implement the same Basic Algorithm in each of the different directories would clear that up.

substitute typedef for double

gpu Makefile directory for OpenACC is wrong

The makefile has acc, but the actual directory is openacc. No compiling here :(

[amj][~/projects/phasefield-accelerator-benchmarks/gpu]> make
make -C acc
make[1]: *** acc: No such file or directory. Stop.
Makefile:7: recipe for target 'acc/diffusion' failed
make: *** [acc/diffusion] Error 2

Initial conditions / png color scheme are scary

It looks like the initial condition is two high-concentration lines on the edge of the computational domain, and concentration diffuses toward the center (see attached). This IC combined with the black-and-white color scheme made the first output image look like the code had errored. Perhaps choose a different color scheme (viridis?) and maybe consider a different initial condition. Will it serve your purposes to put a blob of mass in the center of the domain and let it diffuse outward?

compare results between libraries and architectures

variable names

Per @amjokisaari:

Change from math labels (A, B, C) to matsci labels (oldGrid, newGrid, convGrid)
Eliminate dataX arrays

dynamic documentation

Per @amjokisaari:

Apply lessons from the Art of README
Apply lessons from Code Complete
Include terms of use in implementation files
Annotate function implementations with brief and extended statements of purpose, inputs, outputs, etc. (Doxygen-compatible syntax)
Document flow of data, explain how c gets computed
Use Doxygen to convert comments to documentation
Bridge Doxygen to Sphinx using Breathe

missing ACC ripening code

timestep within GPU

parallelize CPU diffusion

missing CPU ripening code

Implement using

serial
OpenMP
TBB

clarity of mission

Focus this effort by writing a mission statement into the top-level README
rename the repository to reflect the mission
transfer from @tkphd to @usnistgov

missing TBB diffusion code

missing CPU spinodal code

Implement for
~~- [ ] serial~~

OpenMP
~~- [ ] TBB~~

missing PHI spinodal code

Implement using

Knights Landing

summarize usage in pseudocode

add console output during execution

Performing 'make run' gives me some make command info, but then it seems like the code is just hanging (it's not, but my data is being written out ...somewhere... I believe to the subdirectories). It would be nice to:

get additional console output during execution so I know how far along the run is and
the readme in the subdirectory should tell the reader where their output is going.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.