Giter Club home page Giter Club logo

openpmd-standard's Introduction

The openPMD Standard

TL;DR

Technical files of the openPMD standard.

Introduction

The openPMD standard, short for open standard for particle-mesh data files is not a file format per se. It is a standard for meta data and naming schemes.

openPMD provides naming and attribute conventions that allow to exchange particle and mesh based data from scientific simulations and experiments. The primary goals are to define

  • a minimal set/kernel of meta information

that allows to share and exchange data to achieve

  • portability between various applications and differing algorithms
  • a unified open-access description for scientific data (publishing and archiving)
  • a unified description for post-processing, visualization and analysis.

openPMD suits for any kind of hierarchical, self-describing data format, such as, but not limited to

Motivation

Open, hierarchical, machine-independent, self-describing (binary) data formats are available for a while now. Nevertheless, without a certain agreement for a domain of applications, standard tasks like automated data processing and import/export do not come for free.

This standard tries to bridge the gap between the common "blob of data" and the algorithms, methods and/or schemes that created these.

Users or "Why should I care?"

If output from programs, devices (such as cameras), simulations or post-processed data-sets contain a minimal set of meta information as provided by openPMD, you can exchange data between those with minimal effort and you use the same tools for visualization.

Furthermore, since openPMD is not a file format but just an object-oriented markup and meta data naming convention you can still use the large variety of tools that come with the intrinsic data format that you chose to use (e.g., HDF5 or ADIOS BP). Of course you are completely free to use your favorite software (open source or proprietary) to create or process your files.

If the software you are using is not yet able to read/write the information needed to fulfill the openPMD standard, please talk to your software developers and point them to these documents: further adoptions of the current standard and contributions for the design of upcoming versions are very welcome!

License

The content of this standard is provided under the CC-BY 4.0 license (see list of authors) and auxiliary software, if not stated otherwise, under the ISC license.

For more details, see the contributions page.

openpmd-standard's People

Contributors

ax3l avatar c0nsultant avatar hightower8083 avatar remilehe avatar skuschel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openpmd-standard's Issues

Particles: Neighbor lists

On 02.04.2015 17:38, Huebl, Axel wrote:

Hi Yaser,

let us stay connected so we can test and add an image of yours that is
represented via "particles".

That would be a wonderful check and example. Tensor properties welcome! :D

Thanks,
Axel

P.S.: On the long run, we should exchange how/if you want to store things like neighbor lists.

I was thinking about putting two records per particle species: one for the length (or start offset) of the neighbor list per particle and one with the actual, concatenated lists.

ED-PIC: Start of in-cell position

For general meshes, gridSpacing defines the size of each element on a mesh but not where it starts. Nevertheless, the start and end of the mesh node can be derived from gridGlobalOffset.

ED-PIC: The problem occurs for the definition of globalCellId & (in-cell)position. One should either define the offset/start of the cell again so it's independent or force it to be the same as the E or B offset (which would be cumbersome if they are not stored).

ED-PIC: Particle-Target (e.g., Ionization)

ED-PIC extension

It might be convenient to add an attribute pointing to an electron species shall be used as a "target" for newly created free electrons from ionization methods. Other methods might need similar pointers (e.g., boundary conditions that create particles, etc.)

Particle Attributes: Scaling with Weighting

The property momentum (for the particles) is not very well defined in the standard, in the sense that . Is it $m\gamma v$ (in units kg.m^s-1) or $\gamma \beta$ (dimensionless momentum) ?
(Notice that this not only a problem of units, since $m$ will vary from one particle to another.)

The first solution is maybe the most conventional, but will typically result in numbers of the order 1e-20 (due to the low mass of the electrons) while the second solution will result in numbers of the order of 1.

One solution of course would be to put a conversion attribute (like momentumUnit), but I am somewhat against this, since it will add yet another attribute to implement in a PIC code when doing the output. (We have to keep in mind that the simpler the standard is, the more likely other PIC codes will adopt it.)

So I guess we should pick one either the $m\gamma v$ or $\gamma \beta$, and write it explicitly in the STANDARD. What is your opinion ?

clarify format of particle positions

I am confused where exactly particle properties are saved. In the ED PIC Extension it sais ( https://github.com/ComputationalRadiationPhysics/openPMD/blob/draft/EXT_ED-PIC.md#additional-records-per-particle-species ) for the momenta:
/particles/<species name>/momentum/x
/particles/<species name>/momentum/y
/particles/<species name>/momentum/z
would all return an array of floats. But for the positions:
particles/<species name>/position would return an array of vectors. Then you have to look into https://github.com/ComputationalRadiationPhysics/openPMD/blob/draft/STANDARD.md#naming-conventions to understand that vectors are saved the exact same way. It should be clarified, that
/particles/<species name>/position/x
/particles/<species name>/position/y
/particles/<species name>/position/z
is meant with that. I think its confusing, that the definitions in the EXT_ED-PIC differ, but actually mean the same (which is good to keep it simple).

particlePatches: patchID -> numParticlesOffset

The patchID in particlePatches is actually confusing and might imply sorting that would be necesary to be clarified.

We should therefore remove it and rather write the 1D offset where the patch begins in the global array. That makes the object easier to understand and less error prone for implementations.

componentOrder

I am a little bit confused by the componentOrder attribute in the STANDARD.md. I know we probably discussed earlier, but unfortunately I do not remember why it is useful.

Actually, I am wondering what the meaning of a "natural order" is here. Is that from the point of view of how the HDF5 file stores the information on the disk, or from the point of view of how the the PIC codes stores it in memory before writing it ?
Also, how would one use this information when reading/writing an OpenPMD file ? It seems that, since the components are labeled in the file ('x', 'y', 'z' or 'r', 't', 'z'), one could access them by label, without having to deal with the order.

Since it is a required attribute, I think there should be a strong rationale for having this attribute. I suggest that we either remove this attribute if there is no such rationale, or that we add more explanation in STANDARD.md if there is one.

Irregular (Cartesian) Grids

From Jean-Luc Vay on Thursday 26th, 2015:

  • gridSpacing: for support of irregular logically cartesian grid, may
    add (now or in revision) support for N*(n-1) floats where n is the
    number of nodes in a given dimension (e.g. nx, ny or nz)

New Attribute `componentOrder`: Order of Components

For additional records such as the particleGrouping (bad name, it collides with the definition of group) particlePatches the order of the components is relevant.

We should generally add an attribute for each record that is not a scalar, that tells the order of components (and hopefully makes their individual naming irrelevant).

  • type: (string)
  • examples:
    • x;y;z
    • r;z

simplify definition of particle positions

Another related question to my last one: Why would you want to have the definition of postition depending on the presence of another variable (globalCellId)? That is confusing and not easy to use. A Standard should be as simple as possible and ideally we want it in a way, that you can either use it or not. But this adds the options to use it incorrectly without noticing, because you get the basics running but do not implement special cases. This is complicated and dangerous!
My suggestion:

  • the particle property position must save the position in the lab frame of reference. Never something else. That would also define what position to save in case of a moving window, boosted frame,....
  • if you need a record to save the offset position relative to the position of the cell identified by the cell ID globalCellID use some other record (maybe position_cell ?) instead.
  • if you need a record to save the offset position relative to the simulation box, use some other record (maybe position_box ?) instead.

I think, that one record should save alway the same thing in order to avoid confusion. (Like Linux tools: Do one thing, but do it well). I know that this may result in various particle records which will all save something related to the particle position. But thats not a drawback -- thats a good thing, because the standard defines, how to calculate position_cell if only position is given and vice versa. So there will be one tool to do the job (the reader) regardless what pic code wrote the files and what was the easiest for the pic code to write. This way it might even become an argument for other pic codes to use this standard. What do you think?

gridSpacing and gridUnitSI

For cylindrical and spherical coordinates, the unit of the gridSpacing must be set per component.

We could write an array, so a unit for each component.

Allow flexible `basePath`

In future versions, a flexible basePath should be allowed.

For that we have to provide easy-to-user helper functions for both fileBased and groupBased files.

Rename version Attribute

version to openPMD or adding an other attribute that allows general readers to detect the standard

Strings: Variable Length?

Recently, I set to fixed-length strings due to this h5py note.

It looks like implementing and accessing variable-length strings is not so hard (ComputationalRadiationPhysics/libSplash#167) but using them is nearly the same complexity for users (pretty easy).

It might be possible to go back to VL such as str (Python) instead of np.string_, but probably performance and compatibility (and the latter one is definitely not optional) might be harmed for variable length strings. Due to that, fixed length (np.string_) is probably still the best option.

allowing both is also an option, but increases complexity for all file readers without a huge benefit.

Open Questions

  • check ADIOS string support, check performance implications (-> only fixed len)
  • minimal check for IDL/Matlab and the like if the understand variable length strings (-> rather not)

Move `longName` to general Standard

Some attributes such as longName (recommended for particles in the ED-PIC extension) are so general, that they could be moved back to the general standard.

Keyword for dt

OpenPMD doenst specify yet the keywords for grid spacing and timestep. PIConGPU calls it delta_t for time, but cell_width and cell_height for the grid. That seems inconsistent to me.

OpenPMD should specify to call those delta_t, delta_x, delta_y and delta_z instead.

Filtered Particles: Mark as Duplicate

Some PIC Codes (such as EPOCH) have the ability to dump particle subsets based on a variable set of requirements. For Example: in LWFA dump only the electrons that have gamma >2. Is this planned?

Alternative Idea

@skuschel commented

Instead of marking duplicates we could mark entries as 'original' data, meaning: not downsampled, not only a part of the simulation grid: the grid is the actual full simulation grid and the data is the actual quantitiy, that was used for calculations. for a particle species: the particle species is complete, this is not a subset.

Coordinate System

@RemiLehe @ax3l

  • cartesian
  • cylindrical:<+m|-m>
  • other

We still have to decide if it is that general that it goes in the openPMD base format or into openPMD + ED-PIC for the beginning.

ED-PIC: Checkpoints and Dumps

The "required" particle attributes in the ED-PIC extension are currently pretty broad.

One could relax the "required" to "reserved" attributes when defining a minimum kernel of attributes for checkpoints. Nevertheless, non-complete data sets are not ideal for reproducibility.

First noted by @skuschel

Boundary Conditions for Particles & Fields

Boundary conditions for particles & fields are still missing.

Most of them, such as periodic, Lindman open-space, "absorbing", ... might be general enough for STANDARD.md, others such as PML can be added by extensions.

On a side note: also initial conditions are interesting, but not directly necessary for, e.g., restarting from a checkpoint file (open new issue for future?).

Downsampling: General Attributes

For (field) data sets that were down-sampled, we could add that information so one can calculate the original grid spacing and size again that was used in the simulation.

Things that we have to make sure:

  • down-sampling via striding or some kind of averaging (as an other attribute "...parameter")
  • striding is also interesting for particle data (but a bit more dangerous due to internal data structures that might now represent a perfect statistical sample if strided)

`dataOrder`: be more explicit and apply also to attributes

STANDARD.md -> Field records:

the dataOrder attribute should define that it also applies to the order of arguments in:

  • gridSpacing
  • gridGlobalOffset
  • position

Add an example, e.g., that gridSpacing should be an array in order z,y,x for C style

Global definition of the time at the top file level

When working with time series of OpenPMD files, it is rather cumbersome to get the time to which each file corresponds.

The main reason is that the time is an attribute of a record. I understand the rationale for this (for instance E and B may be staggered in time). However, that means one has to dive into the file (checking if there is any field at all; if yes, looking up which fields are defined and finally getting the time; if not checking if there are any particles; if yes looking up which particles are defined, and accessing the position record which contains the time.)

This seems unnecessarily complicated. Could we defined a variable at the file root level ( '/' ), e.g "centered_time", which would correspond to the time of the time-centered quantities ? In this case, we would also keep the more fine-grained time at the record level.

Define <> and []

Required and optional parameters. Define that <> and [] is not part of the variable.

Is there a std rfc for that?

particle record: offset

the particle record needs a counterpart for the mesh's gridGlobalOffset (positionGlobalOffset)

Constant Data

For data that is not changing over time, such as geometry information for detector positions or grids, one could think about a general path to store it (outside of /data/%T/, e.g., /data/constant/).

this can cause problems when only sharing a single time step of a simulation.

Checkpoints: Additional Information

Special treatments for moving window required

  • offsets / "slides" if implemented

and minimal information for ED-PIC necessary:

  • E, B
  • all particle attributes

make clear use of keywords like "required", "mandatory", "recommended", "additional", "reserved", "optional"

I think we are mixing up here 2 different things while talking about openPMD, because people have diffenrent things in mind what the standard should do and what it shoudnt.

  1. I (perspective of a PIC user) usually have in mind to have unified dump format, that specifies for example where E and B fiels and particle properties are dumped, in case they are dumped. I also sais WHERE all information about the simulation is and how to convert the data to SI units, IF the data is dumped. But I want to be able to define what I need to dump because I care how much storage I need when doing 100s of simulations and evaluating them afterwards.

  2. perspective of a PIC code writer: I need a format to dump everything to be able to restart the sim from there if needed.

In the end this is why keywords like "required", "mandatory", "recommended", "additional", "reserved", "optional" can mean different things depending on the way you think what it is designed for.

I think we need to make the point clearer what openPMD can do in a very minimalist version (i.e. imagine the only thing you want is dumping the energy density of the simulation on a grid that is reduced in number of cells compared to your real simulation grid -- Just visualization purposes.) to using its full capabilities (i.e. using an openPMD dump to restart a simulation with code X, that was dumped in the first place by code Y).

Checker: Test Types and Formats

Test and create (valid on 32 and 64bit machines):

  • types of attributes (numpy docs)
  • format of some strings, e.g., the date attribute
  • length of array attributes (e.g., particlePatches)
  • types of specific records (usually free type for records)
  • add test that componentOrder actually describes existing record components

thetaMode: require geometryParameters

For the geometry thetaMode we should make geometryParameters mandatory, so people do not skip it thinking of "oh it's implicit in the dimensions of the record".

Allow non-uniform steps

decouple "time" and "step" - a step does not need to be in time.

  • step #
  • current time
  • current dt

-> absolute time & t_0 != 0 offset
-> might need extra opt. attributes

Remove EOL Whitespaces & 80 Char Limit

  • copy paste into word/writer and check typos/spelling again
  • Remove EOL whitespaces and other style minor cleanups sed -i 's/ *$//' fileName
  • 80 char limit for all files cat fileName | fold -w 80
  • run sed 's/\t/ /g' -i fileName on all files one more time

Linter to fulfill VizSchema

Comparison of the openPMD standard to Tech-X's VisSchema:

Facts about VisSchema:

  • very free set of namings of groups and records (can be good and bad at the same time)
  • developed for visualization of various particle & mesh codes, but mainly for VORPAL
  • very close to (serial) VTK/XDMF namings (can again be good and bad at the same time)
  • they sometimes refer to it as a "markup language" for HDF5/NetCDF which is actually a good naming (e.g., in their composer)
  • wiki, talk, IPAC2010, ICAP2009
  • ADIOS (meshes only?), HDF5 & NetCDF VisIt plugins
  • also a minimal kernel of required mark-ups
  • provides C/Fortran/Python API (HDF5 & NetCDF? open source?)
  • time: also with data sets, but allow axis of a multi-dim mesh to "represent time"
  • markup-features:
    • allows "time" as an axis for multi-dimensional data sets
    • general section: try to capture also: compiler (version, tries to record flags...), host, cmd startup-line
    • some labels (e.g., strings including non-generic units for axis)
    • meshes are records that are defined and can be referenced by other records (useful abstraction for complicated geometries)
    • dx, dy, ... implicit from lower and upper bounds & number of elements (for structured grids)
    • derived records via groups; functions only what VisIt (better say VTK?) directly supports
    • checker tool VizSchemaH5DataValidator.py open source?
  • (intentionally) missing in VisSchema:
    • not open source? (or only parts of it)
    • out of VizSchema-scope: human-readability, documentation for archives/supplementary materials, domain-specific extensions, code-interoperability, filtered data sets (e.g., storing the same particle in multiple records)
    • generally missing: a documentation that is not the wiki and not the code itself (xml descr.)? lacking parallel particle patch hints? staggered positions of components? units (besides hard-coded string labels)?

It might be actually possible to add all required "vs..." attributes out of an openPMD file.

Move of Repository

Dear @ComputationalRadiationPhysics/openpmd-contributors and @ComputationalRadiationPhysics/openpmd-maintainers ,

we will move the repository to it's own organization during the week under
https://github.com/openPMD

we will split up the repo and add additional tools/scripts.

Since the repository with version 1.0.0 will become public with the change, you should still be able to access it :)

Best regards,
Axel

Step: Self-Aware

Some file formats, such as adios, can be used in a step-aware manner.

We could consider relaxing the requirements for the basePath in that case, but if it complicates the format by adding more exceptions we might still enforce it.

timeStepFormat

Hey,

I'm writing a Python function that outputs to the OpenPMD format.
However, I'm not sure I understand the "timeStepFormat" attribute.
For instance, if I'm using a "fileBased" system, and that I'm writing a file named "fields0005000.h5", what should I put as a "timeStepFormat"

  • "fields0005000.h5" ?
  • "fields%T.h5" ?

I'm guessing the second one, but maybe it would nice to make this clearer with an example in STANDARD.md.

What do you think ?

Remi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.