hpc4cmb / tidas Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 5.25 MB

TIDAS (TImestream DAta Storage)

License: Other

Shell 0.25% C++ 36.74% C 58.40% Python 3.65% Makefile 0.20% M4 0.19% Objective-C 0.01% CMake 0.56%

tidas's People

Contributors

Watchers

Forkers

tskisner

tidas's Issues

OS X MPI segfault

Under two different OS X toolchains (clang++ / mpich from conda, and gcc-mp / mpi from macports) the python MPI unit tests produce a segmentation fault.

Python unit tests should use unittest module

Currently the python unit tests are just sort of ad-hoc. The C++ unit tests use gtest, and the Python unit tests should use the built in unittest package, with a custom runner called by the tidas.test() method.

Documentation

Before release, we need:

API reference generated by doxygen and sphinx.
Overview / philosophy discussion, including a figure that visually shows the data layout.
Section covering high-level volume operations, including creation / copying / linking / duplication.
Minimal sections on groups / dictionaries / intervals / blocks, with mainly just links to API docs.
Several meaningful examples.

Add Travis support

We should enable travis integration. There are relatively few dependencies, so this should be easy.

Complete the C/Python bindings

The C and Python bindings do not yet have all the functionality of the C++ interface. This should be fixed eventually.

Per-object backend options

Currently the selection of the backend format (e.g. HDF5) and all options to that backend (compression, chunksize, etc) are set at the volume level and applied to all groups within the volume. This is not ideal, since different kinds of data will require different options. Each object has all of the backend options stored in its location (backend_path object). We should devise a way to set these differently on a per-object basis. One possibility that would be easy to implement would be to use a selection string to specify which objects to set to a particular configuration.

ctypeslib.py Error when importing tidas from python2.7

Installed tidas from github with autoconf, mpi-disabled and default prefix
Running Ubuntu 16.04, python 2.7.12, numpy 1.12.0
When trying to import tidas from python2.7:

Traceback (most recent call last):
File "./demo_telescope.py", line 29, in
import tidas
File "/usr/local/lib/python2.7/dist-packages/tidas/init.py", line 17, in
from .ctidas import (
File "/usr/local/lib/python2.7/dist-packages/tidas/ctidas.py", line 153, in
npu8 = wrapped_ndptr(dtype=np.uint8, ndim=1, flags="C_CONTIGUOUS")
File "/usr/local/lib/python2.7/dist-packages/tidas/ctidas.py", line 146, in wrapped_ndptr
base = npc.ndpointer(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/numpy/ctypeslib.py", line 288, in ndpointer
num = _num_fromflags(flags)
File "/usr/local/lib/python2.7/dist-packages/numpy/ctypeslib.py", line 163, in _num_fromflags
num += flagdict[val]
KeyError: u''

Documentation Bugs

The docs use sphinx, and the C++ code is first passed through doxygen before importing the results using the "breathe" plugin for sphinx. For some reason, using the doxygenclass command from breathe puts both the C++ class docs AND a poorly formatted copy of the corresponding Python class. See for example the source here:

https://github.com/hpc4cmb/tidas/blob/master/docs/sphinx/group.rst

and the generated output here:

http://hpc4cmb.github.io/tidas/group.html

Looking at the html source it seems like maybe the html div ID for the constructor of the C++ class is getting mixed up with the python class. This makes things ugly and should be fixed.

Python bindings

Finish serial index support

Basic documentation of core C++ API

MPI metadata sync not working

Although read and broadcast of metadata seems to work, the mpi gather and replay of transactions seems to have a problem.

Finish MPI index and volume operations.

SQLite index very slow

The SQLite index is so slow that it is a blocking factor. Looking through the code, there are several key mistakes:

prepared statements are re-created every transaction, rather than creating once and binding values.
When doing a single meta-data operation (adding or deleting one object), then it is fine to make that a single transaction. When replaying an long list of operations, these should be batched into a single transaction.

MPI configure check does not fail gracefully

If the MPI compilers are not found, the configure check assigns MPICC and MPICXX to the serial compilers rather than disabling MPI.

group time I/O methods should support reading a slice

Currently the read_times() and write_times() methods force I/O of the entire timestamp vector. This should be changed to support partial I/O, just like normal fields.

Slow metadata index operations when creating many objects

The serial volume class uses an SQLite DB as its metadata store. When adding new objects to the volume, this DB is updated. Since we can't predict what the user is doing, each object insertion is a discrete SQL transaction. This takes a fraction of a second, but when first creating a volume and adding thousands of objects, this can be slow.

Note that this does not impact the MPI volume, since in that case metadata operations are stored locally in memory (very fast) and replayed to the main SQLite DB during a metadata sync using a single transaction (also very fast).

One solution is to also use an in-memory metadata store even in the serial case with explicit sync to disk.

C bindings

Create C bindings, needed for python and Fortran bindings.

hpc4cmb / tidas Goto Github PK

tidas's People

Contributors

Watchers

Forkers

tidas's Issues

Recommend Projects

Recommend Topics

Recommend Org