Giter Club home page Giter Club logo

Comments (7)

cboulay avatar cboulay commented on June 12, 2024

Hi Andrew,

I don't think hdf5 lends itself well to having multiple interleaved streams of data. I might be wrong, but I don't think it can have multiple datasets that are all able to grow arbitrarily in size after they are created. Can it? I know it can have 1, but I seem to recall facing a problem when I tried to have more than 1.

Even if it can have multiple resizable datasets, HDF5 is just a container. The organization of the metadata, datasets, timestamps, etc don't have any universal specification. The only thing that would improve (over xdf) from a user perspective is that they could use any widely available hdf5 importer to get the data in memory, but they would still have to write a custom layout-specific importer to get an intuitive representation of the data. This is arguably worse than using the existing xdf importers we have.

So then we would need to specify a data layout, and create (and maintain) xdfh5 tools.

There are some nice tools for hdf5 (e.g. Dask) that have some great features when loading large data... so I see the value in hdf5 support. However, I think this value can be more easily obtained with an xdf to hdf5 converter. You can already do this pretty easily with NeuroPype and not really lose any information, but you still encounter the problem of having to know the neuropype h5 layout when you want to load it with tools other than neuropype. I think NeuroPype can also write to nwb. If it doesn't yet then it should be able to soon.

I also have the opinion that importing arbitrary h5 files in Matlab kind of sucks and I don't see it as an improvement over the xdf-Matlab importer.

from app-labrecorder.

dioptre avatar dioptre commented on June 12, 2024

This is an interesting argument of decarative vs imperative-ness of storing data. I see what you're saying regarding not being explicit enough and losing the meaning of data - that said we can use metadata - so I'm not sure if you'd lose any specification info. I don't think there's an issue of multiple resizable datasets.

On the other hand, you'd be able to use the format nearly everywhere and in whatever language you'd like... which would open us to use more sophisticated tooling than matlab? Perhaps ubiquity might be a good option over specialization? But its an old ongoing argument in the tech world.

from app-labrecorder.

cboulay avatar cboulay commented on June 12, 2024

that said we can use metadata - so I'm not sure if you'd lose any specification info

We'd still have to agree on the specifications. What will the field names be exactly? What is the type and layout of the information they hold? etc. But actually maybe it's just better to create & use a specification that is an extension of NWB? e.g. NWB:X? (I would suggest NWB:N, but I don't think that holds all of the information we need for different modalities. LSL ecosystem records more than just EEG.)

I would be interested to see a pull request wherein LabRecorder users have the option of saving either to XDF or to NWB.

But, that seems like a lot of work and the value added isn't a whole lot more than what you would get with the comparatively easier work to write a lightweight pyxdf --> pynwb converter, if for some reason you aren't happy with the conversion options available in NeuroPype or MNE-Python. This tool could also come with an anonymization feature and a BIDS validator. I think there's a lot of value here.

There's a much bigger problem that I forgot to mention before.
XDF stores clock offsets between the recording computer and every stream it records. Then, upon loading, the xdf importers (pyxdf and xdf-Matlab) use those clock offsets to synchronize the streams. While LSL provides functionality to do the clock adjustments in real time, in practice it is better and more accurate to do this synchronization offline after all the data and clock offsets have been recorded.

So, if we were to write to another format, we would have to either:

  1. Do the synchronization online; this gives a worse experience than XDF.
  2. Store the clock offsets in the HDF file somewhere, and then when people load the HDF file using something other than the official importer we provide and inevitably ignore the offsets, they will complain about how terrible the synchronization is and/or bombard us with questions about how to synchronize.

So once again, a converter is a much better option.

from app-labrecorder.

agricolab avatar agricolab commented on June 12, 2024

I agree fully with Chadwick. HDF is not suitable for continuous recording. So we would need a converter first. But a format like xdf will be more compact, and we already have xdf loaders for many languages. So where is the benefit of hdf?

from app-labrecorder.

dioptre avatar dioptre commented on June 12, 2024

My use-case is that I'm interested in far more than neuro data (just to fill in the background). It's interesting about doing the clock offsets post processing. Totally understand why you like xdf now. Perhaps is that something that we could improve in the underlying LSL protocol? Wonder if we can include sync and buffering while writing realtime. We use our own format in neuromore.com studio but I am pushing the team more towards open source - so interested to see what best practices we can develop with the community. NWB looks like what I might be after - super interesting, thanks @cboulay

Not sure what you are on about @agricolab

from app-labrecorder.

agricolab avatar agricolab commented on June 12, 2024

In my limited experience, manipulating or extending an hdf5 file usually results in bloated files. see https://support.hdfgroup.org/HDF5/doc/H5.user/Performance.html
That's why i feel they are not as suitable for online recording of data with variable chunk size, and why it seemed to me the approach which will result in the most compact files is a post-hoc conversion from xdf to hdf.
But i might just not be clever enough to understand the intricacies of how to set up extendible hdfs.

from app-labrecorder.

dioptre avatar dioptre commented on June 12, 2024

Interesting @agricolab, I haven't come across that issue myself as I don't remove chunks from the data I'm capturing. From what I read though if you do get bloat you can rewrite the file and it will free up the space. Astronomers use h5 so I'm sure it can capture streaming sufficiently at massive bitrates. I'm a fan of using standards that way you benefit from so much extra tooling and platform support.

from app-labrecorder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.