Giter Club home page Giter Club logo

Comments (8)

timholy avatar timholy commented on May 30, 2024

Agreed 100%. I was basically waiting for immutables to land before getting serious about Compound support, and never got back to it.

Because I don't need this right away I probably won't get to this immediately; feel free to tackle it, or I'll tackle it myself in a week or two.

from hdf5.jl.

timholy avatar timholy commented on May 30, 2024

(I have some julia/Profile.jl bugs that need fixing first.)

from hdf5.jl.

simonster avatar simonster commented on May 30, 2024

I've started on this, but I'm still thinking of the best way to handle things. The first decision to be made is whether we store only immutable bits types as compound types; we store all immutable types as compound types; or we store all Julia types as compound types.

There is an undeniable appeal to storing all Julia types as compound types. Reading/writing objects that contain bits type fields would be significantly faster, since those fields wouldn't need to be references. The differences between immutables and ordinary objects would just be in the way arrays are handled (i.e., as arrays of references or arrays of values). I think we could even reconstruct missing/changed types by dynamically generating a new type based on the compound type definition. The major downside is that we'd need to break compatibility with existing JLD files, leave around a method to read them, or create a converter.

I've also been thinking about how to efficiently convert HDF5 compound types to Julia types. "Efficiently" ideally means that, once the compound type is read into memory, we convert the compound type to a Julia type in place and avoid additional allocations. For immutable bits types and arrays thereof, this is easy, since we just need to add padding in the right places. It might even be possible to get the HDF5 library to perform this conversion for us. For normal Julia types, where arrays are stored as references, there isn't necessarily a big advantage to in-place conversion, since we'll never be converting very much data at a time, although we would avoid an allocation for each object. For arrays of immutable types with pointers, we would need to convert HDF5 object references to pointers in-place to avoid allocating a second buffer.

Allocating a second buffer isn't that bad, and to start with I'll probably just do this, but it limits the maximum size of an array of non-bits immutables to half of the system's available memory. It might be possible to avoid, though, either by giving the HDF5 library custom conversion functions using H5Tregister that would perform in-place conversion of objects with object references or by reading from the HDF5 file directly from Julia (although I'm not sure how to make this work for chunked datasets).

from hdf5.jl.

timholy avatar timholy commented on May 30, 2024

I haven't thought about this in ages, but how would one handle a type declaration like

type MyType
    x::Real
end

and for an array of them, some are Float64 and others Int16?

from hdf5.jl.

simonster avatar simonster commented on May 30, 2024

We'd store anything that's not a bits type as a reference in the compound type, effectively mirroring the way Julia stores types in memory. If we have to reconstruct the type from the compound type definition because the Julia type changed or no longer exists, we'd just leave reference fields untyped.

from hdf5.jl.

timholy avatar timholy commented on May 30, 2024

That seems very reasonable.

Overall I think this sounds like a great plan. As much as it pains me to break JLD compatibility, I think the reality is that these files are not yet in heavy use, but probably will be some day (I'm just starting to make use of them in practice in my own work). So now is the time for breakage if there ever is. Moreover, since there is a version number, in principle we have all the information we need. The "converter," if we need one, could even be a current snapshot of jld.jl, with a different module name (e.g., JLD01).

As far as efficiency goes, presumably there may be places where it matters and where it doesn't (since IO is expected to be somewhat limiting). To me it seems that you have a great plan. I agree that, ultimately, we could probably get the HDF5 library to insert padding etc for us, but also that such optimizations can come second if they're nontrivial to get working.

from hdf5.jl.

simonster avatar simonster commented on May 30, 2024

Making progress: https://gist.github.com/simonster/50d282a533a76eaebbb3

Next step: reading it out.

from hdf5.jl.

timholy avatar timholy commented on May 30, 2024

Oo ooh! Very nice! I'm really excited about this.

Aside from my Vector{Vector{T}} blunder (#123), the major changes that I expect you must be making here were another motivation for the JLDArchives package. Always nice to have an easy way to run tests to see what has been broken.

from hdf5.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.